Leveraging AI for Data Cleanup to Achieve Accurate Customer Segmentation

| | 3 min read

Accurate customer segmentation is one of the key requirements for a successful outbound campaign. The more specific the segment, the better the opportunity to personalize messaging and achieve a higher response rate. However, business development executives often struggle with messy contact data, which hinders their ability to create targeted campaigns. This article discusses a problem we've solved by fine-tuning a generative AI model.

The Challenge: Inconsistent Contact Data

Raw address
Sample inconsistent data

As part of one of our engagements supporting a business development leader of a large organization, we helped the client streamline their business development efforts. When we analyzed their workflow, one of the challenges we identified was the inefficiency in cleaning up data. We focused on their latest campaign, where a business development executive was working with a contact database containing 5,000 entries with inconsistent address formats. In this campaign, one of the criteria for personalizing messages was the city or state information. However, a team member was manually going through each contact, finding the city, state, and country information, and updating the CRM. This approach was far from ideal, especially in an era where AI is so accessible.

 

 

The Solution: Leveraging Generative AI for Data Cleanup 

To address this issue, we leveraged the capabilities of Large Language Models (LLMs) by fine-tuning the model specifically for data cleanup and standardization.

Fine-Tuning the Model Fine-tuning allows you to provide a set of training data to the LLM, which includes both input and desired output. Most LLMs can be fine-tuned with as few as 100 examples, but the more diverse the training data, the better the results.

Automating Training Data Generation 

Training Data

While training data can be generated manually, we used our automation capabilities to expedite the process. We extracted addresses in non-standard formats and utilized Nominatim, a search engine powered by OpenStreetMap data, to generate city, state, and country information. This method provided accurate location data for a significant portion of the addresses. However, due to Nominatim's rate limits and the unstructured nature of the addresses, some challenges remained. To overcome these, we supplemented the dataset with 10-15 manually entered examples to ensure diversity and accuracy.

Implementation: Creating a Fine-Tuned Model

Fine Tuning Gemini

We organized the data in a Google Sheet and used Google AI Studio to create a new fine-tuned model. The input column was marked as the unstructured address field, and the output columns were labeled for city, state, and country.

The training process took a few hours. Google Gemini, which allows free fine-tuning of their models, enabled us to create a customized model ready for deployment within a short period.

Results: Efficient Data Standardization 

City from Address
Google AI Studio Prompt using the newly created model

Once the model was fine-tuned, the process became much simpler. We quickly developed a script to iterate over the list of 5,000 addresses and generate standardized city, state, and country data for each record. Only a handful of records remained without proper location data, primarily due to incorrect or missing addresses.

This AI-powered solution significantly reduced the time and resources required for data cleanup. The business development team was able to segment the data effectively based on accurate location information, leading to more targeted and successful campaigns.

Conclusion 

This use case demonstrates how generative AI and automation can streamline operations, save time and resources, and enable businesses to achieve greater efficiency. Whether it's data cleansing, customer segmentation, or location-based filtering, AI-powered solutions offer a powerful way to tackle complex challenges. 

Imagine the possibilities of freeing up your team's time, reducing manual errors, and achieving more targeted campaigns. Would you be open to exploring how AI can help your department achieve these benefits? Contact us for a free consulting.