function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion
Dilyan DimitrovDilyan Dimitrov 

How to normalize a large, user-generated data-set of company names

Hi,

I would like to know is there a way in salesforce to normalize a large, user-generated data-set of company names

We have user generated names of employers that come in all variations. For example

Google Google, Inc. Google Inc. Google inc

Does anyone have suggestions on how to normalize the existing entries, and also how to maintain we do it for all incoming names as well?
Donald BlayDonald Blay
I would suggest looping through the list and use the string replace method to replace the common words that you don't care about with blanks, and then adding those to a Set of names.  And since a set does not allow for duplicates, you should end up with a list of normalized named.  

Here is some psuedo-code that should point you in the right direction.  I did not compile it or test it yet, so you will need to put it into a method, and you may need to tweak it.  Also I cannot claim credit for the toProperCase method, I found that on a different thread (https://developer.salesforce.com/forums/?id=906F00000008ukJIAQ" target="_blank).   

Hope that helps
List<String> lstAllAccountNames = new List<String>();
Set<String> setNormalizedAccountNames = new Set<String>();

// Populate the list with all your account names somehow

for(String AccountName : lstAllAccountNames){
	//Normalize to lower, and we can title-case later with a function
	String normalizedName = AccountName.toLowerCase();

	// Take out all the common words you don't care about
	// Its best to do them in an order that would make them cumlative. 
	// For example, stripping out the comma first, then the ' Inc ' allows you to get take care of 'Google, Inc' and 'Google Inc'
	// And when you later do ' Co ' you'll be covered for 'Google Co' and 'Google, Co'
	normalizedName = normalizedName.replace(', ', '');
	normalizedName = normalizedName.replace('.', '');		
	normalizedName = normalizedName.replace(' inc ', '');
	normalizedName = normalizedName.replace(' co ', '');
	normalizedName = normalizedName.replace(' ltd ', '');

	// Now convert to Title Case using another method
	normalizedName = toProperCase(normalizedName);

	// Now add it to the Set to make sure you wont' have duplicates
	setNormalizedAccountNames.add(normalizedName);
}


public static String toProperCase(String value) {
        // Normalize - Convert to lowercase
        value = value.toLowerCase();

        // Hold each word
        List<String> pieces = new List<String>();

        // Split
        for(String s : value.split(' ')) {
            // Capitalize each piece
            s = s.capitalize();

            // Add to pieces
            pieces.add(s);
        }

        // Join
        return String.join(pieces, ' ');
    }