+ Start a Discussion

general standard/best practice for Lead de-duplication

I'm sure companies value different fields of data differently but I'm just trying to get a feel for typical de-duplication methods. Do most people rely on email and name or company to determine duplicates? If a duplicate lead is found what is your general course of action? Do you look to merge fields or maybe just discard one of the leads? Do you also check for existing contact duplicates? Maybe we compare different lead sources and if they are different then combine the data since different sources for us mean different kinds information came with the lead. I'm just looking for some general examples so any input would be great.




A lot depends upon what your needs are, what your data inflows look like, the population from which they are drawn, and the size of the data.


Matching on emails is simple, but more sophisticated dupe checking tries to deal with such things as:


1. Maggie = Margaret


These might be addressed by using a custom table of equivalent names. But do you have such a table that applies to the populations that your leads/contacts are drawn from?




2. 123 Primrose Lane = 123 Primrose Ln


These can be addressed by normalizing the address according to post office standards. The U.S. post office provides web services that do this, and may also publish algorithms.



So, deduping often involves a in which you normalize elements (map "Liz" to "Elizabeth", "Lane" to "Ln", remove punctuation from telephone numbers, etc.


And then a step in which you compare fragments. For instance, if you have matching zip code, first name, and leading digits of address first line, that's probably a good match.



Thanks for the information.