function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion
Pedro Garcia 26Pedro Garcia 26 

Datacloud.FindDuplicates.findDuplicates in a high volume of data

Hi,

I need to run a batch to identify all the duplicates that were inserted before the duplicate rule was created.

We have more than 1M of records, I create a batch to find the duplicate using the Datacloud.FindDuplicates.findDuplicates class.

But it doesn't work with a high volume of data.
global with sharing class FuzzyMatchBatch  implements Database.Batchable<SObject> {
 
    global final String Query;
    global static Datacloud.FindDuplicatesResult[] results;
    global static List<Account> accountDuplicates = new List<Account>();

 
    global FuzzyMatchBatch(){
    }
 
    global Database.QueryLocator start(Database.BatchableContext BC){
       return Database.getQueryLocator([SELECT id, first_name__c, last_name__c, billingstreet FROM Account ]);
    }
 
    global void execute(Database.BatchableContext BC, List<Account> scope){

        results = Datacloud.FindDuplicates.findDuplicates(scope);

        for (Datacloud.FindDuplicatesResult findDupeResult : results) {
          System.debug(findDupeResult);
            for (Datacloud.DuplicateResult dupeResult : findDupeResult.getDuplicateResults()) {
              System.debug(dupeResult);
              for (Datacloud.MatchResult matchResult : dupeResult.getMatchResults()) {
                System.debug(matchResult);
                for (Datacloud.MatchRecord matchRecord : matchResult.getMatchRecords()) {
                    System.debug('Duplicate Record: ' + matchRecord.getRecord());

                    accountDuplicates.add((Account) matchRecord.getRecord());
                }
              }
            }
          }
     }
 
    global void finish(Database.BatchableContext BC){
    }
}

 
Best Answer chosen by Pedro Garcia 26
David Zhu 🔥David Zhu 🔥
I had a similiar project few month ago.
I think you should use loop to replace line 17. (results = Datacloud.FindDuplicates.findDuplicates(scope);)
Please refer to my blog page.
https://ideastreeconsulting.blogspot.com/2020/04/salesforce-custom-duplicate-job-to.html
 

All Answers

VinayVinay (Salesforce Developers) 
Hi Pedro,

Batch class using the Database.QueryLocator for instance is limited to 50 million records.

Review below the working example for Finding Duplicate Records using apex.

http://salesforceduplex.blogspot.com/2016/03/batch-class-using-apex-for-finding.html

Hope above information was helpful.

Please mark as Best Answer so that it can help others in the future.

Thanks,
Vinay Kumar
David Zhu 🔥David Zhu 🔥
I had a similiar project few month ago.
I think you should use loop to replace line 17. (results = Datacloud.FindDuplicates.findDuplicates(scope);)
Please refer to my blog page.
https://ideastreeconsulting.blogspot.com/2020/04/salesforce-custom-duplicate-job-to.html
 
This was selected as the best answer
Pedro Garcia 26Pedro Garcia 26
Hi David,

It works! Thanks. It's only small details: Line 75, the "record" variable name must be changed, it uses the same variable name in the loop line 36.

Thanks again.
David Zhu 🔥David Zhu 🔥
Thanks Pedro.