function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion
Von V.Von V. 

Is it possible to have Batch Apex that will find duplicates and without hitting SOQL Governor Limit?

I'm curious if it is possible to have Batch Apex which will be able to find duplicate records within same Object and Cross-Object. Example duplicate emails. 
There were five records run by batch, on the Second Run of batch there was a duplicate email found in Fist Run which is test5@email.com. 
I would like to know if is is possible to find duplicates in Batch Apex? Also, Is it possible that the SOQL query will be used will not hit SQL limit? 
First Run                                     Second Run
- test1@email.com                     - test11@email.com
- test2@email.com                     - test12@email.com
- test3@email.com                     - test5@email.com
- test4@email.com                     - test13@email.com
- test5@email.com                    - test14@email.com

Thanks.


 
Best Answer chosen by James Loghry
James LoghryJames Loghry
There's always some limit, afterall it is a multi-tenant architecture.  That being said, the original query you make in a batch class (using the Database.QueryLocator for instance) is limited to 50 million records.  You *should* be good there :)

The batch class runs in transactions of up to 2000 records, depending on what you use for your batch size.  From there you can query based on your duplicate logic, and then handle duplicates appropriately (either merge or delete them).  So yes, you should be able to find your duplicates and avoid governor limits with a batch apex approach.