function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion
smita bhargavasmita bhargava 

remove legacy records with duplicate email

Hello friends
I have a custom object called fan which has ID field(say fanID) and email as an mandatory field.
There is a validation code in apex which says every fan must have a unique emailID.
Before this email validation rule was written,   legacy records were inserted into the fan object, so as of now 30K users have the same email ID but different fanID.
I need to have one Fan ID per unique Email.

so can anyone please suggest how I approach this issue the bestv possible way?

thanks
smita
Malni Chandrasekaran 2Malni Chandrasekaran 2
Smita,
If handling 30k records is your problem, you may try Batch Apex.
https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/apex_batch_interface.htm

Please mark it as solved, if this helps solving your problem.
Ashish DevAshish Dev
Best approach would be to export data into csv file and identify duplicate records in excel through excel formulas. Keep the desired records and mark other records for deletion. Now prepare 2 files, 1 for records updation and other for records deletion,

Let me know if further help is required. 
smita bhargavasmita bhargava
Hi ashish
there are about 25 lakh records, so loading this into csv file , it will hang.
even if I do this, can u please elaborate ur approach more in detail?
so I get a file for updation records. so how to process this?

thanls
smita

 
smita bhargavasmita bhargava
Hi maini
I am looking at the coding aspect also the non coding aspect.
could you pls provide a sample code(execute method) how to process the duplicate email for each fan record?
this will be helpful.

also please let me know ur opinion. will a merge(merge function in salesforce) on fan records help?
if yes cud u pls elaborate with syntax?

or is there a way I can perform a rollup to the last updated fan record? so any future records will have unique email.
pls let me know.

thanks
smita
 
Ashish DevAshish Dev
If there are 25L records then definatly excel wont be able to handle it. You will either need Mac or tool which handle large amount of data like R programming language.
In your case Batch apex would be good approach. Implement Batch in which get the duplicate records of current batch records, keep the desired one and delete other duplicate records. 
 
smita bhargavasmita bhargava
Hi Cud you pls provide sample code?
smita bhargavasmita bhargava
HI ashish

Instead of deleting the duplicate records can I update the email of those records ?

smita
Malni Chandrasekaran 2Malni Chandrasekaran 2
Hi Smita,
As far as I know, merge function only works with standard objects - Account, Contact and Lead. I dont know if this can be done with custom object.
Sample code for Batch apex is given in the link I already shared with you. If you are looking for exact code, I may try it over the weekend (sorry).

In case if you have not found the solution,
Step 1: In Start method, using QueryLocator write a group by (email id) query to get all the records where count is > 1 to fetch all the records  having duplicate email Id. something like 
select EmailId from Fan group by EmailId having count(EmailId) > 1
And write a query to fetch the final list of records which matches with the email id list that we got in our previous query.

Step2:  In Execute method, you may write the logic to delete the records in the list.

If you want to keep one record, you may use Min(Createddate) or max(CreatedDate) aggregate functions and exclude those records in your second query.

Let me know if you need the exact code and I can get back to you over the week end.

 
smita bhargavasmita bhargava
Hi Maini

Could you send me the code please?

thanks
smita
Malni Chandrasekaran 2Malni Chandrasekaran 2
Smita,
Sorry I didnot see your request.
If you haveh not found the solution yet,
Please try,
public class DuplicateEmailCheck implements Database.Batchable<sObject>
{
    public Database.querylocator start(Database.BatchableContext BC)
    {
        //The below query returns all the emailId which has more than one entry
        return Database.getQueryLocator('select emailID__c from fan__c groupby emailID__c having count(emailID__c)>1');
    }
public void execute(Database.BatchableContext BC, List<AggregateResult> scope){
    Set<String> arEmailId = new set<String>();
    //Below loop is to collect all the email ids from aggregateresult of start method to a set which will be used to filter records .
    for (AggregateResult ar : scope)
    {
           arEmailId.add((String)ar.get('emailId__c'));
    }
 List<Fan__c> fanList = [select Name, emailID__c from fan__C where emailId__c in :arEmailId];
 delete fanList;
}    
public void finish(Database.BatchableContext BC)
{
}
}