+ Start a Discussion
Reppin__505Reppin__505 

Batch Apex Class - Querying Against Large Data Sets

I have a batch apex class where i'm building collections of websites and emails, so that i can use those collections to filter other other queries which will be made into collections. With all collections set, i want to run through a final loop of the scope to perform business processes.

 

Mockup:

 

for(Object o : scope)

{

listEmails.add(o.Email);

listWebsites.add(o.Websites);

}

 

Map<String, Account> accounts = Gather all accounts where website not in :listWebsties; //Website is key

List<String, Contact> contacts = Gather all contacts where email not in :listEmails; //Email is key

 

for(Object o : scope)

{

   Account = accounts.get(o.website);

Contact = contacts.get(o.Email);

 

Perform business logic here

}

 

The problem is when i run this batch it stays processing for hours. When working with a rather small database this works fine. But in working in a larger environment perhaps this is not the best solution. 

 

Can anyone help me speed up the batch process with a more effective approach?

 

Thanks.

Reppin__505Reppin__505

Additional Info:

 

I don't supply a batch size. The scope for the first process i ran was only 300 records. So a collection of 300 list items is used for filtering the other objects queried ie Accounts and Contacts.

 

It's those other objects which are queried that is slowing the process dramatically.

sfdcfoxsfdcfox

You're effectively querying your ENTIRE database EVERY loop through your batch code: you stated that you query A from B, and then you query all NOT A from B, and the union of A and NOT A is all B. So if you have 10,000 records, your code has to process a total of  (200 records per batch times 500 batches) times 10,000 records, or 100,000,000 records. You can see this won't work scale well. You need to figure out what your business logic is doing, or you won't be able to accomplish your goal. I understand that you may be unable to unwilling to share your "business processes", but if you could share your project with us, we could help you create a better algorithm.

Reppin__505Reppin__505

I just noticed that in the psuedo code i used NOT IN. That's my mistake the actual code is filtering for the Accounts and the Contacts that are IN the list collections. This is what is so weird to me, why is the batch process taking so long to run when all i'm trying to do is build list of emails and websites, then build a collections of accounts and contacts with those emails and websites so that i can reference those objects in the latter scope loop. 

 

I want to reference the account and contact records which match the website or email of the final scope records.

 

That's all i'm trying to do here. But it's taking the batch process a long time to run.