+ Start a Discussion

Dealing with large datasets

Hi all,


I have a problem when trying to develop a solution that deals with 'large' amounts of data. The situation is the following. 

We have to develop for our customer a solution that handles records inserted in their Salesforce under a custom object (Unprocessed Agreements). Each record from this custom object hosts information that, once processed, will end in Accounts, Contacts and a custom object called 'Agreements'. Basically, each record of the custom object has all the information of an agreement with its account and its contact related. So processing each record of 'Unprocessed Agreements' means to create an Account (if not already in SF); a Contact (if not already in SF) and an Agreement (if is already in Salesforce, updates this agreement). All the logic built to meet this requirements has to rely on a scheduled class that will run and perform this every day.

The number of agreements, Accounts and Contacts might be around 3000 for each entity, in order to not hit the SOQL limits, we developed the functionality so that first 'dumps' all the accounts, contacts and agreements existing in Salesforce into Lists, and we do searches over that lists to check if the Accounts, Contacts or Agreements were created before, then we build the logic on top of this checks. 

Our problem seems that when we want to perform searches over those Lists of records, we hit a limit: ''Too many statements: 200001" when looping over those lists. We make sure to not hit the SOQL query limit, but we faced this limit mentioned above. The logic built works as desired when we keep low the amount of Accounts, Contacts and Agreements. 

Do you have any ideas on how to handle this?

Best regards, 






As you have already found out and pointed out, the large issue is because of attempting to loop over all accounts, contacts and existing agreements which will probably not work in the long run especially as the Accounts and Contacts grows larger.


My first question is, could you just do an upsert?  When your custom object comes into the system with all of this Account/Contact/Agreement data that is already in the system, would it harm anything to just have those records be updated?  Its possible this data could be out of date in which case I understand this would not be an option, but I feel like having the code trigger on the custom object would be far more efficient.


If thats not possible, could you instead use a Set instead of a list to validate if that data already exists?  That way when you pull in all of the accounts, you don't have to loop over all of the records, you can just say:



If it comes back as true, it already exists, if it comes back as false, add it to the list of records to add.


Hopefully this helps a bit.  Best of luck!


Hi Aar3538,


Unfortunately, your first approach is not doable, but the second one is the one I'm trying to explore.


I just have a question. When dealing with Sets (or maps, I guess is the same) can you use the function 'contains' with fields that are not marked as key? e.g.: accounts.contains(myStreet)


Many thanks for your insights.


Best regards,



Noam DganiNoam Dgani

how are the records inserted? data loader? web services? ETL?

The point i'm getting at is that most of the non manual methods for inserting data into SF allow you to set a bulk size. for instance, the data loader basic setup (without bulk API) uses bulks of 200. so basically, uploading batches eliminates the problem.

alternativlly, if by some way you get all ~10K records at one transaction, use batch apex to chop it up to batches that suit your measure.