+ Start a Discussion
Sumesh ChandranSumesh Chandran 

SOQL inside for loop of execute method of a batch class

The below batch class will be processing a little over 2.5 million records. The START method will be sending in the whole 2.5 million records to the execute method and the execute method does the processing on each of the 2.5 million records inside a for loop. 
Also the for loop has a SOQL inside which I believe cannot be avoided to get the right numbers. 
Is this the right way of doing this or are there any other better ways. Please help!

Also when the running the below batch class I get the First Error: Too many query rows error.
 
global class MDUSquadRawDataBatchTest implements Database.Batchable<sObject>, Database.Stateful {  
    List<Address_Master__c> addressList = new List<Address_Master__c>();
    Set<String> addresses = new Set<String>();
   
    // Start Method
    global Database.QueryLocator start(Database.BatchableContext BC) {
        return Database.getQueryLocator('SELECT Street_Address__c,City_Name__c FROM MDU_Squad_Raw_Data__c');
    }   
    
    // Execute method
    global void execute(Database.BatchableContext BC, List<MDU_Squad_Raw_Data__c> rawData) {        
        for(MDU_Squad_Raw_Data__c mduRawData: rawData) {
            List<MDU_Squad_Raw_Data__c> addressData = [SELECT Street_Address__c,City_Name__c,Province_Code__c,Postal_Code__c,Internet_Service__c,Video_Service__c,Phone_Service__c FROM MDU_Squad_Raw_Data__c WHERE Street_Address__c=:mduRawData.Street_Address__c AND City_Name__c=:mduRawData.City_Name__c];
            String fullAddress = addressData[0].Street_Address__c+' '+addressData[0].City_Name__c+' '+addressData[0].Province_Code__c+' '+addressData[0].Postal_Code__c;
            
            Address_Master__c theAddress = new Address_Master__c();
            if(!addresses.contains(fullAddress.substringBeforeLast(' '))) {
                theAddress.Name = addressData[0].Street_Address__c;
                theAddress.City_Name__c = addressData[0].City_Name__c;
                theAddress.Province_Code__c = addressData[0].Province_Code__c;
                theAddress.Postal_Code__c = addressData[0].Postal_Code__c; 
                fullAddress = addressData[0].Street_Address__c+' '+addressData[0].City_Name__c+' '+addressData[0].Province_Code__c+' '+addressData[0].Postal_Code__c;
                theAddress.Full_Address_Ext_Id__c = fullAddress;

                addresses.add(fullAddress.substringBeforeLast(' '));
                addressList.add(theAddress); 
            }                                                     
        }            
        Database.Upsert(addressList, Address_Master__c.Fields.Full_Address_Ext_Id__c, true);
    }
    // Finish Method    
    global void finish(Database.BatchableContext BC) {
        
    } 
}

I request if someone could please help me with this, as I am dealing with this for some time, with no idea on how to fix this.
Best Answer chosen by Sumesh Chandran
Andrew GAndrew G
if you are still struggling, here is a code example that may point you in the right direction:
//Loop1 - get street and city name data into two lists
list<string> streetList = new List<string>();
list<string> cityList = new List<string>();
for(MDU_Squad_Raw_Data__c mduRawData: rawData) {
  streetlist.add(mduRawData.StreetAddress__c);
  cityList.add(mduRawData.CityName__c);
}
//do query:
List<MDU_Squad_Raw_Data__c> addressData = [SELECT Street_Address__c, City_Name__c, Province_Code__c, Postal_Code__c, Internet_Service__c, Video_Service__c, Phone_Service__c 
    FROM MDU_Squad_Raw_Data__c 
    WHERE Street_Address__c IN :streetList 
        AND City_Name__c IN :cityList];

//Loop addressData to make Map with Key street~city
Map<string,MDU_Squad_Raw_Data__c> mapAddressData = new Map......
for(MDU_Squad_Raw_Data__c raw : addressData){
  //psuedo - if no key, then add
  if !(mapAddressData.containsKey(raw.Street_Address__c+'~'+raw.City_Name__c){
      mapAddressData.add(raw.Street_Address__c+'~'+raw.City_Name__c, raw);
  }
}

//now we loop the original list checking the map for the key and then update as required
list<MDU_Squad_Raw_Data__c > updateList = new....
for(MDU_Squad_Raw_Data__c raw: rawData) {
    if(mapAddressData.containsKey(raw.Street_Address__c+'~'+raw.City_Name__c)){
        MDU_Squad_Raw_Data__c address = mapAddressData.get(raw.Street_Address__c+'~'+raw.City_Name__c);
        //now update the raw record with address record
        raw.postalcode = addres.province code;
//etc
        updateList.add(raw);
     } //endi f
}//end loop
if (updateList.size()>0){
   update udpateList;
}

Regards
Andrew

All Answers

Andrew GAndrew G
High level :

loop the raw data to build a list of elements that you want to use in your query.  e.g. street and city
once loop has finished, do the query using those elements
build a map using street and city as keys
loop the raw data and search the Map using the key
update the record if key found and add to list
once loop finishes, update the list 

this if fairly standard process for bulking code where a query is involved in the update of records.

Regards
Andrew
Andrew GAndrew G
if you are still struggling, here is a code example that may point you in the right direction:
//Loop1 - get street and city name data into two lists
list<string> streetList = new List<string>();
list<string> cityList = new List<string>();
for(MDU_Squad_Raw_Data__c mduRawData: rawData) {
  streetlist.add(mduRawData.StreetAddress__c);
  cityList.add(mduRawData.CityName__c);
}
//do query:
List<MDU_Squad_Raw_Data__c> addressData = [SELECT Street_Address__c, City_Name__c, Province_Code__c, Postal_Code__c, Internet_Service__c, Video_Service__c, Phone_Service__c 
    FROM MDU_Squad_Raw_Data__c 
    WHERE Street_Address__c IN :streetList 
        AND City_Name__c IN :cityList];

//Loop addressData to make Map with Key street~city
Map<string,MDU_Squad_Raw_Data__c> mapAddressData = new Map......
for(MDU_Squad_Raw_Data__c raw : addressData){
  //psuedo - if no key, then add
  if !(mapAddressData.containsKey(raw.Street_Address__c+'~'+raw.City_Name__c){
      mapAddressData.add(raw.Street_Address__c+'~'+raw.City_Name__c, raw);
  }
}

//now we loop the original list checking the map for the key and then update as required
list<MDU_Squad_Raw_Data__c > updateList = new....
for(MDU_Squad_Raw_Data__c raw: rawData) {
    if(mapAddressData.containsKey(raw.Street_Address__c+'~'+raw.City_Name__c)){
        MDU_Squad_Raw_Data__c address = mapAddressData.get(raw.Street_Address__c+'~'+raw.City_Name__c);
        //now update the raw record with address record
        raw.postalcode = addres.province code;
//etc
        updateList.add(raw);
     } //endi f
}//end loop
if (updateList.size()>0){
   update udpateList;
}

Regards
Andrew
This was selected as the best answer
Sumesh ChandranSumesh Chandran
Hello Andrew, I really appreciate the time taken to explain this. Just one doubt, does the code below compile addressData list in the right order, means is it going to have the right street_address & city_name combination. I am sorry if this is a newbie question.
List<MDU_Squad_Raw_Data__c> addressData = [SELECT Street_Address__c, City_Name__c, Province_Code__c, Postal_Code__c, Internet_Service__c, Video_Service__c, Phone_Service__c 
    FROM MDU_Squad_Raw_Data__c 
    WHERE Street_Address__c IN :streetList 
        AND City_Name__c IN :cityList];

 
Sumesh ChandranSumesh Chandran
Hello Andrew, Can I request you to look at the below code, its been re-written as per your advise. It runs,  but at the end I get this message on the Apex jobs page "First Error: Too many DML rows: 10001. Thanks in advance.
global class MDUSquadRawDataBatchTest2 implements Database.Batchable<sObject>, Database.Stateful {  
    global List<Address_Master__c> addressMaster = new List<Address_Master__c>();
    //global Set<String> addresses = new Set<String>();
    global Map<String, MDU_Squad_Raw_Data__c> mapAddressData = new Map<String, MDU_Squad_Raw_Data__c>();
   
    // Start Method
    global Database.QueryLocator start(Database.BatchableContext BC) {
        return Database.getQueryLocator('SELECT Street_Address__c,City_Name__c FROM MDU_Squad_Raw_Data__c');
    }   
    
    // Execute method
    global void execute(Database.BatchableContext BC, List<MDU_Squad_Raw_Data__c> rawData) {        
		List<String> streetList = new List<String>();
		List<String> cityList = new List<String>();
		for(MDU_Squad_Raw_Data__c mduRawData: rawData) {
  			streetList.add(mduRawData.Street_Address__c);
  			cityList.add(mduRawData.City_Name__c);
		} 
        
        List<MDU_Squad_Raw_Data__c> addressData = [SELECT Street_Address__c,City_Name__c,Province_Code__c,Postal_Code__c,Internet_Service__c,Video_Service__c,Phone_Service__c 
                                                   FROM MDU_Squad_Raw_Data__c WHERE Street_Address__c IN :streetList AND City_Name__c IN :cityList];
        
        Map<String, MDU_Squad_Raw_Data__c> mapAddressData = new Map<String, MDU_Squad_Raw_Data__c>();
		for(MDU_Squad_Raw_Data__c raw : addressData) {      
  			if (!mapAddressData.containsKey(raw.Street_Address__c+' '+raw.City_Name__c)) {
      			mapAddressData.put(raw.Street_Address__c+' '+raw.City_Name__c, raw);
 			}
		}
        
        //List<MDU_Squad_Raw_Data__c> addressMaster = new List<MDU_Squad_Raw_Data__c>();
        for(MDU_Squad_Raw_Data__c raw: rawData) {
            if(mapAddressData.containsKey(raw.Street_Address__c+' '+raw.City_Name__c)) {
                MDU_Squad_Raw_Data__c mapAddress = mapAddressData.get(raw.Street_Address__c+' '+raw.City_Name__c);
           		Address_Master__c address = new Address_Master__c();                
                address.Name = mapAddress.Street_Address__c;
                address.City_Name__c = mapAddress.City_Name__c;
                address.Province_Code__c = mapAddress.Province_Code__c;
                
                //addresses.add(fullAddress.substringBeforeLast(' '));
                addressMaster.add(address);                                
            }
		}          
        //Database.Upsert(addressList, Address_Master__c.Fields.Full_Address_Ext_Id__c, true);
    }
    // Finish Method    
    global void finish(Database.BatchableContext BC) {
        Database.Upsert(addressMaster, Address_Master__c.Fields.Full_Address_Ext_Id__c, true);
    } 
}
Morales SchofieldMorales Schofield
SOQL queries must be selective, particularly for queries inside of triggers. To avoid long execution times, the system can terminate nonselective SOQL queries. Developers receive an error message when a non-selective query in a trigger executes against an object that contains more than 100,000 records https://www.myaarpmedicare.us
Sumesh ChandranSumesh Chandran
Thanks @Andrew, I was able to make it work .
Sumesh ChandranSumesh Chandran
Hello Morales, thanks for the explanation. What was the link that you put in at the end? I found it totally unrelated.
Andrew GAndrew G
Hi Sumesh
To answer your question re: the SOQL query:
List<MDU_Squad_Raw_Data__c> addressData = [SELECT Street_Address__c, City_Name__c, Province_Code__c, Postal_Code__c, Internet_Service__c, Video_Service__c, Phone_Service__c 
    FROM MDU_Squad_Raw_Data__c 
    WHERE Street_Address__c IN :streetList 
        AND City_Name__c IN :cityList];


The data that is returned will contain all the address data that you require.  In fact, it is highly likely to return more addresses than required.
If we go back a step and consider the two lists we are creating and do some assumptions on the data passed.

Lets assume that we had 2 x initial rawData records with addresses as
First Avenue in Big City
Second Avenue in Other City.
What the lists will contain is (First Avenue:Second Avenue) and (Big City:Other City)
When we run the query it will return all address records with combinations of the two lists so, it may return 

First Avenue in Big City
Second Avenue in Other City 
but also
First Avenue in Other City 
Second Avenue in Big City
assuming those combinations also exist in your address data.

As I now re-read your code, it is possible that a better loop to start would be:
 

global void execute(Database.BatchableContext BC, List<MDU_Squad_Raw_Data__c> rawData) {        
		Set<Id> setRDIds = new Set<Id>();
		for(MDU_Squad_Raw_Data__c mduRawData: rawData) {
  			setRDIds.add(mduRawData.Id);
		} 
        
        List<MDU_Squad_Raw_Data__c> addressData = [SELECT Street_Address__c, City_Name__c, Province_Code__c, Postal_Code__c, Internet_Service__c, Video_Service__c, Phone_Service__c 
                                                   FROM MDU_Squad_Raw_Data__c WHERE Id IN :setRDIds ];
That loop would return definitely only the records that you are chasing.

But i think you would need to adjust your initial start method
// Start Method
    global Database.QueryLocator start(Database.BatchableContext BC) {
        return Database.getQueryLocator('SELECT Id, Street_Address__c,City_Name__c FROM MDU_Squad_Raw_Data__c');
    }

Also, Morales is correct in that the error is due to the nature of the query. SInce the initial query is non selective, it can throw an error. 

Consider if there is a way to reduce the nature/number of records returned by perhaps throwing in a WHERE clause on the query.

Regards
Andrew 

 
Sumesh ChandranSumesh Chandran
Hello Andrew, 
 Below is what I have as Id's in my custom objects MDU_Squad_Data_Raw 
User-added image

I believe, I will still be having duplicates on my initial rawData, if I go by Id's. As you said, I wish we didn't have any duplicates in the initial rawData.
Andrew GAndrew G
If that is your Raw Data ,

Why are there multiple records for "10 Dover PTSE" in "Calgary"? 

What is the difference between the records as shown?


Regards
Andrew
 
Sumesh ChandranSumesh Chandran
I left out the unit number column, it would be units of a building with On-Billing & Off-Billing data.
Sumesh ChandranSumesh Chandran
Hello Andrew, Can you also help me how to handle the "First Error:Too many DML rows:10001" error.
User-added image
Below is the final code I have. My code worked good when I had 20000 records and now I bumped it up to 50000 which I when I get the Toom many DML rows error.

Eventually this code to supposed to run on a record set of 2.5 million records.
 
global Database.QueryLocator start(Database.BatchableContext BC) {
        return Database.getQueryLocator('SELECT Street_Address__c,City_Name__c FROM MDU_Squad_Raw_Data__c');
    }   

global void execute(Database.BatchableContext BC, List<MDU_Squad_Raw_Data__c> rawData) {        
        List<String> streetList = new List<String>();
        List<String> cityList = new List<String>();

        for(MDU_Squad_Raw_Data__c mduRawData: rawData) {
            streetList.add(mduRawData.Street_Address__c);
            cityList.add(mduRawData.City_Name__c);
        } 
        
        List<MDU_Squad_Raw_Data__c> addressData = [SELECT Street_Address__c,City_Name__c,Province_Code__c,Postal_Code__c,Status__c,Account_Number__c,Telephone_Number__c,Internet_Service__c,Video_Service__c,Phone_Service__c 
                                                   FROM MDU_Squad_Raw_Data__c WHERE Street_Address__c IN :streetList AND City_Name__c IN :cityList];
        
        for(MDU_Squad_Raw_Data__c raw : addressData) {   
            String fullAddress = raw.Street_Address__c+' '+raw.City_Name__c+' '+raw.Province_Code__c+' '+raw.Postal_Code__c;
            if(!addresses.contains(fullAddress.substringBeforeLast(' '))) {
                
                Address_Master__c address = new Address_Master__c();                
                address.Name = raw.Street_Address__c;
                address.City_Name__c = raw.City_Name__c;
                address.Province_Code__c = raw.Province_Code__c;                
                address.Full_Address_Ext_Id__c = fullAddress;
                
		addresses.add(fullAddress.substringBeforeLast(' '));
                addressMaster.add(address);            
            }
        }        
        Database.Upsert(addressMaster, Address_Master__c.Fields.Full_Address_Ext_Id__c, true);
    } 

    global void finish(Database.BatchableContext BC) {
		MDUSquadCityDataSummarizer citySummarizer = new MDUSquadCityDataSummarizer();
		database.executeBatch(citySummarizer);
    } 
}

Please help!
Sumesh ChandranSumesh Chandran
Also forgot to mention,this is the way I run it:
 
MDUSquadAddressDataSummarizer c = new MDUSquadAddressDataSummarizer();
database.executeBatch(c);
On the finish method of the above class I have the below:
 
MDUSquadCityDataSummarizer c = new MDUSquadityDataSummarizer();
database.executeBatch(c);

Please help!

Thanks!​​​​​​​
 
Andrew GAndrew G
To overcome the DML limit error, there is fundamentally only 2 options -
1.  reduce the number of records returned by the initial query , or
2.  chain batch jobs together with smaller queries

The other option would be to revisit the business requirement and determine what they are actually trying to do.  

The code takes "raw data" and updates/inserts the "Address Master".
Questions for the business:
Once the Address Master exists, do we need the Raw Data record?  Can we delete it?
Other Questions
If we can't delete it, can we mark it as processed?

possible solution if we can mark the raw data as processed.
If we can mark it processed, can we then update the batch query to ignore processed records?  
If so adjust the query to have a LIMIT of 10000 and set the WHERE clause to ignore processed records
and then in processing, update the flag on the raw data record
Set the schedule to run the batch 

Regards
Andrew

 
Sumesh ChandranSumesh Chandran
Hello Andrew, I was able to fix it this way, Hope this would fix it with any number of rows. The addressMaster list variable above was accumulating all the data from all the batches and sending it to the database, which was actualy not needed. The addition of the below code worked for me.
addressMaster.clear();

 
Louis HolmesLouis Holmes
For best performance, SOQL queries must be selective, particularly for queries inside of triggers. To avoid long execution times, the system can terminate nonselective SOQL queries. Developers receive an error message when a non-selective query in a trigger executes against an object that contains more than 100,000 records. To avoid this error, ensure that the query is selective. HR CVS LOGIN (https://my-hrcvslogin.com)
Kemper SylvesterKemper Sylvester
SOQL inquiries must be specific, especially for questions within triggers. To stay away from long execution times, the framework can end nonselective SOQL inquiries. Designers get a blunder message when a non-particular inquiry in a trigger executes against an item that contains in excess of 100,000 records (https://disneyhub.club/)
ravindar shahravindar shah
To overcome these governor limits, we can write a batch apex and process the data in chunks. To write a batch apex, we have to implement the Database.Batchable interface. This interface contains three methods.

Start Method: The signature of this method is

global (Database.QueryLocator | Iterable<sobject>) start(Database.BatchableContext bc) {}
This method is used to collect the records that we want to process. This method can return either Database.QueryLocator or an Iterable. The Database.QueryLocator return type is used when we want to run a simple select statement. With this return type, the select statement can return up to 50Million records. When we want to write a custom logic (like aggregations), then we have to use the Iterable return type. With this return type, the batch can only process a maximum of 50K records.(https://upser.online/)