function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion
Alex.AcostaAlex.Acosta 

Query Timeouts due to Formula Field

Hi there,

 

I've been developing in Salesforce for a few months now, and long story short... I've been using Salesforce's SOAP API since Version 15, and currently using Version 20 with the Partner WSDL. Within the last few weeks I've noticed an issue with how Salesforce changed their back-end structure when accessing data from record that contains formula fields.  I've tried talking to salesforce development support team with no results other than extending the timeout per Salesforce Org.

 

This "bug" is happening when dealing with an object that has over 2 million records, and pulling back the oldest record within that object with all of its fields. But, if I exclude the formula field, the query is always successful, and I'm able to retrieve at least 100k records using the queryAll, and queryMore webService. So my question is how else can I get all of my data, without having knowledge of what the date is of my oldest record.

 

Example of query:

SELECT * FROM <objectName> WHERE LastModifiedDate >= '1993-02-23T00:00:00Z' ORDER BY LastModifiedDate LIMIT 1

 

In theory I would like to get this working with queryAll, so I could query more than 1 record, maybe 100k records and using the queryMore webService call to get data in smaller sets.

 

Please help.

jhurstjhurst

Alex,

 

Because of the multitenant nature of the system, there are some limits that queries have.  When you are running queries with complex formula fields across large data sets, the query may not perform well enough to return the data.  There are a couple of things that you can do to help this alopng:

 

1. Work on reducing the data set through use of indexed fields.  When the filter set is using an indexed field, the results are able to return faster, which will help you out.  The rough threshold for the index is 10% of the entire table size (including soft deleted record).  So if your query filters match 10% or less of the total table size, the index will be used and the result set will be returned faster.  Unfortunately, with your example, the filters match your entire data set, so an index cannot be used and the system has to look through the entire table.

 

2. Depending on your formula, you can apply the formula logic on your side rather than having salesforce.com calculate the formula.  Since formula results are not stored in the database, the system calculates the formula result at runtime.  If the formula is complex, this can delay the query.

 

For #1, there may be an option to work around a custom index.  This would be a bit more work, but if you run the query often, it may be worth it.  The idea would be to create a field to logically chunk your data, and then add an index to that field.  The index can be added by salesforce.com support, or you can add it yourself be making the field an external ID.  

 

So, let's say you have 2MM records.  You can chunk the data into groups of 50K.  So you query the first 50K record sorted by your criteria LastModifiedDate.  You then set the external ID field of the "oldest" records to be something like "Group 1".  The next 50K with "Group 2" and so on.  You can then use your query with filters like:

 

SELECT * FROM <objectName> WHERE External_ID__c in ('Group 1', 'Group 2')

 

This will then use the index and the result set should be returned quickly.

 

Hope this helps.

 

Jay

Alex.AcostaAlex.Acosta

Hey Jay,

 

First of all, thanks for the quick reply, it is greatly appreciated and I know I'll be using this knowledge in the near future.

 

The problem I'm currently having is that our software provides users with a local database copy of their Salesforce Org, to be able to run reporting software, among other things. Thus we do not have the flexability of creating these custom fields on current clients and prospects' Orgs. Sadly, these queries did work before on large objects in the past. The only thing I could think of which is what I want to avoid due to the vast waste of API calls is query within a time frame.... ie: month to month? But then again we have clients in the past who could easily hit over 1 million records within a 2 week span, which could still have the same issue.

jhurstjhurst

Alex,

 

It is possible that the queries have been slowly reaching the query time limit (default of 2 minutes) as the table has grown.  There has been no change recently to formula fields and how they are generated.

 

If you are doing this as part of a running process, you should look into using the replication API (getUpdated() and getDeleted()) calls which bring back records updated or deleted within a certain timeframe.  The replication API is meant to be run regularly to keep your client in sync with salesforce.com.

 

Other than that, the option is to scope the requests down on an indexed field (like you are doing with the lastModifiedDate).

 

Jay

Alex.AcostaAlex.Acosta

Jay,

 

I have seen that with the recent new API release, and I am planning on using the getUpdated() and getDeleted() methods, but I do know they have be within 30 days restriction. But clients tend to move the software and sometimes not copy the schemas over, or just having prospects wanting to try the software would have to still generate the tables from scratch, thus I'm still stuck with using the lastModifiedDate. It just looks like the majority of the API calls are going to have to be spent during the initial creation of these local tables.

 

Again, I thank you greatly for all your help.

*werewolf**werewolf*

Just a thought: have you considered using getUpdated and getDeleted rather than querying everything?  It seems like you're running up against some fundamental limitations here, but perhaps with some clever workarounds you can get what you need (which sounds like it's to sync all the data in a given object to your local db).