+ Start a Discussion
MilkovicMilkovic 

query speed

I wrote a loop that query's through every account record I have (150,000 total), and it runs through at about 10 records per second (BatchSize set at 500). Which would take around 4 hours.

My company also uses DataJunction, and when we use it to download all accounts it runs in 20 minutes.

Is their a good reason DJ can run that much faster compared to using the API? I can provide my code if needed.

Thanks

DevAngelDevAngel

Hi WhizA,

I would bump up the batch size to the max (2000).  DJ also might be using compression which would provide significant gains on bandwidth utilization.

 

MilkovicMilkovic

Thanks Dave,

I cannot find any examples for using compression in java. I assume I just need to add the header "Accepting-Encoding: gzip", but I am unsure how to do so. Any ideas?

DevAngelDevAngel

Hi WhizA,

Actually, that's just the tip of the iceburg.  The header is required to let the web service know that you can handle compressed responses.  The tricky part is that the Axis libraries do not have compression/decompression built into the http implementation.  You will have to do this yourself.  Compression is a bonafide part of the http specification, but it seems that almost all SOAP stacks do not implement it.

Here is a link to an article that describes implementing compression in Axis: http://www-106.ibm.com/developerworks/webservices/library/ws-sqzsoap.html I have not tried it though.

 

MilkovicMilkovic

Thanks Dave, that was a great article. I was able to get compression working and it runs about 1/3 as fast, though not as fast as DataJunction

Also, I get java.lang.OutOfMemoryError when I setBatchSize to anything over 500. Is their anyway to extend my memory, or is this a limitation of my PC?

adamgadamg

Try upping the heap size by using this command line switch for Java: -Xmx128m  (where 128 represents the heap size is megs; adjust to taste.)

Google found this tutorial for me that may be useful: http://tutorials.findtutorials.com/read/category/82/id/216/p/1

I'd also be interested in looking at your code to see if we can optomize it; I imagine many developers will have similar questions.

 

 

DevAngelDevAngel

Hi WhizA,

That's great that you got the compressison working.  Would you mind posting the steps and maybe key code snippets at some point?

As far as the out of memory error, I'm not sure what causes that or what to do about it as I'm a very neophyte Java developer.  I will ask around ( there are a lot of very good java developers here ) and see if I can find a solution.  Give me a couple of days on this.

MilkovicMilkovic

Thanks adamg, I had to set it to 200MB but I was able to setBatchSize to 2000 without error.
Here's my function to download a salesforce entity. I stripped it to only query through the records, and not add them to the database. Let me know if you can optimize it at all. DataJunction downloads around 113 records per second, while this code browses through the data at around 41 records per second (with compression).

//Java Code using the Partner API
public boolean download(String sfdc_table, String db_table) { 
  //Verify that we are already authenticated, if not
  //call the login function to do so
  if (!loggedIn)
   if (!login()) 
    return false;
  
  Date now = new Date();
  System.out.println("Download Start: " + now);
  System.currentTimeMillis();

  try
  {
   DescribeSObjectResult describeSObjectResult = binding.describeSObject(sfdc_table);
   Field[] acctFields = describeSObjectResult.getFields();

   String strFieldList = "";
   for (int i = 0;i<acctFields.length;i++) {
    strFieldList = strFieldList + acctFields[i].getName() + ", ";
   } 
 
   //Create a query statement to pull all fields from sfdc
   String strQry = "SELECT " + strFieldList.substring(0,strFieldList.length() - 2) + " FROM " + sfdc_table;
   
   //Get data from sfdc
   QueryResult qr = null;
   _QueryOptions qo = new _QueryOptions();
   qo.setBatchSize(new Integer(2000));
   binding.setHeader("SoapService", "QueryOptions", qo);
   qr = binding.query(strQry); 
   
   if (qr.getSize() != 0)
   {           
    
    long dlCount = 0;
    while (true)
    {
    /*
     * Code to batch insert 500 records into db    
     */
     dlCount += qo.getBatchSize().longValue();
     now = new Date();
     System.out.println(dlCount + " downloaded - " + now);
     if (qr.isDone())
      break;
     else
      qr = binding.queryMore(qr.getQueryLocator());               
    }
   }
   now = new Date();
   System.out.println("Download End: " + now);
   return true;
  }
  catch (RemoteException e) {
   System.out.println("Error (RemoteException): " + e);
   return false;  
  }  
} // end download

Message Edited by WhiZa on 01-14-2004 02:33 PM

MilkovicMilkovic

This is the way I did it, though there's probably an better way ( I also have some neophyte in me   ) . All we need to do is add 10 lines of code to the HTTPSender class.

->Download the apache axis source files http://ws.apache.org/axis/download.cgi

->Download Apache Ant http://ant.apache.org/srcdownload.cgi

->Go into HTTPSender.java and search in the file for the correct placement to insert the code:
 

//Start Add  
import java.util.zip.*;
//End Add  



//process user defined headers for information.
Hashtable userHeaderTable =
(Hashtable) msgContext.getProperty(HTTPConstants.REQUEST_HEADERS);

//Start Add               
//Adding Accept-Encoding header to the hashtable.
if(userHeaderTable == null){userHeaderTable = new Hashtable();} userHeaderTable.put("Accept-Encoding", "gzip");
//End Add



if (null != transferEncoding && transferEncoding.trim().equals(HTTPConstants.HEADER_TRANSFER_ENCODING_CHUNKED))
{ inp = new ChunkedInputStream(inp); }

//Start Add
//Check the content encoding. If it is gzip then
//wrapper the input stream as a GZIPInputStream
String zip = (String)headers.get("content-encoding");
InputStream is = null;

if(zip != null && zip.indexOf("gzip") != -1) {
   GZIPInputStream zipIn = new GZIPInputStream(inp);
   is = zipIn;
} else {
   is = inp;
}
//End Add

//Convert inp -> is
outMsg = new Message( new SocketInputStream(is, sock), false, contentType, contentLocation);


->Recompile axis code using ant

->Replace old HTTPSender.class file with new compiled version

Message Edited by WhiZa on 01-14-2004 03:01 PM

kbkb
when i do a describe global, i get a maxBatchSize of 500. is this wrong? is the real value 2000, as you mention?
DevAngelDevAngel

Hi kb,

Batch size refers to creates, updates and deletes - not queries.