function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion
WardsterWardster 

Performance Issues with Inserting Data with Command Line Data Loader Interface

Lately I've been scripting alot of my data exports with Data Loader using the CLI since I often have to do them repeatedly.  I've been please that I can pull 2-3 million records and hour out of Saleforce this way.

 

However, the same cannot be said for inserting data.  I tried this last night and the performance is appalling.  Like 3000-4000 records an hour -- maybe even less.  I think it is so slow because of all the logging that is going on.  I'm seeing log4j/commons.logging messages for all the BeanUtil classes which I frankly don't need.  Is there anyway to turn all that junk off.  That is bound to speed things up at least a little.  I"m getting 30-40K records an hour when I just use the GUI.  It looks like like the sfdc.debugMessages parameter just effects the actual soap calls (which I don't need to see either).

 

I know about the new Bulk API but I have a unique indexes on my object (lead) and the Bulk API is running into all kinds of nasty contention issues.   I've yet to succeed loading a single record successfully with the Bulk API on my Lead object.  Granted there are a bunch of triggers and code hanging off this entity as well.  But I'm not running into the same issues when I just use the standard API.  Enabling Serial mode seems to have even worse performance than the good 'ol fashioned web services API.  That doesn't seem to be helping me.

 

I'm sure I could unzip the dataloader.jar and update the log4j.config properties.  I'm just hoping to find another setting.

 

 

 

 

 

 

Best Answer chosen by Admin (Salesforce Developers) 
WardsterWardster

I finally solved my logging issues with inserting data with the DataLoader CLI.  There isn't a log4j config file in the jar files actually it is the Data Loader install/conf directory.  I had to do two thing to get the logging to a more reasonable level:

 

1.  First I edited the default log-config.xml.  I added a category to filter all the BeanUtil messages I was getting from the org.apache.commons:

 

    <category name="org.apache.commons">
        <priority value="warn" />
        <appender-ref ref="fileAppender" />
        <appender-ref ref="STDOUT" />
    </category>

 

Here is the complete log-config.xml file:

 

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration>
    <appender name="fileAppender" class="org.apache.log4j.RollingFileAppender">
        <param name="File"   value="${java.io.tmpdir}/sdl.log" />
        <param name="Append" value="true" />
        <param name="MaxFileSize" value="100KB" />
        <param name="MaxBackupIndex" value="1" />
        <layout class="org.apache.log4j.PatternLayout">
            <param name="ConversionPattern"
                   value="%d %-5p [%t] %C{2} %M (%F:%L) - %m%n"/>
        </layout>       
    </appender>
   
   
    <appender name="STDOUT" class="org.apache.log4j.ConsoleAppender">
        <layout class="org.apache.log4j.PatternLayout">
            <param name="ConversionPattern"
                   value="%d %-5p [%t] %C{2} %M (%F:%L) - %m%n"/>
        </layout>       
    </appender>
   
   
    <category name="org.apache.log4j.xml">
        <priority value="warn" />
        <appender-ref ref="fileAppender" />
        <appender-ref ref="STDOUT" />
    </category>
   
    <category name="org.apache.commons">
        <priority value="warn" />
        <appender-ref ref="fileAppender" />
        <appender-ref ref="STDOUT" />
    </category>
   
     
    <root>
        <priority value="info" />
        <appender-ref ref="fileAppender" />
        <appender-ref ref="STDOUT" />
    </root>
</log4j:configuration>

 

2.  I had copied the "dataloader.bat" file to another directory so my config directory was no longer pointing to the C:\Program Files\Salesforce.com\Apex Data Loader 17.0\conf folder which contains the default salesforce.com log config file.  Once I copied my updated log-config.xml to my new config directory it got picked up in the classpath.  The dataloader.bat takes the config directory as a command line argument:  -Dsalesforce.config.dir=%1

 

That's it.  Now I'm off to the races.

All Answers

Cool_DevloperCool_Devloper

Hi,

 

There is setting which you can do in the "process-conf.xml" file in order to turn-off the logging which takes additional time.

CLI parameters: https://na3.salesforce.com/help/doc/en/loader_params.htm

Cool_D

WardsterWardster

As far as I can see the only parameter related to log messages is the sfdc.debugMessages parameter.  But at least in the documentation this only applies to turning the SOAP messages off an on.  It doesn't seem to effect the rest of the logging going on, which I think is the real issue here.   I'm seeing huge amounts of log messages from the Jakarta Commons BeanUtil class as each record is read from the file and converted to java objects.

 

Maybe I'll try a previous version of the dataloader.jar and see if it has the same problem.

Cool_DevloperCool_Devloper

Well, i agree with you. There does'nt seem to be any other setting to reduce the logging!!

Maybe you can go ahead and unzip the jar file and make some modifications:(

Cool_D 

SuperfellSuperfell
You shouldn't even need to unzip the jar, just put the modified config file at the front of the classpath, and it'll get picked up in preference to the one in the jar.
WardsterWardster

I finally solved my logging issues with inserting data with the DataLoader CLI.  There isn't a log4j config file in the jar files actually it is the Data Loader install/conf directory.  I had to do two thing to get the logging to a more reasonable level:

 

1.  First I edited the default log-config.xml.  I added a category to filter all the BeanUtil messages I was getting from the org.apache.commons:

 

    <category name="org.apache.commons">
        <priority value="warn" />
        <appender-ref ref="fileAppender" />
        <appender-ref ref="STDOUT" />
    </category>

 

Here is the complete log-config.xml file:

 

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration>
    <appender name="fileAppender" class="org.apache.log4j.RollingFileAppender">
        <param name="File"   value="${java.io.tmpdir}/sdl.log" />
        <param name="Append" value="true" />
        <param name="MaxFileSize" value="100KB" />
        <param name="MaxBackupIndex" value="1" />
        <layout class="org.apache.log4j.PatternLayout">
            <param name="ConversionPattern"
                   value="%d %-5p [%t] %C{2} %M (%F:%L) - %m%n"/>
        </layout>       
    </appender>
   
   
    <appender name="STDOUT" class="org.apache.log4j.ConsoleAppender">
        <layout class="org.apache.log4j.PatternLayout">
            <param name="ConversionPattern"
                   value="%d %-5p [%t] %C{2} %M (%F:%L) - %m%n"/>
        </layout>       
    </appender>
   
   
    <category name="org.apache.log4j.xml">
        <priority value="warn" />
        <appender-ref ref="fileAppender" />
        <appender-ref ref="STDOUT" />
    </category>
   
    <category name="org.apache.commons">
        <priority value="warn" />
        <appender-ref ref="fileAppender" />
        <appender-ref ref="STDOUT" />
    </category>
   
     
    <root>
        <priority value="info" />
        <appender-ref ref="fileAppender" />
        <appender-ref ref="STDOUT" />
    </root>
</log4j:configuration>

 

2.  I had copied the "dataloader.bat" file to another directory so my config directory was no longer pointing to the C:\Program Files\Salesforce.com\Apex Data Loader 17.0\conf folder which contains the default salesforce.com log config file.  Once I copied my updated log-config.xml to my new config directory it got picked up in the classpath.  The dataloader.bat takes the config directory as a command line argument:  -Dsalesforce.config.dir=%1

 

That's it.  Now I'm off to the races.

This was selected as the best answer
venkatMvenkatM

Hi Wardstar I also intend to do a load of 2.2 million records daily.. and it is taking too much of time. Can you please explain me the second step of your solution as first one is clear and easy. I am using dataloader 18.0 CLI process method to insert the data.

 

Your help will be greatly appreciated!!