+ Start a Discussion
PNSPNS 

Downloading a Salesforce database via the Partner API

 Hi...

 

I am playing with a test Salesforce database and trying to figure out how to download its contents (i.e., do a full database "dump") into my machines memory, in SObject objects (returned from QueryResult.getRecords()).

 

I am using the Partner API and so far only access the standard Salesforce objects. As far as I can tell, there are 3 categories of those:

 

 

  • Fully queryable (most of them, e.g. Account, Opportunity, Contract, User etc.).
  • Partially queryable, i.e. queryable but restricted by implementation (Vote, UserProfileFeed)
  • Non-queryable (FeedPost, AggregateResult, Name, NoteAndAttachment, ActivityHistory, FeedComment, OpenActivity, EmailStatus, ProcessInstanceHistory)

 

If, in a database with information only stored in standard objects, I download, with appropriate queries, all fully queryable objects, plus somehow (join-like) the remaining few ones (partially- and non-queryable), would that be a the equivalent of the "full dump" of the database, or would I be leaving data behind?

 

Also, how exactly should I go about downloading the partially- and non-queryable objects?

 

Thanks!

 

Best Answer chosen by Admin (Salesforce Developers) 
Rick.BanisterRick.Banister

If all you want to do is pull data into flat files, use the free Salesforce.com Data Loader. But assuming that "into the machine's memory" means using a database, unless you have 3 to 6 months to write a first-draft replication program and want to make that a full-time job, or are out to invent yet another of many replication products, we suggest you license one of the existing solutions on the AppExchange, such as our company's Relational Junction for Salesforce. Development of such a program involves

  • Dynamically checking for all the metadata (objects, fields, field types and attributes)
  • Bringing the DBMS into alignment with the Salesforce schema via adding or altering columns
  • Looping through all objects desired, determine the last time replication was completed, and query all records since then. A cautionary note: using the greatest timestamp retrieved is not a safe strategy; our recent discovery is that records can actually be changed during the retrieval, and you might miss records because you assumed there were no records updated between the start of the query and the greatest timestamp retrieved. It can really happen. Using the end time of the query also will miss records. We've sorted this out, but no one else has.
  • Handle deleted records. Unless you truncate your tables daily, you'll need to handle the delete behavior, which varies by object. Some objects are not recycled, and are removed from Salesforce with no audit trail at all. We have techniques to detect this without truncating or rebuilding the entire table.
  • Make sure everything has retry-restart logic or your pager will be going off all ours of the day and night. Long-running queries are not stable and have a way of losing salesforce connections.
  • Handle your largest object's record count for the initial download. Salesforce will timeout if you request more than a million records in one API call. We have a patent-pending process for handling 10's of millions of records in one pass, and have Salesforce.com's largest customers using our product when no one else's would work. 
  • Error logging and notification

Contact me directly or visit our web site if you're interested in knowing more.

All Answers

Rick.BanisterRick.Banister

If all you want to do is pull data into flat files, use the free Salesforce.com Data Loader. But assuming that "into the machine's memory" means using a database, unless you have 3 to 6 months to write a first-draft replication program and want to make that a full-time job, or are out to invent yet another of many replication products, we suggest you license one of the existing solutions on the AppExchange, such as our company's Relational Junction for Salesforce. Development of such a program involves

  • Dynamically checking for all the metadata (objects, fields, field types and attributes)
  • Bringing the DBMS into alignment with the Salesforce schema via adding or altering columns
  • Looping through all objects desired, determine the last time replication was completed, and query all records since then. A cautionary note: using the greatest timestamp retrieved is not a safe strategy; our recent discovery is that records can actually be changed during the retrieval, and you might miss records because you assumed there were no records updated between the start of the query and the greatest timestamp retrieved. It can really happen. Using the end time of the query also will miss records. We've sorted this out, but no one else has.
  • Handle deleted records. Unless you truncate your tables daily, you'll need to handle the delete behavior, which varies by object. Some objects are not recycled, and are removed from Salesforce with no audit trail at all. We have techniques to detect this without truncating or rebuilding the entire table.
  • Make sure everything has retry-restart logic or your pager will be going off all ours of the day and night. Long-running queries are not stable and have a way of losing salesforce connections.
  • Handle your largest object's record count for the initial download. Salesforce will timeout if you request more than a million records in one API call. We have a patent-pending process for handling 10's of millions of records in one pass, and have Salesforce.com's largest customers using our product when no one else's would work. 
  • Error logging and notification

Contact me directly or visit our web site if you're interested in knowing more.

This was selected as the best answer
PNSPNS

Hmmm... That's what I suspected about the complexity of the task, after playing around with the API. Even 3-6 months seems optimistic, as far as I can tell at the moment, so hopefully I won't have to do it. :-)

 

I have already seen your product and it does seem very interesting. The Salesforce Data Loader seems much more restricted (like just a query GUI), compared to it.

 

Thanks!

gsmithfarmergsmithfarmer

Try http://code.google.com/p/sqlforce/wiki/CopyForce

 

Free and open source. We use it for this task every day.

 

On the same site is a java jar that wraps the partner API.

PNSPNS

Thanks for the hint.

 

JavaForce is cool, but it seems to just download queryable tables. So if, for instance, I want to also get attachments, those stored in Note and Attachment tables will be downloaded, but those submitted via Chatter will not (since they are only retrievable indirectly from FeedPost, which is not queryable).