+ Start a Discussion
Stamen Nikolov 4Stamen Nikolov 4 

Import Campaign Members

Hi all,

Having a big amount of campaign members which I`m trying to import in a timely manner, but unsuccessfully. The records are ~55M, initially i tried with my default bulk settings (parallel mode) and I got locking errors for ~30% of the records. This is key quote from one of the salesforce documentation which offers all possible workaounds for my scenario, but i can`t use any of these: "For optional lookup fields, you can avoid the locks by setting the "Clear the value of this field" option, which does more than just tell Salesforce what to do if your lookup record is deleted. When you set this option, whenever a record that has a lookup field to this lookup record is inserted or updated, Salesforce doesn't lock the lookup records; instead it only validates that the lookup values exist. To avoid unnecessary locks, it’s best to set this option for your lookup fields whenever possible. If it isn’t possible to configure your lookup fields with this option, use a workaround similar to the one you would use for master-detail relationships: Ensure that no lookup values span multiple batches or order your loads by their lookup values to minimize how often the records being inserted in multiple, concurrent batches reference a single lookup value."

For the lookups I use CampaignId and ContactId and both fields doesn`t have that option "Clear the value of this field", I assume because they`re standart fields. I can`t really play with my files to adjust, because having ~7000 campaigns, 10M contacts and 55M members is a big thing. Therefor the only working solution that I currently have is to use serial concurency mode, however the speed is really not acceptable because it will take me ~2 weeks to import all members. 

Please do let me know if you have such issue before and/or if you know an alternative. Thank you!

Best Answer chosen by Stamen Nikolov 4
Alain CabonAlain Cabon
@Stamen
  1. Your parent objects are all inserted (Campaigns and Contacts)
  2. If you use Talend, you can easily sort the file on the key used for the contacts (Id, External Id of contact).

Changing the sort order is not absurd because I found that someone solved a problem of parallel bulk inserts (lock contention) just by changing the sort order of the input file.

In parallel, if you insert thousands of contacts for the same campaign split in groups loaded at the same time, more than one threads will try to lock the same campaign record for inserting a lot of contacts during a long time (lock contention).

A CSV file sorted on the contact (external) id will perhaps have less lock contentions during the loading of the campaign members but there will be more individual locks on the campaign object (that could not be excellent for the performance either but that will be a bulk parallel treatment at least).

There is also the problem of lookup skew but there is no simple solution for that:
https://developer.salesforce.com/blogs/engineering/2013/04/managing-lookup-skew-to-avoid-record-lock-exceptions.html
 

All Answers

Alain CabonAlain Cabon
Hi,
  1. Are you using the standard dataloader?
  2. Do you have problems for the parent objects in bulk mode: campaigns and contacts?
  3. Each contact is linked to about five campaigns. 
  4. The problem is where multiple campaign member inserts for the same campaign run into record locking issues on the campaign, due to the standard rollups.
  5. What is the order of your CSV file for the campaign members?
  6. Did you try with the CSV file for the campaign members sorted on the contact Id(or external contact id field)?

Field: Contacts in Campaign = NumberOfContacts (int) = rollup summary field so Master-Details.
https://developer.salesforce.com/docs/atlas.en-us.sfFieldRef.meta/sfFieldRef/salesforce_field_reference_Campaign.htm

Master Details: On insert or delete of a detail record, the master record is always locked. On update, the master is only locked if rollup summary fields exist on the master for that child type.
Lookups: On update of children, parent records are only locked when the Lookup value on the child changes. On insert or update, parent records are only locked "if [the] lookup relationship is not configured to clear the value of this field if the lookup record is deleted," which is the default option (clearing the value, that is).
Stamen Nikolov 4Stamen Nikolov 4

@Alain,
thanks for the reply. On your questions:

1. I`m using Talend ETL Tool
2. Yes I do have problems with the parents - I believe the locking error issues I was getting was only from Campaign object. For Contacts the only errorI inittialy encountered was that I can`t update ContactId (I was using UPSERT). I found that ContactId is not actually editable, hence the error, so i changed the job to INSERT. 
3. The file is not really sorted by anything and I haven`t tested, because it`a huge file, I`ll have to do that maybe and try to see what I can get out of it. And no - I have not tried with the CSV file for the campaign members sorted on the contact Id(or external contact id field). I will spend some time testing this and reply back.

 

Alain CabonAlain Cabon
@Stamen
  1. Your parent objects are all inserted (Campaigns and Contacts)
  2. If you use Talend, you can easily sort the file on the key used for the contacts (Id, External Id of contact).

Changing the sort order is not absurd because I found that someone solved a problem of parallel bulk inserts (lock contention) just by changing the sort order of the input file.

In parallel, if you insert thousands of contacts for the same campaign split in groups loaded at the same time, more than one threads will try to lock the same campaign record for inserting a lot of contacts during a long time (lock contention).

A CSV file sorted on the contact (external) id will perhaps have less lock contentions during the loading of the campaign members but there will be more individual locks on the campaign object (that could not be excellent for the performance either but that will be a bulk parallel treatment at least).

There is also the problem of lookup skew but there is no simple solution for that:
https://developer.salesforce.com/blogs/engineering/2013/04/managing-lookup-skew-to-avoid-record-lock-exceptions.html
 
This was selected as the best answer
Stamen Nikolov 4Stamen Nikolov 4
@Alain,

thanks for the details. I spent some more time reading everything that you shared and some other stuff as well, I tested some possibilities, however I couldn`t make it run without locks. I was able to reduce the number of locks when i sorted the csv file by Customer external id, however i was still getting some errors, and therefor i don`t think this will work for me. I tried sorting by CampaignId as well, but i was getting pretty big numbers of locks. Soo, seems like I`ll have to live with the current solution with the serial mode and the expected 1 week for import. Thank you very much for your answers, I`ve learned something new here. Hope this would be of help to the others as well. 
Alain CabonAlain Cabon
@Stamen  Thanks very much for your very valuable feedback about your tries.

By definition, il is very difficult (quite impossible) to reproduce the problems on very large volumes of data.

I have used a lot some ETL tools (Informatica, Talend and Pentaho) but never with such volumes (dozens of million) that is why your question is very interesting (problem for architect or data expert).

There is a last alternative by coding a solution in native java or C# for a better control of the lock contentions with the dilemma of spending "some" days writing multi-threading code when it is for a "one shot initialization" and without the certainty of being better than Talend at the end (lost of time finally, failed POC).

Changing the sort order is not enough and there is this lookup skew in ambush (the "silent killer") without clear solution excepted by reducing the scope (a basic change always done at first like the sort order but without being sufficient).

The same question on other forums could be interesting for new feedbacks
(sfdxfox  https://salesforce.stackexchange.com/users/2984/sfdcfox  answers very pointed questions)

Best regards
Alain
Stamen Nikolov 4Stamen Nikolov 4
Hi again,

thanks for you engagement with the topic, it was very interesting to me as well to get some ouptputs and I`m sure this will be of help to someone else in future, so it contains additional added value. The alternative that you`re talking about mught be worth if the number are really huge - let`s say hundreds of millions above, but in my case it is not really thaaat bad - I mean 54M records with serial mode will probably finish in 5 days, so it`s not a big deal and in my case it is just a historical data and there`s no rush to use it.