function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion
Lindsey Tessmer 4Lindsey Tessmer 4 

Failed apex Batch

We have a batch that fails intermittently. The error messages come back as "batch failed" with no other information. We can reschedule the batch and it will run normally until it fails again. Since it is intermittent we can't catch anything in the debug logs, we had them running for weeks at a time with no success. We have checked our release schedule and the release schedule of the integrated platform and the timing is not consistent with the failures, so we are assuming our code is not to blame. 

Has anyone else run into something like this, was there a resolution?

Example of failed batch.
DarthGarryDarthGarry
A few other details:

Customer org is EE.  There are about 5 or 6 batches running on different intervals in the org.  This one is generally supposed to run every 15 minutes.  Payload size varies but we have everything bulkified.  Things have been running smoothly since early 2016 with no issues.  Early July this year, the first failure happened.  Then we restart batches, and it'll fail silently anywhere from 24 hours to two weeks from the restart.

We reviewed the org audit logs, packages, system the batch integrates with, data quality of records being processed by the batch - everything came up clean.  Having a difficult time discerning the root cause of the failures since we get no errors beyond the status = failed.

To help resolve we're looking for any intel on if anything changed with Salesforce that would affect scheduled batches, or any tips on how we could get error logs.  We turn on debug logs, and they fill up in 15 minutes so we have not caught a failure while the "tape was running".

Much appreciated - Garry

 
Luke J FreelandLuke J Freeland
Hey Garry & Lindsey,

I haven't seen this kind of behavior but here are some suggestions / things to explore:

1) Submit a Salesforce case asking for more information about the error.

2) Add "manual" logging to the code so that it saves "log" records to a custom object or perhaps saves it to a logging service such as loggly so that you have more information after the fact without having to enable Salesforce debug logs and hoping you get more information when it may happen.

In fact, I've recently created a Logging package that lets one add "logging" code to their customizations and Apex so that things like errors and other information can be logged to a database, loggly or somewhere else if you'd like without having to enable logging. My intention is to potentially sell this as a product on the AppExchange for errors like this but I'd love to have some beta testers first. If you're interested, email me.

3) Perhaps it's a race condition? I.E. the job is still running and then 15 minutes have passed and the next job has started processing. However, since they're working on the same records, you get weird errors because you may have locked them for updating or the second batch has processed records faster than the first one. Hard to say for sure without more info.

Hope it helps,
Luke