The Problem (and my solution at the bottom)
I was having a vexing problem with a set of workflows running on a custom SharePoint list. I would occasionally get a SharePoint workflow error without any consistent reproducible state. Some of the workflows were created by me, and some where there when I took on the role of maintaining the SharePoint site.
The Symptoms I Could Observe
I was not a site administrator, but I had the ability to create custom lists, workflows, and pages. But I couldn’t look at any server logs or anything.
The workflows only occasionally failed. Sometimes (most of the time actually), everything worked great.
Some of the workflows were set to run when a new record was created, some when a record changed, and some when either a record changed or was created.
Some of the workflows just sent alert emails. Some changed values in certain columns based on conditions set in the workflow.
Running the workflows manually always worked and was the way to get the records, emails, views and alerts all caught up if needed.
Does this sound familiar to your problem?
My “Aha” Moment
My aha moment came while manually running some of the workflows again to ‘fix up’ the state of the list. That realization was that these workflows are not kicked off in any order I’m controlling. Not even in any sequence. I figured the workflows that were modifying some of the column values in the records were trying to run at the same time, and one or more of the workflows was failing.
My “Fix”
Assuming my thinking was correct, I created two main workflows. One was for new records, and one was for changed records. All of the other workflows were disabled. Each task that needed to be done was put into a distinct Step within one (or both) of the workflows as needed. That way, only one workflow would run on any given change or record creation. Since then, none of the problems has returned and none of the individual workflows has needed to be run.
There you go.