Identify SQL Server agent jobs that do not run all steps

Problem: Sometimes SQL Server agent jobs do not work as expected due to either syntax or logical errors. In this blog post, I attempt to show you how to identify SQL Server agent jobs that do not provide the desired outcome as a result of logic errors, that is, jobs in which some steps ‘on success‘ or ‘on failure‘ conditions completely omit running others.

Solution: Below is a script attempting to accomplish this goal. I’ve searched extensively and couldn’t find anything to solve this need . Plus, I didn’t see any Microsoft provided procedures/functions allowing one to check a job for conditional flow continuity either. Yes, SQL Server Management Studio (SSMS) does this check upon job save, but the code is buried in their executable.

This script is not pretty, but it does get the job done. By all means, if you have an easier way to accomplish this task, hit me up in the comments.

To go about this task, I first identify all multi-step jobs and cycle through them step-by-step to determine their ‘on_success_action‘ and ‘on_fail_action‘ while saving their conditional next steps in a temporary table. In the end, I have a list of unique job steps traversed during normal job flow. Using this list of unique steps, I can then compare this to the number of actual steps in the job, and if they do not match, then we know at least one step has been missed during the execution.

/* 
Author: Brent Sodtke
Created: 2022-10
Description: Shows all SQL Agent jobs that contain steps that will not be executed (i.e., skipped during regular execution of the job).
----------
License: CC BY-SA (Creative Commons Share-alike)
You are free to:
Share - copy and redistribute the material in any medium or format.
Adapt - remix, transform, and build upon the material for any purpose, even commercially.

Under the following terms:
Attribution - You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
ShareAlike - If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED
*/
SET NOCOUNT ON
GO

USE msdb;
GO

IF OBJECT_ID(N'tempdb.dbo.#multistepjobs') IS NOT NULL
DROP TABLE #multistepjobs;
CREATE TABLE #multistepjobs
(
id INT IDENTITY (1,1)
	, job_id UNIQUEIDENTIFIER NULL
	, job_name SYSNAME NULL
	, num_job_steps INT NULL
	, start_step_id INT NULL
	, flagged TINYINT NULL
);
-- only concerned with multi-step jobs.
INSERT INTO #multistepjobs (job_id, job_name, num_job_steps, start_step_id)
SELECT 
	DISTINCT(sjs.job_id)
	, sj.[name] job_name
	, COUNT(1)
	, sj.start_step_id
FROM msdb.dbo.sysjobsteps sjs
JOIN msdb.dbo.sysjobs sj
ON sjs.job_id = sj.job_id
GROUP BY 
	sjs.job_id
	, sj.[name]
	, sj.start_step_id
HAVING COUNT(sjs.job_id) > 1
ORDER BY sj.[name]

DECLARE @jobid UNIQUEIDENTIFIER;
DECLARE @stepid INT;
DECLARE @numsteps INT;
DECLARE @on_success_action TINYINT;
DECLARE @on_fail_action TINYINT;
DECLARE @on_success_stepid INT;
DECLARE @on_fail_stepid INT;
DECLARE @loopcounter INT;

IF OBJECT_ID(N'tempdb.dbo.#jobflow') IS NOT NULL DROP TABLE #jobflow;
CREATE TABLE #jobflow
(
	job_id UNIQUEIDENTIFIER NULL
	, in_step_id INT NULL
	, out_step_id INT NULL
);

WHILE (SELECT COUNT(1) FROM #multistepjobs WHERE flagged IS NULL) >= 1
-- we are going to loop through all jobs determined to have multiple steps
BEGIN
	SET @jobid = (SELECT job_id FROM #multistepjobs WHERE id = (SELECT min(id) FROM #multistepjobs WHERE flagged IS NULL));
	SET @stepid = (SELECT start_step_id FROM #multistepjobs WHERE job_id = @jobid);
	SET @numsteps = (SELECT num_job_steps FROM #multistepjobs WHERE job_id = @jobid);
	SET @loopcounter = @stepid;

	-- first let's insert the starting step to the job flow table
	INSERT INTO #jobflow (job_id, out_step_id) SELECT @jobid, @loopcounter

	WHILE (@loopcounter <= @numsteps)
	-- this inner loop cycles through each step within the job
	BEGIN
		SET @on_success_action = (SELECT on_success_action FROM msdb.dbo.sysjobsteps WHERE job_id = @jobid AND step_id = @loopcounter);
		SET @on_success_stepid = (SELECT on_success_step_id FROM msdb.dbo.sysjobsteps WHERE job_id = @jobid AND step_id = @loopcounter)
		SET @on_fail_action = (SELECT on_fail_action FROM msdb.dbo.sysjobsteps WHERE job_id = @jobid AND step_id = @loopcounter)
		SET @on_fail_stepid = (SELECT on_fail_step_id FROM msdb.dbo.sysjobsteps WHERE job_id = @jobid AND step_id = @loopcounter)

		-- we are only concerned with actions 3 and 4 since those transfer control to another step within the job logic
		IF @on_success_action = 3 -- i.e., go to next step
		BEGIN
			INSERT INTO #jobflow (job_id, in_step_id, out_step_id) SELECT @jobid, @loopcounter, @loopcounter + 1;
		END
		ELSE
		BEGIN
			IF @on_success_action = 4 -- i.e., go to step number
			BEGIN
				INSERT INTO #jobflow (job_id, in_step_id, out_step_id) SELECT @jobid, @loopcounter, @on_success_stepid;
			END
		END

		-- using the same logic as working with successful steps, we now work with what happens on step failure.
		IF @on_fail_action = 3 -- i.e., go to next step
		BEGIN
			-- note: there is one special case whereby SSMS allows the last step of a job to have a 'go to next step' condition, this must be ignored.
			IF @loopcounter <> @numsteps -- that is, if we are NOT working with the last job step
			BEGIN
				INSERT INTO #jobflow (job_id, in_step_id, out_step_id) SELECT @jobid, @loopcounter, @loopcounter + 1;
			END
		END
		ELSE
		BEGIN
			IF @on_fail_action = 4 -- i.e., go to step number
			BEGIN
				INSERT INTO #jobflow (job_id, in_step_id, out_step_id) SELECT @jobid, @loopcounter, @on_fail_stepid
			END
		END

		SET @loopcounter += 1; -- value of loopcounter should match that of the job step
	END

	UPDATE #multistepjobs SET flagged = 1 WHERE job_id = @jobid;
	SET @jobid = (SELECT job_id FROM #multistepjobs WHERE id = (SELECT min(id) FROM #multistepjobs WHERE flagged IS NULL));
END

-- display results --
-- we return all jobs by id and name along with the number of unique steps visited during the job conditional flow and the total number of steps for that job.
;WITH compareCTE AS
(
SELECT DISTINCT(jf.job_id)
	, msj.job_name
	, COUNT(DISTINCT(jf.out_step_id)) AS unique_steps_visited
	, msj.num_job_steps
FROM #jobflow jf
JOIN #multistepjobs msj
ON jf.job_id = msj.job_id
GROUP BY jf.job_id, msj.job_name, msj.num_job_steps
)
-- all jobs whereby the unique steps visited does not match the number of jobs steps clearly indicate that some steps are skipped during execution.
-- skipping steps may be intentional, but could also be a accidental mistake during job creation/modification.
SELECT * FROM compareCTE WHERE unique_steps_visited <> num_job_steps ORDER BY job_name;

Figure 1 – Sample output showing all jobs that have at least one job step that is not executed during normal job logic flow.

Additional Notes:

  • SQL Server Management Studio (SSMS) does warn the user that a job step cannot be reached in the current job flow upon saving and modifying a job, but this can easily be missed. Sometimes DBAs do get distracted and can just click through the warning prompt.
  • Perhaps the job was created by a PowerShell script, in which case, the job logic might not have been checked extensively.
  • One thing when creating a job within SSMS is the ability (albeit I do not understand why) to allow the last job step to go to the next step on failure. This doesn’t make any sense if there are no other steps that follow. Is this an oversight in the SSMS UI? I would think SSMS should default to ‘quit the job reporting failure’ in this case.

Did you find this useful? Did it help you solve an immediate need within your SQL Server agent estate? If so, you might want to check out SQL Agent Insight, a product I’ve been working on during spare time to assist in determining problem areas of SQL Server agent instances.