First off, thank you for all of your work putting this tool together! quite handy!
As I’ve put it on a couple servers, I’ve started noticing that the failed job check -1 misses job failures.
The check looks as msdb…sysjobhistory based off the check’s {LAST_CHECK_DATE}.
the function msdb.dbo.agent_datetime(run_date, run_time) is the time that the job starts executing.
if the failed job check runs once a minute, it will only alert for a job that has started after the {LAST_CHECK_DATE}, or in another words, it will only alert if a job ran for less than 1 minute and failed. If a job runs for an hour and then fails, the check will not catch it.
My suggestion is to update the check and report to add the run_duration to the job start time and then compare that calculated column to {LAST_CHECK_DATE} instead.
for example:
select @output=count(*) from msdb.dbo.sysjobhistory
where DATEADD(second, DATEDIFF(second, GETDATE(), GETUTCDATE()), --convert to UTC
dateadd
(ss, --add seconds for run_duration
(CAST(SUBSTRING((right(‘0000000’ + convert(varchar(7), run_duration), 7)),1,3) AS INT) * 60 * 60
+ CAST(SUBSTRING((right(‘0000000’ + convert(varchar(7), run_duration), 7)),4,2) AS INT) * 60
+ CAST(SUBSTRING((right(‘0000000’ + convert(varchar(7), run_duration), 7)),6,2) AS INT)
)
,msdb.dbo.agent_datetime(run_date, run_time)) --add seconds to job start date/time
) >= dateadd(second,-10,’{LAST_CHECK_DATE}’) and run_status = 0 and step_id <> 0