SQLWATCH-INTERNAL-CHECKS job failed every 5 minutes

I installed SQLWATCH 3.0 on three SQL database servers: two servers have SQL 2016 and one server has SQL 2012. The installation created 12 SQL Agent jobs. However, the job “SQLWATCH-INTERNAL-CHECKS” whose default schedule runs every one minute failed on all three servers every 5 minutes.

Here is the log for the failed job:
Date 7/31/2020 12:00:43 PM
Log Job History (SQLWATCH-INTERNAL-CHECKS)

Step ID 1
Server PFAIDS-TEST
Job Name SQLWATCH-INTERNAL-CHECKS
Step Name dbo.usp_sqlwatch_internal_process_checks
Duration 00:00:03
Sql Severity 16
Sql Message ID 0
Operator Emailed
Operator Net sent
Operator Paged
Retries Attempted 0

Message
Executed as user: NT AUTHORITY\SYSTEM. Processing actions for check. [SQLSTATE 01000] (Message 0) Processing actions for check. [SQLSTATE 01000] (Message 0) Processing actions for check. [SQLSTATE 01000] (Message 0) Processing actions for check. [SQLSTATE 01000] (Message 0) Processing actions for check. [SQLSTATE 01000] (Message 0) Processing actions for check. [SQLSTATE 01000] (Message 0) Processing actions for check. [SQLSTATE 01000] (Message 0) Processing actions for check. [SQLSTATE 01000] (Message 0) Processing actions for check. [SQLSTATE 01000] (Message 0) Processing actions for check. [SQLSTATE 01000] (Message 0) Processing actions for check. [SQLSTATE 01000] (Message 0) Errors during execution (usp_sqlwatch_internal_process_checks) [SQLSTATE 42000] (Error 50000) [SQLSTATE 01000] (Error 0) Processing actions for check. [SQLSTATE 01000] (Error 0) Processing actions for check. [SQLSTATE 01000] (Error 0) Processing actions for check. [SQLSTATE 01000] (Error 0). The step failed.

All other SQLWATCH agent jobs completed successfully.

Would you please let me know why the job failed on all three servers with the same pattern? How can I fix it?

thanks!

Version 3.0 is still work in progress. Please use 2.6 if you want a stable version.

As to the error, can you tell me which checks fail? They will have a status of CHECK_ERROR in the logger_check table.

There should also be some Errors in the app_log table.

Hi Marcin,

Sorry, I think SQLWATCH is version 2.5.7507.41097. I found it from table sqlwatch_app_version. I used the latest version of your dbatools to deploy it. We are new to SQLWATCH and like to test it.

The app_log shows the job failed on error_procedure “usp_sqlwatch_logger_xes_long_queries” with error message “XML parsing: line 2304, character 54, illegal name character”.

Please advise and thanks!

The problem you’re describing was actually fixed in 3.0 but 3.0 isn’t yet ready.

The error you’re seeing is because the XE session is stored in the ring buffer which isn’t not very reliable. When the content is too large, SQL Server truncates it and because it’s an XML, it means it breaks the XML.

It was fixed in 3.0 by saving the session to a file which is more resilient.

However, this has nothing to do with the subject of this thread - ie SQLWATCH INTERNAL CHECKS. The broken XML will fail the Performance collector.

Thanks for the info! Do you know when will you release the SQLWATCH 3.0?

thanks!

Hi, I’m having a similar error with the the SQLWATCH INTERNAL CHECKS job every 5 minutes. The error from the mentioned app_log table is: Invalid column name ‘last_backup_finish_date’. The process_message column has the following sql statement which appears to query for databases not in simple mode or tempdb:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED SET NOCOUNT ON SET ANSI_WARNINGS OFF select @output=isnull(max(datediff(minute,last_backup_finish_date,getdate())),999) from sys.databases d left join msdb.dbo.backupset bs on bs.database_name = d.name and bs.type = ‘L’ where d.recovery_model_desc <> ‘SIMPLE’ and d.name not in (‘tempdb’)

For this SQL instance, which is non production, we are not running any databases in full recovery mode. They are all set to simple mode. The databases were set to simple mode after installing SQLWATCH.

Edit:
I reveiwed the query,
the query references the columns ‘last_backup_finish_date’ from msdb.dbo.backupset. That column is actually named [backup_finish_date].
The following edited snippet returns the expected result:
select isnull(max(datediff(minute,backup_finish_date,getdate())),999) from sys.databases d left join msdb.dbo.backupset bs on bs.database_name = d.name and bs.type = ‘L’ where d.recovery_model_desc <> ‘SIMPLE’ and d.name not in (‘tempdb’)

I am on version: 2.5.7507.41097 , installed using dbatools.
Please let me know if there is a fix or workaround. Thanks.

PS - SQLWatch is great :slight_smile:

There’s version 2.6 but I believe this is related https://github.com/marcingminski/sqlwatch/issues/191

Yes, I confirm that I had the failure during initial installation described in issue 191. I simply installed it a second time and the result was successful.

We are talking two different issues.
The 191 is about the incorrect column, which is what you are experiencing.

There is in fact another issue with the failing checks during the installation, but this is something else.

Thanks for your replies, yes they are 2 unrelated issues. I have issue 191 and the install failure referenced in issue 198.

The other issue I believe is https://github.com/marcingminski/sqlwatch/issues/198.
It should all be hopefully fixed in the upcoming release.

(never mind, I did not see your entire response)