In SQL Server, replication is a widely adopted technology for the purpose of real-time data replication between SQL servers. It serves the purpose of keeping data consistent between multiple end points. So the last thing we want to hear from clients is that data is no longer in synch, and as DBA, we don't want to be the last person to realize replication is out of order or broken. So here are some steps you can take to monitor SQL Server replication.
This article is written for and tested in SQL Server 2008.
Pre-requisites for setting up SQL Server monitoring
Of course, the monitoring mechanism I am introducing here is merely one of the many ways to monitor replication. Below is what's needed to set up the monitoring jobs:
1. Database Mail
Make sure your database mail is set up properly and is sending out emails without a problem.
2. SQL Server Agent up and running
What is used to monitor replication?
I am only using three system tables to monitor the replication latency and condition. They are dynamic management viewsys.dm_os_performance_counters, system msdb..sysjobhistory and msdb..sysjobs. In conjunction with the usage of database mail, we can set up replication monitor in five minutes and get notified whenever replication is out of synch or the replication agent fails on execution due to some unexpected cause.
Step 1: Test database mail functionality
Before we setup SQL Agent job for the replication monitoring, open up SQL Server Management Studio and Navigate down to Database Mail
Right click on Database Mail then select "Send Test E-Mail"
In the "To" column, fill in your email and click Send Test E-Mail. This is to ensure that your database mail is functioning normally. You should get a testing email within minutes.
Now we have confirmed that database mail is working. We can proceed with setting up SQL Agent job to monitor replication.
Step 2: Monitoring job #1 - Latency monitoring
In replication monitoring, I usually like to gauge the performance of replication by checking the latency from publisher to distributor then from distributor to subscriber. The latency from Publisher to Distributor can be found through Performance Monitor's counter "LogReader-Delivery Latency," and the latency from Distributor to Subscriber can be found through PerfMon counter "Dist:Delivery Latency." Below are the pictures when searching both counters.
LogReader: Delivery Latency
Dist: Delivery Latency
Starting with SQL 2005, we can directly accessing the LogReader and Distribution delivery latency by querying dynamic management viewsys.dm_os_performance_counters.
To get LogReader Delivery Latency counter value, login into Distributor SQL instance and execute the query below:
select * from sys.dm_os_performance_counters where object_name like '%Replica%'
and counter_name like '%Logreader:%latency%'
and counter_name like '%Logreader:%latency%'
For Distributor Delivery Latency counter value, login into Distributor SQL instance and execute the query below:
select * from sys.dm_os_performance_counters where object_name like '%Replica%'
and counter_name like '%Dist%latency%'
and counter_name like '%Dist%latency%'
Only the SQL Server instance that acts as replication Distributor will provide useful information for delivery latency.
To set up the first monitoring job, follow the steps below:
1. Bring up SQL Server Management Studio
2. Connect to Replication distributor SQL instance
3. Create a new job, then copy the script below into job step command
4. Setup the schedule to run the job every 30 minutes or adjust the frequency according to your environment
5. Update @Tolist and @CClist to the correct email recipients; you can separate the email addresses with a semi-colon (;)
---------------------------------------------------------------------------
--Define monitoring threshold in the scale of minutes
---------------------------------------------------------------------------
declare @minutes int, @threshold int
set @minutes = 30 --> Here is where you define how many minutes latency you would like to be notified
set @threshold = @minutes * 60 * 1000
---------------------------------------------------------------------------
--Specify email distribution list, To and CC
---------------------------------------------------------------------------
declare @Tolist varchar(100)
set @Tolist = 'tiger@companyname123.com'
declare @CClist varchar(100)
set @CClist = 'lion@companyname123.com'
---------------------------------------------------------------------------------------------
--Specify the email subject, @@servername will pick up replication distributor server name
---------------------------------------------------------------------------------------------
declare @mailsubject varchar(100)
set @mailsubject = @@SERVERNAME + ' Replication Latency'
--select datename(hh,GETDATE())
if exists (
select top 1 1 from sys.dm_os_performance_counters where object_name like '%Replica%'
and counter_name like '%Logreader:%latency%' and cntr_value > @threshold
union
select top 1 1 from sys.dm_os_performance_counters where object_name like '%Replica%'
and counter_name like '%Dist%latency%' and cntr_value > @threshold)
begin
DECLARE @tableHTML NVARCHAR(MAX) ;
SET @tableHTML =
N'<H1>'+@@SERVERNAME+' Replication Latency</H1>' +
N'<table border="1">' +
N'<tr><th>Object Name</th><th>Counter Name</th>' +
N'<th>Instance Name</th><th>latency in sec</th>' +
CAST ( ( SELECT td = object_name, '',
td = counter_name, '',
td = instance_name, '',
td = latency_sec, ''
FROM
(select object_name, counter_name, instance_name, round(cntr_value/1000,0) as latency_sec from sys.dm_os_performance_counters where object_name like '%Replica%'
and counter_name like '%Logreader:%latency%' and cntr_value > @threshold
union
select object_name, counter_name, instance_name, round(cntr_value/1000,0) as latency_sec from sys.dm_os_performance_counters where object_name like '%Replica%'
and counter_name like '%Dist%latency%' and cntr_value > @threshold) a
FOR XML PATH('tr'), TYPE
) AS NVARCHAR(MAX) ) +
N'</table>' ;
exec msdb.dbo.sp_send_dbmail
@recipients= @Tolist,
@copy_recipients = @CClist,
@subject = @mailsubject ,
@body_format ='HTML',
@body = @tableHTML
END
--Define monitoring threshold in the scale of minutes
---------------------------------------------------------------------------
declare @minutes int, @threshold int
set @minutes = 30 --> Here is where you define how many minutes latency you would like to be notified
set @threshold = @minutes * 60 * 1000
---------------------------------------------------------------------------
--Specify email distribution list, To and CC
---------------------------------------------------------------------------
declare @Tolist varchar(100)
set @Tolist = 'tiger@companyname123.com'
declare @CClist varchar(100)
set @CClist = 'lion@companyname123.com'
---------------------------------------------------------------------------------------------
--Specify the email subject, @@servername will pick up replication distributor server name
---------------------------------------------------------------------------------------------
declare @mailsubject varchar(100)
set @mailsubject = @@SERVERNAME + ' Replication Latency'
--select datename(hh,GETDATE())
if exists (
select top 1 1 from sys.dm_os_performance_counters where object_name like '%Replica%'
and counter_name like '%Logreader:%latency%' and cntr_value > @threshold
union
select top 1 1 from sys.dm_os_performance_counters where object_name like '%Replica%'
and counter_name like '%Dist%latency%' and cntr_value > @threshold)
begin
DECLARE @tableHTML NVARCHAR(MAX) ;
SET @tableHTML =
N'<H1>'+@@SERVERNAME+' Replication Latency</H1>' +
N'<table border="1">' +
N'<tr><th>Object Name</th><th>Counter Name</th>' +
N'<th>Instance Name</th><th>latency in sec</th>' +
CAST ( ( SELECT td = object_name, '',
td = counter_name, '',
td = instance_name, '',
td = latency_sec, ''
FROM
(select object_name, counter_name, instance_name, round(cntr_value/1000,0) as latency_sec from sys.dm_os_performance_counters where object_name like '%Replica%'
and counter_name like '%Logreader:%latency%' and cntr_value > @threshold
union
select object_name, counter_name, instance_name, round(cntr_value/1000,0) as latency_sec from sys.dm_os_performance_counters where object_name like '%Replica%'
and counter_name like '%Dist%latency%' and cntr_value > @threshold) a
FOR XML PATH('tr'), TYPE
) AS NVARCHAR(MAX) ) +
N'</table>' ;
exec msdb.dbo.sp_send_dbmail
@recipients= @Tolist,
@copy_recipients = @CClist,
@subject = @mailsubject ,
@body_format ='HTML',
@body = @tableHTML
END
Below is the sample email notification you will receive by setting up the above monitoring:
This monitoring notification will provide the latency in seconds for each agent. You can adjust the threshold of the latency by setting the value for @minutes at the beginning of the above script; instead of "set @minutes = 30" which sets monitoring threshold as 30 minutes, you can reduce it to 5 minutes or increase to 60 minutes.
Step 3: Monitoring job #2 à Agent job status monitoring
Just setting up monitoring on delivery latency is not enough; we also need to make sure that all the agents are running without issue. The job below will help.
---------------------------------------------------------------------------
--How much time you want to trace back on agent failure
--Below example, I am tracing back job failure for current day and past 30 min
---------------------------------------------------------------------------
declare @time time
set @time = dateadd(n,-30,getdate()) -- Here I am setting to trace back only past 30 minutes
declare @date date
set @date = convert(date,getdate()) -- The job failure trace back is defined on current day
---------------------------------------------------------------------------
--Specify the publisher SQL instance name
---------------------------------------------------------------------------
declare @publisher varchar(100) --
set @publisher = 'PUBLISHER_SERVERNAME'
---------------------------------------------------------------------------
--Specify email distribution list, To and CC
---------------------------------------------------------------------------
declare @Tolist varchar(100)
set @Tolist = 'tiger@companyname123.com'
declare @CClist varchar(100)
set @CClist = 'lion@companyname123.com'
---------------------------------------------------------------------------------------------
--Specify the email subject, @@servername will pick up replication distributor server name
---------------------------------------------------------------------------------------------
declare @mailsubject varchar(100)
set @mailsubject = @@SERVERNAME + ' Replication failure'
if exists (
select distinct b.name,a.run_date, right('00000'+convert(varchar,a.run_time),6),message from msdb..sysjobhistory a inner join msdb..sysjobs b on a.job_id = b.job_id
where b.name like @publisher+'%' and run_status <> 1 and message like '%error%'
and convert(date,convert(varchar,a.run_date ))= @date
and right('00000'+convert(varchar,a.run_time),6) > replace(convert(varchar(8),@time),':','')
)
begin
DECLARE @tableHTML NVARCHAR(MAX) ;
SET @tableHTML =
N'<H1>'+@@SERVERNAME+' Replication Agent Failed</H1>' +
N'<table border="1">' +
N'<tr><th>Job Name</th><th>Run Date</th>' +
N'<th>Run Time</th><th>Failure Message</th>' +
CAST ( ( SELECT td = name, '',
td = run_date, '',
td = run_time, '',
td = message, ''
FROM
(select name, MAX(run_date) as run_date, MAX(run_time) as run_time, message from
(select distinct b.name,a.run_date, run_time = right('00000'+convert(varchar,a.run_time),6),message from msdb..sysjobhistory a inner join msdb..sysjobs b on a.job_id = b.job_id
where b.name like @publisher+'%' and run_status <> 1 and message like '%error%'
and convert(date,convert(varchar,a.run_date ))= @date
and right('00000'+convert(varchar,a.run_time),6) > replace(convert(varchar(8),@time),':','') ) a
group by name, message
) a
FOR XML PATH('tr'), TYPE
) AS NVARCHAR(MAX) ) +
N'</table>' ;
exec msdb.dbo.sp_send_dbmail
@recipients= @Tolist,
@copy_recipients = @CClist,
@subject = @mailsubject ,
@body_format ='HTML',
@body = @tableHTML
end
--How much time you want to trace back on agent failure
--Below example, I am tracing back job failure for current day and past 30 min
---------------------------------------------------------------------------
declare @time time
set @time = dateadd(n,-30,getdate()) -- Here I am setting to trace back only past 30 minutes
declare @date date
set @date = convert(date,getdate()) -- The job failure trace back is defined on current day
---------------------------------------------------------------------------
--Specify the publisher SQL instance name
---------------------------------------------------------------------------
declare @publisher varchar(100) --
set @publisher = 'PUBLISHER_SERVERNAME'
---------------------------------------------------------------------------
--Specify email distribution list, To and CC
---------------------------------------------------------------------------
declare @Tolist varchar(100)
set @Tolist = 'tiger@companyname123.com'
declare @CClist varchar(100)
set @CClist = 'lion@companyname123.com'
---------------------------------------------------------------------------------------------
--Specify the email subject, @@servername will pick up replication distributor server name
---------------------------------------------------------------------------------------------
declare @mailsubject varchar(100)
set @mailsubject = @@SERVERNAME + ' Replication failure'
if exists (
select distinct b.name,a.run_date, right('00000'+convert(varchar,a.run_time),6),message from msdb..sysjobhistory a inner join msdb..sysjobs b on a.job_id = b.job_id
where b.name like @publisher+'%' and run_status <> 1 and message like '%error%'
and convert(date,convert(varchar,a.run_date ))= @date
and right('00000'+convert(varchar,a.run_time),6) > replace(convert(varchar(8),@time),':','')
)
begin
DECLARE @tableHTML NVARCHAR(MAX) ;
SET @tableHTML =
N'<H1>'+@@SERVERNAME+' Replication Agent Failed</H1>' +
N'<table border="1">' +
N'<tr><th>Job Name</th><th>Run Date</th>' +
N'<th>Run Time</th><th>Failure Message</th>' +
CAST ( ( SELECT td = name, '',
td = run_date, '',
td = run_time, '',
td = message, ''
FROM
(select name, MAX(run_date) as run_date, MAX(run_time) as run_time, message from
(select distinct b.name,a.run_date, run_time = right('00000'+convert(varchar,a.run_time),6),message from msdb..sysjobhistory a inner join msdb..sysjobs b on a.job_id = b.job_id
where b.name like @publisher+'%' and run_status <> 1 and message like '%error%'
and convert(date,convert(varchar,a.run_date ))= @date
and right('00000'+convert(varchar,a.run_time),6) > replace(convert(varchar(8),@time),':','') ) a
group by name, message
) a
FOR XML PATH('tr'), TYPE
) AS NVARCHAR(MAX) ) +
N'</table>' ;
exec msdb.dbo.sp_send_dbmail
@recipients= @Tolist,
@copy_recipients = @CClist,
@subject = @mailsubject ,
@body_format ='HTML',
@body = @tableHTML
end
Below is the sample notification content for monitoring the replication agent status
You can set the monitoring to run every 30 minutes to check if any agent failed to run. If you have one distributor server that services for multiple publishing servers, you will want to set up one job per each publishing server and update "set @publisher = 'PUBLISHER_SERVERNAME'" to the respective publishing server name.
Conclusion
Replication monitoring can go way beyond the above setting, but if you are looking for something that's quick and easy to set up, I believe these two monitoring will serve the purpose just fine.