from: http://www.softwareprojects.com/resources/programming/t-mysql-replication-monitor-1744.html
What is MySQL Replication
MySQL Replication is a great way to improve database performance and availability, by creating live copies of the database on multiple machines.
Read our article about How to setup MySQL Replication for more information about setting up replication.
There are two primary issues you have to deal with, when using MySQL replication:
#1. Replication can often break
When a query cannot process properly on a slave (due to duplicate key or another MySQL error), replication will stop running until you manually resume it.
To avoid replication breaking due to duplicate keys, make sure your /etc/my.cnf configuration file includes these two lines:
slave_exec_mode = IDEMPOTENT
slave-skip-error =1062
#2. Replication can lag
Since replication is serialized (slave runs all queries on a single thread, while master is multi threaded), the slave machine can often fall behind the master. This leads to inconsistency in data, especially when using a master-master replication setup.
-
The issue with monitoring MySQL Replication
The most popular method to monitor MySQL replication, is to use MySQL built-in SHOW SLAVE STATUS command.
Quote: show slave status ;
Slave_IO_State: Waiting for master to send event
Master_Log_File: mysql-bin.000017
Read_Master_Log_Pos: 572720266
Relay_Log_File: api4-relay-bin.000034
Relay_Log_Pos: 2424716
Relay_Master_Log_File: mysql-bin.000017
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Exec_Master_Log_Pos: 572720266
Relay_Log_Space: 97695432
Seconds_Behind_Master: 0
The two important fields are 'Slave_IO_Running' and 'Seconds_Behind_Master', indicating whether the slave thread is running and how many seconds it is lagging behind the master.
Unfortunately, the 'Seconds_Behind_Master' estimate is very unreliable, especially when you have a slow network or when the master/slave servers are not running on the same LAN.
The reason for the inaccuracy lies in the way MySQL calculates 'Seconds_Behind_Master'. All it is doing is subtracting the position of its internal SQL log from the IO log.
In our tests, we measured cases where a slave was 10 minutes behind the master and yet 'Seconds_Behind_Master' returned 0 seconds.
A better approach to Monitoring MySQL Replication
A more reliable approach to measure replication lag,take advantage of MySQL SYSDATE function.
If you insert NOW at 12:00:04 on the master the row will hold exactly 12:00:04 on the slave, regardless of the slave local time. However, the SYSDATE function does not follow this behavior. It always uses the value of the slave's system clock.
If you insert a row with one column holding the value of NOW or CURRENT_TIMESTAMP and the other holding the value of SYSDATE into the master, you can use the difference between the two values on the slave to see how far behind it is. If the slave is in sync the two values will be identical. If the slave is one second behind the column holding SYSDATE will be one second ahead of the column holding NOW. No polling is required to determine the current lag.
Implementation
Step 1: Create Heartbeat table
CREATE TABLE mysql_heartbeat (master_host char(50) default '' not null, master_time timestamp default CURRENT_TIMESTAMP, slave_time timestamp not null, index(master_host));
Step 2: Setup heartbeat cron
Install a script on a cronjob that will run this query on the master server, every 60 seconds
insert into mysql_heartbeat(master_host, slave_time) values('127.0.0.1',SYSDATE());
(Replace 127.0.0.1 with the master host name or ip address)
Step 3: Monitor script
Run this script to display server lag over a given time frame
select master_host,master_time,timediff(slave_time,master_time) from mysql_heartbeat where date(master_time)=date(now()) group by master_host,master_time order by master_host,master_time;