通常是根据 Seconds_Behind_Master 的值来判断slave的延迟。这么做大部分情况下尚可接受,但并不够准确,而应该考虑更多因素。
查看slave的状态
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.0.20
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000150
Read_Master_Log_Pos: 226326418
Relay_Log_File: abce-relay-bin.000003
Relay_Log_Pos: 77106856
Relay_Master_Log_File: mysql-bin.000149
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 1046292913
Relay_Log_Space: 330964012
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 30510
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)
mysql>
从查询结果可以看出,slave落后master的时间是30510秒(即Seconds_Behind_Master的值)
查看一下slave上的复制进程在做什么:
mysql> show full processlist\G
*************************** 1. row ***************************
Id: 4
User: system user
Host:
db: NULL
Command: Connect
Time: 6018
State: Waiting for master to send event
Info: NULL
*************************** 2. row ***************************
Id: 5
User: system user
Host:
db: abce
Command: Connect
Time: 30501
State: updating
Info: delete from s_item_e
where
(
ng_user_id = 23880
and dt_shift_date in
(
'2017-10-30 00:00:00'
,
'2017-10-29 00:00:00'
)
)
or
(
ng_user_id = 24853
and dt_shift_date in
(
'2017-10-30 00:00:00'
,
'2017-10-29 00:00:00'
)
)
......
sql线程在做updating操作,具体内容是从表中删除数据。Time的值是30501。通常简单的sql不会执行30501秒。
在REPLICATION进程中,Time这列的值还可能有几种情况:
1、SQL线程当前执行的binlog(实际上是relay log)中的timestamp和IO线程最新的timestamp的差值,这就是通常大家认为的Seconds_Behind_Master值,并不是某个SQL的实际执行耗时;
2、SQL线程当前如果没有活跃SQL在执行的话,Time值就是SQL线程的idle time;
而IO线程的Time值则是该线程自从启动以来的总时长(多少秒),如果系统时间在IO线程启动后发生修改的话,可能会导致该Time值异常,比如变成负数,或者非常大。
可以根据以下顺序进行延迟的监控:
1.首先比较 Read_Master_Log_Pos 和 Master_Log_File 是否有差异
2.如果 Relay_Master_Log_File 和 Master_Log_File 是一样的话,再看 Exec_Master_Log_Pos 和 Read_Master_Log_Pos 的差异,对比SQL线程比IO线程慢了多少个binlog事件;
3.如果 Relay_Master_Log_File 和 Master_Log_File 不一样,那说明延迟可能较大,需要从MASTER上取得binlog status,判断当前的binlog和MASTER上的差距;
查看主库的binary日志:
mysql> SHOW BINARY LOGS;
+------------------+------------+
| Log_name | File_size |
+------------------+------------+
| mysql-bin.000142 | 1073778812 |
| mysql-bin.000143 | 1073798740 |
| mysql-bin.000144 | 1073910052 |
| mysql-bin.000145 | 1073781405 |
| mysql-bin.000146 | 1073845859 |
| mysql-bin.000147 | 1073746199 |
| mysql-bin.000148 | 1073863295 |
| mysql-bin.000149 | 1073822822 |
| mysql-bin.000150 | 226343255 |
+------------------+------------+
9 rows in set (0.00 sec)
mysql>
计算延迟应该是mysql-bin.000149的最大事件数减去已经被执行完的事件数,
即 1073822822 - 1046292913 = 27529909 个binlog event,再加上mysql-bin.000150这个binlog已经产生的226343255个binlog event,共226343255个binlog event。即:1073822822 - 1046292913 + 226343255
查看过程中也可以设置pager,只查看关注的几个status值:
mysql> pager cat | egrep -i 'system user|Exec_Master_Log_Pos|Seconds_Behind_Master|Read_Master_Log_Pos'
PAGER set to 'cat | egrep -i 'system user|Exec_Master_Log_Pos|Seconds_Behind_Master|Read_Master_Log_Pos''
mysql>
这时有活跃SQL,Time值是30583,Seconds_Behind_Master是30565
mysql> show processlist; show slave status\G
| 4 | system user | | NULL | Connect | 7168 | Waiting for master to send event | NULL |
| 5 | system user | | abce | Connect | 30583 | updating | delete from s_item_e
3 rows in set (0.00 sec)
Read_Master_Log_Pos: 226364447
Exec_Master_Log_Pos: 1057661705
Seconds_Behind_Master: 30565
1 row in set (0.00 sec)
mysql>