今天试验了一把MySQL MMM.(MySQL Master-Master Replication Manager)一个主主复制的管理工具,它提供了主机监控,MySQL服务监控,复制线程等多个方位的监控,使用VIP自动漂移,可以对应用程序实现透明。
下面是试验步骤:
OS: Ubuntu 11.04 X64 DB1:192.168.1.186 (MASTER1) DB2:192.168.1.187 (MASTER2) DB3:192.168.1.184 (SLAVE) Monitor: 192.168.1.175 虚拟ip:192.168.1.200(writer),192.168.1.201/202(reader)
MySQL V5.1.50(源码安装)
MMM V2.2.1 下载地址:http://mysql-mmm.org/downloads
安装部分参考地址:参考地址:http://mysql-mmm.org/mmm2:guide
Note:在建立复制环境时需要注意的几点:
1、server_id 要设置不同 2、Auto_increment_offset=1 Auto_increment_increment=2 (master1 上设置) Auto_increment_offset=2 Auto_increment_increment=2 (master2上设置)
可以防止在某master假死并恢复的瞬间,两个“master” 都出现数据的更新,导致复制失败
3、master上都要启用 log_slave_updates,log_bin
MMM 安装:
所需用户:
monitor user | used by the mmm monitor to check the health of the MySQL servers | REPLICATION CLIENT |
agent user | used by the mmm agent to change read-only mode, replication master, etc. | SUPER, REPLICATION CLIENT, PROCESS |
relication user | used for replication | REPLICATION SLAVE |
perl 依赖包安装:
(DB需要安装部分:)
monitoring 安装部分:
perl -MCPAN -e shell cpan> install Algorithm::Diff cpan> install Class::Singleton cpan> install Log::Dispatch cpan> install Log::Log4perl cpan> install Mail::Send cpan> install Proc::Daemon cpan> install Thread::Queue cpan> install Time::HiRes cpan> install DBI cpan>install DBD::mysql
下载的二进制包,直接make install (在所有机器上)
MMM主要的功能通过下面三个脚本来实现:
mmm_mond
监控进程,负责所有的监控工作,决定和处理所有节点角色活动
mmm_agentd
运行在每个mysql服务器上的代理进程,完成监控的探针工作和执行简单的远端服务设置
mmm_control
一个简单的脚本,提供管理mmm_mond进程的命令
配置文件部分:
active_master_role writer <host default> cluster_interface eth0 pid_path /var/run/mmm_agentd.pid bin_path /usr/lib/mysql-mmm/ replication_user abc replication_password MMM agent_user mmm_agent agent_password MMM mysql_port 3308 </host> <host db1> ip 192.168.1.186 mode master peer db2 </host> <host db2> ip 192.168.1.187 mode master peer db1 </host> <host db3> ip 192.168.1.184 mode slave </host> <role writer> hosts db2,db1 ips 192.168.1.200 mode exclusive </role> <role reader> hosts db2,db3 ips 192.168.1.201,192.168.1.202 mode balanced </role>
monitoring 部分:
active_master_role writer <host default> cluster_interface eth0 pid_path /var/run/mmm_agentd.pid bin_path /usr/lib/mysql-mmm/ replication_user abc replication_password MMM agent_user mmm_agent agent_password MMM mysql_port 3308 </host> <host db1> ip 192.168.1.186 mode master peer db2 </host> <host db2> ip 192.168.1.187 mode master peer db1 </host> <host db3> ip 192.168.1.184 mode slave </host> <role writer> hosts db2,db1 ips 192.168.1.200 mode exclusive </role> <role reader> hosts db2,db3 ips 192.168.1.201,192.168.1.202 mode balanced </role> 安装好后: mmm_control show 查看状态:
db1(192.168.1.186) master/ONLINE. Roles: writer(192.168.1.200) db2(192.168.1.187) master/ONLINE. Roles: reader(192.168.1.201) db3(192.168.1.184) slave/ONLINE. Roles: reader(192.168.1.202)
测试一、db2 宕机,恢复情况:
日志分析:
2012/08/23 17:02:59 WARN Check 'rep_backlog' on 'db2' is in unknown state! Message: UNKNOWN: Connect error (host = 192.168.1.187:3308, user = mmm_monitor)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111 2012/08/23 17:02:59 WARN Check 'rep_threads' on 'db2' is in unknown state! Message: UNKNOWN: Connect error (host = 192.168.1.187:3308, user = mmm_monitor)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111 2012/08/23 17:03:09 ERROR Check 'mysql' on 'db2' has failed for 10 seconds! Message: ERROR: Connect error (host = 192.168.1.187:3308, user = mmm_monitor)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111 2012/08/23 17:03:09 ERROR Check 'rep_threads' on 'db1' has failed for 10 seconds! Message: ERROR: Replication is broken 2012/08/23 17:03:10 FATAL State of host 'db2' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK) 2012/08/23 17:03:10 INFO Removing all roles from host 'db2': 2012/08/23 17:03:10 INFO Removed role 'reader(192.168.1.201)' from host 'db2' 2012/08/23 17:03:10 INFO Orphaned role 'reader(192.168.1.201)' has been assigned to 'db3'
其中我们可以看到,monitor 通过10次 mysql检查,判定DB2已经从 online状态到 HARD_OFFLINE 主机存在,但是数据库已“死”,DB1的线程也已经停止,从 DB2上移除 reader角色到 DB3.
当把DB2启动起来之后,它的状态会变为:changed from HARD_OFFLINE to AWAITING_RECOVERY,
Check 'rep_threads' on 'db1' is ok! mmm_control set_online db2 手动变为online状态,它会重新获得reader角色: Admin changed state of 'db2' from AWAITING_RECOVERY to ONLINE INFO Moving role 'reader(192.168.1.202)' from host 'db3' to host 'db2'
测试二:
DB1宕机, DB2应该变为writer,db3 slave 把主定向为db2.
日志分析:
2012/08/23 17:14:51 WARN Check 'rep_threads' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 192.168.1.186:3308, user = mmm_monitor)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111 2012/08/23 17:14:51 WARN Check 'rep_backlog' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 192.168.1.186:3308, user = mmm_monitor)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111 2012/08/23 17:15:00 ERROR Check 'mysql' on 'db1' has failed for 10 seconds! Message: ERROR: Connect error (host = 192.168.1.186:3308, user = mmm_monitor)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111 2012/08/23 17:15:01 ERROR Check 'rep_threads' on 'db2' has failed for 10 seconds! Message: ERROR: Replication is broken 2012/08/23 17:15:01 ERROR Check 'rep_threads' on 'db3' has failed for 10 seconds! Message: ERROR: Replication is broken 2012/08/23 17:15:02 FATAL State of host 'db1' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK) 2012/08/23 17:15:02 INFO Removing all roles from host 'db1': 2012/08/23 17:15:02 INFO Removed role 'writer(192.168.1.200)' from host 'db1' 2012/08/23 17:15:02 INFO Orphaned role 'writer(192.168.1.200)' has been assigned to 'db2' 2012/08/23 17:15:06 FATAL State of host 'db3' changed from ONLINE to REPLICATION_FAIL 2012/08/23 17:15:06 INFO Removing all roles from host 'db3': 2012/08/23 17:15:06 INFO Removed role 'reader(192.168.1.201)' from host 'db3' 2012/08/23 17:15:06 INFO Check 'rep_threads' on 'db3' is ok! 2012/08/23 17:15:06 INFO Orphaned role 'reader(192.168.1.201)' has been assigned to 'db2' 2012/08/23 17:15:09 FATAL State of host 'db3' changed from REPLICATION_FAIL to ONLINE 2012/08/23 17:15:09 INFO Moving role 'reader(192.168.1.202)' from host 'db2' to host 'db3'
Monitor 检测到 DB2和DB3 与DB1 之间的复制已经停止,10次检查后,确定DB1 已经 “死掉”,将它的writer ip 转移至 DB2,db3 复制线程停止后,将其拥有的 reader ip摘除,当 db3 重新与 DB2建立主从关系后,又获得 reader 角色;
等DB1恢复后, mmm_control show
db1(192.168.1.186) master/ONLINE. Roles: db2(192.168.1.187) master/ONLINE. Roles: reader(192.168.1.201), writer(192.168.1.200) db3(192.168.1.184) slave/ONLINE. Roles: reader(192.168.1.202)
手动将DB1获得writer角色:
mmm_control move_role writer db1 db1(192.168.1.186) master/ONLINE. Roles: writer(192.168.1.200) db2(192.168.1.187) master/ONLINE. Roles: reader(192.168.1.201) db3(192.168.1.184) slave/ONLINE. Roles: reader(192.168.1.202)
在测试的时候,间隔要超过60s,原因是:MMM防抖动检测:
当主机从Online 状态转换为HARD_OFFLINE
/REPLICATION_FAIL
/REPLICATION_DELAY,又从这些状态转换为online(在小于60s的情况下)
MMM 会阻止这种情况的发生,
我们可以通过配置文件配置它,如果一个主机在flap_duration时间内宕掉了flap_count次,就认为主机处理flap状态,这个主机就不会自动被设置为ONLINE状 态,它将一直处于AWAITING_RECOVERY状态除非手动设置online(mmm_control set online host)。如果auto_set_online>0,处于flapping的主机在flap_duration时间后将制动被设置为ONLINE 状态。
英文参考地址:http://mysql-mmm.org/mysql-mmm.html
中文参考地址:http://linuxguest.blog.51cto.com/195664/578311
转载于:https://blog.51cto.com/weipengfei/971419