====模拟当前slaves之间relay log存在差异,当前master mysql服务关闭后的mha自动切换
====1、在没有设置master_ip_failover_script的情况下的切换测试:测试预想结果:自动切换成功
=============================================
1)在241上,制造relay log同步不一致情况
mysql> stop slave io_thread;
Query OK, 0 rows affected (0.02 sec)
2)在原来主库108上,shutdown mysql
[root@rac1 script]# mysqladmin -uroot -p shutdown
在管理主机查看日志没有任何日志:
tail -30 /var/log/masterha/app1/app1.log
Thu May 17 18:04:09 2012 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Fri May 18 10:26:04 2012 - [info] Got terminate signal. Exit.
原来是masterha没有启动:
[root@racdb ~]# masterha_check_status --conf=/etc/app1.cnf
app1 is stopped(2:NOT_RUNNING).
使用后台进程模式启动:
[root@racdb mha4mysql-manager-0.53]# nohup masterha_manager --conf=/etc/app1.cnf --remove_dead_master_conf < /dev/null > /var/log/masterha/app1/app1.log 2>&1 &
查看日志,启动失败退出。
3)、尝试重启108上的mysql,修改change master to master_host和master_log_file以及master_log_pos;
但没有设置master_user和master_password,报如下错误:
[root@racdb ~]# masterha_check_repl --conf=/etc/app1.cnf
Wed May 23 11:55:18 2012 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/Server.pm, ln381] 192.168.133.108(192.168.133.108:3306): User does not exist or does not have REPLICATION SLAVE privilege! Other slaves can not start replication from this host.
Wed May 23 11:55:18 2012 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln383] Error happend on checking configurations. at /usr/lib/perl5/site_perl/5.8.8/MHA/ServerManager.pm line 1305
4)、通过增加change master_user和master_password,解决user not exists问题;继续检查复制报如下错误。但,实际的复制没有问题。
[root@racdb ~]# masterha_check_repl --conf=/etc/app1.cnf
Wed May 23 12:03:20 2012 - [info] Connecting to root@192.168.133.108(192.168.133.108:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to rac1-relay-bin.000002
Temporary relay log file is /var/lib/mysql/rac1-relay-bin.000002
Testing mysql connection and privileges..ERROR 1142 (42000) at line 1: CREATE command denied to user 'root'@'rac1.mttang.com' for table 'apply_diff_relay_logs_test'
mysql command failed with rc 1:0!
at /usr/bin/apply_diff_relay_logs line 348
main::check() called at /usr/bin/apply_diff_relay_logs line 467
eval {...} called at /usr/bin/apply_diff_relay_logs line 447
main::main() called at /usr/bin/apply_diff_relay_logs line 110
Wed May 23 12:03:20 2012 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln193] Slaves settings check failed!
Wed May 23 12:03:20 2012 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln372] Slave configuration failed.
Wed May 23 12:03:20 2012 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterMonitor.pm, ln383] Error happend on checking configurations. at
没有做任何修改,后莫名奇妙的恢复。(此记1次)
5)、再次手工shutdown 192.168.133.109 master主库
检查app1.log出现如下错误:
Wed May 23 17:13:37 2012 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterFailover.pm, ln295] Last failover was done at 2012/05/23 11:04:38. Current time is too early to do failover again. If you want to do failover, manually remove /var/log/masterha/app1/app1.failover.complete and run this script. again.
Wed May 23 17:13:37 2012 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/ManagerUtil.pm, ln178] Got ERROR: at /usr/bin/masterha_master_switch line 53
解决方法:
rm -f /var/log/masterha/app1/app1.failover.complete
重新手工切换(或不用手工切换,直接运行manager),成功切换。
接下来,继续测试,在manager running的情况下,无master_ip_failover_script设置下的自动切换。注意,自动切换时,因为masterha_manager --remove_dead_master_conf 所以App1.cnf 中server1已经删除,检查不会出现问题,需要动态在/etc/app1.cnf中增加如下配置:
masterha_conf_host --command=add --conf=/etc/app1.cnf --hostname=192.168.133.109 --block=server1 --params="candidate_master=1;ignore_fail=1"
循环执行上述步骤到第4)步;上述问题重现。如果192.168.133.109上的skip-name-resolve没有设置,重启mysql后,检查则会报Create command denied to user 'root'@'rac2.mttang.com' for table 'apply_diff_relay_logs_test'
mysql command failed with rc 1:0!
at /usr/bin/apply_diff_relay_logs line 348
main::check() called at /usr/bin/apply_diff_relay_logs line 467
eval {...} called at /usr/bin/apply_diff_relay_logs line 447
main::main() called at /usr/bin/apply_diff_relay_logs line 110
如果,修改133.109上的my.cnf设置skip-name-resolve;则报如下错误:
Thu May 24 14:46:01 2012 - [info] Checking SSH publickey authentication and checking recovery script. configurations on all alive slave servers..
Thu May 24 14:46:01 2012 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=192.168.133.109 --slave_ip=192.168.133.109 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.5.20-log --manager_version=0.53 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Thu May 24 14:46:01 2012 - [info] Connecting to root@192.168.133.109(192.168.133.109:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to rac2-relay-bin.000002
Temporary relay log file is /var/lib/mysql/rac2-relay-bin.000002
Testing mysql connection and privileges..ERROR 1130 (HY000): Host '192.168.133.109' is not allowed to connect to this MySQL server
mysql command failed with rc 1:0!
at /usr/bin/apply_diff_relay_logs line
解决方法:grant all privileges on *.* to 'root'@'192.168.133.109' identified by '12345678';
自此,1142在mha failover时的错误解决:在my.cnf中设置skip-name-resolve。