环境: mysql version 5.7.22 on redhat6.4 64bit
master-db1,mha-node 192.168.0.100
slave-db2,mha-node 192.168.0.105
slave-db3,mha-manager 192.168.0.107
1. 192.168.0.100 master-db1 与 192.168.0.105 slave-db2 主从复制:略过
192.168.0.100 master-db1 与 192.168.0.107 slave-db3 主从复制:略过
2.master-db1创建mha的管理和监控账号mha,密码123456
create user 'mha'@'%' identified by '123456';
grant all privileges on *.* to mha;
3.安装mha
mha node上:
yum install -y perl-DBD-MySQL ncftp perl-DBI.x86
rpm -ivh mha4mysql-node-0.57-0.el7.noarch.rpm
mha manager上:
yum install -y perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes ncftp
rpm -ivh mha4mysql-manager-0.57-0.el7.noarch.rpm
4.配置3台服务器之间的ssh互相免密,如100 master-db1上,其他2台同理:
ssh-keygen会生成pub和private key在root home目录下
ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.0.105
ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.0.107
5.mha manager配置文件:
root@rhel64-64bit Desktop]# mkdir -p /etc/mha---->vim mha.conf
[root@rhel64-64bit Desktop]# mkdir -p /home/mysql-mha
mha.conf配置文件如下:
[server default]
user=mha
password=123456
manager_workdir=/home/mysql-mha
manager_log=/home/mysql-mha/manager.log
remote_workdir=/home/mysql-mha
ssh_user=root
repl_user=replicant
repl_password=123456
ping_interval=1
master_binlog_dir=/var/lib/mysql
master_ip_failover_script=/usr/bin/master_ip_failover
secondary_check_script=/usr/bin/masterha_secondary_check -s 192.168.0.100 -s 192.168.0.105 -s 192.168.0.107
[server1]
hostname=192.168.0.100
port=3306
candidate_master=1
[server2]
hostname=192.168.0.105
port=3306
candidate_master=1
[server3]
hostname=192.168.0.107
port=3306
no_master=1
6.mha manager上执行masterha_check_ssh --conf=/etc/mha/mha.conf报错:
[root@rhel64-64bit bin]# masterha_check_ssh --conf=/etc/mha/mha.conf
Mon Jun 11 13:53:21 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jun 11 13:53:21 2018 - [info] Reading application default configuration from /etc/mha/mha.conf..
Mon Jun 11 13:53:21 2018 - [info] Reading server configuration from /etc/mha/mha.conf..
Mon Jun 11 13:53:21 2018 - [info] Starting SSH connection tests..
Mon Jun 11 13:53:40 2018 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln63]
Mon Jun 11 13:53:22 2018 - [debug] Connecting via SSH from root@192.168.0.107(192.168.0.107:22) to root@192.168.0.100(192.168.0.100:22)..
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Mon Jun 11 13:53:40 2018 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln111] SSH connection from root@192.168.0.107(192.168.0.107:22) to root@192.168.0.100(192.168.0.100:22) failed!
Mon Jun 11 13:54:33 2018 - [debug]
Mon Jun 11 13:53:21 2018 - [debug] Connecting via SSH from root@192.168.0.100(192.168.0.100:22) to root@192.168.0.105(192.168.0.105:22)..
Mon Jun 11 13:53:57 2018 - [debug] ok.
Mon Jun 11 13:53:57 2018 - [debug] Connecting via SSH from root@192.168.0.100(192.168.0.100:22) to root@192.168.0.107(192.168.0.107:22)..
SSH Configuration Check Failed!
从报错上看是权限问题,检查100和107机器的/root/.ssh下的id_rsa,id_rsa.pub包括.ssh文件夹等文件权限,600,644,700没有问题的,而且手动测试ssh from107到100也是正常,也是没有找到可以检查的日志,manager.log还没生成。。
查看脚本/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm
my $local_file =
"$workdir/$target->{ssh_host}_$target->{ssh_port}_ssh_check.log";
也木有ssh_check.log日志,全机器搜索也没有,大多网上资料说是公钥文件或权限等问题,建议重新配置
只能把所有机器中的.ssh下面的文件全删,重新配置,遇到新问题:
[root@rhel64-64bit .ssh]# ssh 192.168.0.105
Agent admitted failure to sign using the key.
root@192.168.0.105's password:
解决办法:ssh-add ~/.ssh/id_rsa 应该是把服务器的private key加到ssh-agent的高速缓存中,以便ssh时发送给对方服务器以作认证.
但是重新配置后,masterha_check_ssh问题依旧。。
最终,醒悟到manger和slave-db3是同一个机器,所以应该在107机器上配置ssh到本机的免密登录,问题得以解决:
ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.0.107
Mon Jun 11 16:33:42 2018 - [info] All SSH connection tests passed successfully.
7.mha manager上masterha_check_repl --conf=/etc/mha/mha.conf查看复制状态:
[root@rhel64-64bit Desktop]# masterha_check_repl --conf=/etc/mha/mha.conf
MySQL Replication Health is OK.
8.在mha manager上启动manager:
nohup masterha_manager --conf=/etc/mha/mha.conf &
[root@rhel64-64bit Desktop]# ps -ef|grep master
root 6823 6015 0 16:39 pts/10 00:00:00 perl /usr/bin/masterha_manager --conf=/etc/mha/mha.conf
[root@rhel64-64bit Desktop]# jobs -l
[1]+ 8930 Running nohup masterha_manager --conf=/etc/mha/mha.conf &
[root@rhel64-64bit Desktop]# masterha_check_status --conf=/etc/mha/mha.conf
mha (pid:8930) is running(0:PING_OK), master:192.168.0.100
9.在master-db1上,手动配置vip,这个和master_ip_failover里面脚本的配置vip网卡信息等要一致
[root@rhel64-64bit ~]# ifconfig eth0:1 192.168.0.109/24
[root@rhel64-64bit ~]# ip add
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:22:98:4d brd ff:ff:ff:ff:ff:ff
inet 192.168.0.100/24 brd 192.168.0.255 scope global eth0
inet 192.168.0.109/24 brd 192.168.0.255 scope global secondary eth0:1
inet6 fe80::20c:29ff:fe22:984d/64 scope link
valid_lft forever preferred_lft forever
10.故障转移和切换测试:
1>在master-db1上停止mysqld,
结果vip192.168.0.109没有漂移到105上,但是master已经切换到105了
查看/home/mysql-mha/manager.log:
IN SCRIPT TEST====root|sudo /sbin/ifconfig eth0:1 down==root|sudo /sbin/ifconfig eth0:1 192.168.0.109/24===
Enabling the VIP - 192.168.0.109/24 on the new master - 192.168.0.105
sudo: sorry, you must have a tty to run sudo
可以见vip切换失败是因为没有设置好sudo
------------
Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.0.100(192.168.0.100:3306)
The latest slave 192.168.0.105(192.168.0.105:3306) has all relay logs for recovery.
Selected 192.168.0.105(192.168.0.105:3306) as a new master.
192.168.0.105(192.168.0.105:3306): OK: Applying all logs succeeded.
192.168.0.105(192.168.0.105:3306): OK: Activated master IP address.
192.168.0.107(192.168.0.107:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.0.107(192.168.0.107:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.0.105(192.168.0.105:3306)
192.168.0.105(192.168.0.105:3306): Resetting slave info succeeded.
Master failover to 192.168.0.105(192.168.0.105:3306) completed successfully.
sudo: sorry, you must have a tty to run sudo
虽vip没有正常漂移,但是主已经从100服务器切换到105服务器了
检查master_ip_failover脚本,并把下面
my $ssh_start_vip = "sudo /sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "sudo /sbin/ifconfig eth0:$key down";
my $ssh_Bcast_arp= "sudo /sbin/arping -I bond0 -c 3 -A #vip#";
修改为:
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
2>重新测试,停止master-db1上的mysqld服务,检查这次vip可正常漂移到105服务器:
[root@rhel64-64bit .ssh]# ip add
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:5c:b2:0e brd ff:ff:ff:ff:ff:ff
inet 192.168.0.105/24 brd 192.168.0.255 scope global eth0
inet 192.168.0.109/24 brd 192.168.0.255 scope global secondary eth0:1
inet6 fe80::20c:29ff:fe5c:b20e/64 scope link
valid_lft forever preferred_lft forever
在107的slave-db3的主已经指向新的master,即slave-db2 192.168.0.105服务器了
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.0.105
Master_User: replicant
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mybinlogg.000015
Read_Master_Log_Pos: 154
Relay_Log_File: rhel64-64bit-relay-bin.000002
Relay_Log_Pos: 320
Relay_Master_Log_File: mybinlogg.000015
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
PS:每次一次自动切换完成后,mha manager进程便停止,并生成一个文件
/home/mysql-mha/mha.failover.complete
需要删除,重启mha开始新的监控,否可可能出现下次切换不成功的情况
11.总结:
mysql-mha特点:
1>当master-db1 down后,mha自动从candidate_master选择新的master,保存源master上的binlog,并识别最新
日志点的slave,应用差异的relay和binlog到新的master,最大程度减小数据的丢失
mysql-mha缺点:
1>并不是绝对保证数据不丢失,有可能不能保存down掉的源master的最新的binlog,硬盘问题或ssh问题都可能导致
日志保存失败
2>SSH免密登录配置存在一定安全风险
3>本身不支持vip转移,需要借助keepliaved第三方工具或自动开发脚本
4>从masterha_check_status输出,它只监控master