简介:
MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton(现就职于Facebook公司)开发,是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。
该软件由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)。MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。MHA Node运行在每台MySQL服务器上,MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。
在MHA自动故障切换过程中,MHA试图从宕机的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失了最新的数据。使用MySQL 5.5的半同步复制,可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性。
目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,因为至少需要三台服务器,出于机器成本的考虑,淘宝也在该基础上进行了改造,目前淘宝TMHA已经支持一主一从。另外对于想快速搭建的可以参考:MHA快速搭建
我们自己使用其实也可以使用1主1从,但是master主机宕机后无法切换,以及无法补全binlog。master的mysqld进程crash后,还是可以切换成功,以及补全binlog的。
官方介绍:https://code.google.com/p/mysql-master-ha/
工作原理:
(1)从宕机崩溃的master保存二进制日志事件(binlog events);
(2)识别含有最新更新的slave;
(3)应用差异的中继日志(relay log)到其他的slave;
(4)应用从master保存的二进制日志事件(binlog events);
(5)提升一个slave为新的master;
(6)使其他的slave连接新的master进行复制;
一.环境介绍:
【server5】:主库 172.25.39.5
【server6】:从库 172.25.39.6
【server8】:从库 172.25.39.8
三个数据库搭建GTID主从复制
二.部署MHA:
1.编辑【server5】/etc/masterha/apple.cnf 文件,添加MHA配置:
[server default]
manager_workdir=/etc/masterha/ //设置manager的工作目录
manager_log=/etc/masterha/app1.log //设置manager的日志
master_binlog_dir=/var/lib/mysql //设置master 保存binlog的位置,以便MHA可以找到master的日志,我这里的也就是mysql的数据目录
#master_ip_failover_script= /usr/local/bin/master_ip_failover //设置自动failover时候的切换脚本
#master_ip_online_change_script= /usr/local/bin/master_ip_online_change //设置手
动切换时候的切换脚本
password=Xa85215295## //设置mysql中root用户的密码,这个密码是前文中创建>监控用户的那个密码
user=root 设置监控用户root
ping_interval=1 //设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三>次没有回应的时候自动进行railover
remote_workdir=/tmp //设置远端mysql在发生切换时binlog的保存位置
repl_password=Xa85215295## //设置复制用户的密码
repl_user=repl //设置复制环境中的复制用户名
#report_script=/usr/local/send_report //设置发生切换后发送的报警的脚本
#secondary_check_script= /usr/local/bin/masterha_secondary_check -s server03 -s server02
shutdown_script="" //设置故障发生后关闭故障主机脚本(该脚本的主要作用是关闭
主机放在发生脑裂,这里没有使用)
ssh_user=root //设置ssh的登录用户名
[server5]
hostname=172.25.39.5
port=3306
[server6]
hostname=172.25.39.6
port=3306
candidate_master=1 //设置为候选master,如果设置该参数以后,发生主从切换以后将>会将此从库提升为主库,即使这个主库不是集群中事件最新的slave
check_repl_delay=0 //默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master,因为对于这个slave的恢复需要花费很长时间,
通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时
,这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程
中一定是新的master
[server8]
hostname=172.25.39.8
port=3306
2.配置免密连接:
(1)【server5】生成密钥
[root@server5 ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
1b:a5:82:cc:70:8b:53:90:85:d2:0a:31:ce:0c:fa:8e root@server5
The key's randomart image is:
+--[ RSA 2048]----+
|+o.+. |
|O.+. |
|o*. o . |
|.. B o o |
| + = . S |
| o . . o |
|E . . |
| |
| |
+-----------------+
(2)发送到各个节点
[root@server5 ~]# ssh-copy-id 172.25.39.5
The authenticity of host '172.25.39.5 (172.25.39.5)' can't be established.
RSA key fingerprint is ce:b7:35:21:60:9f:f3:8d:f4:25:af:73:ad:ad:bc:ab.
Are you sure you want to continue connecting (yes/no)? yes
[root@server5 ~]# scp -r .ssh/ 172.25.39.6:
The authenticity of host '172.25.39.6 (172.25.39.6)' can't be established.
RSA key fingerprint is ce:b7:35:21:60:9f:f3:8d:f4:25:af:73:ad:ad:bc:ab.
Are you sure you want to continue connecting (yes/no)? yes
[root@server5 ~]# scp -r .ssh/ 172.25.39.8:
The authenticity of host '172.25.39.8 (172.25.39.8)' can't be established.
RSA key fingerprint is ce:b7:35:21:60:9f:f3:8d:f4:25:af:73:ad:ad:bc:ab.
Are you sure you want to continue connecting (yes/no)? yes
(3)进行免密测试:
【server5】 其他节点做相同测试
[root@server5 ~]# ssh 172.25.39.5
Last login: Sat Aug 11 09:40:18 2018 from 172.25.39.250
[root@server5 ~]# logout
Connection to 172.25.39.5 closed.
[root@server5 ~]# ssh 172.25.39.6
Last login: Sat Aug 11 09:40:27 2018 from 172.25.39.250
[root@server6 ~]# logout
Connection to 172.25.39.6 closed.
[root@server5 ~]# ssh 172.25.39.8
Last login: Sat Aug 11 10:11:54 2018 from 172.25.39.250
[root@server8 ~]# logout
Connection to 172.25.39.8 closed.
3.检测【server5】与【server6】【server8】的SSH连接状态:
[root@server5 ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Sat Aug 11 11:17:37 2018 - [debug] Connecting via SSH from root@172.25.39.8(172.25.39.8:22) to root@172.25.39.5(172.25.39.5:22)..
Sat Aug 11 11:17:37 2018 - [debug] ok.
Sat Aug 11 11:17:37 2018 - [debug] Connecting via SSH from root@172.25.39.8(172.25.39.8:22) to root@172.25.39.6(172.25.39.6:22)..
Sat Aug 11 11:17:37 2018 - [debug] ok.
Sat Aug 11 11:17:38 2018 - [info] All SSH connection tests passed successfully.
4.【server5】添加root权限:
mysql> grant all on *.* to root@'%' identified by 'Xa85215295##';
Query OK, 0 rows affected, 1 warning (0.04 sec)
5.检测通过masterha_check_repl脚本整个集群的状态:
[root@server5 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Sat Aug 11 11:33:56 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Aug 11 11:33:56 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sat Aug 11 11:33:56 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf..
MySQL Replication Health is OK.
##NOT OK!!!时检测报错:
【server6】
mysql> set GLOBAL read_only=1;
Query OK, 0 rows affected (0.00 sec)
三.进行master主机切换测试:
1.手动切换master主机:
(1)把【server5】主机切换为【server6】主机
[root@server5 ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=172.25.39.6 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
Sat Aug 11 12:09:31 2018 - [info] * Phase 5: New master cleanup phase..
Sat Aug 11 12:09:31 2018 - [info]
Sat Aug 11 12:09:31 2018 - [info] 172.25.39.6: Resetting slave info succeeded.
Sat Aug 11 12:09:31 2018 - [info] Switching master to 172.25.39.6(172.25.39.6:3306) completed successfully.
##有successfully表示切换成功
(2)在【server5】与【server8】上进行查看
【server5】
[root@server5 ~]# mysql -pXa85215295## ##登陆数据库
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.25.39.6 ##切换成功
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000003
Read_Master_Log_Pos: 1724
Relay_Log_File: server5-relay-bin.000003
Relay_Log_Pos: 841
Relay_Master_Log_File: binlog.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes ##连接状态成功
【server8】
[root@server5 ~]# mysql -pXa85215295##
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.25.39.6
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000001
Read_Master_Log_Pos: 450
Relay_Log_File: server8-relay-bin.000003
Relay_Log_Pos: 657
Relay_Master_Log_File: binlog.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
如果不是两个yes,则:
【server6】添加授权
mysql> reset master;
Query OK, 0 rows affected (0.22 sec)
mysql> grant all on *.* to root@'172.25.39.%' identified by 'Xa85215295##';
Query OK, 0 rows affected, 1 warning (0.05 sec)
【server8】
mysql> stop slave;
Query OK, 0 rows affected (0.04 sec)
mysql> reset slave;
Query OK, 0 rows affected (0.35 sec)
mysql> reset master;
Query OK, 0 rows affected (0.27 sec)
mysql> start slave;
Query OK, 0 rows affected (0.38 sec)
再次查看
2.自动切换master主机:
(1)在manager机上将master进程打入后台
[root@server5 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --ignore_last_failover &
[1] 2956
[root@server5 ~]# nohup: ignoring input and appending output to `nohup.out'
(2)结束掉【server6】master主机
[root@server6 ~]# ps ax ##查看进程
2549 pts/0 S 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket
2846 pts/0 Sl 0:00 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plu
2894 pts/0 R+ 0:00 ps ax
[root@server6 ~]# kill -9 2549
[root@server6 ~]# kill -9 2846
(3)打开【server6】数据库登陆
[root@server6 ~]# /etc/init.d/mysqld start
Starting mysqld: [ OK ]
[root@server6 ~]# mysql -pXa85215295##
(3)用【server6】重新连接【server5】master主机
mysql> change master to master_host='172.25.39.5', master_user='repl', master_password='Xa85215295##', master_auto_position=1;
mysql> stop slave;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> reset slave;
Query OK, 0 rows affected (0.27 sec)
mysql> reset master;
Query OK, 0 rows affected (0.12 sec)
mysql> start slave;
Query OK, 0 rows affected (0.07 sec)
(4)在【server5】master主机查看master进程状态
mysql> show master status;
+---------------+----------+--------------+------------------+----------------------------------------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+----------------------------------------------------------------------------------+
| binlog.000001 | 1645 | | | d7c9bd1d-9d07-11e8-b333-525400f9cbe2:1,
e6d178a2-9d09-11e8-83df-525400b9273c:1-5 |
+---------------+----------+--------------+------------------+----------------------------------------------------------------------------------+
1 row in set (0.00 sec)
(5)在【server6】【server8】查看连接状态
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.25.39.5 ##主机master已经改变
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binlog.000001
Read_Master_Log_Pos: 1645
Relay_Log_File: server6-relay-bin.000002
Relay_Log_Pos: 651
Relay_Master_Log_File: binlog.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes ##连接成功
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 1645
Relay_Log_Space: 860
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 5
Master_UUID: d7c9bd1d-9d07-11e8-b333-525400f9cbe2
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: d7c9bd1d-9d07-11e8-b333-525400f9cbe2:1
Executed_Gtid_Set: d7c9bd1d-9d07-11e8-b333-525400f9cbe2:1
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
(4)在【sever5】主机添加数据,在【server6】【server8】查看数据
mysql> use westos
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> insert into userlist values('user3','333');
Query OK, 1 row affected (0.08 sec)
mysql> insert into userlist values('user4','444');
Query OK, 1 row affected (0.21 sec)
mysql> select * from userlist;
+----------+----------+
| username | password |
+----------+----------+
| user1 | 111 |
| user2 | 222 |
| user3 | 333 |
| user4 | 444 |
+----------+----------+
4 rows in set (0.00 sec)
加入VIP主机连接测试:
1.【server5】中修改配置文件:
(1)app1.cnf打开【master_ip_failover】【master_ip_online_change】脚本的注释行
[root@server5 masterha]# pwd
/etc/masterha
[root@server5 masterha]# vim app1.cnf
(2)修改【master_ip_failover】脚本
(3)修改【master_ip_online_change】脚本
2.移动脚本,加可执行权限:
[root@server5 MHA]# mv master_ip_* /usr/local/bin/
[root@server5 MHA]# cd /usr/local/bin/
[root@server5 bin]# ls
master_ip_failover master_ip_online_change
[root@server5 bin]# chmod +x *
[root@server5 bin]# ll
total 8
-rwxr-xr-x 1 root root 2172 Aug 11 15:46 master_ip_failover
-rwxr-xr-x 1 root root 3847 Aug 11 15:48 master_ip_online_change
3.【server5】此时为master主机:
[root@server5 ~]# ip addr add 172.25.39.100/24 dev eth0 #加入vip
[root@server5 ~]# mv master_ip_* /usr/local/bin
[root@server5 ~]# cd /usr/local/bin
[root@server5 bin]# chmod +x *
[root@server5 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --ignore_last_failover & #将进程打入后台
4.连接vip服务端,进行测试:
[root@foundation39 Desktop]# ssh root@172.25.39.100
mysql> use westos
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select * from userlist;
+----------+----------+
| username | password |
+----------+----------+
| user1 | 111 |
| user2 | 222 |
| user3 | 333 |
| user4 | 444 |
+----------+----------+
mysql> insert into userlist values ('user5','555'); #添加信息
5.在【server6】【server8】上查看建立的信息:
mysql> use westos
mysql> select * from userlist;
+----------+----------+
| username | password |
+----------+----------+
| user1 | 111 |
| user2 | 222 |
| user3 | 333 |
| user4 | 444 |
| user5 | 555 |
+----------+----------+
5 rows in set (0.00 sec)
6.【server5】结束数据库端,自动跳转其他master端
[root@server5 ~]# ps ax
2333 pts/1 S 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket
2626 pts/1 Sl 0:14 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plu
3053 ? S 0:00 pickup -l -t fifo -u
[root@server5 masterha]# kill -9 2333
[root@server5 masterha]# kill -9 2626
7.【server5】 master切换到server6主机上:
[root@server6 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 52:54:00:b9:27:3c brd ff:ff:ff:ff:ff:ff
inet 172.25.39.6/24 brd 172.25.39.255 scope global eth0
inet 172.25.39.100/24 scope global secondary eth0
inet6 fe80::5054:ff:feb9:273c/64 scope link
valid_lft forever preferred_lft forever
查看server5的VIP已经漂移到server6的主机上,表示master已经切换到了server6主机上
8.【server5】将server5主机连接到master上:
[root@server5 ~]# /etc/init.d/mysqld start
[root@server5 ~]# mysql -p
mysql> change master to master_host='172.25.39.6', master_user='repl', master_password='Xa85215295##', master_auto_position=1;
mysql> use westos;
mysql> select * from userlist;
mysql> select * from userlist;
+----------+----------+
| username | password |
+----------+----------+
| user1 | 111 |
| user2 | 222 |
| user3 | 333 |
| user4 | 444 |
| user5 | 555 |
+----------+----------+
5 rows in set (0.00 sec)