MySQL高可用之MHA
MHA 是由日本人 yoshinorim(原就职于DeNA现就职于FaceBook)开发的比较成熟的 MySQL 高可用方案。MHA 能够在30秒内实现故障切换,并能在故障切换中,最大可能的保证数据一致性。目前淘宝也正在开发相似产品 TMHA, 目前已支持一主一从。(本文不对理论做过多的叙述,直接演示配置)
实验环境
192.168.186.161 manager
192.168.186.162 master
192.168.186.163 slaver1
192.168.186.164 slaver2
实验准备
所有机器做好相互解析,传公钥,做好时间对时。【基础操作,不赘述。】
安装mysql(除了manager节点之外的所有节点)
#配置一个安装mysql的源
[root@master ~]# cd /etc/yum.repos.d/
[root@master yum.repos.d]# vim mysql5.6.repo
[mysql5.6]
name=mysql-5.6-community
baseurl=http://repo.mysql.com/yum/mysql-5.6-community/el/7/x86_64/
enabled=1
gpgcheck=1
gpgkey=http://repo.mysql.com/RPM-GPG-KEY-mysql
[root@master yum.repos.d]# yum -y install mysql-community-server
修改配置文件
#master
[root@master ~]# vim /etc/my.cnf
[mysqld]
server-id = 1
log-bin = master-log
relay-log = relay-log
#slaver1
[root@slave1 ~]# vim /etc/my.cnf
[mysqld]
server-id = 2
relay-log = relay-log
log-bin = master-log
read_only = ON
relay_log_purge = 0
log_slave_updates = 1
#slaver2
[root@slave2 ~]# vim /etc/my.cnf
[mysqld]
server-id = 3
relay-log = relay-log
log-bin = master-log
read_only = ON
relay_log_purge = 0
log_slave_updates = 1
#启动所有数据库服务
[root@master ~]# systemctl start mysql.service
配置一主多从复制
在所有节点上执行以下语句
mysql > delete from mysql.user where user="";
mysql > grant replication slave,replication client on *.* to 'slave'@'192.168.%.%' identified by '123';
mysql > grant all on *.* to 'mha'@'192.168.%.%' identified by '123';
查看master的同步起点
mysql> show master status;
+-------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+-------------------+----------+--------------+------------------+-------------------+
| master-log.000003 | 349 | | | |
+-------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
所有slave节点执行以下语句
mysql> change master to master_host='192.168.186.162',
-> master_user='slave',
-> master_password='123',
-> master_log_file='master-log.000003',
-> master_log_pos=349;
Query OK, 0 rows affected, 2 warnings (0.03 sec)
mysql> start slave;
Query OK, 0 rows affected (0.01 sec)
mysql> show slave status\G; 【看到两个线程正常即可】
...
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
...
安装配置MHA
Manager节点需要另外多安装一个包。具体需要安装的内容如下:
四个节点都需安装:mha4mysql-node-0.56-0.el6.norch.rpm
Manager 节点另需要安装:mha4mysql-manager-0.56-0.el6.noarch.rpm
下载地址:https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads
下载地址:https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads
[root@manager ~]# yum -y install ./mha4mysql-node-0.56-0.el6.noarch.rpm
[root@manager ~]# yum -y install ./mha4mysql-manager-0.56-0.el6.noarch.rpm
[root@master ~]# yum -y install ./mha4mysql-node-0.56-0.el6.noarch.rpm
[root@slaver1 ~]# yum -y install ./mha4mysql-node-0.56-0.el6.noarch.rpm
[root@slaver2 ~]# yum -y install ./mha4mysql-node-0.56-0.el6.noarch.rpm
初始化 MHA ,进行配置
[root@manager ~]# mkdir /etc/mha_master
[root@manager ~]# vim /etc/mha_master/mha.cnf
[server default]
user=mha
password=123
manager_workdir=/etc/mha_master/app1
manager_log=/etc/mha_master/manager.log
remote_workdir=/mydata/mha_master/app1
ssh_user=root
repl_user=slave
repl_password=123
ping_interval=1
[server1]
hostname=192.168.186.162
ssh_port=22
candidate_master=1
[server2]
hostname=192.168.186.163
ssh_port=22
candidate_master=1
[server3]
hostname=192.168.186.164
ssh_port=22
candidate_master=1
对四个节点进行检测
#1)检测各节点间 ssh 互信通信配置是否 ok
[root@manager ~]# masterha_check_ssh -conf=/etc/mha_master/mha.cnf
Tue Nov 10 19:34:39 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Nov 10 19:34:39 2020 - [info] Reading application default configuration from /etc/mha_master/mha.cnf..
Tue Nov 10 19:34:39 2020 - [info] Reading server configuration from /etc/mha_master/mha.cnf..
Tue Nov 10 19:34:39 2020 - [info] Starting SSH connection tests..
Tue Nov 10 19:34:40 2020 - [debug]
Tue Nov 10 19:34:39 2020 - [debug] Connecting via SSH from root@192.168.186.162(192.168.186.162:22) to root@192.168.186.163(192.168.186.163:22)..
Tue Nov 10 19:34:39 2020 - [debug] ok.
Tue Nov 10 19:34:39 2020 - [debug] Connecting via SSH from root@192.168.186.162(192.168.186.162:22) to root@192.168.186.164(192.168.186.164:22)..
Tue Nov 10 19:34:39 2020 - [debug] ok.
Tue Nov 10 19:34:40 2020 - [debug]
Tue Nov 10 19:34:39 2020 - [debug] Connecting via SSH from root@192.168.186.163(192.168.186.163:22) to root@192.168.186.162(192.168.186.162:22)..
Tue Nov 10 19:34:39 2020 - [debug] ok.
Tue Nov 10 19:34:39 2020 - [debug] Connecting via SSH from root@192.168.186.163(192.168.186.163:22) to root@192.168.186.164(192.168.186.164:22)..
Tue Nov 10 19:34:40 2020 - [debug] ok.
Tue Nov 10 19:34:41 2020 - [debug]
Tue Nov 10 19:34:40 2020 - [debug] Connecting via SSH from root@192.168.186.164(192.168.186.164:22) to root@192.168.186.162(192.168.186.162:22)..
Tue Nov 10 19:34:40 2020 - [debug] ok.
Tue Nov 10 19:34:40 2020 - [debug] Connecting via SSH from root@192.168.186.164(192.168.186.164:22) to root@192.168.186.163(192.168.186.163:22)..
Tue Nov 10 19:34:40 2020 - [debug] ok.
Tue Nov 10 19:34:41 2020 - [info] All SSH connection tests passed successfully.
#2)检查管理的MySQL复制集群的连接配置参数是否OK
[root@manager ~]# masterha_check_repl -conf=/etc/mha_master/mha.cnf
Tue Nov 10 19:39:14 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Nov 10 19:39:14 2020 - [info] Reading application default configuration from /etc/mha_master/mha.cnf..
Tue Nov 10 19:39:14 2020 - [info] Reading server configuration from /etc/mha_master/mha.cnf..
Tue Nov 10 19:39:14 2020 - [info] MHA::MasterMonitor version 0.56.
Tue Nov 10 19:39:15 2020 - [info] GTID failover mode = 0
Tue Nov 10 19:39:15 2020 - [info] Dead Servers:
Tue Nov 10 19:39:15 2020 - [info] Alive Servers:
Tue Nov 10 19:39:15 2020 - [info] 192.168.186.162(192.168.186.162:3306)
Tue Nov 10 19:39:15 2020 - [info] 192.168.186.163(192.168.186.163:3306)
Tue Nov 10 19:39:15 2020 - [info] 192.168.186.164(192.168.186.164:3306)
Tue Nov 10 19:39:15 2020 - [info] Alive Slaves:
Tue Nov 10 19:39:15 2020 - [info] 192.168.186.163(192.168.186.163:3306) Version=5.6.50-log (oldest major version between slaves) log-bin:enabled
Tue Nov 10 19:39:15 2020 - [info] Replicating from 192.168.186.162(192.168.186.162:3306)
Tue Nov 10 19:39:15 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Nov 10 19:39:15 2020 - [info] 192.168.186.164(192.168.186.164:3306) Version=5.6.50-log (oldest major version between slaves) log-bin:enabled
Tue Nov 10 19:39:15 2020 - [info] Replicating from 192.168.186.162(192.168.186.162:3306)
Tue Nov 10 19:39:15 2020 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Nov 10 19:39:15 2020 - [info] Current Alive Master: 192.168.186.162(192.168.186.162:3306)
Tue Nov 10 19:39:15 2020 - [info] Checking slave configurations..
Tue Nov 10 19:39:15 2020 - [info] Checking replication filtering settings..
Tue Nov 10 19:39:15 2020 - [info] binlog_do_db= , binlog_ignore_db=
Tue Nov 10 19:39:15 2020 - [info] Replication filtering check ok.
Tue Nov 10 19:39:15 2020 - [info] GTID (with auto-pos) is not supported
Tue Nov 10 19:39:15 2020 - [info] Starting SSH connection tests..
Tue Nov 10 19:39:18 2020 - [info] All SSH connection tests passed successfully.
Tue Nov 10 19:39:18 2020 - [info] Checking MHA Node version..
Tue Nov 10 19:39:19 2020 - [info] Version check ok.
Tue Nov 10 19:39:19 2020 - [info] Checking SSH publickey authentication settings on the current master..
Tue Nov 10 19:39:19 2020 - [info] HealthCheck: SSH to 192.168.186.162 is reachable.
Tue Nov 10 19:39:20 2020 - [info] Master MHA Node version is 0.56.
Tue Nov 10 19:39:20 2020 - [info] Checking recovery script configurations on 192.168.186.162(192.168.186.162:3306)..
Tue Nov 10 19:39:20 2020 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/mydata/mha_master/app1/save_binary_logs_test --manager_version=0.56 --start_file=master-log.000003
Tue Nov 10 19:39:20 2020 - [info] Connecting to root@192.168.186.162(192.168.186.162:22)..
Creating /mydata/mha_master/app1 if not exists.. Creating directory /mydata/mha_master/app1.. done.
ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to master-log.000003
Tue Nov 10 19:39:20 2020 - [info] Binlog setting check done.
Tue Nov 10 19:39:20 2020 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Tue Nov 10 19:39:20 2020 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.186.163 --slave_ip=192.168.186.163 --slave_port=3306 --workdir=/mydata/mha_master/app1 --target_version=5.6.50-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Tue Nov 10 19:39:20 2020 - [info] Connecting to root@192.168.186.163(192.168.186.163:22)..
Creating directory /mydata/mha_master/app1.. done.
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to relay-log.000002
Temporary relay log file is /var/lib/mysql/relay-log.000002
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Tue Nov 10 19:39:20 2020 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.186.164 --slave_ip=192.168.186.164 --slave_port=3306 --workdir=/mydata/mha_master/app1 --target_version=5.6.50-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Tue Nov 10 19:39:20 2020 - [info] Connecting to root@192.168.186.164(192.168.186.164:22)..
Creating directory /mydata/mha_master/app1.. done.
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to relay-log.000002
Temporary relay log file is /var/lib/mysql/relay-log.000002
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Tue Nov 10 19:39:20 2020 - [info] Slaves settings check done.
Tue Nov 10 19:39:20 2020 - [info]
192.168.186.162(192.168.186.162:3306) (current master)
+--192.168.186.163(192.168.186.163:3306)
+--192.168.186.164(192.168.186.164:3306)
Tue Nov 10 19:39:20 2020 - [info] Checking replication health on 192.168.186.163..
Tue Nov 10 19:39:20 2020 - [info] ok.
Tue Nov 10 19:39:20 2020 - [info] Checking replication health on 192.168.186.164..
Tue Nov 10 19:39:21 2020 - [info] ok.
Tue Nov 10 19:39:21 2020 - [warning] master_ip_failover_script is not defined.
Tue Nov 10 19:39:21 2020 - [warning] shutdown_script is not defined.
Tue Nov 10 19:39:21 2020 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
启动 MHA
[root@manager ~]# nohup masterha_manager -conf=/etc/mha_master/mha.cnf &> /etc/mha_master/manager.log &
[1] 11924
[root@manager ~]# masterha_check_status -conf=/etc/mha_master/mha.cnf
mha (pid:11924) is running(0:PING_OK), master:192.168.186.162
测试 MHA 故障转移
先创建一些数据,测试主从集群是没有问题
#主节点写入
mysql> create database if not exists MHA default character set='utf8';
Query OK, 1 row affected (0.00 sec)
mysql> use MHA;
Database changed
mysql> create table test (
-> id int,
-> name varchar(20)
-> );
Query OK, 0 rows affected (0.01 sec)
mysql> insert into test values ( 1,'jack' );
Query OK, 1 row affected (0.00 sec)
mysql> insert into test values ( 2,'jerry' );
Query OK, 1 row affected (0.00 sec)
mysql> insert into test values ( 3,'tom' );
Query OK, 1 row affected (0.00 sec)
#从节点查询
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| MHA |
| mysql |
| performance_schema |
+--------------------+
4 rows in set (0.00 sec)
mysql> use MHA;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select * from test;
+------+-------+
| id | name |
+------+-------+
| 1 | jack |
| 2 | jerry |
| 3 | tom |
+------+-------+
3 rows in set (0.01 sec)
在 master 节点关闭 mysql服务,模拟主节点数据崩溃
[root@master ~]# systemctl stop mysql.service
[root@master ~]# rm -rf /var/lib/mysql/*
在 manger 节点查看日志
[root@manager ~]# tail -20 /etc/mha_master/manager.log
Tue Nov 10 19:50:45 2020 - [info] Master failover to 192.168.186.163(192.168.186.163:3306) completed successfully.
Tue Nov 10 19:50:45 2020 - [info]
----- Failover Report -----
mha: MySQL Master failover 192.168.186.162(192.168.186.162:3306) to 192.168.186.163(192.168.186.163:3306) succeeded
Master 192.168.186.162(192.168.186.162:3306) is down!
Check MHA Manager logs at manager:/etc/mha_master/manager.log for details.
Started automated(non-interactive) failover.
The latest slave 192.168.186.163(192.168.186.163:3306) has all relay logs for recovery.
Selected 192.168.186.163(192.168.186.163:3306) as a new master.
192.168.186.163(192.168.186.163:3306): OK: Applying all logs succeeded.
192.168.186.164(192.168.186.164:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.186.164(192.168.186.164:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.186.163(192.168.186.163:3306)
192.168.186.163(192.168.186.163:3306): Resetting slave info succeeded.
Master failover to 192.168.186.163(192.168.186.163:3306) completed successfully.
[1]+ Done nohup masterha_manager -conf=/etc/mha_master/mha.cnf &>/etc/mha_master/manager.log
#表示 manager 检测到192.168.186.162节点故障, 而后自动执行故障转移, 将192.168.186.163提升为主节点。注意,故障转移完成后, manager将会自动停止。
在192.168.186.164进行查看
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.186.163
Master_User: slave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-log.000003
Read_Master_Log_Pos: 2493
Relay_Log_File: relay-log.000002
Relay_Log_Pos: 284
Relay_Master_Log_File: master-log.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
.........
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| MHA |
| mysql |
| performance_schema |
+--------------------+
4 rows in set (0.00 sec)
mysql> use MHA;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select * from test;
+------+-------+
| id | name |
+------+-------+
| 1 | jack |
| 2 | jerry |
| 3 | tom |
+------+-------+
3 rows in set (0.00 sec)
实验成功!