听说很多大公司都在使用mysql ha(mysql-master-ha)的环境,下面,我也试着搭建测试一下。
一.MHA原理及其特点:
1.原理:
MHA在主服务器发生故障时,自动实现故障转移,快速将从服务器晋级为主服务器(通常在10-30s),而不影响复制的一致性,适用于任何存储引擎
MHA 提供在线服务器切换,将从服务器器提升为主服务器,时间在0.5-2s,这段时间,数据是无法写入的。
MHA manager 是通过ssh 连接到 mysql slave服务器上
MHA 每次从宕掉的主服务器上保存二进制日志,但不是每次都成功(主服务器硬件故障或无法通过ssh访问)
mysql 5.5支持半同步复制,MHA与半同步复制结合,可以大大降低数据丢失的风险,
2.特性:
(1).主服务器的自动监控和故障转移
(2).交互式主服务器故障迁移(使用MHA实现故障迁移,不监控主服务器,出现故障,手工调用MHA切换)
(3).非交互式的主故障迁移(不监控主服务器,但自动实现故障转移)
(4).在线切换主服务器
注: MHA 可以应用于任何复制结构(此点待定)
二.MHA的前提条件:
1.搭建的前提条件:
(1).mysql 主从复制已配置好(master-slave)
(2).4台server 设置SSH公钥免密码登录
(3).操作系统为linux系统(MHA 只支持linux系统)
(4).单台可写主服务器和多台从服务器或只读服务器
(5).mysql 版本为5.0或以上版本
(6).候选主服务器的log-bin必须开启
(7).
所有服务器上的二进制日志和中级日志过滤规则必须相同
(8).候选服务器上的复制用户必须存在
(9).
保留中继日志和定期清理
参考文章:http://www.vmcd.org/2012/04/mysql-high-availability-mha/
2.MHA的基本说明:
MHA由node 和 manage 组成,
node 安装在每一台mysql服务器上(主从上都要装),
manage端运行在独立服务器上,也需要安装node节点
3.MHA 切换过程:
(1).监控和故障转移过程
检测复制设置和确定当前主服务器
监控主服务器
检测主服务器当掉
再次检测从服务器配置
关闭当掉的主服务器(可选)
恢复一个新的主服务器
激活新的主服务器
恢复其余的从服务器
告警(可选)
(2).在线切换过程
检测复制设置和确定当前主服务器
确定新的主服务器
阻塞写入到当前主服务器
等待所有从服务器赶上复制
授予写入到新的主服务器
重新设置从服务器
三.MHA 部署实例:
主DB:192.168.1.101
从DB1:192.168.1.102
从DB2:192.168.1.103
MHA管理端:192.168.1.100
系统版本:red hat linux 6
数据库版本:mysql 5.5
注:默认mysql主从配置已经配置完毕
1.对mysql的主从进行半同步配置、从上建立复制的用户和授予相应的权限、日志清除设置
- # 所有mysql数据库服务器,安装半同步插件(semisync_master.so,semisync_slave.so)
- mysql> install plugin rpl_semi_sync_master soname 'semisync_master.so';
- mysql> install plugin rpl_semi_sync_slave soname 'semisync_slave.so';
- # 注:其实master是安装在主,slave是安装在从上,但是我们是涉及从升级为主,所以主从都装
- # 卸载插件使用命令:
- # mysql> uninstall plugin rpl_semi_sync_slave;
- # mysql> uninstall plugin rpl_semi_sync_master;
- # 修改所有数据库服务器的参数文件(/etc/my.cnf),添加内容如下:
- [root@client101-103 ~]# vim /etc/my.cnf
- rpl_semi_sync_master_enabled=1
- rpl_semi_sync_master_timeout=1000
- rpl_semi_sync_slave_enabled=1
- relay_log_purge=0 # 此参数将定期清除中继日志功能关闭
- # 重启数据库,查看同步状态和半同步参数、状态
- [root@client101-103 ~]# /etc/init.d/mysql restart
- # 查看半同步参数:
- mysql> show variables like '%sync%';
- # 查看半同步状态:
- mysql> show status like '%sync%';
- # 有几个状态参数值得关注的:
- rpl_semi_sync_master_status:显示主服务是异步复制模式还是半同步复制模式
- rpl_semi_sync_master_clients:显示有多少个从服务器配置为半同步复制模式
- rpl_semi_sync_master_yes_tx:显示从服务器确认成功提交的数量
- rpl_semi_sync_master_no_tx:显示从服务器确认不成功提交的数量
- rpl_semi_sync_master_tx_avg_wait_time:事务因开启semi_sync,平均需要额外等待的时间
- rpl_semi_sync_master_net_avg_wait_time:事务进入等待队列后,到网络平均等待时间
- # 两台从数据库建立用于复制的用户和密码
- mysql> GRANT REPLICATION SLAVE ON *.* TO 'slave'@'192.168.1.%' IDENTIFIED BY 'slave';
- mysql> flush privileges;
2.4台server建立ssh无密码验证登陆
- 注:ssh-keygen -t rsa:
- # manage 端
- [root@client100 ~]# ssh-keygen -t rsa
- [root@client100 ~]# ssh-copy-id root@192.168.1.101
- [root@client100 ~]# ssh-copy-id root@192.168.1.102
- [root@client100 ~]# ssh-copy-id root@192.168.1.103
- # 主DB
- [root@client101 ~]# ssh-keygen -t rsa
- [root@client101 ~]# ssh-copy-id root@192.168.1.100
- [root@client101 ~]# ssh-copy-id root@192.168.1.102
- [root@client101 ~]# ssh-copy-id root@192.168.1.103
- # 从DB1
- [root@client102 ~]# ssh-keygen -t rsa
- [root@client102 ~]# ssh-copy-id root@192.168.1.100
- [root@client102 ~]# ssh-copy-id root@192.168.1.102
- [root@client102 ~]# ssh-copy-id root@192.168.1.103
- # 从DB2
- [root@client103 ~]# ssh-keygen -t rsa
- [root@client103 ~]# ssh-copy-id root@192.168.1.100
- [root@client103 ~]# ssh-copy-id root@192.168.1.101
- [root@client103 ~]# ssh-copy-id root@192.168.1.102
3.在3台数据库节点安装MHA的node节点:
- # 如果安装下面包,报依赖关系错,请先安装mysql-share-compat包
- # 先安装下面的 perl-dbd-mysql包
- # 在下面执行perl时,如果出现报错,需要安装如下这几个perl包: perl-devel perl-CPAN
- ## 在192.168.1.101上执行如下操作
- [root@client101 ~]# yum -y install perl-DBD-MySQL
- [root@client101 ~]# tar -xf mha4mysql-node-0.54.tar.gz
- [root@client101 ~]# cd mha4mysql-node-0.54
- [root@client101 mha4mysql-node-0.54]# perl Makefile.PL
- [root@client101 mha4mysql-node-0.54]# make && make install
- # 在192.168.1.102上执行如下操作
- [root@client102 ~]# yum -y install perl-DBD-MySQL
- [root@client102 ~]# tar -xf mha4mysql-node-0.54.tar.gz
- [root@client102 ~]# cd mha4mysql-node-0.54
- [root@client102 mha4mysql-node-0.54]# perl Makefile.PL
- [root@client102 mha4mysql-node-0.54]# make && make install
- # 在192.168.1.103上执行如下操作
- [root@client103 ~]# yum -y install perl-DBD-MySQL
- [root@client103 ~]# tar -xf mha4mysql-node-0.54.tar.gz
- [root@client103 ~]# cd mha4mysql-node-0.54
- [root@client103 mha4mysql-node-0.54]# perl Makefile.PL
- [root@client103 mha4mysql-node-0.54]# make && make install
4.管理节点安装MHA的manager和node(192.168.1.100):
- # 管理节点一定记得装mha4mysql-node-0.54.tar.gz 和 mha4mysql-manager-0.55.tar.gz
- # 管理节点,不装node 会报错的
- # 安装相关perl包(有些包,系统没有自带,需要自行下载安装)
- [root@client100 ~]# yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Config-IniFiles
- [root@client100 ~]# tar -xf mha4mysql-node-0.54.tar.gz
- [root@client100 ~]# cd mha4mysql-node-0.54
- [root@client100 mha4mysql-node-0.54]# perl Makefile.PL
- [root@client100 mha4mysql-node-0.54]# make && make install
- [root@client100 ~]# tar -xf mha4mysql-manager-0.55.tar.gz
- [root@client100 ~]# cd mha4mysql-manager-0.55
- [root@client100 mha4mysql-manager-0.55]# perl Makefile.PL
- [root@client100 mha4mysql-manager-0.55]# make && make install
5.编辑管理节点的配置文件
- # 建立配置文件存放目录,并将模板配置文件复制到新建目录
- [root@client100 ~]# mkdir /etc/mastermha
- [root@client100 ~]# mkdir -p /mastermha/app1
- [root@client100 ~]# cd /tmp/mha4mysql-manager-0.55
- [root@client100 mha4mysql-manager-0.55]# cp samples/conf/* /etc/mastermha/
- [root@client100 ~]# vim /etc/mastermha/app1.cnf
- [server default]
- manager_workdir=/masterha/app1
- manager_log=/masterha/app1/manager.log
- user=mha_mon
- password=123456
- ssh_user=root
- repl_user=slave
- repl_password=slave
- ping_interval=1
- shutdown_script=""
- master_ip_online_change_script=""
- report_script=""
- [server1]
- hostname=192.168.1.101
- master_binlog_dir=/var/lib/mysql/
- candidate_master=1
- [server2]
- hostname=192.168.1.102
- master_binlog_dir=/var/lib/mysql
- candidate_master=1
- [server3]
- hostname=192.168.1.103
- #master_binlog_dir=/var/lib/mysql
- no_master=1
6.在所有数据库节点的机器上进行授权操作(192.168.1.101/192.168.1.102/192.168.1.103)
- mysql> grant all privileges on *.* to 'mha_mon'@'192.168.1.%' identified by '123456';
- Query OK, 0 rows affected (1.00 sec)
- mysql> flush privileges;
7.测试ssh连接是否正常
- # 所有都为ok 即为正常
- [root@client100 mastermha]# masterha_check_ssh --global_conf=/etc/mastermha/masterha_default.cnf --conf=/etc/mastermha/app1.cnf
- Sun Mar 2 14:52:53 2014 - [info] Reading default configuratoins from /etc/mastermha/masterha_default.cnf..
- Sun Mar 2 14:52:53 2014 - [info] Reading application default configurations from /etc/mastermha/app1.cnf..
- Sun Mar 2 14:52:53 2014 - [info] Reading server configurations from /etc/mastermha/app1.cnf..
- Sun Mar 2 14:52:53 2014 - [info] Starting SSH connection tests..
- Sun Mar 2 14:53:04 2014 - [debug]
- Sun Mar 2 14:52:53 2014 - [debug] Connecting via SSH from root@192.168.1.102(192.168.1.102:22) to root@192.168.1.101(192.168.1.101:22)..
- Sun Mar 2 14:53:03 2014 - [debug] ok.
- Sun Mar 2 14:53:03 2014 - [debug] Connecting via SSH from root@192.168.1.102(192.168.1.102:22) to root@192.168.1.103(192.168.1.103:22)..
- Sun Mar 2 14:53:04 2014 - [debug] ok.
- Sun Mar 2 14:53:04 2014 - [debug]
- Sun Mar 2 14:52:54 2014 - [debug] Connecting via SSH from root@192.168.1.103(192.168.1.103:22) to root@192.168.1.101(192.168.1.101:22)..
- Sun Mar 2 14:53:04 2014 - [debug] ok.
- Sun Mar 2 14:53:04 2014 - [debug] Connecting via SSH from root@192.168.1.103(192.168.1.103:22) to root@192.168.1.102(192.168.1.102:22)..
- Sun Mar 2 14:53:04 2014 - [debug] ok.
- Sun Mar 2 14:54:53 2014 - [debug]
- Sun Mar 2 14:52:53 2014 - [debug] Connecting via SSH from root@192.168.1.101(192.168.1.101:22) to root@192.168.1.102(192.168.1.102:22)..
- Sun Mar 2 14:53:53 2014 - [debug] ok.
- Sun Mar 2 14:53:53 2014 - [debug] Connecting via SSH from root@192.168.1.101(192.168.1.101:22) to root@192.168.1.103(192.168.1.103:22)..
- Sun Mar 2 14:54:53 2014 - [debug] ok.
- Sun Mar 2 14:54:53 2014 - [info] All SSH connection tests passed successfully.
8.测试数据库之间的复制情况:
- # 如果正常点额话,下面会有ok显示
- [root@client100 mastermha]# masterha_check_repl --global_conf=/etc/mastermha/masterha_default.cnf --conf=/etc/mastermha/app1.cnf
- Sun Mar 2 15:10:16 2014 - [info] Reading default configuratoins from /etc/mastermha/masterha_default.cnf..
- Sun Mar 2 15:10:16 2014 - [info] Reading application default configurations from /etc/mastermha/app1.cnf..
- Sun Mar 2 15:10:16 2014 - [info] Reading server configurations from /etc/mastermha/app1.cnf..
- Sun Mar 2 15:10:16 2014 - [info] MHA::MasterMonitor version 0.55.
- Sun Mar 2 15:10:25 2014 - [info] Dead Servers:
- Sun Mar 2 15:10:25 2014 - [info] 192.168.1.101(192.168.1.101:3306)
- Sun Mar 2 15:10:25 2014 - [info] Alive Servers:
- Sun Mar 2 15:10:25 2014 - [info] 192.168.1.102(192.168.1.102:3306)
- Sun Mar 2 15:10:25 2014 - [info] 192.168.1.103(192.168.1.103:3306)
- Sun Mar 2 15:10:25 2014 - [info] Alive Slaves:
- Sun Mar 2 15:10:25 2014 - [info] 192.168.1.102(192.168.1.102:3306) Version=5.5.36-log (oldest major version between slaves) log-bin:enabled
- Sun Mar 2 15:10:25 2014 - [info] Replicating from 192.168.1.101(192.168.1.101:3306)
- Sun Mar 2 15:10:25 2014 - [info] Primary candidate for the new Master (candidate_master is set)
- Sun Mar 2 15:10:25 2014 - [info] 192.168.1.103(192.168.1.103:3306) Version=5.5.36-log (oldest major version between slaves) log-bin:enabled
- Sun Mar 2 15:10:25 2014 - [info] Replicating from 192.168.1.101(192.168.1.101:3306)
- Sun Mar 2 15:10:25 2014 - [warning] MySQL master is not currently alive!
- Sun Mar 2 15:10:25 2014 - [info] Checking slave configurations..
- Sun Mar 2 15:10:25 2014 - [info] read_only=1 is not set on slave 192.168.1.102(192.168.1.102:3306).
- Sun Mar 2 15:10:25 2014 - [warning] relay_log_purge=0 is not set on slave 192.168.1.102(192.168.1.102:3306).
- Sun Mar 2 15:10:25 2014 - [info] Checking replication filtering settings..
- Sun Mar 2 15:10:25 2014 - [info] Replication filtering check ok.
- Sun Mar 2 15:10:25 2014 - [info] Starting SSH connection tests..
- Sun Mar 2 15:10:26 2014 - [info] All SSH connection tests passed successfully.
- Sun Mar 2 15:10:26 2014 - [info] Checking MHA Node version..
- Sun Mar 2 15:10:26 2014 - [info] Version check ok.
- Sun Mar 2 15:10:26 2014 - [info] Getting current master (maybe dead) info ..
- Sun Mar 2 15:10:26 2014 - [info] Identified master is 192.168.1.101(192.168.1.101:3306).
- Sun Mar 2 15:10:26 2014 - [info] Checking SSH publickey authentication settings on the current master..
- Sun Mar 2 15:10:31 2014 - [warning] HealthCheck: Got timeout on checking SSH connection to 192.168.1.101! at /usr/local/share/perl5/MHA/HealthCheck.pm line 298.
- Sun Mar 2 15:10:31 2014 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
- Sun Mar 2 15:10:31 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha_mon' --slave_host=192.168.1.102 --slave_ip=192.168.1.102 --slave_port=3306 --workdir=/data/log/masterha --target_version=5.5.36-log --manager_version=0.55 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
- Sun Mar 2 15:10:31 2014 - [info] Connecting to root@192.168.1.102(192.168.1.102:22)..
- Creating directory /data/log/masterha.. done.
- Checking slave recovery environment settings..
- Opening /var/lib/mysql/relay-log.info ... ok.
- Relay log found at /var/lib/mysql, up to mysql-relay-bin.000007
- Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000007
- Testing mysql connection and privileges.. done.
- Testing mysqlbinlog output.. done.
- Cleaning up test file(s).. done.
- Sun Mar 2 15:10:31 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha_mon' --slave_host=192.168.1.103 --slave_ip=192.168.1.103 --slave_port=3306 --workdir=/data/log/masterha --target_version=5.5.36-log --manager_version=0.55 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
- Sun Mar 2 15:10:31 2014 - [info] Connecting to root@192.168.1.103(192.168.1.103:22)..
- Creating directory /data/log/masterha.. done.
- Checking slave recovery environment settings..
- Opening /var/lib/mysql/relay-log.info ... ok.
- Relay log found at /var/lib/mysql, up to mysql-relay-bin.000009
- Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000009
- Testing mysql connection and privileges.. done.
- Testing mysqlbinlog output.. done.
- Cleaning up test file(s).. done.
- Sun Mar 2 15:10:32 2014 - [info] Slaves settings check done.
- Sun Mar 2 15:10:32 2014 - [info]
- 192.168.1.101 (current master)
- +--192.168.1.102
- +--192.168.1.103
- Sun Mar 2 15:10:32 2014 - [info] Checking replication health on 192.168.1.102..
- Sun Mar 2 15:10:32 2014 - [info] ok.
- Sun Mar 2 15:10:32 2014 - [info] Checking replication health on 192.168.1.103..
- Sun Mar 2 15:10:32 2014 - [info] ok.
- Sun Mar 2 15:10:32 2014 - [warning] master_ip_failover_script is not defined.
- Sun Mar 2 15:10:32 2014 - [warning] shutdown_script is not defined.
- Sun Mar 2 15:10:32 2014 - [info] Got exit code 0 (Not master dead).
- MySQL Replication Health is OK.
9.启动管理节点进程:
- [root@client100 mastermha]#nohup masterha_manager --conf=/etc/mastermha/app1.cnf > /tmp/mha_manager.log </dev/null 2>&1 &
- # 可以用以下命令,查看mha 进程是否启动
- [root@client100 mastermha]# ps -ef |grep mha
- # 检查MHA状态:
- [root@client100 ~]# masterha_check_status --global_conf=/etc/mastermha/masterha_default.cnf --conf=/etc/mastermha/app1.cnf
- app1 (pid:2835) is running(0:PING_OK), master:192.168.1.101
- # 正常情况下,如果为ok,则mha已经配置成功
- # 也可以查看相关日志,/masterha/app1/manager.log 此日记记录详细切换过程
10.测试:
关闭目前为主的mysql,仔细查看日志,系统会自动进行切换
注:切换失败后,再次启动管理端,是启不来的,需要删除目录下相关app1.failover.complete,才可正常开启
各位自行测试,有问题,可以互相探讨。