mysql高可用MHA的搭建与测试
MHA分管理节点和数据库节点,数据库节点由mysql主从或者主主从构成,当主库挂掉后,管理节点会自动将从节点提升为主节点;管理节点的角色类似于oracle数据库中的fast start failover中的observer,但mha上层可以通过keepalive部署VIP,程序连接数据库使用VIP,从而实现后台数据库的故障切换透明化
MHA节点包含三个脚本,依赖perl模块;
save_binary_logs:保存和复制当掉的主服务器二进制日志;
apply_diff_relay_logs:识别差异的relay log事件,并应用于其他salve服务器;
purge_relay_logs:清除relay log文件;
需要在所有mysql服务器上安装MHA节点,MHA管理服务器也需要安装。MHA管理节点模块内部依赖MHA节点模块;
MHA管理节点通过ssh连接管理mysql服务器和执行MHA节点脚本。MHA节点依赖perl的DBD::mysql模块;
本文环境介绍,操作系统均为rhel5.4
在开始之前,请先配置好服务器间的时间同步和名称解析
规划
192.168.186.141 MYSQL.COM | 主: node |
192.168.186.142 SLAVE1.COM | 从1: node |
192.168.186.146 SLAVE2.COM | 从2: node |
192.168.186.144 MANAGER.COM | 管理机:manager node |
一:搭建mysql 主从复制,配置主机间的ssh互信(免密码登陆)
注意点:在主上创建复制用户的时候在从上也创建一遍搭建半同步复制:原因:使用半同步复制可以解决当主服务器当掉后,无法ssh到主服务器上保存尚未发送的二进制日志事件。
二:安装node与manager
安装perl环境(每台机器都需要做的)- [root@SLAVE2 data]# yum -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Config-IniFiles ncftp perl-Params-Validate perl-CPAN perl-Test-Mock-LWP.noarch perl-LWP-Authen-Negotiate.noarch perl-devel
- [root@SLAVE2 data]#yum install perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker 这个百度文档没有但是显然是要装的
安装node(每台机器都要做的)
- [root@MANAGER src]# tar -xf mha4mysql-node-0.56.tar.gz
- [root@MANAGER src]# cd mha4mysql-node-0.53
- [root@MANAGER mha4mysql-node-0.53]# perl Makefile.PL
- [root@MANAGER mha4mysql-node-0.53]# make && make install
安装manager(管理机器上要安装的,就是管理机器上同时存在node与manager )
点击(此处)折叠或打开
- [root@MANAGER src]# tar -xf mha4mysql-manager-0.56.tar.gz
- [root@MANAGER src]# cd mha4mysql-manager-0.53
- [root@MANAGER mha4mysql-manager-0.53]# perl Makefile.PL
- [root@MANAGER mha4mysql-manager-0.53]# make && make install
- 根据提示输入 如果中间有卡顿现象 直接crtl+c 然后继续会出现下载的进度条,说明是正常的
- [root@MANAGER src]# mkdir /etc/masterha
- [root@MANAGER mha]# mkdir -p /master/app1
- [root@MANAGERmha]# mkdir -p /scripts
- [root@MANAGER mha]# cp samples/conf/* /etc/masterha/ --拷贝配置文件
- [root@MANAGERmha]# cp samples/scripts/* /scripts
- [root@MANAGER mha4mysql-manager-0.53]# cp samples/conf/* /etc/masterha/
三:配置
- [root@MANAGER masterha]# vi app1.cnf
- 内容如下;
- [server default]
- manager_workdir=/masterha/app1
- manager_log=/masterha/app1/manager.log
- user=mha_mon 监控用户,每台机器上都要建的
- password=123
- ssh_user=root ssh无密码的用户
- repl_user=slave 做主从的用户 每台都要授权的
- repl_password=yunwei123 做主从的密码
- ping_interval=1
- shutdown_script=""
- report_script="" ###切换时发邮件
-
####master_ip_failover_script=/usr/local/bin/master_ip_failover ###自动切换
####master_ip_online_change_script=/usr/local/bin/master_ip_online_change_script ###手工切换,了解这两个脚本可参考http://blog.csdn.net/largetalk/article/details/10006899
####secondary_check_script= /usr/local/bin/masterha_secondary_check -s node1 -s node2 --user=root --master_host=node2 --master_ip=192.168.6.115 --master_port=3306
####//一旦MHA到node2(主)的监控之间出现问题,MHA Manager将会尝试从node1(从,主机名)登录到node2(主,主机名),检查更详细
- [server1]
- hostname=192.168.186.141
- master_binlog_dir=/data/mysql/data ####binlog目录
- candidate_master=1
-
- [server2]
- hostname=192.168.186.142
- master_binlog_dir=/data/mysql/data ####binlog目录
- candidate_master=1 ---备主,当master当掉之后会自动切换这台为主机
-
- [server3]
- hostname=192.168.186.146
- master_binlog_dir=/data/mysql/data ####binlog目录
- no_master=1 -----表示永不做主
-
- 保存退出!
- [root@MANAGER masterha]# >masterha_default.cnf
-
登入每台数据库
mysql> grant all privileges on *.* to mha_mon@'%' identified by '123'; Query OK, 0 rows affected (1.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.01 sec)
四:测试ssh,与复制
测试ssh- [root@MANAGER masterha]# masterha_check_ssh --global_conf=/etc/masterha/masterha_default.cnf --conf=/etc/masterha/app1.cnf
- Wed Jul 9 02:26:57 2014 - [info] Reading default configuratoins from /etc/masterha/masterha_default.cnf..
- Wed Jul 9 02:26:57 2014 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
- Wed Jul 9 02:26:57 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf..
- Wed Jul 9 02:26:57 2014 - [info] Starting SSH connection tests..
- Wed Jul 9 02:26:58 2014 - [debug]
- Wed Jul 9 02:26:57 2014 - [debug] Connecting via SSH from root@192.168.186.141(192.168.186.141:22) to root@192.168.186.142(192.168.186.142:22)..
- Wed Jul 9 02:26:57 2014 - [debug] ok.
- Wed Jul 9 02:26:57 2014 - [debug] Connecting via SSH from root@192.168.186.141(192.168.186.141:22) to root@192.168.186.146(192.168.186.146:22)..
- Wed Jul 9 02:26:57 2014 - [debug] ok.
- Wed Jul 9 02:26:58 2014 - [debug]
- Wed Jul 9 02:26:57 2014 - [debug] Connecting via SSH from root@192.168.186.142(192.168.186.142:22) to root@192.168.186.141(192.168.186.141:22)..
- Wed Jul 9 02:26:57 2014 - [debug] ok.
- Wed Jul 9 02:26:57 2014 - [debug] Connecting via SSH from root@192.168.186.142(192.168.186.142:22) to root@192.168.186.146(192.168.186.146:22)..
-
- Wed Jul 9 02:26:58 2014 - [debug] ok.
- Wed Jul 9 02:26:58 2014 - [debug]
- Wed Jul 9 02:26:58 2014 - [debug] Connecting via SSH from root@192.168.186.146(192.168.186.146:22) to root@192.168.186.141(192.168.186.141:22)..
- Wed Jul 9 02:26:58 2014 - [debug] ok.
- Wed Jul 9 02:26:58 2014 - [debug] Connecting via SSH from root@192.168.186.146(192.168.186.146:22) to root@192.168.186.142(192.168.186.142:22)..
- Wed Jul 9 02:26:58 2014 - [debug] ok.
- Wed Jul 9 02:26:58 2014 - [info] All SSH connection tests passed successfully.
注意
[root@SLAVE1 ~]# ln -s /usr/local/mysql/bin/* /usr/bin ---rpm安装的mysql不需要,源码安装的就这样
在每台 MYSQL 服务器上做这件事情 极度重要哦
mysql>set global read_only=1; set global relay_log_purge=0; --read-only的意义是普通用户没有create,update,insert等修改的权限,而具有super-privs超级权限(all on *.*)的是不受这个参数约束的!
在备用节点和从节点的/etc/mysql/my.cnf中加入选项:
read_only=1 #这个设置待商榷,备选master设为read only之后,master转移到备选master后数据库不可写(有super权限的用户还是可写,切换到新的主库上后,会在新的主库上执行 SET GLOBAL read_only =0 ),,这个参数开放设置,根据环境来设置
relay_log_purge=0(这个值主从上都写上,因为mha主从补齐数据会用到relay_log)
slave-skip-errors=1396
为什么要跳过这个错误呢 因为啊在主里面删除用户的时候 从会报错说没有这个用户所以跳过这个错误吧
如果数据库存在空的用户 域名的用户 一定要删除否则 MHA 连接 MYSQL 会报错连不上 一般只要在从上面删除 如果直接没删除也 OK 那就 OK 吧如果报错登录不了就删除掉吧 或者跳过域名解析,或者你授权的时候记得也授权域名等等方法多种 我的方式是跳过域名解析的 skip-name-resolve
mysql> select user,host from mysql.user; +---------+---------------+ | user | host |
+---------+---------------+ | root | 127.0.0.1 |
| mha_mon | 192.168.186.% |
| repl | 192.168.186.% |
| slave | 192.168.186.% |
| root | ::1 |
| | SLAVE2.COM |
| root | SLAVE2.COM |
| root | localhost |
+---------+---------------+ 8 rows in set (0.00 sec)
mysql> drop user 'root'@SLAVE2.COM; Query OK, 0 rows affected (0.00 sec)
测试复制进程
点击(此处)折叠或打开
- [root@MANAGER masterha]# masterha_check_repl --conf=/etc/masterha/app1.cnf
- Wed Jul 9 04:23:16 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
- Wed Jul 9 04:23:16 2014 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
- Wed Jul 9 04:23:16 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf..
- Wed Jul 9 04:23:16 2014 - [info] MHA::MasterMonitor version 0.53.
- Wed Jul 9 04:23:17 2014 - [info] Dead Servers:
- Wed Jul 9 04:23:17 2014 - [info] Alive Servers:
- Wed Jul 9 04:23:17 2014 - [info] 192.168.186.141(192.168.186.141:3306)
- Wed Jul 9 04:23:17 2014 - [info] 192.168.186.142(192.168.186.142:3306)
- Wed Jul 9 04:23:17 2014 - [info] SLAVE2.COM(192.168.186.146:3306)
- Wed Jul 9 04:23:17 2014 - [info] Alive Slaves:
- Wed Jul 9 04:23:17 2014 - [info] 192.168.186.142(192.168.186.142:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
- Wed Jul 9 04:23:17 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
- Wed Jul 9 04:23:17 2014 - [info] Primary candidate for the new Master (candidate_master is set)
- Wed Jul 9 04:23:17 2014 - [info] SLAVE2.COM(192.168.186.146:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
- Wed Jul 9 04:23:17 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
- Wed Jul 9 04:23:17 2014 - [info] Not candidate for the new Master (no_master is set)
- Wed Jul 9 04:23:17 2014 - [info] Current Alive Master: 192.168.186.141(192.168.186.141:3306)
- Wed Jul 9 04:23:17 2014 - [info] Checking slave configurations..
- Wed Jul 9 04:23:17 2014 - [info] Checking replication filtering settings..
- Wed Jul 9 04:23:17 2014 - [info] binlog_do_db= , binlog_ignore_db=
- Wed Jul 9 04:23:17 2014 - [info] Replication filtering check ok.
- Wed Jul 9 04:23:17 2014 - [info] Starting SSH connection tests..
- Wed Jul 9 04:23:18 2014 - [info] All SSH connection tests passed successfully.
- Wed Jul 9 04:23:18 2014 - [info] Checking MHA Node version..
- Wed Jul 9 04:23:19 2014 - [info] Version check ok.
- Wed Jul 9 04:23:19 2014 - [info] Checking SSH publickey authentication settings on the current master..
- Wed Jul 9 04:23:19 2014 - [info] HealthCheck: SSH to 192.168.186.141 is reachable.
- Wed Jul 9 04:23:19 2014 - [info] Master MHA Node version is 0.53.
- Wed Jul 9 04:23:19 2014 - [info] Checking recovery script configurations on the current master..
- Wed Jul 9 04:23:19 2014 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/data --output_file=/var/tmp/save_binary_logs_test --manager_version=0.53 --start_file=mysql-bin.000001
- Wed Jul 9 04:23:19 2014 - [info] Connecting to root@192.168.186.141(192.168.186.141)..
- Creating /var/tmp if not exists.. ok.
- Checking output directory is accessible or not..
- ok.
- Binlog found at /data/mysql/data, up to mysql-bin.000001
- Wed Jul 9 04:23:20 2014 - [info] Master setting check done.
- Wed Jul 9 04:23:20 2014 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
- Wed Jul 9 04:23:20 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=mha_mon --slave_host=192.168.186.142 --slave_ip=192.168.186.142 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.10-log --manager_version=0.53 --relay_log_info=/data/mysql/data/relay-log.info --relay_dir=/data/mysql/data/ --slave_pass=xxx
- Wed Jul 9 04:23:20 2014 - [info] Connecting to root@192.168.186.142(192.168.186.142:22)..
- Checking slave recovery environment settings..
- Opening /data/mysql/data/relay-log.info ... ok.
- Relay log found at /data/mysql/data, up to SLAVE1-relay-bin.000002
- Temporary relay log file is /data/mysql/data/SLAVE1-relay-bin.000002
- Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
- done.
- Testing mysqlbinlog output.. done.
- Cleaning up test file(s).. done.
- Wed Jul 9 04:23:20 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=mha_mon --slave_host=SLAVE2.COM --slave_ip=192.168.186.146 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.10-log --manager_version=0.53 --relay_log_info=/data/mysql/data/relay-log.info --relay_dir=/data/mysql/data/ --slave_pass=xxx
- Wed Jul 9 04:23:20 2014 - [info] Connecting to root@192.168.186.146(SLAVE2.COM:22)..
- Checking slave recovery environment settings..
- Opening /data/mysql/data/relay-log.info ... ok.
- Relay log found at /data/mysql/data, up to slave2-relay-bin.000002
- Temporary relay log file is /data/mysql/data/slave2-relay-bin.000002
- Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
- done.
- Testing mysqlbinlog output.. done.
- Cleaning up test file(s).. done.
- Wed Jul 9 04:23:21 2014 - [info] Slaves settings check done.
- Wed Jul 9 04:23:21 2014 - [info]
- 192.168.186.141 (current master)
- +--192.168.186.142
- +--SLAVE2.COM
-
- Wed Jul 9 04:23:21 2014 - [info] Checking replication health on 192.168.186.142..
- Wed Jul 9 04:23:21 2014 - [info] ok.
- Wed Jul 9 04:23:21 2014 - [info] Checking replication health on SLAVE2.COM..
- Wed Jul 9 04:23:21 2014 - [info] ok.
- Wed Jul 9 04:23:21 2014 - [warning] master_ip_failover_script is not defined.
- Wed Jul 9 04:23:21 2014 - [warning] shutdown_script is not defined.
- Wed Jul 9 04:23:21 2014 - [info] Got exit code 0 (Not master dead).
-
- MySQL Replication Health is OK.
-
- 至此说明你的MHA 已经配置好了
五:启动/关闭mha
- [root@dg55 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log < /dev/null 2>&1 &
- [1] 25516
- [root@dg55 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf --检查一下
- app1 (pid:25516) is running(0:PING_OK), master:192.168.123.13
-
- [root@node4 app1]#masterha_stop --conf=/etc/app1.cnf --关闭
# 如果不能停止, 加 --abort选项
六测试故障转移
将 MYSQL.COM 机器上的 MYSQL 服务关闭 ,注意观察 manager.log 日志会发现 切换到了 SLAVE1.COM 并且 SLAVE1.COM 变成了主 而 SLAVE2.COM 则变成了 SLAVE1.COM 的从
点击(此处)折叠或打开
- root@MANAGER app1]# tail -f manager.log 这是启动后还没关闭主数据库的日志内容
- 192.168.186.141 (current master)
- +--192.168.186.142
- +--SLAVE2.COM
-
- Wed Jul 9 18:52:32 2014 - [warning] master_ip_failover_script is not defined.
- Wed Jul 9 18:52:32 2014 - [warning] shutdown_script is not defined.
- Wed Jul 9 18:52:32 2014 - [info] Set master ping interval 1 seconds.
- Wed Jul 9 18:52:32 2014 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
- Wed Jul 9 18:52:32 2014 - [info] Starting ping health check on 192.168.186.141(192.168.186.141:3306)..
- Wed Jul 9 18:52:32 2014 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
-
- [root@MYSQL ~]# service mysqld stop
- Shutting down MySQL..... SUCCESS!
-
-
- [root@MANAGER app1]# tail -f manager.log 最要看最后几行 就知道有没有切换成功
- 192.168.186.141 (current master)
- +--192.168.186.142
- +--SLAVE2.COM
- Wed Jul 9 18:56:47 2014 - [info] Dead Servers:
- Wed Jul 9 18:56:47 2014 - [info] 192.168.186.141(192.168.186.141:3306)
- Wed Jul 9 18:56:47 2014 - [info] Alive Servers:
- Wed Jul 9 18:56:47 2014 - [info] 192.168.186.142(192.168.186.142:3306)
- Wed Jul 9 18:56:47 2014 - [info] SLAVE2.COM(192.168.186.146:3306)
- Wed Jul 9 18:56:47 2014 - [info] Alive Slaves:
- Wed Jul 9 18:56:47 2014 - [info] 192.168.186.142(192.168.186.142:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
- Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
- Wed Jul 9 18:56:47 2014 - [info] Primary candidate for the new Master (candidate_master is set)
- Wed Jul 9 18:56:47 2014 - [info] SLAVE2.COM(192.168.186.146:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
- Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
- Wed Jul 9 18:56:47 2014 - [info] Not candidate for the new Master (no_master is set)
- Wed Jul 9 18:56:47 2014 - [info] Checking slave configurations..
- Wed Jul 9 18:56:47 2014 - [info] Checking replication filtering settings..
- Wed Jul 9 18:56:47 2014 - [info] Replication filtering check ok.
- Wed Jul 9 18:56:47 2014 - [info] Master is down!
- Wed Jul 9 18:56:47 2014 - [info] Terminating monitoring script.
- Wed Jul 9 18:56:47 2014 - [info] Got exit code 20 (Master dead).
- Wed Jul 9 18:56:47 2014 - [info] MHA::MasterFailover version 0.53.
- Wed Jul 9 18:56:47 2014 - [info] Starting master failover.
- Wed Jul 9 18:56:47 2014 - [info] * Phase 2: Dead Master Shutdown Phase completed.
- Wed Jul 9 18:56:47 2014 - [info] 192.168.186.142(192.168.186.142:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
- Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
- Wed Jul 9 18:56:47 2014 - [info] Primary candidate for the new Master (candidate_master is set)
- Wed Jul 9 18:56:47 2014 - [info] SLAVE2.COM(192.168.186.146:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
- Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
- Wed Jul 9 18:56:47 2014 - [info] Not candidate for the new Master (no_master is set)
- Wed Jul 9 18:56:47 2014 - [info] The oldest binary log file/position on all slaves is mysql-bin.000001:214
- Wed Jul 9 18:56:47 2014 - [info] Oldest slaves:
- Wed Jul 9 18:56:47 2014 - [info] 192.168.186.142(192.168.186.142:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
- Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
- Wed Jul 9 18:56:47 2014 - [info] Primary candidate for the new Master (candidate_master is set)
- Wed Jul 9 18:56:47 2014 - [info] SLAVE2.COM(192.168.186.146:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
- Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
- Wed Jul 9 18:56:47 2014 - [info] Not candidate for the new Master (no_master is set)
- Wed Jul 9 18:56:47 2014 - [info]
- Wed Jul 9 18:56:47 2014 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
- Wed Jul 9 18:56:47 2014 - [info]
- Wed Jul 9 18:56:48 2014 - [info] Fetching dead master's binary logs..
- Wed Jul 9 18:56:48 2014 - [info] Executing command on the dead master 192.168.186.141(192.168.186.141:3306): save_binary_logs --command=save --start_file=mysql-bin.000001 --start_pos=214 --binlog_dir=/data/mysql/data --output_file=/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53
- Creating /var/tmp if not exists.. ok.
- Concat binary/relay logs from mysql-bin.000001 pos 214 to mysql-bin.000001 EOF into /var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog ..
- Dumping binlog format description event, from position 0 to 120.. ok.
- Dumping effective binlog data from /data/mysql/data/mysql-bin.000001 position 214 to tail(237).. ok.
- Concat succeeded.
- Wed Jul 9 18:56:48 2014 - [info] scp from root@192.168.186.141:/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog to local:/masterha/app1/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog succeeded.
- Wed Jul 9 18:56:49 2014 - [info] HealthCheck: SSH to 192.168.186.142 is reachable.
- Wed Jul 9 18:56:49 2014 - [info] HealthCheck: SSH to SLAVE2.COM is reachable.
- Wed Jul 9 18:56:49 2014 - [info]
- Wed Jul 9 18:56:49 2014 - [info] * Phase 3.3: Determining New Master Phase..
- Wed Jul 9 18:56:49 2014 - [info]
- Wed Jul 9 18:56:49 2014 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
- Wed Jul 9 18:56:49 2014 - [info] All slaves received relay logs to the same position. No need to resync each other.
- Wed Jul 9 18:56:49 2014 - [info] Searching new master from slaves..
- Wed Jul 9 18:56:49 2014 - [info] Candidate masters from the configuration file:
- Wed Jul 9 18:56:49 2014 - [info] 192.168.186.142(192.168.186.142:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
- Wed Jul 9 18:56:49 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
- Wed Jul 9 18:56:49 2014 - [info] Primary candidate for the new Master (candidate_master is set)
- Wed Jul 9 18:56:49 2014 - [info] Non-candidate masters:
- Wed Jul 9 18:56:49 2014 - [info] SLAVE2.COM(192.168.186.146:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
- Wed Jul 9 18:56:49 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
- Wed Jul 9 18:56:49 2014 - [info] Not candidate for the new Master (no_master is set)
- Wed Jul 9 18:56:49 2014 - [info] Searching from candidate_master slaves which have received the latest relay log events..
- Wed Jul 9 18:56:49 2014 - [info] New master is 192.168.186.142(192.168.186.142:3306)
- Wed Jul 9 18:56:49 2014 - [info] Starting master failover..
- Wed Jul 9 18:56:49 2014 - [info]
- From:
- 192.168.186.141 (current master)
- +--192.168.186.142
- +--SLAVE2.COM
-
- To:
- 192.168.186.142 (new master)
- +--SLAVE2.COM
- Wed Jul 9 18:56:49 2014 - [info]
- Wed Jul 9 18:56:49 2014 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
- Wed Jul 9 18:56:49 2014 - [info]
- Wed Jul 9 18:56:49 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
- Wed Jul 9 18:56:49 2014 - [info] Sending binlog..
- Wed Jul 9 18:56:50 2014 - [info] scp from local:/masterha/app1/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog to root@192.168.186.142:/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog succeeded.
- Wed Jul 9 18:56:50 2014 - [info]
- Wed Jul 9 18:56:50 2014 - [info] * Phase 3.4: Master Log Apply Phase..
- Wed Jul 9 18:56:50 2014 - [info]
- Wed Jul 9 18:56:50 2014 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
- Wed Jul 9 18:56:50 2014 - [info] Starting recovery on 192.168.186.142(192.168.186.142:3306)..
- Wed Jul 9 18:56:50 2014 - [info] Generating diffs succeeded.
- Wed Jul 9 18:56:50 2014 - [info] Waiting until all relay logs are applied.
- Wed Jul 9 18:56:50 2014 - [info] done.
- Wed Jul 9 18:56:50 2014 - [info] Getting slave status..
- Wed Jul 9 18:56:50 2014 - [info] This slave(192.168.186.142)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000001:214). No need to recover from Exec_Master_Log_Pos.
- Wed Jul 9 18:56:50 2014 - [info] Connecting to the target slave host 192.168.186.142, running recover script..
- Wed Jul 9 18:56:50 2014 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=mha_mon --slave_host=192.168.186.142 --slave_ip=192.168.186.142 --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog --workdir=/var/tmp --target_version=5.6.10-log --timestamp=20140709185647 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
- Wed Jul 9 18:56:50 2014 - [info]
- MySQL client version is 5.6.10. Using --binary-mode.
- Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog on 192.168.186.142:3306. This may take long time...
- Applying log files succeeded.
- Wed Jul 9 18:56:50 2014 - [info] All relay logs were successfully applied.
- Wed Jul 9 18:56:50 2014 - [info] Getting new master's binlog name and position..
- Wed Jul 9 18:56:50 2014 - [info] mysql-bin.000007:504
- Wed Jul 9 18:56:50 2014 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.186.142', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000007', MASTER_LOG_POS=504, MASTER_USER='repl', MASTER_PASSWORD='xxx';
- Wed Jul 9 18:56:50 2014 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address.
- Wed Jul 9 18:56:50 2014 - [info] Setting read_only=0 on 192.168.186.142(192.168.186.142:3306)..
- Wed Jul 9 18:56:50 2014 - [info] ok.
- Wed Jul 9 18:56:50 2014 - [info] ** Finished master recovery successfully.
- Wed Jul 9 18:56:50 2014 - [info] * Phase 3: Master Recovery Phase completed.
- Wed Jul 9 18:56:50 2014 - [info]
- Wed Jul 9 18:56:50 2014 - [info] * Phase 4: Slaves Recovery Phase..
- Wed Jul 9 18:56:50 2014 - [info]
- Wed Jul 9 18:56:50 2014 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
- Wed Jul 9 18:56:50 2014 - [info]
- Wed Jul 9 18:56:50 2014 - [info] -- Slave diff file generation on host SLAVE2.COM(192.168.186.146:3306) started, pid: 3135. Check tmp log /masterha/app1/SLAVE2.COM_3306_20140709185647.log if it takes time..
- Wed Jul 9 18:56:50 2014 - [info]
- Wed Jul 9 18:56:50 2014 - [info] Log messages from SLAVE2.COM ...
- Wed Jul 9 18:56:50 2014 - [info]
- Wed Jul 9 18:56:50 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
- Wed Jul 9 18:56:50 2014 - [info] End of log messages from SLAVE2.COM.
- Wed Jul 9 18:56:50 2014 - [info] -- SLAVE2.COM(192.168.186.146:3306) has the latest relay log events.
- Wed Jul 9 18:56:50 2014 - [info] Generating relay diff files from the latest slave succeeded.
- Wed Jul 9 18:56:50 2014 - [info]
- Wed Jul 9 18:56:50 2014 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
- Wed Jul 9 18:56:50 2014 - [info]
- Wed Jul 9 18:56:50 2014 - [info] -- Slave recovery on host SLAVE2.COM(192.168.186.146:3306) started, pid: 3137. Check tmp log /masterha/app1/SLAVE2.COM_3306_20140709185647.log if it takes time..
- Wed Jul 9 18:56:51 2014 - [info]
- Wed Jul 9 18:56:51 2014 - [info] Log messages from SLAVE2.COM ...
- Wed Jul 9 18:56:51 2014 - [info]
- Wed Jul 9 18:56:50 2014 - [info] Sending binlog..
- Wed Jul 9 18:56:51 2014 - [info] scp from local:/masterha/app1/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog to root@SLAVE2.COM:/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog succeeded.
- Wed Jul 9 18:56:51 2014 - [info] Starting recovery on SLAVE2.COM(192.168.186.146:3306)..
- Wed Jul 9 18:56:51 2014 - [info] Generating diffs succeeded.
- Wed Jul 9 18:56:51 2014 - [info] Waiting until all relay logs are applied.
- Wed Jul 9 18:56:51 2014 - [info] done.
- Wed Jul 9 18:56:51 2014 - [info] Getting slave status..
- Wed Jul 9 18:56:51 2014 - [info] This slave(SLAVE2.COM)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000001:214). No need to recover from Exec_Master_Log_Pos.
- Wed Jul 9 18:56:51 2014 - [info] Connecting to the target slave host SLAVE2.COM, running recover script..
- Wed Jul 9 18:56:51 2014 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=mha_mon --slave_host=SLAVE2.COM --slave_ip=192.168.186.146 --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog --workdir=/var/tmp --target_version=5.6.10-log --timestamp=20140709185647 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
- Wed Jul 9 18:56:51 2014 - [info]
- MySQL client version is 5.6.10. Using --binary-mode.
- Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog on SLAVE2.COM:3306. This may take long time...
- Applying log files succeeded.
- Wed Jul 9 18:56:51 2014 - [info] All relay logs were successfully applied.
- Wed Jul 9 18:56:51 2014 - [info] Resetting slave SLAVE2.COM(192.168.186.146:3306) and starting replication from the new master 192.168.186.142(192.168.186.142:3306)..
- Wed Jul 9 18:56:51 2014 - [info] Executed CHANGE MASTER.
- Wed Jul 9 18:56:51 2014 - [info] Slave started.
- Wed Jul 9 18:56:51 2014 - [info] End of log messages from SLAVE2.COM.
- Wed Jul 9 18:56:51 2014 - [info] -- Slave recovery on host SLAVE2.COM(192.168.186.146:3306) succeeded.
- Wed Jul 9 18:56:51 2014 - [info] All new slave servers recovered successfully.
- Wed Jul 9 18:56:51 2014 - [info]
- Wed Jul 9 18:56:51 2014 - [info] * Phase 5: New master cleanup phease..
- Wed Jul 9 18:56:51 2014 - [info]
- Wed Jul 9 18:56:51 2014 - [info] Resetting slave info on the new master..
- Wed Jul 9 18:56:51 2014 - [info] 192.168.186.142: Resetting slave info succeeded.
- Wed Jul 9 18:56:51 2014 - [info] Master failover to 192.168.186.142(192.168.186.142:3306) completed successfully.
- Wed Jul 9 18:56:51 2014 - [info]
-
- ----- Failover Report -----
-
- app1: MySQL Master failover 192.168.186.141 to 192.168.186.142 succeeded
-
- Master 192.168.186.141 is
-
- Check MHA Manager logs at MANAGER.COM:/masterha/app1/manager.log for details.
-
- Started automated(non-interactive) failover.
- The latest slave 192.168.186.142(192.168.186.142:3306) has all relay logs for recovery.
- Selected 192.168.186.142 as a new master.
- 192.168.186.142: OK: Applying all logs succeeded.
- SLAVE2.COM: This host has the latest relay log events.
- Generating relay diff files from the latest slave succeeded.
- SLAVE2.COM: OK: Applying all logs succeeded. Slave started, replicating from 192.168.186.142.
- 192.168.186.142: Resetting slave info succeeded.
- 192.168.186.142: Resetting slave info succeeded.
- 看到192.168.186.142: Resetting slave info succeeded.
-
- Master failover to 192.168.186.142(192.168.186.142:3306) completed successfully. 说明切换成功了
-
- [root@slave2 ~]# mysql -e "show slave status\G"
- *************************** 1. row ***************************
- Slave_IO_State: Waiting for master to send event
- Master_Host: 192.168.186.142
- Master_User: repl
- Master_Port: 3306
- Connect_Retry: 60
- Master_Log_File: mysql-bin.000007
- Read_Master_Log_Pos: 504
- Relay_Log_File: slave2-relay-bin.000002
- Relay_Log_Pos: 283
- Relay_Master_Log_File: mysql-bin.000007
- Slave_IO_Running: Yes
- Slave_SQL_Running: Yes
- 看到已经切换到和142同步了 本来是和141同步的 此时的SALVE1.COM已经变成主了 说明已然生效
切换后旧master的修复及重新上线
master已经由192.168.17.199 3306 切到了192.168.17.200 3306 实际环境中数据是在不断的变化的,而在切换点mha没有记录当时新master的log-file和log-pos 所以要想直接启动192.168.17.199 3306 然后change master to 192.168.17.200 3306的话是不行的,只能对新主或slave2做一个全备然后再恢复再change。
另外,当执行切换后管理节点上的masterha_manager进程会自动stop,所以等修复好后要再次执行启动----不要纠结ip不对,这个地方是个思路
重构
重构我想就不要我做了吧,就是这时候等于你的主挂了 切换在 SLAVE1.COM 上变成了主 因此重构我提供一种方案(方案多种),拿出一台新的服务器重新加入以 142为主做成从 再 app1.conf 配置文件中加入 新机器的信息
[root@MANAGER app1]# rm -rf app1.failover.complete
删除该文件后 再次启动 manager 端即可
七.扩展 Keepalived
- [root@MYSQL src]# wget http://www.keepalived.org/software/keepalived-1.2.12.tar.gz
- [root@MYSQL src]# tar -xf keepalived-1.2.12.tar.gz
- [root@MYSQL src]# cd keepalived-1.2.12
- [root@MYSQL src]# yum -y install gcc gcc-c++ gcc-g77 ncurses-devel bison libaio-devel cmake libnl* libpopt* popt-static openssl-devel
- [root@MYSQL keepalived-1.2.12]# ./configure
- [root@MYSQL keepalived-1.2.12]# make && make install
- [root@MYSQL src]#mkdir /etc/keepalived/
-
- [root@MYSQL src]# cp /usr/local/etc/keepalived/keepalived.conf /etc/keepalived/
- [root@MYSQL src]# cp /usr/local/etc/rc.d/init.d/keepalived /etc/init.d/
- [root@MYSQL src]# cp /usr/local/etc/sysconfig/keepalived /etc/sysconfig/
- [root@MYSQL src]#cp /usr/local/sbin/keepalived /usr/sbin/
2. 配置 keepalived
- [root@MYSQL keepalived]# vi keepalived.conf
- ! Configuration File for keepalived
-
- global_defs {
- notification_email {
- acassen@firewall.loc
- failover@firewall.loc
- sysadmin@firewall.loc
- }
- notification_email_from Alexandre.Cassen@firewall.loc
- smtp_server 192.168.200.1
- smtp_connect_timeout 30
- router_id LVS_DEVEL ##配置是为了标识当前节点,两个节点的此项设置可相同,也可不相同
- }
-
- vrrp_instance VI_1 {
- state MASTER #指定A节点为主节点 备用节点上设置为BACKUP即可
- interface eth0
- virtual_router_id 51 #VRRP组名,两个节点的设置必须一样,以指明各个节点属于同一VRRP
- priority 100 #主节点的优先级(1-254之间),备用节点必须比主节点优先级低
- advert_int 1
- authentication { #设置验证信息,两个节点必须一致
- auth_type PASS
- auth_pass 1111
- }
- virtual_ipaddress {
- 192.168.200.16 =====vip,指定一个vip
- }
- }
3. 检测 mysql 服务脚本
-
[root@MYSQL keepalived]# vi /root/check_mysql.sh
- [root@node1 ~]# vi checkmysql.sh
#!/bin/bash
MYSQL=/usr/bin/mysql
MYSQL_HOST=127.0.0.1
MYSQL_USER=root
MYSQL_PASSWORD=ESBecs00
CHECK_TIME=3
MYSQL_OK=1
function check_mysql_helth (){
$MYSQL -h $MYSQL_HOST -u $MYSQL_USER -p$MYSQL_PASSWORD -e "show status;" >/dev/null 2>&1
if [ $? = 0 ] ;then
MYSQL_OK=1
else
MYSQL_OK=0
fi
return $MYSQL_OK
}
#echo "$MYSQL_OK $CHECK_TIME"
while [ $CHECK_TIME -ne 0 ]
do
let "CHECK_TIME -= 1"
check_mysql_helth
if [ $MYSQL_OK = 1 ] ; then
echo "#### Mysql is alive ####"
CHECK_TIME=0
exit 0
fi
if [ $MYSQL_OK -eq 0 ] && [ $CHECK_TIME -eq 0 ]
then
#pkill keepalived
echo "#### Mysql Down ####"
echo "####now kill keepalive#####"
/etc/init.d/keepalived stop
exit 1
fi
sleep 1
done
该脚本做一个计划任务每分钟做一次检查 MYSQL 服务如果挂掉了的话 VIP 就会在SALVE1.COM 上生效 这样的话 client 端连接的 VIP 就会从 MYSQL.COM 上变到了SLAVE1.COM 上 此时 SLAVE1.COM 由于 MHA 的生效 已经变为主对外提供服务了 VIP 也在 SALVE1.COM 上 所以从头到尾 客户端只需要连接 VIP 就可以了
[root@MYSQL ~]# crontab -l */1 * * * * bash /root/checkmysql.sh
以上步骤再在下一次切换的 SLAVE 上做一次也就是 SLAVE1.COM 上做一次 keepalived.conf 配置文件不同的地方已做注释
4. 启动测试
两边启动 keepalived
[root@MYSQL keepalived]# keepalived -f /etc/keepalived/keepalived.conf
[root@MYSQL keepalived]# ps -ef |grep keep
root 3230 1 0 23:27 ? 00:00:00 keepalived -f /etc/keepalived/keepalived.conf root 3231 3230 0 23:27 ? 00:00:00 keepalived -f /etc/keepalived/keepalived.conf root 3232 3230 0 23:27 ? 00:00:00 keepalived -f /etc/keepalived/keepalived.conf root 3234 2538 0 23:27 pts/0 00:00:00 grep keep [root@MYSQL keepalived]# ip a 1: lo: mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:c9:85:ba brd ff:ff:ff:ff:ff:ff inet 192.168.186.141/24 brd 192.168.186.255 scope global eth0 inet 192.168.200.16/32 scope global eth0 inet6 fe80::20c:29ff:fec9:85ba/64 scope link valid_lft forever preferred_lft forever
测试关闭该主机的 MYSQL 服务看看 VIP 是否飘到配好 keepalive 的 SLAVE1.COM上 MHA 中的 MYSQL 主是不是也变成了 SLAVE1.COM
如是 至此全部完成。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。谢谢!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
一点注意事项: MHA时基于传统的log + postion开发的,维护GTID肯定不灵光了,但是mha0.56已经开始支持gtid了,可以去作者的博客看到原文地址: http://www.tuicool.com/articles/uaeE3y
参考文章: http://ylw6006.blog.51cto.com/470441/890360/
附上node与manager的下载地址
http://pan.baidu.com/s/1DixZo
http://pan.baidu.com/s/1hqtZtac
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/29096438/viewspace-1818461/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/29096438/viewspace-1818461/