【MHA】mysql高可用之MHA

最新推荐文章于 2022-01-03 00:03:31 发布

cml2016

最新推荐文章于 2022-01-03 00:03:31 发布

阅读量188

点赞数

文章标签：运维数据库操作系统

mysql高可用MHA的搭建与测试

mha官网https://code.google.com/p/mysql-master-ha/

MHA分管理节点和数据库节点，数据库节点由mysql主从或者主主从构成，当主库挂掉后，管理节点会自动将从节点提升为主节点；管理节点的角色类似于oracle数据库中的fast start failover中的observer，但mha上层可以通过keepalive部署VIP，程序连接数据库使用VIP，从而实现后台数据库的故障切换透明化

MHA节点包含三个脚本，依赖perl模块；
save_binary_logs:保存和复制当掉的主服务器二进制日志；
apply_diff_relay_logs:识别差异的relay log事件，并应用于其他salve服务器；
purge_relay_logs:清除relay log文件；
需要在所有mysql服务器上安装MHA节点，MHA管理服务器也需要安装。MHA管理节点模块内部依赖MHA节点模块；
MHA管理节点通过ssh连接管理mysql服务器和执行MHA节点脚本。MHA节点依赖perl的DBD::mysql模块;

本文环境介绍，操作系统均为rhel5.4

在开始之前，请先配置好服务器间的时间同步和名称解析

规划

192.168.186.141 MYSQL.COM	主： node
192.168.186.142 SLAVE1.COM	从1： node
192.168.186.146 SLAVE2.COM	从2： node
192.168.186.144 MANAGER.COM	管理机：manager node

一：搭建mysql 主从复制，配置主机间的ssh互信（免密码登陆）

注意点：在主上创建复制用户的时候在从上也创建一遍
搭建半同步复制：原因：使用半同步复制可以解决当主服务器当掉后，无法ssh到主服务器上保存尚未发送的二进制日志事件。

二：安装node与manager

安装perl环境（每台机器都需要做的）

[root@SLAVE2 data]# yum -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Config-IniFiles ncftp perl-Params-Validate perl-CPAN perl-Test-Mock-LWP.noarch perl-LWP-Authen-Negotiate.noarch perl-devel
[root@SLAVE2 data]#yum install perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker 这个百度文档没有但是显然是要装的

安装node（每台机器都要做的）

[root@MANAGER src]# tar -xf mha4mysql-node-0.56.tar.gz
[root@MANAGER src]# cd mha4mysql-node-0.53
[root@MANAGER mha4mysql-node-0.53]# perl Makefile.PL
[root@MANAGER mha4mysql-node-0.53]# make && make install

安装manager（管理机器上要安装的，就是管理机器上同时存在node与manager ）

点击(此处)折叠或打开

[root@MANAGER src]# tar -xf mha4mysql-manager-0.56.tar.gz
[root@MANAGER src]# cd mha4mysql-manager-0.53
[root@MANAGER mha4mysql-manager-0.53]# perl Makefile.PL
[root@MANAGER mha4mysql-manager-0.53]# make && make install
根据提示输入如果中间有卡顿现象直接crtl+c 然后继续会出现下载的进度条，说明是正常的
[root@MANAGER src]# mkdir /etc/masterha
[root@MANAGER mha]# mkdir -p /master/app1
[root@MANAGERmha]# mkdir -p /scripts
[root@MANAGER mha]# cp samples/conf/* /etc/masterha/ --拷贝配置文件
[root@MANAGERmha]# cp samples/scripts/* /scripts
[root@MANAGER mha4mysql-manager-0.53]# cp samples/conf/* /etc/masterha/

三：配置

[root@MANAGER masterha]# vi app1.cnf
内容如下;
[server default]
manager_workdir=/masterha/app1
manager_log=/masterha/app1/manager.log
user=mha_mon 监控用户，每台机器上都要建的
password=123
ssh_user=root ssh无密码的用户
repl_user=slave 做主从的用户每台都要授权的
repl_password=yunwei123 做主从的密码
ping_interval=1
shutdown_script=""
report_script="" ###切换时发邮件
####master_ip_failover_script=/usr/local/bin/master_ip_failover ###自动切换

####master_ip_online_change_script=/usr/local/bin/master_ip_online_change_script ###手工切换，了解这两个脚本可参考http://blog.csdn.net/largetalk/article/details/10006899

####secondary_check_script= /usr/local/bin/masterha_secondary_check -s node1 -s node2 --user=root --master_host=node2 --master_ip=192.168.6.115 --master_port=3306
####//一旦MHA到node2(主)的监控之间出现问题，MHA Manager将会尝试从node1（从，主机名）登录到node2（主，主机名），检查更详细
[server1]
hostname=192.168.186.141
master_binlog_dir=/data/mysql/data ####binlog目录
candidate_master=1
[server2]
hostname=192.168.186.142
master_binlog_dir=/data/mysql/data ####binlog目录
candidate_master=1 ---备主，当master当掉之后会自动切换这台为主机
[server3]
hostname=192.168.186.146
master_binlog_dir=/data/mysql/data ####binlog目录
no_master=1 -----表示永不做主
保存退出！
[root@MANAGER masterha]# >masterha_default.cnf

登入每台数据库
mysql> grant all privileges on *.* to mha_mon@'%' identified by '123'; Query OK, 0 rows affected (1.00 sec)
 
mysql> flush privileges;
Query OK, 0 rows affected (0.01 sec)

四：测试ssh，与复制

测试ssh

[root@MANAGER masterha]# masterha_check_ssh --global_conf=/etc/masterha/masterha_default.cnf --conf=/etc/masterha/app1.cnf
Wed Jul 9 02:26:57 2014 - [info] Reading default configuratoins from /etc/masterha/masterha_default.cnf..
Wed Jul 9 02:26:57 2014 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Wed Jul 9 02:26:57 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Wed Jul 9 02:26:57 2014 - [info] Starting SSH connection tests..
Wed Jul 9 02:26:58 2014 - [debug]
Wed Jul 9 02:26:57 2014 - [debug] Connecting via SSH from root@192.168.186.141(192.168.186.141:22) to root@192.168.186.142(192.168.186.142:22)..
Wed Jul 9 02:26:57 2014 - [debug] ok.
Wed Jul 9 02:26:57 2014 - [debug] Connecting via SSH from root@192.168.186.141(192.168.186.141:22) to root@192.168.186.146(192.168.186.146:22)..
Wed Jul 9 02:26:57 2014 - [debug] ok.
Wed Jul 9 02:26:58 2014 - [debug]
Wed Jul 9 02:26:57 2014 - [debug] Connecting via SSH from root@192.168.186.142(192.168.186.142:22) to root@192.168.186.141(192.168.186.141:22)..
Wed Jul 9 02:26:57 2014 - [debug] ok.
Wed Jul 9 02:26:57 2014 - [debug] Connecting via SSH from root@192.168.186.142(192.168.186.142:22) to root@192.168.186.146(192.168.186.146:22)..
Wed Jul 9 02:26:58 2014 - [debug] ok.
Wed Jul 9 02:26:58 2014 - [debug]
Wed Jul 9 02:26:58 2014 - [debug] Connecting via SSH from root@192.168.186.146(192.168.186.146:22) to root@192.168.186.141(192.168.186.141:22)..
Wed Jul 9 02:26:58 2014 - [debug] ok.
Wed Jul 9 02:26:58 2014 - [debug] Connecting via SSH from root@192.168.186.146(192.168.186.146:22) to root@192.168.186.142(192.168.186.142:22)..
Wed Jul 9 02:26:58 2014 - [debug] ok.
Wed Jul 9 02:26:58 2014 - [info] All SSH connection tests passed successfully.

注意

[root@SLAVE1 ~]# ln -s /usr/local/mysql/bin/* /usr/bin    ---rpm安装的mysql不需要，源码安装的就这样

在每台 MYSQL 服务器上做这件事情极度重要哦

mysql>set global read_only=1; set global relay_log_purge=0;    --read-only的意义是普通用户没有create，update，insert等修改的权限，而具有super-privs超级权限（all on *.*）的是不受这个参数约束的！

在备用节点和从节点的/etc/mysql/my.cnf中加入选项：

read_only=1 #这个设置待商榷，备选master设为read only之后，master转移到备选master后数据库不可写(有super权限的用户还是可写，切换到新的主库上后，会在新的主库上执行 SET GLOBAL read_only =0 ），，这个参数开放设置，根据环境来设置

relay_log_purge=0（这个值主从上都写上，因为mha主从补齐数据会用到relay_log）

       slave-skip-errors=1396

为什么要跳过这个错误呢因为啊在主里面删除用户的时候从会报错说没有这个用户所以跳过这个错误吧

如果数据库存在空的用户域名的用户一定要删除否则 MHA 连接 MYSQL 会报错连不上一般只要在从上面删除如果直接没删除也 OK 那就 OK 吧如果报错登录不了就删除掉吧或者跳过域名解析，或者你授权的时候记得也授权域名等等方法多种我的方式是跳过域名解析的 skip-name-resolve

mysql> select user,host from mysql.user; +---------+---------------+ | user    | host          |
+---------+---------------+ | root    | 127.0.0.1     |
| mha_mon | 192.168.186.% |
| repl    | 192.168.186.% |
| slave   | 192.168.186.% |
| root    | ::1           |
|         | SLAVE2.COM    |
| root    | SLAVE2.COM    |
| root    | localhost     |
+---------+---------------+ 8 rows in set (0.00 sec)
mysql> drop user 'root'@SLAVE2.COM; Query OK, 0 rows affected (0.00 sec)

测试复制进程

点击(此处)折叠或打开

[root@MANAGER masterha]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Wed Jul 9 04:23:16 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jul 9 04:23:16 2014 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Wed Jul 9 04:23:16 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Wed Jul 9 04:23:16 2014 - [info] MHA::MasterMonitor version 0.53.
Wed Jul 9 04:23:17 2014 - [info] Dead Servers:
Wed Jul 9 04:23:17 2014 - [info] Alive Servers:
Wed Jul 9 04:23:17 2014 - [info] 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 04:23:17 2014 - [info] 192.168.186.142(192.168.186.142:3306)
Wed Jul 9 04:23:17 2014 - [info] SLAVE2.COM(192.168.186.146:3306)
Wed Jul 9 04:23:17 2014 - [info] Alive Slaves:
Wed Jul 9 04:23:17 2014 - [info] 192.168.186.142(192.168.186.142:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
Wed Jul 9 04:23:17 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 04:23:17 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Jul 9 04:23:17 2014 - [info] SLAVE2.COM(192.168.186.146:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
Wed Jul 9 04:23:17 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 04:23:17 2014 - [info] Not candidate for the new Master (no_master is set)
Wed Jul 9 04:23:17 2014 - [info] Current Alive Master: 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 04:23:17 2014 - [info] Checking slave configurations..
Wed Jul 9 04:23:17 2014 - [info] Checking replication filtering settings..
Wed Jul 9 04:23:17 2014 - [info] binlog_do_db= , binlog_ignore_db=
Wed Jul 9 04:23:17 2014 - [info] Replication filtering check ok.
Wed Jul 9 04:23:17 2014 - [info] Starting SSH connection tests..
Wed Jul 9 04:23:18 2014 - [info] All SSH connection tests passed successfully.
Wed Jul 9 04:23:18 2014 - [info] Checking MHA Node version..
Wed Jul 9 04:23:19 2014 - [info] Version check ok.
Wed Jul 9 04:23:19 2014 - [info] Checking SSH publickey authentication settings on the current master..
Wed Jul 9 04:23:19 2014 - [info] HealthCheck: SSH to 192.168.186.141 is reachable.
Wed Jul 9 04:23:19 2014 - [info] Master MHA Node version is 0.53.
Wed Jul 9 04:23:19 2014 - [info] Checking recovery script configurations on the current master..
Wed Jul 9 04:23:19 2014 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/data --output_file=/var/tmp/save_binary_logs_test --manager_version=0.53 --start_file=mysql-bin.000001
Wed Jul 9 04:23:19 2014 - [info] Connecting to root@192.168.186.141(192.168.186.141)..
Creating /var/tmp if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mysql/data, up to mysql-bin.000001
Wed Jul 9 04:23:20 2014 - [info] Master setting check done.
Wed Jul 9 04:23:20 2014 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Jul 9 04:23:20 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=mha_mon --slave_host=192.168.186.142 --slave_ip=192.168.186.142 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.10-log --manager_version=0.53 --relay_log_info=/data/mysql/data/relay-log.info --relay_dir=/data/mysql/data/ --slave_pass=xxx
Wed Jul 9 04:23:20 2014 - [info] Connecting to root@192.168.186.142(192.168.186.142:22)..
Checking slave recovery environment settings..
Opening /data/mysql/data/relay-log.info ... ok.
Relay log found at /data/mysql/data, up to SLAVE1-relay-bin.000002
Temporary relay log file is /data/mysql/data/SLAVE1-relay-bin.000002
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Jul 9 04:23:20 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=mha_mon --slave_host=SLAVE2.COM --slave_ip=192.168.186.146 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.10-log --manager_version=0.53 --relay_log_info=/data/mysql/data/relay-log.info --relay_dir=/data/mysql/data/ --slave_pass=xxx
Wed Jul 9 04:23:20 2014 - [info] Connecting to root@192.168.186.146(SLAVE2.COM:22)..
Checking slave recovery environment settings..
Opening /data/mysql/data/relay-log.info ... ok.
Relay log found at /data/mysql/data, up to slave2-relay-bin.000002
Temporary relay log file is /data/mysql/data/slave2-relay-bin.000002
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Jul 9 04:23:21 2014 - [info] Slaves settings check done.
Wed Jul 9 04:23:21 2014 - [info]
192.168.186.141 (current master)
+--192.168.186.142
+--SLAVE2.COM
Wed Jul 9 04:23:21 2014 - [info] Checking replication health on 192.168.186.142..
Wed Jul 9 04:23:21 2014 - [info] ok.
Wed Jul 9 04:23:21 2014 - [info] Checking replication health on SLAVE2.COM..
Wed Jul 9 04:23:21 2014 - [info] ok.
Wed Jul 9 04:23:21 2014 - [warning] master_ip_failover_script is not defined.
Wed Jul 9 04:23:21 2014 - [warning] shutdown_script is not defined.
Wed Jul 9 04:23:21 2014 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
至此说明你的MHA 已经配置好了

五：启动/关闭mha

[root@dg55 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log < /dev/null 2>&1 &
[1] 25516
[root@dg55 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf --检查一下
app1 (pid:25516) is running(0:PING_OK), master:192.168.123.13
[root@node4 app1]#masterha_stop --conf=/etc/app1.cnf --关闭
# 如果不能停止，加 --abort选项

六测试故障转移

将 MYSQL.COM 机器上的 MYSQL 服务关闭，注意观察 manager.log 日志会发现切换到了 SLAVE1.COM 并且 SLAVE1.COM 变成了主而 SLAVE2.COM 则变成了 SLAVE1.COM 的从

点击(此处)折叠或打开

root@MANAGER app1]# tail -f manager.log 这是启动后还没关闭主数据库的日志内容
192.168.186.141 (current master)
+--192.168.186.142
+--SLAVE2.COM
Wed Jul 9 18:52:32 2014 - [warning] master_ip_failover_script is not defined.
Wed Jul 9 18:52:32 2014 - [warning] shutdown_script is not defined.
Wed Jul 9 18:52:32 2014 - [info] Set master ping interval 1 seconds.
Wed Jul 9 18:52:32 2014 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Wed Jul 9 18:52:32 2014 - [info] Starting ping health check on 192.168.186.141(192.168.186.141:3306)..
Wed Jul 9 18:52:32 2014 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
[root@MYSQL ~]# service mysqld stop
Shutting down MySQL..... SUCCESS!
[root@MANAGER app1]# tail -f manager.log 最要看最后几行就知道有没有切换成功
192.168.186.141 (current master)
+--192.168.186.142
+--SLAVE2.COM
Wed Jul 9 18:56:47 2014 - [info] Dead Servers:
Wed Jul 9 18:56:47 2014 - [info] 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 18:56:47 2014 - [info] Alive Servers:
Wed Jul 9 18:56:47 2014 - [info] 192.168.186.142(192.168.186.142:3306)
Wed Jul 9 18:56:47 2014 - [info] SLAVE2.COM(192.168.186.146:3306)
Wed Jul 9 18:56:47 2014 - [info] Alive Slaves:
Wed Jul 9 18:56:47 2014 - [info] 192.168.186.142(192.168.186.142:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 18:56:47 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Jul 9 18:56:47 2014 - [info] SLAVE2.COM(192.168.186.146:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 18:56:47 2014 - [info] Not candidate for the new Master (no_master is set)
Wed Jul 9 18:56:47 2014 - [info] Checking slave configurations..
Wed Jul 9 18:56:47 2014 - [info] Checking replication filtering settings..
Wed Jul 9 18:56:47 2014 - [info] Replication filtering check ok.
Wed Jul 9 18:56:47 2014 - [info] Master is down!
Wed Jul 9 18:56:47 2014 - [info] Terminating monitoring script.
Wed Jul 9 18:56:47 2014 - [info] Got exit code 20 (Master dead).
Wed Jul 9 18:56:47 2014 - [info] MHA::MasterFailover version 0.53.
Wed Jul 9 18:56:47 2014 - [info] Starting master failover.
Wed Jul 9 18:56:47 2014 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Wed Jul 9 18:56:47 2014 - [info] 192.168.186.142(192.168.186.142:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 18:56:47 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Jul 9 18:56:47 2014 - [info] SLAVE2.COM(192.168.186.146:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 18:56:47 2014 - [info] Not candidate for the new Master (no_master is set)
Wed Jul 9 18:56:47 2014 - [info] The oldest binary log file/position on all slaves is mysql-bin.000001:214
Wed Jul 9 18:56:47 2014 - [info] Oldest slaves:
Wed Jul 9 18:56:47 2014 - [info] 192.168.186.142(192.168.186.142:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 18:56:47 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Jul 9 18:56:47 2014 - [info] SLAVE2.COM(192.168.186.146:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
Wed Jul 9 18:56:47 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 18:56:47 2014 - [info] Not candidate for the new Master (no_master is set)
Wed Jul 9 18:56:47 2014 - [info]
Wed Jul 9 18:56:47 2014 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Wed Jul 9 18:56:47 2014 - [info]
Wed Jul 9 18:56:48 2014 - [info] Fetching dead master's binary logs..
Wed Jul 9 18:56:48 2014 - [info] Executing command on the dead master 192.168.186.141(192.168.186.141:3306): save_binary_logs --command=save --start_file=mysql-bin.000001 --start_pos=214 --binlog_dir=/data/mysql/data --output_file=/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53
Creating /var/tmp if not exists.. ok.
Concat binary/relay logs from mysql-bin.000001 pos 214 to mysql-bin.000001 EOF into /var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog ..
Dumping binlog format description event, from position 0 to 120.. ok.
Dumping effective binlog data from /data/mysql/data/mysql-bin.000001 position 214 to tail(237).. ok.
Concat succeeded.
Wed Jul 9 18:56:48 2014 - [info] scp from root@192.168.186.141:/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog to local:/masterha/app1/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog succeeded.
Wed Jul 9 18:56:49 2014 - [info] HealthCheck: SSH to 192.168.186.142 is reachable.
Wed Jul 9 18:56:49 2014 - [info] HealthCheck: SSH to SLAVE2.COM is reachable.
Wed Jul 9 18:56:49 2014 - [info]
Wed Jul 9 18:56:49 2014 - [info] * Phase 3.3: Determining New Master Phase..
Wed Jul 9 18:56:49 2014 - [info]
Wed Jul 9 18:56:49 2014 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Wed Jul 9 18:56:49 2014 - [info] All slaves received relay logs to the same position. No need to resync each other.
Wed Jul 9 18:56:49 2014 - [info] Searching new master from slaves..
Wed Jul 9 18:56:49 2014 - [info] Candidate masters from the configuration file:
Wed Jul 9 18:56:49 2014 - [info] 192.168.186.142(192.168.186.142:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
Wed Jul 9 18:56:49 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 18:56:49 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Jul 9 18:56:49 2014 - [info] Non-candidate masters:
Wed Jul 9 18:56:49 2014 - [info] SLAVE2.COM(192.168.186.146:3306) Version=5.6.10-log (oldest major version between slaves) log-bin:enabled
Wed Jul 9 18:56:49 2014 - [info] Replicating from 192.168.186.141(192.168.186.141:3306)
Wed Jul 9 18:56:49 2014 - [info] Not candidate for the new Master (no_master is set)
Wed Jul 9 18:56:49 2014 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Wed Jul 9 18:56:49 2014 - [info] New master is 192.168.186.142(192.168.186.142:3306)
Wed Jul 9 18:56:49 2014 - [info] Starting master failover..
Wed Jul 9 18:56:49 2014 - [info]
From:
192.168.186.141 (current master)
+--192.168.186.142
+--SLAVE2.COM
To:
192.168.186.142 (new master)
+--SLAVE2.COM
Wed Jul 9 18:56:49 2014 - [info]
Wed Jul 9 18:56:49 2014 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Wed Jul 9 18:56:49 2014 - [info]
Wed Jul 9 18:56:49 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Wed Jul 9 18:56:49 2014 - [info] Sending binlog..
Wed Jul 9 18:56:50 2014 - [info] scp from local:/masterha/app1/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog to root@192.168.186.142:/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog succeeded.
Wed Jul 9 18:56:50 2014 - [info]
Wed Jul 9 18:56:50 2014 - [info] * Phase 3.4: Master Log Apply Phase..
Wed Jul 9 18:56:50 2014 - [info]
Wed Jul 9 18:56:50 2014 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Wed Jul 9 18:56:50 2014 - [info] Starting recovery on 192.168.186.142(192.168.186.142:3306)..
Wed Jul 9 18:56:50 2014 - [info] Generating diffs succeeded.
Wed Jul 9 18:56:50 2014 - [info] Waiting until all relay logs are applied.
Wed Jul 9 18:56:50 2014 - [info] done.
Wed Jul 9 18:56:50 2014 - [info] Getting slave status..
Wed Jul 9 18:56:50 2014 - [info] This slave(192.168.186.142)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000001:214). No need to recover from Exec_Master_Log_Pos.
Wed Jul 9 18:56:50 2014 - [info] Connecting to the target slave host 192.168.186.142, running recover script..
Wed Jul 9 18:56:50 2014 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=mha_mon --slave_host=192.168.186.142 --slave_ip=192.168.186.142 --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog --workdir=/var/tmp --target_version=5.6.10-log --timestamp=20140709185647 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
Wed Jul 9 18:56:50 2014 - [info]
MySQL client version is 5.6.10. Using --binary-mode.
Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog on 192.168.186.142:3306. This may take long time...
Applying log files succeeded.
Wed Jul 9 18:56:50 2014 - [info] All relay logs were successfully applied.
Wed Jul 9 18:56:50 2014 - [info] Getting new master's binlog name and position..
Wed Jul 9 18:56:50 2014 - [info] mysql-bin.000007:504
Wed Jul 9 18:56:50 2014 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.186.142', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000007', MASTER_LOG_POS=504, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Wed Jul 9 18:56:50 2014 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address.
Wed Jul 9 18:56:50 2014 - [info] Setting read_only=0 on 192.168.186.142(192.168.186.142:3306)..
Wed Jul 9 18:56:50 2014 - [info] ok.
Wed Jul 9 18:56:50 2014 - [info] ** Finished master recovery successfully.
Wed Jul 9 18:56:50 2014 - [info] * Phase 3: Master Recovery Phase completed.
Wed Jul 9 18:56:50 2014 - [info]
Wed Jul 9 18:56:50 2014 - [info] * Phase 4: Slaves Recovery Phase..
Wed Jul 9 18:56:50 2014 - [info]
Wed Jul 9 18:56:50 2014 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Wed Jul 9 18:56:50 2014 - [info]
Wed Jul 9 18:56:50 2014 - [info] -- Slave diff file generation on host SLAVE2.COM(192.168.186.146:3306) started, pid: 3135. Check tmp log /masterha/app1/SLAVE2.COM_3306_20140709185647.log if it takes time..
Wed Jul 9 18:56:50 2014 - [info]
Wed Jul 9 18:56:50 2014 - [info] Log messages from SLAVE2.COM ...
Wed Jul 9 18:56:50 2014 - [info]
Wed Jul 9 18:56:50 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Wed Jul 9 18:56:50 2014 - [info] End of log messages from SLAVE2.COM.
Wed Jul 9 18:56:50 2014 - [info] -- SLAVE2.COM(192.168.186.146:3306) has the latest relay log events.
Wed Jul 9 18:56:50 2014 - [info] Generating relay diff files from the latest slave succeeded.
Wed Jul 9 18:56:50 2014 - [info]
Wed Jul 9 18:56:50 2014 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Wed Jul 9 18:56:50 2014 - [info]
Wed Jul 9 18:56:50 2014 - [info] -- Slave recovery on host SLAVE2.COM(192.168.186.146:3306) started, pid: 3137. Check tmp log /masterha/app1/SLAVE2.COM_3306_20140709185647.log if it takes time..
Wed Jul 9 18:56:51 2014 - [info]
Wed Jul 9 18:56:51 2014 - [info] Log messages from SLAVE2.COM ...
Wed Jul 9 18:56:51 2014 - [info]
Wed Jul 9 18:56:50 2014 - [info] Sending binlog..
Wed Jul 9 18:56:51 2014 - [info] scp from local:/masterha/app1/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog to root@SLAVE2.COM:/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog succeeded.
Wed Jul 9 18:56:51 2014 - [info] Starting recovery on SLAVE2.COM(192.168.186.146:3306)..
Wed Jul 9 18:56:51 2014 - [info] Generating diffs succeeded.
Wed Jul 9 18:56:51 2014 - [info] Waiting until all relay logs are applied.
Wed Jul 9 18:56:51 2014 - [info] done.
Wed Jul 9 18:56:51 2014 - [info] Getting slave status..
Wed Jul 9 18:56:51 2014 - [info] This slave(SLAVE2.COM)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000001:214). No need to recover from Exec_Master_Log_Pos.
Wed Jul 9 18:56:51 2014 - [info] Connecting to the target slave host SLAVE2.COM, running recover script..
Wed Jul 9 18:56:51 2014 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=mha_mon --slave_host=SLAVE2.COM --slave_ip=192.168.186.146 --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog --workdir=/var/tmp --target_version=5.6.10-log --timestamp=20140709185647 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
Wed Jul 9 18:56:51 2014 - [info]
MySQL client version is 5.6.10. Using --binary-mode.
Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_192.168.186.141_3306_20140709185647.binlog on SLAVE2.COM:3306. This may take long time...
Applying log files succeeded.
Wed Jul 9 18:56:51 2014 - [info] All relay logs were successfully applied.
Wed Jul 9 18:56:51 2014 - [info] Resetting slave SLAVE2.COM(192.168.186.146:3306) and starting replication from the new master 192.168.186.142(192.168.186.142:3306)..
Wed Jul 9 18:56:51 2014 - [info] Executed CHANGE MASTER.
Wed Jul 9 18:56:51 2014 - [info] Slave started.
Wed Jul 9 18:56:51 2014 - [info] End of log messages from SLAVE2.COM.
Wed Jul 9 18:56:51 2014 - [info] -- Slave recovery on host SLAVE2.COM(192.168.186.146:3306) succeeded.
Wed Jul 9 18:56:51 2014 - [info] All new slave servers recovered successfully.
Wed Jul 9 18:56:51 2014 - [info]
Wed Jul 9 18:56:51 2014 - [info] * Phase 5: New master cleanup phease..
Wed Jul 9 18:56:51 2014 - [info]
Wed Jul 9 18:56:51 2014 - [info] Resetting slave info on the new master..
Wed Jul 9 18:56:51 2014 - [info] 192.168.186.142: Resetting slave info succeeded.
Wed Jul 9 18:56:51 2014 - [info] Master failover to 192.168.186.142(192.168.186.142:3306) completed successfully.
Wed Jul 9 18:56:51 2014 - [info]
----- Failover Report -----
app1: MySQL Master failover 192.168.186.141 to 192.168.186.142 succeeded
Master 192.168.186.141 is
Check MHA Manager logs at MANAGER.COM:/masterha/app1/manager.log for details.
Started automated(non-interactive) failover.
The latest slave 192.168.186.142(192.168.186.142:3306) has all relay logs for recovery.
Selected 192.168.186.142 as a new master.
192.168.186.142: OK: Applying all logs succeeded.
SLAVE2.COM: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
SLAVE2.COM: OK: Applying all logs succeeded. Slave started, replicating from 192.168.186.142.
192.168.186.142: Resetting slave info succeeded.
192.168.186.142: Resetting slave info succeeded.
看到192.168.186.142: Resetting slave info succeeded.
Master failover to 192.168.186.142(192.168.186.142:3306) completed successfully. 说明切换成功了
[root@slave2 ~]# mysql -e "show slave status\G"
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.186.142
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000007
Read_Master_Log_Pos: 504
Relay_Log_File: slave2-relay-bin.000002
Relay_Log_Pos: 283
Relay_Master_Log_File: mysql-bin.000007
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
看到已经切换到和142同步了本来是和141同步的此时的SALVE1.COM已经变成主了说明已然生效

切换后旧master的修复及重新上线
master已经由192.168.17.199 3306 切到了192.168.17.200 3306 实际环境中数据是在不断的变化的，而在切换点mha没有记录当时新master的log-file和log-pos 所以要想直接启动192.168.17.199 3306 然后change master to 192.168.17.200 3306的话是不行的，只能对新主或slave2做一个全备然后再恢复再change。
另外，当执行切换后管理节点上的masterha_manager进程会自动stop，所以等修复好后要再次执行启动----不要纠结ip不对，这个地方是个思路

重构

重构我想就不要我做了吧，就是这时候等于你的主挂了切换在 SLAVE1.COM 上变成了主因此重构我提供一种方案（方案多种），拿出一台新的服务器重新加入以 142为主做成从再 app1.conf 配置文件中加入新机器的信息

[root@MANAGER app1]# rm -rf app1.failover.complete

删除该文件后再次启动 manager 端即可

七．扩展 Keepalived

1master与slave端都安装keepalive

[root@MYSQL src]# wget http://www.keepalived.org/software/keepalived-1.2.12.tar.gz
[root@MYSQL src]# tar -xf keepalived-1.2.12.tar.gz
[root@MYSQL src]# cd keepalived-1.2.12
[root@MYSQL src]# yum -y install gcc gcc-c++ gcc-g77 ncurses-devel bison libaio-devel cmake libnl* libpopt* popt-static openssl-devel
[root@MYSQL keepalived-1.2.12]# ./configure
[root@MYSQL keepalived-1.2.12]# make && make install
[root@MYSQL src]#mkdir /etc/keepalived/
[root@MYSQL src]# cp /usr/local/etc/keepalived/keepalived.conf /etc/keepalived/
[root@MYSQL src]# cp /usr/local/etc/rc.d/init.d/keepalived /etc/init.d/
[root@MYSQL src]# cp /usr/local/etc/sysconfig/keepalived /etc/sysconfig/
[root@MYSQL src]#cp /usr/local/sbin/keepalived /usr/sbin/

2. 配置 keepalived

[root@MYSQL keepalived]# vi keepalived.conf
! Configuration File for keepalived
global_defs {
notification_email {
acassen@firewall.loc
failover@firewall.loc
sysadmin@firewall.loc
}
notification_email_from Alexandre.Cassen@firewall.loc
smtp_server 192.168.200.1
smtp_connect_timeout 30
router_id LVS_DEVEL ##配置是为了标识当前节点，两个节点的此项设置可相同，也可不相同
}
vrrp_instance VI_1 {
state MASTER #指定A节点为主节点备用节点上设置为BACKUP即可
interface eth0
virtual_router_id 51 #VRRP组名，两个节点的设置必须一样，以指明各个节点属于同一VRRP
priority 100 #主节点的优先级（1-254之间），备用节点必须比主节点优先级低
advert_int 1
authentication { #设置验证信息，两个节点必须一致
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.200.16 =====vip，指定一个vip
}
}

3. 检测 mysql 服务脚本

[root@MYSQL keepalived]# vi /root/check_mysql.sh

[root@node1 ~]# vi checkmysql.sh
#!/bin/bash
MYSQL=/usr/bin/mysql
MYSQL_HOST=127.0.0.1
MYSQL_USER=root
MYSQL_PASSWORD=ESBecs00
CHECK_TIME=3
MYSQL_OK=1
function check_mysql_helth (){
$MYSQL -h $MYSQL_HOST -u $MYSQL_USER -p$MYSQL_PASSWORD -e "show status;" >/dev/null 2>&1
if [ $? = 0 ] ;then
MYSQL_OK=1
else
MYSQL_OK=0
fi
return $MYSQL_OK
}
#echo "$MYSQL_OK $CHECK_TIME"

while [ $CHECK_TIME -ne 0 ]
do
let "CHECK_TIME -= 1"
check_mysql_helth
if [ $MYSQL_OK = 1 ] ; then
echo "#### Mysql is alive ####"
CHECK_TIME=0
exit 0
fi
if [ $MYSQL_OK -eq 0 ] && [ $CHECK_TIME -eq 0 ]
then
#pkill keepalived
echo "#### Mysql Down ####"
echo "####now kill keepalive#####"
/etc/init.d/keepalived stop
exit 1
fi
sleep 1
done

该脚本做一个计划任务每分钟做一次检查 MYSQL 服务如果挂掉了的话 VIP 就会在SALVE1.COM 上生效这样的话 client 端连接的 VIP 就会从 MYSQL.COM 上变到了SLAVE1.COM 上此时 SLAVE1.COM 由于 MHA 的生效已经变为主对外提供服务了 VIP 也在 SALVE1.COM 上所以从头到尾客户端只需要连接 VIP 就可以了

 [root@MYSQL ~]# crontab  -l */1 * * * * bash /root/checkmysql.sh

以上步骤再在下一次切换的 SLAVE 上做一次也就是 SLAVE1.COM 上做一次 keepalived.conf 配置文件不同的地方已做注释

4. 启动测试

两边启动 keepalived

[root@MYSQL keepalived]# keepalived -f /etc/keepalived/keepalived.conf

 [root@MYSQL keepalived]# ps -ef |grep keep

root 3230 1 0 23:27 ? 00:00:00 keepalived -f /etc/keepalived/keepalived.conf
root 3231 3230 0 23:27 ? 00:00:00 keepalived -f /etc/keepalived/keepalived.conf
root 3232 3230 0 23:27 ? 00:00:00 keepalived -f /etc/keepalived/keepalived.conf
root 3234 2538 0 23:27 pts/0 00:00:00 grep keep
 
[root@MYSQL keepalived]# ip a 1: lo:  mtu 16436 qdisc noqueue state UNKNOWN  link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00  inet 127.0.0.1/8 scope host lo  inet6 ::1/128 scope host   valid_lft forever preferred_lft forever 2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000  link/ether 00:0c:29:c9:85:ba brd ff:ff:ff:ff:ff:ff  inet 192.168.186.141/24 brd 192.168.186.255 scope global eth0  inet 192.168.200.16/32 scope global eth0  inet6 fe80::20c:29ff:fec9:85ba/64 scope link   valid_lft forever preferred_lft forever

测试关闭该主机的 MYSQL 服务看看 VIP 是否飘到配好 keepalive 的 SLAVE1.COM上 MHA 中的 MYSQL 主是不是也变成了 SLAVE1.COM

如是至此全部完成。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。谢谢！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！！

一点注意事项： MHA时基于传统的log + postion开发的，维护GTID肯定不灵光了，但是mha0.56已经开始支持gtid了，可以去作者的博客看到

原文地址： http://www.tuicool.com/articles/uaeE3y
参考文章： http://ylw6006.blog.51cto.com/470441/890360/

附上node与manager的下载地址
http://pan.baidu.com/s/1DixZo
http://pan.baidu.com/s/1hqtZtac

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/29096438/viewspace-1818461/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/29096438/viewspace-1818461/

cml2016

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【MHA】mysql高可用之MHA

mysql高可用MHA的搭建与测试 mha官网https://code.google.com/p/mysql-master-ha/ MHA分管理节点和数据库节点，数据库节点由mysql主从或...
复制链接

扫一扫