MHA+KEEPALIVED

最新推荐文章于 2023-01-04 17:35:54 发布

ciyiyi3536

最新推荐文章于 2023-01-04 17:35:54 发布

阅读量115

点赞数

文章标签：数据库运维开发工具

MHA是由日本人用Perl写的一套Mysql故障切换方案以保障数据库的高可用性，它的功能是能在0-30s之内实现主Mysql故障转移.MHA里有两个角色一个是node节点一个是manager节点，要实现这个MHA，必须最少要三台数据库服务器，
一主多备，即一台充当master，一台充当master的备份机，另外一台是从属机，这里实验为了实现更好的效果使用四台机器，需要说明的是一旦主服务器宕机，备份机即开始充当master提供服务，如果主服务器上线也不会再成为master了，因为如果这样数据库的一致性就被改变了。
mha下载地址
https://code.google.com/p/mysql-master-ha/
mha4mysql-manager-0.56-0.el6.noarch.rpm
mha4mysql-node-0.56-0.el6.noarch.rpm

192.168.1.231 master
192.168.1.232 slave_1
192.168.1.233 slave_2
192.168.1.234 manager

注：
1、每台数据库服务都要开启bin-log日志
2、user表不要有空的值 (user,host)
3、 每个数据库都要建立复制的用户和密码，并且要一样。

在从库中
set global read_only=1; //开启只读
set global relay_log_purge=0; //不开启read_only日志自动清理
或将其写入my.cnf配置文件中

4台机器面ssh密码连接
ssh-keygen -t rsa
cd /root/.ssh/
-rw------- 1 sweet sweet 1679 Dec 21 11:56 id_rsa
-rw-r--r-- 1 sweet sweet 394 Dec 21 11:56 id_rsa.pub
-rw-r--r-- 1 sweet sweet 1110 Dec 21 21:19 known_hosts

scp ./id_rsa.pub 192.168.1.232/233/234:/root/.ssh/authorized_keys 所有的服务都要给一个
如果多个服务都要往authorized_keys里写密匙
cat scp过来的文件 >>authorized_keys

集群中有几个服务器就做几次，实现四台主机之间相互免密钥登录

4台服务器全部安装
# rpm -ivh http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.3-1.el6.rf.x86_64.rpm
# yum clean all
# yum make
# yum -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Config-IniFiles ncftp perl-Params-Validate perl-CPAN perl-Test-Mock-LWP.noarch perl-LWP-Authen-Negotiate.noarch perl-devel
# yum install perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker
安装node包，manager也要装mha4mysql-node-0.56-0.el6.noarch.rpm
# rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm

manager 234
需要安装包
rpm -ivh perl-Mail-Sender-0.8.16-1.el6.rf.noarch.rpm
rpm -ivh perl-Mail-Sendmail-0.79-1.2.el6.rf.noarch.rpm
rpm -ivh perl-Parallel-ForkManager-0.7.5-2.2.el6.rf.noarch.rpm
rpm -ivh perl-Config-Tiny-2.12-1.el6.rfx.noarch.rpm
yum install perl-Log-Dispatch
yum install perl-Time-HiRes
安装管理软件
rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm

#mkdir /etc/masterha 配置文件路径
#mkdir -p /master/app1 工作路径
#mkdir -p /scripts
#cd /etc/masterha
#touch app1.cnf

#vim /etc/masterha/app1.cnf
[server default]
manager_workdir=/master/app1
manager_log=/master/app1/manager.log
user=manager ------下面会创建（集群管理用户）
password=oracle ------下面创建的密码
ssh_user=root ---------ssh的用户名
repl_user=backup ---------主从同步的用户名
repl_password=backup ------主从同步的密码
ping_interval=1 ------ping间隔，用来检测master是否正常
shutdown_script=""
master_ip_online_change_script=""
report_script=""

注：数据库端口不是默认的要加上port=端口号
[server1]
hostname=192.168.1.232
master_binlog_dir=/data/
candidate_master=1 #备用切换的master
check_repl_delay=0 #切换时忽略延迟

[server2]
hostname=192.168.1.231
master_binlog_dir=/data/
candidate_master=1

[server3]
hostname=192.168.1.233
master_binlog_dir=/data/
no_master=1
ignore_fail=1 # 1 从挂了不影响集群，一样可以用

测试信任登录是否成功 ----检测集群中的ssh是否相互免密钥登录
# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Wed Dec 9 19:09:08 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Dec 9 19:09:08 2015 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Wed Dec 9 19:09:08 2015 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Wed Dec 9 19:09:08 2015 - [info] Starting SSH connection tests..
Wed Dec 9 19:09:09 2015 - [debug]
Wed Dec 9 19:09:08 2015 - [debug] Connecting via SSH from root@192.168.1.232(192.168.1.232:22) to root@192.168.1.231(192.168.1.231:22)..
Wed Dec 9 19:09:09 2015 - [debug] ok.
Wed Dec 9 19:09:09 2015 - [debug] Connecting via SSH from root@192.168.1.232(192.168.1.232:22) to root@192.168.1.233(192.168.1.233:22)..
Wed Dec 9 19:09:09 2015 - [debug] ok.
Wed Dec 9 19:09:10 2015 - [debug]
Wed Dec 9 19:09:09 2015 - [debug] Connecting via SSH from root@192.168.1.233(192.168.1.233:22) to root@192.168.1.232(192.168.1.232:22)..
Wed Dec 9 19:09:10 2015 - [debug] ok.
Wed Dec 9 19:09:10 2015 - [debug] Connecting via SSH from root@192.168.1.233(192.168.1.233:22) to root@192.168.1.231(192.168.1.231:22)..
Wed Dec 9 19:09:10 2015 - [debug] ok.
Wed Dec 9 19:09:10 2015 - [debug]
Wed Dec 9 19:09:09 2015 - [debug] Connecting via SSH from root@192.168.1.231(192.168.1.231:22) to root@192.168.1.232(192.168.1.232:22)..
Wed Dec 9 19:09:09 2015 - [debug] ok.
Wed Dec 9 19:09:09 2015 - [debug] Connecting via SSH from root@192.168.1.231(192.168.1.231:22) to root@192.168.1.233(192.168.1.233:22)..
Wed Dec 9 19:09:10 2015 - [debug] ok.
Wed Dec 9 19:09:10 2015 - [info] All SSH connection tests passed successfully.
这为正常

登入每台数据库执行
grant all privileges on *.* to manager@'192.168.1.%' identified by 'oracle'; ----上面配置文件的用户名密码
flush privileges

每台数据库服务器都要执行
ln -s /opt/mysql/bin/* /usr/bin ***********很重要，每台数据库服务器都要执行***********************************

manager上 234
masterha_check_repl工具验证mysql复制是否成功
# masterha_check_repl --conf=/etc/masterha/app1.cnf
Wed Dec 9 19:59:13 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Dec 9 19:59:13 2015 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Wed Dec 9 19:59:13 2015 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Wed Dec 9 19:59:13 2015 - [info] MHA::MasterMonitor version 0.56.
Wed Dec 9 19:59:13 2015 - [info] GTID failover mode = 0
Wed Dec 9 19:59:13 2015 - [info] Dead Servers:
Wed Dec 9 19:59:13 2015 - [info] Alive Servers:
Wed Dec 9 19:59:13 2015 - [info] 192.168.1.232(192.168.1.232:3306)
Wed Dec 9 19:59:13 2015 - [info] 192.168.1.231(192.168.1.231:3306)
Wed Dec 9 19:59:13 2015 - [info] 192.168.1.233(192.168.1.233:3306)
Wed Dec 9 19:59:13 2015 - [info] Alive Slaves:
Wed Dec 9 19:59:13 2015 - [info] 192.168.1.232(192.168.1.232:3306) Version=5.6.14-log (oldest major version between slaves) log-bin:enabled
Wed Dec 9 19:59:13 2015 - [info] Replicating from 192.168.1.231(192.168.1.231:3306)
Wed Dec 9 19:59:13 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Dec 9 19:59:13 2015 - [info] 192.168.1.233(192.168.1.233:3306) Version=5.6.14-log (oldest major version between slaves) log-bin:enabled
Wed Dec 9 19:59:13 2015 - [info] Replicating from 192.168.1.231(192.168.1.231:3306)
Wed Dec 9 19:59:13 2015 - [info] Not candidate for the new Master (no_master is set)
Wed Dec 9 19:59:13 2015 - [info] Current Alive Master: 192.168.1.231(192.168.1.231:3306)
Wed Dec 9 19:59:13 2015 - [info] Checking slave configurations..
Wed Dec 9 19:59:13 2015 - [info] Checking replication filtering settings..
Wed Dec 9 19:59:13 2015 - [info] binlog_do_db= , binlog_ignore_db=
Wed Dec 9 19:59:13 2015 - [info] Replication filtering check ok.
Wed Dec 9 19:59:13 2015 - [info] GTID (with auto-pos) is not supported
Wed Dec 9 19:59:13 2015 - [info] Starting SSH connection tests..
Wed Dec 9 19:59:15 2015 - [info] All SSH connection tests passed successfully.
Wed Dec 9 19:59:15 2015 - [info] Checking MHA Node version..
Wed Dec 9 19:59:15 2015 - [info] Version check ok.
Wed Dec 9 19:59:15 2015 - [info] Checking SSH publickey authentication settings on the current master..
Wed Dec 9 19:59:15 2015 - [info] HealthCheck: SSH to 192.168.1.231 is reachable.
Wed Dec 9 19:59:16 2015 - [info] Master MHA Node version is 0.56.
Wed Dec 9 19:59:16 2015 - [info] Checking recovery script configurations on 192.168.1.231(192.168.1.231:3306)..
Wed Dec 9 19:59:16 2015 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/ --output_file=/var/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000004
Wed Dec 9 19:59:16 2015 - [info] Connecting to root@192.168.1.231(192.168.1.231:22)..
Creating /var/tmp if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/, up to mysql-bin.000004
Wed Dec 9 19:59:16 2015 - [info] Binlog setting check done.
Wed Dec 9 19:59:16 2015 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Dec 9 19:59:16 2015 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='manager' --slave_host=192.168.1.232 --slave_ip=192.168.1.232 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.14-log --manager_version=0.56 --relay_log_info=/data/relay-log.info --relay_dir=/data/ --slave_pass=xxx
Wed Dec 9 19:59:16 2015 - [info] Connecting to root@192.168.1.232(192.168.1.232:22)..
Checking slave recovery environment settings..
Opening /data/relay-log.info ... ok.
Relay log found at /data, up to log-relay-bin.000004
Temporary relay log file is /data/log-relay-bin.000004
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Dec 9 19:59:16 2015 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='manager' --slave_host=192.168.1.233 --slave_ip=192.168.1.233 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.14-log --manager_version=0.56 --relay_log_info=/data/relay-log.info --relay_dir=/data/ --slave_pass=xxx
Wed Dec 9 19:59:16 2015 - [info] Connecting to root@192.168.1.233(192.168.1.233:22)..
Checking slave recovery environment settings..
Opening /data/relay-log.info ... ok.
Relay log found at /data, up to proxy-relay-bin.000004
Temporary relay log file is /data/proxy-relay-bin.000004
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Dec 9 19:59:17 2015 - [info] Slaves settings check done.
Wed Dec 9 19:59:17 2015 - [info]
192.168.1.231(192.168.1.231:3306) (current master)
+--192.168.1.232(192.168.1.232:3306)
+--192.168.1.233(192.168.1.233:3306)

Wed Dec 9 19:59:17 2015 - [info] Checking replication health on 192.168.1.232..
Wed Dec 9 19:59:17 2015 - [info] ok.
Wed Dec 9 19:59:17 2015 - [info] Checking replication health on 192.168.1.233..
Wed Dec 9 19:59:17 2015 - [info] ok.
Wed Dec 9 19:59:17 2015 - [warning] master_ip_failover_script is not defined.
Wed Dec 9 19:59:17 2015 - [warning] shutdown_script is not defined.
Wed Dec 9 19:59:17 2015 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
---这为正常

manager上 234
启动管理
masterha_manager --conf=/etc/masterha/app1.cnf --ignore_last_failover ---自己加上后台运行
--ignore_last_failover：检查上一个故障转移状态如果最后一个故障转移以错误结束，或最近一次故障转移最近完成，则MHA将在此停止，并且不会启动故障转移。可以使用ignore_last_failover和wait_on_failover_error参数更改此行为。

测试切换
关闭master的数据库，看日志状态，还有主是否切换到你指定的被主上了

注意：iptable -L和selinux 折腾惨了，本来以为关闭了，结果只是临时关闭的，机器有重启过

slave 232
show slave status //查看从的状态，应该为空、
show full processlist; //查看从的链接进程

在切换回来（切换到231上）
在原来的主执行 (231)
reset master;
change master to master_host='192.168.1.232',master_user='backup',master_password='backup',master_log_file='mysql-bin.000011',master_log_pos=120;
将232设为主

manager上 234
查看主从状态
# masterha_check_repl --conf=/etc/masterha/app1.cnf

手动在线切换主从
# masterha_master_switch --master_state=alive --conf=/etc/masterha/app1.cnf ********************这样切换现在的master(232)将不启动(确保要切换的主和现在的主数据是同步的*********************
Dec 11 10:39:14 2015 - [info] MHA::MasterRotate version 0.56.
Fri Dec 11 10:39:14 2015 - [info] Starting online master switch..
Fri Dec 11 10:39:14 2015 - [info]
Fri Dec 11 10:39:14 2015 - [info] * Phase 1: Configuration Check Phase..
Fri Dec 11 10:39:14 2015 - [info]
Fri Dec 11 10:39:14 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Dec 11 10:39:14 2015 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Fri Dec 11 10:39:14 2015 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Fri Dec 11 10:39:15 2015 - [info] GTID failover mode = 0
Fri Dec 11 10:39:15 2015 - [info] Current Alive Master: 192.168.1.232(192.168.1.232:3306)
Fri Dec 11 10:39:15 2015 - [info] Alive Slaves:
Fri Dec 11 10:39:15 2015 - [info] 192.168.1.231(192.168.1.231:3306) Version=5.6.14-log (oldest major version between slaves) log-bin:enabled
Fri Dec 11 10:39:15 2015 - [info] Replicating from 192.168.1.232(192.168.1.232:3306)
Fri Dec 11 10:39:15 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Dec 11 10:39:15 2015 - [info] 192.168.1.233(192.168.1.233:3306) Version=5.6.14-log (oldest major version between slaves) log-bin:enabled
Fri Dec 11 10:39:15 2015 - [info] Replicating from 192.168.1.232(192.168.1.232:3306)
Fri Dec 11 10:39:15 2015 - [info] Not candidate for the new Master (no_master is set)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.1.232(192.168.1.232:3306)? (YES/no): yes
Fri Dec 11 10:39:17 2015 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Fri Dec 11 10:39:17 2015 - [info] ok.
Fri Dec 11 10:39:17 2015 - [info] Checking MHA is not monitoring or doing failover..
Fri Dec 11 10:39:17 2015 - [info] Checking replication health on 192.168.1.231..
Fri Dec 11 10:39:17 2015 - [info] ok.
Fri Dec 11 10:39:17 2015 - [info] Checking replication health on 192.168.1.233..
Fri Dec 11 10:39:17 2015 - [info] ok.
Fri Dec 11 10:39:17 2015 - [info] Searching new master from slaves..
Fri Dec 11 10:39:17 2015 - [info] Candidate masters from the configuration file:
Fri Dec 11 10:39:17 2015 - [info] 192.168.1.231(192.168.1.231:3306) Version=5.6.14-log (oldest major version between slaves) log-bin:enabled
Fri Dec 11 10:39:17 2015 - [info] Replicating from 192.168.1.232(192.168.1.232:3306)
Fri Dec 11 10:39:17 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Dec 11 10:39:17 2015 - [info] 192.168.1.232(192.168.1.232:3306) Version=5.6.14-log log-bin:enabled
Fri Dec 11 10:39:17 2015 - [info] Non-candidate masters:
Fri Dec 11 10:39:17 2015 - [info] 192.168.1.233(192.168.1.233:3306) Version=5.6.14-log (oldest major version between slaves) log-bin:enabled
Fri Dec 11 10:39:17 2015 - [info] Replicating from 192.168.1.232(192.168.1.232:3306)
Fri Dec 11 10:39:17 2015 - [info] Not candidate for the new Master (no_master is set)
Fri Dec 11 10:39:17 2015 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Fri Dec 11 10:39:17 2015 - [info]
From:
192.168.1.232(192.168.1.232:3306) (current master)
+--192.168.1.231(192.168.1.231:3306)
+--192.168.1.233(192.168.1.233:3306)

To:
192.168.1.231(192.168.1.231:3306) (new master)
+--192.168.1.233(192.168.1.233:3306)

Starting master switch from 192.168.1.232(192.168.1.232:3306) to 192.168.1.231(192.168.1.231:3306)? (yes/NO): yes
Fri Dec 11 10:39:19 2015 - [info] Checking whether 192.168.1.231(192.168.1.231:3306) is ok for the new master..
Fri Dec 11 10:39:19 2015 - [info] ok.
Fri Dec 11 10:39:19 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Dec 11 10:39:19 2015 - [info]
Fri Dec 11 10:39:19 2015 - [info] * Phase 2: Rejecting updates Phase..
Fri Dec 11 10:39:19 2015 - [info]
master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): yes
Fri Dec 11 10:39:28 2015 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Fri Dec 11 10:39:28 2015 - [info] Executing FLUSH TABLES WITH READ LOCK..
Fri Dec 11 10:39:28 2015 - [info] ok.
Fri Dec 11 10:39:28 2015 - [info] Orig master binlog:pos is log-bin.000003:1508.
Fri Dec 11 10:39:28 2015 - [info] Waiting to execute all relay logs on 192.168.1.231(192.168.1.231:3306)..
Fri Dec 11 10:39:28 2015 - [info] master_pos_wait(log-bin.000003:1508) completed on 192.168.1.231(192.168.1.231:3306). Executed 0 events.
Fri Dec 11 10:39:28 2015 - [info] done.
Fri Dec 11 10:39:28 2015 - [info] Getting new master's binlog name and position..
Fri Dec 11 10:39:28 2015 - [info] mysql-bin.000009:120
Fri Dec 11 10:39:28 2015 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.1.231', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000009', MASTER_LOG_POS=120, MASTER_USER='backup', MASTER_PASSWORD='xxx';
Fri Dec 11 10:39:28 2015 - [info]
Fri Dec 11 10:39:28 2015 - [info] * Switching slaves in parallel..
Fri Dec 11 10:39:28 2015 - [info]
Fri Dec 11 10:39:28 2015 - [info] -- Slave switch on host 192.168.1.233(192.168.1.233:3306) started, pid: 26287
Fri Dec 11 10:39:28 2015 - [info]
Fri Dec 11 10:39:28 2015 - [info] Log messages from 192.168.1.233 ...
Fri Dec 11 10:39:28 2015 - [info]
Fri Dec 11 10:39:28 2015 - [info] Waiting to execute all relay logs on 192.168.1.233(192.168.1.233:3306)..
Fri Dec 11 10:39:28 2015 - [info] master_pos_wait(log-bin.000003:1508) completed on 192.168.1.233(192.168.1.233:3306). Executed 0 events.
Fri Dec 11 10:39:28 2015 - [info] done.
Fri Dec 11 10:39:28 2015 - [info] Resetting slave 192.168.1.233(192.168.1.233:3306) and starting replication from the new master 192.168.1.231(192.168.1.231:3306)..
Fri Dec 11 10:39:28 2015 - [info] Executed CHANGE MASTER.
Fri Dec 11 10:39:28 2015 - [info] Slave started.
Fri Dec 11 10:39:28 2015 - [info] End of log messages from 192.168.1.233 ...
Fri Dec 11 10:39:28 2015 - [info]
Fri Dec 11 10:39:28 2015 - [info] -- Slave switch on host 192.168.1.233(192.168.1.233:3306) succeeded.
Fri Dec 11 10:39:28 2015 - [info] Unlocking all tables on the orig master:
Fri Dec 11 10:39:28 2015 - [info] Executing UNLOCK TABLES..
Fri Dec 11 10:39:28 2015 - [info] ok.
Fri Dec 11 10:39:28 2015 - [info] All new slave servers switched successfully.
Fri Dec 11 10:39:28 2015 - [info]
Fri Dec 11 10:39:28 2015 - [info] * Phase 5: New master cleanup phase..
Fri Dec 11 10:39:28 2015 - [info]
Fri Dec 11 10:39:28 2015 - [info] 192.168.1.231: Resetting slave info succeeded.
Fri Dec 11 10:39:28 2015 - [info] Switching master to 192.168.1.231(192.168.1.231:3306) completed successfully.
这为ok

232上
重新指定主（这时232上既不是主又不是从，需要重新指定）
reset master；
change............;

或者（也是手工在线切换）
# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.1.231 --orig_master_is_new_slave
*********指向新的master，并把现在的master变为slave，不加的--orig_master_is_new_slave的话现在的master将不启动,最好是这样启动（但要注意，确保232上面之前没有写入数据，如有写入，需要将231和232同步后，232锁库）*************************

[root@manager ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.1.231 --orig_master_is_new_slave
Fri Dec 18 13:11:57 2015 - [info] MHA::MasterRotate version 0.56.
Fri Dec 18 13:11:57 2015 - [info] Starting online master switch..
Fri Dec 18 13:11:57 2015 - [info]
Fri Dec 18 13:11:57 2015 - [info] * Phase 1: Configuration Check Phase..
Fri Dec 18 13:11:57 2015 - [info]
Fri Dec 18 13:11:57 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Dec 18 13:11:57 2015 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Fri Dec 18 13:11:57 2015 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Fri Dec 18 13:11:57 2015 - [info] GTID failover mode = 0
Fri Dec 18 13:11:57 2015 - [info] Current Alive Master: 192.168.1.232(192.168.1.232:3306)
Fri Dec 18 13:11:57 2015 - [info] Alive Slaves:
Fri Dec 18 13:11:57 2015 - [info] 192.168.1.231(192.168.1.231:3306) Version=5.6.14-log (oldest major version between slaves) log-bin:enabled
Fri Dec 18 13:11:57 2015 - [info] Replicating from 192.168.1.232(192.168.1.232:3306)
Fri Dec 18 13:11:57 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Dec 18 13:11:57 2015 - [info] 192.168.1.233(192.168.1.233:3306) Version=5.6.14-log (oldest major version between slaves) log-bin:enabled
Fri Dec 18 13:11:57 2015 - [info] Replicating from 192.168.1.232(192.168.1.232:3306)
Fri Dec 18 13:11:57 2015 - [info] Not candidate for the new Master (no_master is set)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.1.232(192.168.1.232:3306)? (YES/no): yes
Fri Dec 18 13:12:08 2015 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Fri Dec 18 13:12:08 2015 - [info] ok.
Fri Dec 18 13:12:08 2015 - [info] Checking MHA is not monitoring or doing failover..
Fri Dec 18 13:12:08 2015 - [info] Checking replication health on 192.168.1.231..
Fri Dec 18 13:12:08 2015 - [info] ok.
Fri Dec 18 13:12:08 2015 - [info] Checking replication health on 192.168.1.233..
Fri Dec 18 13:12:08 2015 - [info] ok.
Fri Dec 18 13:12:08 2015 - [info] 192.168.1.231 can be new master.
Fri Dec 18 13:12:08 2015 - [info]
From:
192.168.1.232(192.168.1.232:3306) (current master)
+--192.168.1.231(192.168.1.231:3306)
+--192.168.1.233(192.168.1.233:3306)

To:
192.168.1.231(192.168.1.231:3306) (new master)
+--192.168.1.233(192.168.1.233:3306)
+--192.168.1.232(192.168.1.232:3306)

Starting master switch from 192.168.1.232(192.168.1.232:3306) to 192.168.1.231(192.168.1.231:3306)? (yes/NO): yes
Fri Dec 18 13:12:15 2015 - [info] Checking whether 192.168.1.231(192.168.1.231:3306) is ok for the new master..
Fri Dec 18 13:12:15 2015 - [info] ok.
Fri Dec 18 13:12:15 2015 - [info] 192.168.1.232(192.168.1.232:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Fri Dec 18 13:12:15 2015 - [info] 192.168.1.232(192.168.1.232:3306): Resetting slave pointing to the dummy host.
Fri Dec 18 13:12:15 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Dec 18 13:12:15 2015 - [info]
Fri Dec 18 13:12:15 2015 - [info] * Phase 2: Rejecting updates Phase..
Fri Dec 18 13:12:15 2015 - [info]
master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): yes
Fri Dec 18 13:12:37 2015 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Fri Dec 18 13:12:37 2015 - [info] Executing FLUSH TABLES WITH READ LOCK..
Fri Dec 18 13:12:37 2015 - [info] ok.
Fri Dec 18 13:12:37 2015 - [info] Orig master binlog:pos is log-bin.000003:1508.
Fri Dec 18 13:12:37 2015 - [info] Waiting to execute all relay logs on 192.168.1.231(192.168.1.231:3306)..
Fri Dec 18 13:12:37 2015 - [info] master_pos_wait(log-bin.000003:1508) completed on 192.168.1.231(192.168.1.231:3306). Executed 0 events.
Fri Dec 18 13:12:37 2015 - [info] done.
Fri Dec 18 13:12:37 2015 - [info] Getting new master's binlog name and position..
Fri Dec 18 13:12:37 2015 - [info] mysql-bin.000001:120
Fri Dec 18 13:12:37 2015 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.1.231', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=120, MASTER_USER='backup', MASTER_PASSWORD='xxx';
Fri Dec 18 13:12:37 2015 - [info]
Fri Dec 18 13:12:37 2015 - [info] * Switching slaves in parallel..
Fri Dec 18 13:12:37 2015 - [info]
Fri Dec 18 13:12:37 2015 - [info] -- Slave switch on host 192.168.1.233(192.168.1.233:3306) started, pid: 23956
Fri Dec 18 13:12:37 2015 - [info]
Fri Dec 18 13:12:37 2015 - [info] Log messages from 192.168.1.233 ...
Fri Dec 18 13:12:37 2015 - [info]
Fri Dec 18 13:12:37 2015 - [info] Waiting to execute all relay logs on 192.168.1.233(192.168.1.233:3306)..
Fri Dec 18 13:12:37 2015 - [info] master_pos_wait(log-bin.000003:1508) completed on 192.168.1.233(192.168.1.233:3306). Executed 0 events.
Fri Dec 18 13:12:37 2015 - [info] done.
Fri Dec 18 13:12:37 2015 - [info] Resetting slave 192.168.1.233(192.168.1.233:3306) and starting replication from the new master 192.168.1.231(192.168.1.231:3306)..
Fri Dec 18 13:12:37 2015 - [info] Executed CHANGE MASTER.
Fri Dec 18 13:12:37 2015 - [info] Slave started.
Fri Dec 18 13:12:37 2015 - [info] End of log messages from 192.168.1.233 ...
Fri Dec 18 13:12:37 2015 - [info]
Fri Dec 18 13:12:37 2015 - [info] -- Slave switch on host 192.168.1.233(192.168.1.233:3306) succeeded.
Fri Dec 18 13:12:37 2015 - [info] Unlocking all tables on the orig master:
Fri Dec 18 13:12:37 2015 - [info] Executing UNLOCK TABLES..
Fri Dec 18 13:12:37 2015 - [info] ok.
Fri Dec 18 13:12:37 2015 - [info] Starting orig master as a new slave..
Fri Dec 18 13:12:37 2015 - [info] Resetting slave 192.168.1.232(192.168.1.232:3306) and starting replication from the new master 192.168.1.231(192.168.1.231:3306)..
Fri Dec 18 13:12:37 2015 - [info] Executed CHANGE MASTER.
Fri Dec 18 13:12:37 2015 - [info] Slave started.
Fri Dec 18 13:12:37 2015 - [info] All new slave servers switched successfully.
Fri Dec 18 13:12:37 2015 - [info]
Fri Dec 18 13:12:37 2015 - [info] * Phase 5: New master cleanup phase..
Fri Dec 18 13:12:37 2015 - [info]
Fri Dec 18 13:12:37 2015 - [info] 192.168.1.231: Resetting slave info succeeded.
Fri Dec 18 13:12:37 2015 - [info] Switching master to 192.168.1.231(192.168.1.231:3306) completed successfully
这里需要手动输入yes

在到刚刚切换的主上看（231上）
show full processlist;

每次切换后一定注意app1.failover.complete这个文件，切换完成后要把这个文件删除掉，不然下次切换回报错

**********************************************************************************************************************************************************************************************************
安装keepalived

master和slave_1上安装 (231,232)
下载地址：
http://www.keepalived.org/software/
keepalived-1.2.19.tar.gz
解压
cd keepalived-1.2.19
要先安装openssl-devel包
# yum install openssl-devel -y
#./configure
#make
#make install
#mkdir /etc/keepalived/
# cp /usr/local/etc/keepalived/keepalived.conf /etc/keepalived/
# cp /usr/local/etc/rc.d/init.d/keepalived /etc/init.d/
# cp /usr/local/etc/sysconfig/keepalived /etc/sysconfig/
# cp /usr/local/sbin/keepalived /usr/sbin/

# vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived

global_defs {
notification_email {
xxx@126.com 接收邮箱，可以有多个，一行一个
}
notification_email_from ccc@126.com 从谁那发出
smtp_server mail.xxxxx.com 邮箱服务器
smtp_connect_timeout 30 发送邮箱的超时时间
router_id master 主机名
}

vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 60
priority 100 优先级，主的要不从的高（主是100，从就80吧）
advert_int 1
nopreempt 不抢占资源，及时服务重启了也不会把vip抢回来（主上配置就好了，从不要配置）
authentication {
auth_type PASS 认证方式
auth_pass 1111 认证密码（主从一样）
}
virtual_ipaddress {
192.168.1.236 虚拟ip
}
}

virtual_server 192.168.1.236 3306 { 端口要和服务的端口一致
delay_loop 2 健康检查时间
lb_algo wrr 调度算法rr|wrr|lc|wlc|lblc|sh|dh
lb_kind DR 负载均衡转发规则 NAT|DR|TUN
nat_mask 255.255.255.0
persistence_timeout 50 回话保持时间
protocol TCP

real_server 192.168.1.231 3306 { （主从的ip不一样）
weight 1 默认为1,0为失效
#notify_down /opt/script/mysql_down.sh 检测到server down后，执行脚本 (没用这种方法，不知为什么脚本不执行)
TCP_CHECK{
connect_timeout 3 连接超时时间
nb_get_retry 3 重连次数
connect_port 3306 健康检查的端口的端口
}
}
}

master 231（只在主上做就好了）
用脚本检测服务，并控制vip和keepalive服务
脚本如下：
root@master bin]# vim /opt/script/check.sh
#! /bin/bash
MYSQL=/opt/mysql/bin/mysql
MYSQL_HOST=127.0.0.1
MYSQL_USER=root
MYSQL_PASSWORD=oracle
CHECK_TIME=3
#mysql is working MYSQL_OK is 1 , mysql down MYSQL_OK is 0
MYSQL_OK=1
function check_mysql_helth(){
$MYSQL -u $MYSQL_USER -p$MYSQL_PASSWORD -e "select version();" >/dev/null 2>&1
if [ $? = 0 ] ;then
MYSQL_OK=1
else
MYSQL_OK=0
fi
return $MYSQL_OK
}
while [ $CHECK_TIME -ne 0 ]
do
let "CHECK_TIME -= 1"
check_mysql_helth
if [ $MYSQL_OK = 1 ] ; then
CHECK_TIME=0
exit 0
fi
if [ $MYSQL_OK -eq 0 ] && [ $CHECK_TIME -eq 0 ]
then
pkill keepalived
exit 1
fi
sleep 1
done

将脚本写入定时任务里（保证每3秒执行一次检测脚本）
[root@master bin]# crontab -l
*/1 * * * * bash /opt/script/check.sh
*/1 * * * * sleep 3 && /opt/script/check.sh
*/1 * * * * sleep 6 && /opt/script/check.sh
*/1 * * * * sleep 9 && /opt/script/check.sh
*/1 * * * * sleep 12 && /opt/script/check.sh
*/1 * * * * sleep 15 && /opt/script/check.sh
*/1 * * * * sleep 18 && /opt/script/check.sh
*/1 * * * * sleep 21 && /opt/script/check.sh
*/1 * * * * sleep 24 && /opt/script/check.sh
*/1 * * * * sleep 27 && /opt/script/check.sh
*/1 * * * * sleep 30 && /opt/script/check.sh
*/1 * * * * sleep 33 && /opt/script/check.sh
*/1 * * * * sleep 36 && /opt/script/check.sh
*/1 * * * * sleep 39 && /opt/script/check.sh
*/1 * * * * sleep 42 && /opt/script/check.sh
*/1 * * * * sleep 45 && /opt/script/check.sh
*/1 * * * * sleep 48 && /opt/script/check.sh
*/1 * * * * sleep 51 && /opt/script/check.sh
*/1 * * * * sleep 54 && /opt/script/check.sh
*/1 * * * * sleep 57 && /opt/script/check.sh
没隔3秒执行一次，这样能够保证vip快速的飘移到另一机器上去

测试：
1.关闭主库，查看主从的情况，和vip是否漂移。
2.关闭服务器，查看主从的情况，和vip是否漂移。