MHA+KEEPALIVED

MHA是由日本人用Perl写的一套Mysql故障切换方案以保障数据库的高可用性,它的功能是能在0-30s之内实现主Mysql故障转移.MHA里有两个角色一个是node节点 一个是manager节点,要实现这个MHA,必须最少要三台数据库服务器,
一主多备,即一台充当master,一台充当master的备份机,另外一台是从属机,这里实验为了实现更好的效果使用四台机器,需要说明的是一旦主服务器宕机,备份机即开始充当master提供服务,如果主服务器上线也不会再成为master了,因为如果这样数据库的一致性就被改变了。
mha下载地址
https://code.google.com/p/mysql-master-ha/    
mha4mysql-manager-0.56-0.el6.noarch.rpm
mha4mysql-node-0.56-0.el6.noarch.rpm


192.168.1.231  master
192.168.1.232  slave_1
192.168.1.233  slave_2
192.168.1.234  manager

注:
1、每台数据库服务都要开启bin-log日志
2、user表不要有空的值 (user,host)
3、 每个数据库都要建立复制的用户和密码,并且要一样。

在从库中
set global read_only=1;                //开启只读
set global relay_log_purge=0;       //不开启read_only日志自动清理
或将其写入my.cnf配置文件中


4台机器面ssh密码连接
ssh-keygen -t rsa
cd /root/.ssh/
-rw------- 1 sweet sweet 1679 Dec 21 11:56 id_rsa
-rw-r--r-- 1 sweet sweet  394 Dec 21 11:56 id_rsa.pub
-rw-r--r-- 1 sweet sweet 1110 Dec 21 21:19 known_hosts


scp ./id_rsa.pub 192.168.1.232/233/234:/root/.ssh/authorized_keys  所有的服务都要给一个
如果多个服务都要往authorized_keys里写密匙
cat scp过来的文件 >>authorized_keys


集群中有几个服务器就做几次,实现四台主机之间相互免密钥登录


4台服务器全部安装
# rpm -ivh http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.3-1.el6.rf.x86_64.rpm  
# yum clean all
# yum make
# yum  -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Config-IniFiles  ncftp perl-Params-Validate  perl-CPAN perl-Test-Mock-LWP.noarch perl-LWP-Authen-Negotiate.noarch perl-devel
# yum install perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker
安装node包,manager也要装mha4mysql-node-0.56-0.el6.noarch.rpm 
# rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm


manager 234
需要安装包
rpm -ivh perl-Mail-Sender-0.8.16-1.el6.rf.noarch.rpm
rpm -ivh perl-Mail-Sendmail-0.79-1.2.el6.rf.noarch.rpm
rpm -ivh perl-Parallel-ForkManager-0.7.5-2.2.el6.rf.noarch.rpm
rpm -ivh perl-Config-Tiny-2.12-1.el6.rfx.noarch.rpm

yum install perl-Log-Dispatch
yum install perl-Time-HiRes

安装管理软件
rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm


#mkdir /etc/masterha          配置文件路径
#mkdir -p /master/app1      工作路径
#mkdir -p /scripts
#cd /etc/masterha
#touch app1.cnf


#vim /etc/masterha/app1.cnf
[server default]
manager_workdir=/master/app1
manager_log=/master/app1/manager.log
user=manager      ------下面会创建(集群管理用户)
password=oracle   ------下面创建的密码
ssh_user=root      ---------ssh的用户名
repl_user=backup   ---------主从同步的用户名
repl_password=backup  ------主从同步的密码
ping_interval=1       ------ping间隔,用来检测master是否正常
shutdown_script=""
master_ip_online_change_script=""
report_script=""

注:数据库端口不是默认的要加上port=端口号
[server1]
hostname=192.168.1.232
master_binlog_dir=/data/
candidate_master=1      #备用切换的master
check_repl_delay=0       #切换时忽略延迟

[server2]
hostname=192.168.1.231
master_binlog_dir=/data/
candidate_master=1


[server3]
hostname=192.168.1.233
master_binlog_dir=/data/
no_master=1
ignore_fail=1       # 1  从挂了不影响集群,一样可以用

测试信任登录是否成功      ----检测集群中的ssh是否相互免密钥登录
# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Wed Dec  9 19:09:08 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Dec  9 19:09:08 2015 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Wed Dec  9 19:09:08 2015 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Wed Dec  9 19:09:08 2015 - [info] Starting SSH connection tests..
Wed Dec  9 19:09:09 2015 - [debug] 
Wed Dec  9 19:09:08 2015 - [debug]  Connecting via SSH from root@192.168.1.232(192.168.1.232:22) to root@192.168.1.231(192.168.1.231:22)..
Wed Dec  9 19:09:09 2015 - [debug]   ok.
Wed Dec  9 19:09:09 2015 - [debug]  Connecting via SSH from root@192.168.1.232(192.168.1.232:22) to root@192.168.1.233(192.168.1.233:22)..
Wed Dec  9 19:09:09 2015 - [debug]   ok.
Wed Dec  9 19:09:10 2015 - [debug] 
Wed Dec  9 19:09:09 2015 - [debug]  Connecting via SSH from root@192.168.1.233(192.168.1.233:22) to root@192.168.1.232(192.168.1.232:22)..
Wed Dec  9 19:09:10 2015 - [debug]   ok.
Wed Dec  9 19:09:10 2015 - [debug]  Connecting via SSH from root@192.168.1.233(192.168.1.233:22) to root@192.168.1.231(192.168.1.231:22)..
Wed Dec  9 19:09:10 2015 - [debug]   ok.
Wed Dec  9 19:09:10 2015 - [debug] 
Wed Dec  9 19:09:09 2015 - [debug]  Connecting via SSH from root@192.168.1.231(192.168.1.231:22) to root@192.168.1.232(192.168.1.232:22)..
Wed Dec  9 19:09:09 2015 - [debug]   ok.
Wed Dec  9 19:09:09 2015 - [debug]  Connecting via SSH from root@192.168.1.231(192.168.1.231:22) to root@192.168.1.233(192.168.1.233:22)..
Wed Dec  9 19:09:10 2015 - [debug]   ok.
Wed Dec  9 19:09:10 2015 - [info] All SSH connection tests passed successfully.
这为正常


登入每台数据库执行
grant all privileges on *.* to manager@'192.168.1.%' identified by 'oracle';   ----上面配置文件的用户名密码
flush privileges

每台数据库服务器都要执行
ln -s /opt/mysql/bin/* /usr/bin   ***********很重要,每台数据库服务器都要执行***********************************

manager上 234
masterha_check_repl工具验证mysql复制是否成功
# masterha_check_repl --conf=/etc/masterha/app1.cnf
Wed Dec  9 19:59:13 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Dec  9 19:59:13 2015 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Wed Dec  9 19:59:13 2015 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Wed Dec  9 19:59:13 2015 - [info] MHA::MasterMonitor version 0.56.
Wed Dec  9 19:59:13 2015 - [info] GTID failover mode = 0
Wed Dec  9 19:59:13 2015 - [info] Dead Servers:
Wed Dec  9 19:59:13 2015 - [info] Alive Servers:
Wed Dec  9 19:59:13 2015 - [info]   192.168.1.232(192.168.1.232:3306)
Wed Dec  9 19:59:13 2015 - [info]   192.168.1.231(192.168.1.231:3306)
Wed Dec  9 19:59:13 2015 - [info]   192.168.1.233(192.168.1.233:3306)
Wed Dec  9 19:59:13 2015 - [info] Alive Slaves:
Wed Dec  9 19:59:13 2015 - [info]   192.168.1.232(192.168.1.232:3306)  Version=5.6.14-log (oldest major version between slaves) log-bin:enabled
Wed Dec  9 19:59:13 2015 - [info]     Replicating from 192.168.1.231(192.168.1.231:3306)
Wed Dec  9 19:59:13 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Dec  9 19:59:13 2015 - [info]   192.168.1.233(192.168.1.233:3306)  Version=5.6.14-log (oldest major version between slaves) log-bin:enabled
Wed Dec  9 19:59:13 2015 - [info]     Replicating from 192.168.1.231(192.168.1.231:3306)
Wed Dec  9 19:59:13 2015 - [info]     Not candidate for the new Master (no_master is set)
Wed Dec  9 19:59:13 2015 - [info] Current Alive Master: 192.168.1.231(192.168.1.231:3306)
Wed Dec  9 19:59:13 2015 - [info] Checking slave configurations..
Wed Dec  9 19:59:13 2015 - [info] Checking replication filtering settings..
Wed Dec  9 19:59:13 2015 - [info]  binlog_do_db= , binlog_ignore_db= 
Wed Dec  9 19:59:13 2015 - [info]  Replication filtering check ok.
Wed Dec  9 19:59:13 2015 - [info] GTID (with auto-pos) is not supported
Wed Dec  9 19:59:13 2015 - [info] Starting SSH connection tests..
Wed Dec  9 19:59:15 2015 - [info] All SSH connection tests passed successfully.
Wed Dec  9 19:59:15 2015 - [info] Checking MHA Node version..
Wed Dec  9 19:59:15 2015 - [info]  Version check ok.
Wed Dec  9 19:59:15 2015 - [info] Checking SSH publickey authentication settings on the current master..
Wed Dec  9 19:59:15 2015 - [info] HealthCheck: SSH to 192.168.1.231 is reachable.
Wed Dec  9 19:59:16 2015 - [info] Master MHA Node version is 0.56.
Wed Dec  9 19:59:16 2015 - [info] Checking recovery script configurations on 192.168.1.231(192.168.1.231:3306)..
Wed Dec  9 19:59:16 2015 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/ --output_file=/var/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000004 
Wed Dec  9 19:59:16 2015 - [info]   Connecting to root@192.168.1.231(192.168.1.231:22).. 
  Creating /var/tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /data/, up to mysql-bin.000004
Wed Dec  9 19:59:16 2015 - [info] Binlog setting check done.
Wed Dec  9 19:59:16 2015 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Dec  9 19:59:16 2015 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='manager' --slave_host=192.168.1.232 --slave_ip=192.168.1.232 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.14-log --manager_version=0.56 --relay_log_info=/data/relay-log.info  --relay_dir=/data/  --slave_pass=xxx
Wed Dec  9 19:59:16 2015 - [info]   Connecting to root@192.168.1.232(192.168.1.232:22).. 
  Checking slave recovery environment settings..
    Opening /data/relay-log.info ... ok.
    Relay log found at /data, up to log-relay-bin.000004
    Temporary relay log file is /data/log-relay-bin.000004
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Dec  9 19:59:16 2015 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='manager' --slave_host=192.168.1.233 --slave_ip=192.168.1.233 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.14-log --manager_version=0.56 --relay_log_info=/data/relay-log.info  --relay_dir=/data/  --slave_pass=xxx
Wed Dec  9 19:59:16 2015 - [info]   Connecting to root@192.168.1.233(192.168.1.233:22).. 
  Checking slave recovery environment settings..
    Opening /data/relay-log.info ... ok.
    Relay log found at /data, up to proxy-relay-bin.000004
    Temporary relay log file is /data/proxy-relay-bin.000004
    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Dec  9 19:59:17 2015 - [info] Slaves settings check done.
Wed Dec  9 19:59:17 2015 - [info] 
192.168.1.231(192.168.1.231:3306) (current master)
 +--192.168.1.232(192.168.1.232:3306)
 +--192.168.1.233(192.168.1.233:3306)


Wed Dec  9 19:59:17 2015 - [info] Checking replication health on 192.168.1.232..
Wed Dec  9 19:59:17 2015 - [info]  ok.
Wed Dec  9 19:59:17 2015 - [info] Checking replication health on 192.168.1.233..
Wed Dec  9 19:59:17 2015 - [info]  ok.
Wed Dec  9 19:59:17 2015 - [warning] master_ip_failover_script is not defined.
Wed Dec  9 19:59:17 2015 - [warning] shutdown_script is not defined.
Wed Dec  9 19:59:17 2015 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
---这为正常


manager上 234
启动管理
masterha_manager --conf=/etc/masterha/app1.cnf    --ignore_last_failover    ---自己加上后台运行
--ignore_last_failover: 检查上一个故障转移状态如果最后一个故障转移以错误结束,或最近一次故障转移最近完成,则MHA将在此停止,并且不会启动故障转移。 可以使用ignore_last_failover和wait_on_failover_error参数更改此行为。

测试切换
关闭master的数据库,看日志状态,还有主是否切换到你指定的被主上了


注意:iptable -L和selinux  折腾惨了,本来以为关闭了,结果只是临时关闭的,机器有重启过


slave 232
show slave status             //查看从的状态,应该为空、
show full processlist;        //查看从的链接进程


在切换回来(切换到231上)
在原来的主执行 (231)
reset master;
change master to master_host='192.168.1.232',master_user='backup',master_password='backup',master_log_file='mysql-bin.000011',master_log_pos=120;
将232设为主


manager上 234
查看主从状态
# masterha_check_repl --conf=/etc/masterha/app1.cnf


手动在线切换主从
# masterha_master_switch --master_state=alive --conf=/etc/masterha/app1.cnf     ********************这样切换现在的master(232)将不启动(确保要切换的主和现在的主数据是同步的*********************
Dec 11 10:39:14 2015 - [info] MHA::MasterRotate version 0.56.
Fri Dec 11 10:39:14 2015 - [info] Starting online master switch..
Fri Dec 11 10:39:14 2015 - [info] 
Fri Dec 11 10:39:14 2015 - [info] * Phase 1: Configuration Check Phase..
Fri Dec 11 10:39:14 2015 - [info] 
Fri Dec 11 10:39:14 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Dec 11 10:39:14 2015 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Fri Dec 11 10:39:14 2015 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Fri Dec 11 10:39:15 2015 - [info] GTID failover mode = 0
Fri Dec 11 10:39:15 2015 - [info] Current Alive Master: 192.168.1.232(192.168.1.232:3306)
Fri Dec 11 10:39:15 2015 - [info] Alive Slaves:
Fri Dec 11 10:39:15 2015 - [info]   192.168.1.231(192.168.1.231:3306)  Version=5.6.14-log (oldest major version between slaves) log-bin:enabled
Fri Dec 11 10:39:15 2015 - [info]     Replicating from 192.168.1.232(192.168.1.232:3306)
Fri Dec 11 10:39:15 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Dec 11 10:39:15 2015 - [info]   192.168.1.233(192.168.1.233:3306)  Version=5.6.14-log (oldest major version between slaves) log-bin:enabled
Fri Dec 11 10:39:15 2015 - [info]     Replicating from 192.168.1.232(192.168.1.232:3306)
Fri Dec 11 10:39:15 2015 - [info]     Not candidate for the new Master (no_master is set)


It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.1.232(192.168.1.232:3306)? (YES/no): yes
Fri Dec 11 10:39:17 2015 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Fri Dec 11 10:39:17 2015 - [info]  ok.
Fri Dec 11 10:39:17 2015 - [info] Checking MHA is not monitoring or doing failover..
Fri Dec 11 10:39:17 2015 - [info] Checking replication health on 192.168.1.231..
Fri Dec 11 10:39:17 2015 - [info]  ok.
Fri Dec 11 10:39:17 2015 - [info] Checking replication health on 192.168.1.233..
Fri Dec 11 10:39:17 2015 - [info]  ok.
Fri Dec 11 10:39:17 2015 - [info] Searching new master from slaves..
Fri Dec 11 10:39:17 2015 - [info]  Candidate masters from the configuration file:
Fri Dec 11 10:39:17 2015 - [info]   192.168.1.231(192.168.1.231:3306)  Version=5.6.14-log (oldest major version between slaves) log-bin:enabled
Fri Dec 11 10:39:17 2015 - [info]     Replicating from 192.168.1.232(192.168.1.232:3306)
Fri Dec 11 10:39:17 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Dec 11 10:39:17 2015 - [info]   192.168.1.232(192.168.1.232:3306)  Version=5.6.14-log log-bin:enabled
Fri Dec 11 10:39:17 2015 - [info]  Non-candidate masters:
Fri Dec 11 10:39:17 2015 - [info]   192.168.1.233(192.168.1.233:3306)  Version=5.6.14-log (oldest major version between slaves) log-bin:enabled
Fri Dec 11 10:39:17 2015 - [info]     Replicating from 192.168.1.232(192.168.1.232:3306)
Fri Dec 11 10:39:17 2015 - [info]     Not candidate for the new Master (no_master is set)
Fri Dec 11 10:39:17 2015 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Fri Dec 11 10:39:17 2015 - [info] 
From:
192.168.1.232(192.168.1.232:3306) (current master)
 +--192.168.1.231(192.168.1.231:3306)
 +--192.168.1.233(192.168.1.233:3306)


To:
192.168.1.231(192.168.1.231:3306) (new master)
 +--192.168.1.233(192.168.1.233:3306)


Starting master switch from 192.168.1.232(192.168.1.232:3306) to 192.168.1.231(192.168.1.231:3306)? (yes/NO): yes
Fri Dec 11 10:39:19 2015 - [info] Checking whether 192.168.1.231(192.168.1.231:3306) is ok for the new master..
Fri Dec 11 10:39:19 2015 - [info]  ok.
Fri Dec 11 10:39:19 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Dec 11 10:39:19 2015 - [info] 
Fri Dec 11 10:39:19 2015 - [info] * Phase 2: Rejecting updates Phase..
Fri Dec 11 10:39:19 2015 - [info] 
master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): yes
Fri Dec 11 10:39:28 2015 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Fri Dec 11 10:39:28 2015 - [info] Executing FLUSH TABLES WITH READ LOCK..
Fri Dec 11 10:39:28 2015 - [info]  ok.
Fri Dec 11 10:39:28 2015 - [info] Orig master binlog:pos is log-bin.000003:1508.
Fri Dec 11 10:39:28 2015 - [info]  Waiting to execute all relay logs on 192.168.1.231(192.168.1.231:3306)..
Fri Dec 11 10:39:28 2015 - [info]  master_pos_wait(log-bin.000003:1508) completed on 192.168.1.231(192.168.1.231:3306). Executed 0 events.
Fri Dec 11 10:39:28 2015 - [info]   done.
Fri Dec 11 10:39:28 2015 - [info] Getting new master's binlog name and position..
Fri Dec 11 10:39:28 2015 - [info]  mysql-bin.000009:120
Fri Dec 11 10:39:28 2015 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.1.231', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000009', MASTER_LOG_POS=120, MASTER_USER='backup', MASTER_PASSWORD='xxx';
Fri Dec 11 10:39:28 2015 - [info] 
Fri Dec 11 10:39:28 2015 - [info] * Switching slaves in parallel..
Fri Dec 11 10:39:28 2015 - [info] 
Fri Dec 11 10:39:28 2015 - [info] -- Slave switch on host 192.168.1.233(192.168.1.233:3306) started, pid: 26287
Fri Dec 11 10:39:28 2015 - [info] 
Fri Dec 11 10:39:28 2015 - [info] Log messages from 192.168.1.233 ...
Fri Dec 11 10:39:28 2015 - [info] 
Fri Dec 11 10:39:28 2015 - [info]  Waiting to execute all relay logs on 192.168.1.233(192.168.1.233:3306)..
Fri Dec 11 10:39:28 2015 - [info]  master_pos_wait(log-bin.000003:1508) completed on 192.168.1.233(192.168.1.233:3306). Executed 0 events.
Fri Dec 11 10:39:28 2015 - [info]   done.
Fri Dec 11 10:39:28 2015 - [info]  Resetting slave 192.168.1.233(192.168.1.233:3306) and starting replication from the new master 192.168.1.231(192.168.1.231:3306)..
Fri Dec 11 10:39:28 2015 - [info]  Executed CHANGE MASTER.
Fri Dec 11 10:39:28 2015 - [info]  Slave started.
Fri Dec 11 10:39:28 2015 - [info] End of log messages from 192.168.1.233 ...
Fri Dec 11 10:39:28 2015 - [info] 
Fri Dec 11 10:39:28 2015 - [info] -- Slave switch on host 192.168.1.233(192.168.1.233:3306) succeeded.
Fri Dec 11 10:39:28 2015 - [info] Unlocking all tables on the orig master:
Fri Dec 11 10:39:28 2015 - [info] Executing UNLOCK TABLES..
Fri Dec 11 10:39:28 2015 - [info]  ok.
Fri Dec 11 10:39:28 2015 - [info] All new slave servers switched successfully.
Fri Dec 11 10:39:28 2015 - [info] 
Fri Dec 11 10:39:28 2015 - [info] * Phase 5: New master cleanup phase..
Fri Dec 11 10:39:28 2015 - [info] 
Fri Dec 11 10:39:28 2015 - [info]  192.168.1.231: Resetting slave info succeeded.
Fri Dec 11 10:39:28 2015 - [info] Switching master to 192.168.1.231(192.168.1.231:3306) completed successfully.   
这为ok


232上
重新指定主(这时232上既不是主又不是从,需要重新指定)
reset master;
change............;


或者(也是手工在线切换)
# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.1.231 --orig_master_is_new_slave     
*********指向新的master,并把现在的master变为slave,不加的--orig_master_is_new_slave的话现在的master将不启动,最好是这样启动(但要注意,确保232上面之前没有写入数据,如有写入,需要将231和232同步后,232锁库)*************************


[root@manager ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.1.231 --orig_master_is_new_slave
Fri Dec 18 13:11:57 2015 - [info] MHA::MasterRotate version 0.56.
Fri Dec 18 13:11:57 2015 - [info] Starting online master switch..
Fri Dec 18 13:11:57 2015 - [info] 
Fri Dec 18 13:11:57 2015 - [info] * Phase 1: Configuration Check Phase..
Fri Dec 18 13:11:57 2015 - [info] 
Fri Dec 18 13:11:57 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Dec 18 13:11:57 2015 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Fri Dec 18 13:11:57 2015 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Fri Dec 18 13:11:57 2015 - [info] GTID failover mode = 0
Fri Dec 18 13:11:57 2015 - [info] Current Alive Master: 192.168.1.232(192.168.1.232:3306)
Fri Dec 18 13:11:57 2015 - [info] Alive Slaves:
Fri Dec 18 13:11:57 2015 - [info]   192.168.1.231(192.168.1.231:3306)  Version=5.6.14-log (oldest major version between slaves) log-bin:enabled
Fri Dec 18 13:11:57 2015 - [info]     Replicating from 192.168.1.232(192.168.1.232:3306)
Fri Dec 18 13:11:57 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Dec 18 13:11:57 2015 - [info]   192.168.1.233(192.168.1.233:3306)  Version=5.6.14-log (oldest major version between slaves) log-bin:enabled
Fri Dec 18 13:11:57 2015 - [info]     Replicating from 192.168.1.232(192.168.1.232:3306)
Fri Dec 18 13:11:57 2015 - [info]     Not candidate for the new Master (no_master is set)


It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.1.232(192.168.1.232:3306)? (YES/no): yes
Fri Dec 18 13:12:08 2015 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Fri Dec 18 13:12:08 2015 - [info]  ok.
Fri Dec 18 13:12:08 2015 - [info] Checking MHA is not monitoring or doing failover..
Fri Dec 18 13:12:08 2015 - [info] Checking replication health on 192.168.1.231..
Fri Dec 18 13:12:08 2015 - [info]  ok.
Fri Dec 18 13:12:08 2015 - [info] Checking replication health on 192.168.1.233..
Fri Dec 18 13:12:08 2015 - [info]  ok.
Fri Dec 18 13:12:08 2015 - [info] 192.168.1.231 can be new master.
Fri Dec 18 13:12:08 2015 - [info] 
From:
192.168.1.232(192.168.1.232:3306) (current master)
 +--192.168.1.231(192.168.1.231:3306)
 +--192.168.1.233(192.168.1.233:3306)


To:
192.168.1.231(192.168.1.231:3306) (new master)
 +--192.168.1.233(192.168.1.233:3306)
 +--192.168.1.232(192.168.1.232:3306)


Starting master switch from 192.168.1.232(192.168.1.232:3306) to 192.168.1.231(192.168.1.231:3306)? (yes/NO): yes
Fri Dec 18 13:12:15 2015 - [info] Checking whether 192.168.1.231(192.168.1.231:3306) is ok for the new master..
Fri Dec 18 13:12:15 2015 - [info]  ok.
Fri Dec 18 13:12:15 2015 - [info] 192.168.1.232(192.168.1.232:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Fri Dec 18 13:12:15 2015 - [info] 192.168.1.232(192.168.1.232:3306): Resetting slave pointing to the dummy host.
Fri Dec 18 13:12:15 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Dec 18 13:12:15 2015 - [info] 
Fri Dec 18 13:12:15 2015 - [info] * Phase 2: Rejecting updates Phase..
Fri Dec 18 13:12:15 2015 - [info] 
master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): yes            
Fri Dec 18 13:12:37 2015 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Fri Dec 18 13:12:37 2015 - [info] Executing FLUSH TABLES WITH READ LOCK..
Fri Dec 18 13:12:37 2015 - [info]  ok.
Fri Dec 18 13:12:37 2015 - [info] Orig master binlog:pos is log-bin.000003:1508.
Fri Dec 18 13:12:37 2015 - [info]  Waiting to execute all relay logs on 192.168.1.231(192.168.1.231:3306)..
Fri Dec 18 13:12:37 2015 - [info]  master_pos_wait(log-bin.000003:1508) completed on 192.168.1.231(192.168.1.231:3306). Executed 0 events.
Fri Dec 18 13:12:37 2015 - [info]   done.
Fri Dec 18 13:12:37 2015 - [info] Getting new master's binlog name and position..
Fri Dec 18 13:12:37 2015 - [info]  mysql-bin.000001:120
Fri Dec 18 13:12:37 2015 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.1.231', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=120, MASTER_USER='backup', MASTER_PASSWORD='xxx';
Fri Dec 18 13:12:37 2015 - [info] 
Fri Dec 18 13:12:37 2015 - [info] * Switching slaves in parallel..
Fri Dec 18 13:12:37 2015 - [info] 
Fri Dec 18 13:12:37 2015 - [info] -- Slave switch on host 192.168.1.233(192.168.1.233:3306) started, pid: 23956
Fri Dec 18 13:12:37 2015 - [info] 
Fri Dec 18 13:12:37 2015 - [info] Log messages from 192.168.1.233 ...
Fri Dec 18 13:12:37 2015 - [info] 
Fri Dec 18 13:12:37 2015 - [info]  Waiting to execute all relay logs on 192.168.1.233(192.168.1.233:3306)..
Fri Dec 18 13:12:37 2015 - [info]  master_pos_wait(log-bin.000003:1508) completed on 192.168.1.233(192.168.1.233:3306). Executed 0 events.
Fri Dec 18 13:12:37 2015 - [info]   done.
Fri Dec 18 13:12:37 2015 - [info]  Resetting slave 192.168.1.233(192.168.1.233:3306) and starting replication from the new master 192.168.1.231(192.168.1.231:3306)..
Fri Dec 18 13:12:37 2015 - [info]  Executed CHANGE MASTER.
Fri Dec 18 13:12:37 2015 - [info]  Slave started.
Fri Dec 18 13:12:37 2015 - [info] End of log messages from 192.168.1.233 ...
Fri Dec 18 13:12:37 2015 - [info] 
Fri Dec 18 13:12:37 2015 - [info] -- Slave switch on host 192.168.1.233(192.168.1.233:3306) succeeded.
Fri Dec 18 13:12:37 2015 - [info] Unlocking all tables on the orig master:
Fri Dec 18 13:12:37 2015 - [info] Executing UNLOCK TABLES..
Fri Dec 18 13:12:37 2015 - [info]  ok.
Fri Dec 18 13:12:37 2015 - [info] Starting orig master as a new slave..
Fri Dec 18 13:12:37 2015 - [info]  Resetting slave 192.168.1.232(192.168.1.232:3306) and starting replication from the new master 192.168.1.231(192.168.1.231:3306)..
Fri Dec 18 13:12:37 2015 - [info]  Executed CHANGE MASTER.
Fri Dec 18 13:12:37 2015 - [info]  Slave started.
Fri Dec 18 13:12:37 2015 - [info] All new slave servers switched successfully.
Fri Dec 18 13:12:37 2015 - [info] 
Fri Dec 18 13:12:37 2015 - [info] * Phase 5: New master cleanup phase..
Fri Dec 18 13:12:37 2015 - [info] 
Fri Dec 18 13:12:37 2015 - [info]  192.168.1.231: Resetting slave info succeeded.
Fri Dec 18 13:12:37 2015 - [info] Switching master to 192.168.1.231(192.168.1.231:3306) completed successfully
这里需要手动输入yes




在到刚刚切换的主上看 (231上)
show full processlist;


每次切换后一定注意app1.failover.complete这个文件,切换完成后要把这个文件删除掉,不然下次切换回报错


**********************************************************************************************************************************************************************************************************
安装keepalived


master和slave_1上安装 (231,232)
下载地址:
http://www.keepalived.org/software/
keepalived-1.2.19.tar.gz
解压
cd keepalived-1.2.19
要先安装openssl-devel包
# yum install openssl-devel -y
#./configure
#make
#make install 
#mkdir /etc/keepalived/
# cp /usr/local/etc/keepalived/keepalived.conf /etc/keepalived/
# cp /usr/local/etc/rc.d/init.d/keepalived /etc/init.d/
# cp /usr/local/etc/sysconfig/keepalived /etc/sysconfig/
# cp /usr/local/sbin/keepalived /usr/sbin/


# vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived


global_defs {
   notification_email {
        xxx@126.com      接收邮箱,可以有多个,一行一个
   }
   notification_email_from ccc@126.com    从谁那发出
   smtp_server mail.xxxxx.com      邮箱服务器
   smtp_connect_timeout 30    发送邮箱的超时时间
   router_id master       主机名
}


vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 60
    priority 100      优先级,主的要不从的高  (主是100,从就80吧)
    advert_int 1
    nopreempt      不抢占资源,及时服务重启了也不会把vip抢回来(主上配置就好了,从不要配置)
    authentication {
        auth_type PASS   认证方式
        auth_pass 1111   认证密码 (主从一样)
    }
    virtual_ipaddress {
        192.168.1.236     虚拟ip
    }
}


virtual_server 192.168.1.236 3306 {    端口要和服务的端口一致
    delay_loop 2      健康检查时间
    lb_algo wrr       调度算法rr|wrr|lc|wlc|lblc|sh|dh
    lb_kind DR        负载均衡转发规则 NAT|DR|TUN
    nat_mask 255.255.255.0 
    persistence_timeout 50    回话保持时间
    protocol TCP


    real_server 192.168.1.231 3306 {    (主从的ip不一样)
        weight 1    默认为1,0为失效
        #notify_down /opt/script/mysql_down.sh   检测到server down后,执行脚本 (没用这种方法,不知为什么脚本不执行)
        TCP_CHECK{
            connect_timeout 3 连接超时时间
            nb_get_retry 3 重连次数
            connect_port 3306 健康检查的端口的端口
        }
    }
}


master 231(只在主上做就好了)
用脚本检测服务,并控制vip和keepalive服务 
脚本如下:
root@master bin]# vim /opt/script/check.sh 
#! /bin/bash
MYSQL=/opt/mysql/bin/mysql
MYSQL_HOST=127.0.0.1
MYSQL_USER=root
MYSQL_PASSWORD=oracle
CHECK_TIME=3
#mysql  is working MYSQL_OK is 1 , mysql down MYSQL_OK is 0
MYSQL_OK=1
function check_mysql_helth(){
$MYSQL -u $MYSQL_USER -p$MYSQL_PASSWORD -e "select version();" >/dev/null 2>&1
if [ $? = 0 ] ;then
     MYSQL_OK=1
else
     MYSQL_OK=0
fi
     return $MYSQL_OK
}
while [ $CHECK_TIME -ne 0 ]
do
     let "CHECK_TIME -= 1"
     check_mysql_helth
if [ $MYSQL_OK = 1 ] ; then
     CHECK_TIME=0
     exit 0
fi
if [ $MYSQL_OK -eq 0 ] &&  [ $CHECK_TIME -eq 0 ]
then
     pkill keepalived
exit 1
fi
sleep 1
done


将脚本写入定时任务里(保证每3秒执行一次检测脚本)
[root@master bin]# crontab -l
*/1 * * * * bash /opt/script/check.sh
*/1 * * * * sleep 3 &&  /opt/script/check.sh
*/1 * * * * sleep 6 &&  /opt/script/check.sh
*/1 * * * * sleep 9 &&  /opt/script/check.sh
*/1 * * * * sleep 12 &&  /opt/script/check.sh
*/1 * * * * sleep 15 &&  /opt/script/check.sh
*/1 * * * * sleep 18 &&  /opt/script/check.sh
*/1 * * * * sleep 21 &&  /opt/script/check.sh
*/1 * * * * sleep 24 &&  /opt/script/check.sh
*/1 * * * * sleep 27 &&  /opt/script/check.sh
*/1 * * * * sleep 30 &&  /opt/script/check.sh
*/1 * * * * sleep 33 &&  /opt/script/check.sh
*/1 * * * * sleep 36 &&  /opt/script/check.sh
*/1 * * * * sleep 39 &&  /opt/script/check.sh
*/1 * * * * sleep 42 &&  /opt/script/check.sh
*/1 * * * * sleep 45 &&  /opt/script/check.sh
*/1 * * * * sleep 48 &&  /opt/script/check.sh
*/1 * * * * sleep 51 &&  /opt/script/check.sh
*/1 * * * * sleep 54 &&  /opt/script/check.sh
*/1 * * * * sleep 57 &&  /opt/script/check.sh
没隔3秒执行一次,这样能够保证vip快速的飘移到另一机器上去


测试:
1.关闭主库,查看主从的情况,和vip是否漂移。
2.关闭服务器,查看主从的情况,和vip是否漂移。


















来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/26909803/viewspace-1876923/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/26909803/viewspace-1876923/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值