mha切换日志

最新推荐文章于 2022-01-08 07:20:56 发布

weixin_33762130

最新推荐文章于 2022-01-08 07:20:56 发布

阅读量613

点赞数

文章标签：运维操作系统数据库

原文链接：http://blog.51cto.com/14096233/2368895

版权

初始检测

判断主库是否已宕

一共判断了3次，均判断master mysqld不可用，但是master主机通过ssh还是能登录上去。ssh用于传输日志

通过上面的输出才知道是通过save_binary_logs脚本来判断ssh可用性的。(数据库宕)
如果这里显示的是SSH is NOT reachable，则代表master主机也已经宕机了（操作系统宕）

Sat Jun 16 15:10:58 2018 - [warning] Got error on MySQL select ping: 2013 (Lost connection to MySQL server during query)
Sat Jun 16 15:10:58 2018 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/arch --output_file=/home/masterha/binlogdir/save_binary_logs_test --manager_version=0.57 --binlog_prefix=mysql-bin
Sat Jun 16 15:10:58 2018 - [info] HealthCheck: SSH to 10.17.28.11 is reachable.
Sat Jun 16 15:11:08 2018 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Sat Jun 16 15:11:08 2018 - [warning] Connection failed 2 time(s)..
Sat Jun 16 15:11:18 2018 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Sat Jun 16 15:11:18 2018 - [warning] Connection failed 3 time(s)..
Sat Jun 16 15:11:28 2018 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Sat Jun 16 15:11:28 2018 - [warning] Connection failed 4 time(s)..
Sat Jun 16 15:11:28 2018 - [warning] Master is not reachable from health checker!
Sat Jun 16 15:11:28 2018 - [warning] Master 10.17.28.11(10.17.28.11:3306) is not reachable! ##前面save_binary_logs来判断库是否挂
Sat Jun 16 15:11:28 2018 - [warning] SSH is reachable. ##主机目前没挂
Sat Jun 16 15:11:28 2018 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /data/masterha/app_test/test.cnf again, and trying to connect to all servers to check server status..
Sat Jun 16 15:11:28 2018 - [info] Reading default configuration from /etc/masterha_default.cnf..
Sat Jun 16 15:11:28 2018 - [info] Reading application default configuration from /data/masterha/app_test/test.cnf..
Sat Jun 16 15:11:28 2018 - [info] Reading server configuration from /data/masterha/app_test/test.cnf..
Sat Jun 16 15:11:29 2018 - [warning] SQL Thread is stopped(no error) on 10.17.28.12(10.17.28.12:3306)
Sat Jun 16 15:11:29 2018 - [warning] SQL Thread is stopped(no error) on 10.17.28.13(10.17.28.13:3306)
Sat Jun 16 15:11:29 2018 - [info] GTID failover mode = 0
Sat Jun 16 15:11:29 2018 - [info] Dead Servers:
Sat Jun 16 15:11:29 2018 - [info]   10.17.28.11(10.17.28.11:3306)
Sat Jun 16 15:11:29 2018 - [info] Alive Servers:
Sat Jun 16 15:11:29 2018 - [info]   10.17.28.12(10.17.28.12:3306)
Sat Jun 16 15:11:29 2018 - [info]   10.17.28.13(10.17.28.13:3306)
Sat Jun 16 15:11:29 2018 - [info] Alive Slaves:
Sat Jun 16 15:11:29 2018 - [info]   10.17.28.12(10.17.28.12:3306)  Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Sat Jun 16 15:11:29 2018 - [info]     Replicating from 10.17.28.11(10.17.28.11:3306)
Sat Jun 16 15:11:29 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jun 16 15:11:29 2018 - [info]   10.17.28.13(10.17.28.13:3306)  Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Sat Jun 16 15:11:29 2018 - [info]     Replicating from 10.17.28.11(10.17.28.11:3306)
Sat Jun 16 15:11:29 2018 - [info]     Not candidate for the new Master (no_master is set)
Sat Jun 16 15:11:29 2018 - [info] Checking slave configurations..
Sat Jun 16 15:11:29 2018 - [info] Checking replication filtering settings..
Sat Jun 16 15:11:29 2018 - [info]  Replication filtering check ok.
Sat Jun 16 15:11:29 2018 - [info] Master is down!
Sat Jun 16 15:11:29 2018 - [info] Terminating monitoring script.
Sat Jun 16 15:11:29 2018 - [info] Got exit code 20 (Master dead).
Sat Jun 16 15:11:29 2018 - [info] MHA::MasterFailover version 0.57.
Sat Jun 16 15:11:29 2018 - [info] Starting master failover.
Sat Jun 16 15:11:29 2018 - [info]

二、正式开始切换

1.第一阶段，检查了MHA的配置信息，并再次判断了master的可用性。

读取MHA的配置文件

检查slave的相关配置，比如read_only参数，是否设置了复制的过滤规则（从上面的输出中可以看出，SQL Thread正常停止了并不影响MHA的切换。）

Sat Jun 16 15:11:29 2018 - [info] * Phase 1: Configuration Check Phase..
Sat Jun 16 15:11:29 2018 - [info] 
Sat Jun 16 15:11:30 2018 - [warning] SQL Thread is stopped(no error) on 10.17.28.12(10.17.28.12:3306)
Sat Jun 16 15:11:30 2018 - [warning] SQL Thread is stopped(no error) on 10.17.28.13(10.17.28.13:3306)
Sat Jun 16 15:11:30 2018 - [info] GTID failover mode = 0
Sat Jun 16 15:11:30 2018 - [info] Dead Servers:
Sat Jun 16 15:11:30 2018 - [info]   10.17.28.11(10.17.28.11:3306)
Sat Jun 16 15:11:30 2018 - [info] Checking master reachability via MySQL(double check)...
Sat Jun 16 15:11:30 2018 - [info]  ok.
Sat Jun 16 15:11:30 2018 - [info] Alive Servers:
Sat Jun 16 15:11:30 2018 - [info]   10.17.28.12(10.17.28.12:3306)
Sat Jun 16 15:11:30 2018 - [info]   10.17.28.13(10.17.28.13:3306)
Sat Jun 16 15:11:30 2018 - [info] Alive Slaves:
Sat Jun 16 15:11:30 2018 - [info]   10.17.28.12(10.17.28.12:3306)  Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Sat Jun 16 15:11:30 2018 - [info]     Replicating from 10.17.28.11(10.17.28.11:3306)
Sat Jun 16 15:11:30 2018 - [info]     Primary candidate for the new Master (candidate_master is set)##候选主库
Sat Jun 16 15:11:30 2018 - [info]   10.17.28.13(10.17.28.13:3306)  Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Sat Jun 16 15:11:30 2018 - [info]     Replicating from 10.17.28.11(10.17.28.11:3306)
Sat Jun 16 15:11:30 2018 - [info]     Not candidate for the new Master (no_master is set)
Sat Jun 16 15:11:30 2018 - [info]  Starting SQL thread on 10.17.28.12(10.17.28.12:3306) ..
Sat Jun 16 15:11:30 2018 - [info]   done.
Sat Jun 16 15:11:30 2018 - [info]  Starting SQL thread on 10.17.28.13(10.17.28.13:3306) ..
Sat Jun 16 15:11:30 2018 - [info]   done.
Sat Jun 16 15:11:30 2018 - [info] Starting Non-GTID based failover.
Sat Jun 16 15:11:30 2018 - [info] 
Sat Jun 16 15:11:30 2018 - [info] ** Phase 1: Configuration Check Phase completed.
Sat Jun 16 15:11:30 2018 - [info]

第二阶段，关闭dead master。

包括执行摘除切换主库域名指向无效ip 1.1.1.1，同时执行shutdown_script脚本，因为该脚本在配置文件中没有定义，故跳过。

Sat Jun 16 15:11:30 2018 - [info] * Phase 2: Dead Master Shutdown Phase..
Sat Jun 16 15:11:30 2018 - [info] 
Sat Jun 16 15:11:30 2018 - [info] Forcing shutdown so that applications never connect to the current master..
Sat Jun 16 15:11:30 2018 - [info] Executing master IP deactivation script:
Sat Jun 16 15:11:30 2018 - [info]   /data/masterha/ymha/pls/master_ip_failover.pl  --master_domain_name=mysql-mhatest2.jq.int.yihaodian.com --master_invalid_ip=1.1.1.1 --app_config_workdir=/data/masterha/app_test  --orig_master_host=10.17.28.11 --orig_master_ip=10.17.28.11 --orig_master_port=3306 --command=stopssh --ssh_user=root  
Change mysql-mhatest2.jq.int.yihaodian.com to a invalid IP: 1.1.1.1 
1.1.1.1
10.17.28.11
mysql-mhatest2.jq.int.yihaodian.com

--2018-06-16 15:11:30--  http://oms.yihaodian.com.cn/cmdb/dns/api/domain_dba?action=edit&env=prod&oldDomain=mysql-mhatest2.jq.int.yihaodian.com&oldIP=10.17.28.11&newDomain=mysql-mhatest2.jq.int.yihaodian.com&newIP=1.1.1.1
Resolving oms.yihaodian.com.cn... 10.17.34.190
Connecting to oms.yihaodian.com.cn|10.17.34.190|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://oms.yihaodian.com.cn/api/dns/domain_dba/?action=edit&env=prod&oldDomain=mysql-mhatest2.jq.int.yihaodian.com&oldIP=10.17.28.11&newDomain=mysql-mhatest2.jq.int.yihaodian.com&newIP=1.1.1.1 [following]
--2018-06-16 15:11:30--  http://oms.yihaodian.com.cn/api/dns/domain_dba/?action=edit&env=prod&oldDomain=mysql-mhatest2.jq.int.yihaodian.com&oldIP=10.17.28.11&newDomain=mysql-mhatest2.jq.int.yihaodian.com&newIP=1.1.1.1
Reusing existing connection to oms.yihaodian.com.cn:80.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: “/data/masterha/app_test/disable_dns_msg.log”
     0K                                                        9.94M=0s

2018-06-16 15:11:32 (9.94 MB/s) - “/data/masterha/app_test/disable_dns_msg.log” saved [42]

Sat Jun 16 15:11:32 2018 - [info]  done.

Sat Jun 16 15:11:32 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Sat Jun 16 15:11:32 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed.

第三阶段

3.1 判断哪个slave的二进制日志是最新的。---看io_thread的

通过下面的输出可以看出，所有的slave中，最新的二进制日志位置是mysql-bin.000037:2046（通过show slave status中的Master_Log_File, Read_Master_Log_Pos得到）
最旧的二进制日志位置是mysql-bin.000037:1532

Sat Jun 16 15:11:32 2018 - [info] 
Sat Jun 16 15:11:32 2018 - [info] * Phase 3: Master Recovery Phase..
Sat Jun 16 15:11:32 2018 - [info] 
Sat Jun 16 15:11:32 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Sat Jun 16 15:11:32 2018 - [info] 
Sat Jun 16 15:11:32 2018 - [info] The latest binary log file/position on all slaves is mysql-bin.000037:2046  ##28,.13-S2
Sat Jun 16 15:11:32 2018 - [info] Latest slaves (Slaves that received relay log files to the latest):
Sat Jun 16 15:11:32 2018 - [info]   10.17.28.13(10.17.28.13:3306)  Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Sat Jun 16 15:11:32 2018 - [info]     Replicating from 10.17.28.11(10.17.28.11:3306)
Sat Jun 16 15:11:32 2018 - [info]     Not candidate for the new Master (no_master is set)

Sat Jun 16 15:11:32 2018 - [info] The oldest binary log file/position on all slaves is mysql-bin.000037:1532  ##28.12--S1-候选主库
Sat Jun 16 15:11:32 2018 - [info] Oldest slaves:
Sat Jun 16 15:11:32 2018 - [info]   10.17.28.12(10.17.28.12:3306)  Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Sat Jun 16 15:11:32 2018 - [info]     Replicating from 10.17.28.11(10.17.28.11:3306)
Sat Jun 16 15:11:32 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jun 16 15:11:32 2018 - [info]

3.2 保存master的binlog

注意，上面已经判断到slave中最新二进制日志是mysql-bin.000037:2046，所以它把该位置后所有二进制日志都拼接起来，并scp到monitor的/home/masterha/binlogdir-28.11原主库目录下。

Sat Jun 16 15:11:32 2018 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Sat Jun 16 15:11:32 2018 - [info] 
Sat Jun 16 15:11:32 2018 - [info] Fetching dead master's binary logs..
Sat Jun 16 15:11:32 2018 - [info] Executing command on the dead master 10.17.28.11(10.17.28.11:3306): save_binary_logs --command=save --start_file=mysql-bin.000037  --start_pos=2046 --binlog_dir=/data/mysql/arch --output_file=/home/masterha/binlogdir/saved_master_binlog_from_10.17.28.11_3306_20180616151129.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57

##手工执行命令：
#ssh 10.17.28.11
#save_binary_logs --command=save --start_file=mysql-bin.000037  --start_pos=2046 --binlog_dir=/data/mysql/arch --output_file=/home/masterha/binlogdir/saved_master_binlog_0616.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57

  Creating /home/masterha/binlogdir if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000037 pos 2046 to mysql-bin.000037 EOF into /home/masterha/binlogdir/saved_master_binlog_from_10.17.28.11_3306_20180616151129.binlog ..
 Binlog Checksum enabled
  Dumping binlog format description event, from position 0 to 120.. ok.
  Dumping effective binlog data from /data/mysql/arch/mysql-bin.000037 position 2046 to tail(2562).. ok.
 Binlog Checksum enabled
 Concat succeeded.
Sat Jun 16 15:11:33 2018 - [info] scp from root@10.17.28.11:/home/masterha/binlogdir/saved_master_binlog_from_10.17.28.11_3306_20180616151129.binlog to local:/data/masterha/app_test/saved_master_binlog_from_10.17.28.11_3306_20180616151129.binlog succeeded. ##拷贝到mha_manager
Sat Jun 16 15:11:33 2018 - [info] HealthCheck: SSH to 10.17.28.12 is reachable.
Sat Jun 16 15:11:33 2018 - [info] HealthCheck: SSH to 10.17.28.13 is reachable.
Sat Jun 16 15:11:33 2018 - [info]

3.3 选新主阶段

   首先判断最新的slave中是否包括最旧的二进制日志（mysql-bin.000037:1532）以后的relay log。
   接着选新主，
   因为10.17.28.12中设置了candidate_master设置了，所以10.17.28.12被指定为新主。

Sat Jun 16 15:11:33 2018 - [info] * Phase 3.3: Determining New Master Phase..
Sat Jun 16 15:11:33 2018 - [info] 
Sat Jun 16 15:11:33 2018 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Sat Jun 16 15:11:33 2018 - [info] Checking whether 10.17.28.13 has relay logs from the oldest position..
Sat Jun 16 15:11:33 2018 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000037 --latest_rmlp=2046 --target_mlf=mysql-bin.000037 --target_rmlp=1532 --server_id=2813 --workdir=/home/masterha/binlogdir --timestamp=20180616151129 --manager_version=0.57 --relay_dir=/data/mysql/relaylog --current_relay_log=mysql-relay-bin.000012  :

#手工执行
#ssh 10.17.28.13
#apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000037 --latest_rmlp=2046 --target_mlf=mysql-bin.000037 --target_rmlp=1532 --server_id=2813 --workdir=/home/masterha/binlogdir --timestamp=20180616 --manager_version=0.57 --relay_dir=/data/mysql/relaylog --current_relay_log=mysql-relay-bin.000012

 Relay log found at /data/mysql/relaylog, up to mysql-relay-bin.000012
 Fast relay log position search succeeded.
 Target relay log file/position found. start_file:mysql-relay-bin.000012, start_pos:1695.  ##获取所需relay_log位置
Target relay log FOUND!
Sat Jun 16 15:11:33 2018 - [info] OK. 10.17.28.13 has all relay logs.

Sat Jun 16 15:11:33 2018 - [info] Searching new master from slaves..
Sat Jun 16 15:11:33 2018 - [info]  Candidate masters from the configuration file:
Sat Jun 16 15:11:33 2018 - [info]   10.17.28.12(10.17.28.12:3306)  Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Sat Jun 16 15:11:33 2018 - [info]     Replicating from 10.17.28.11(10.17.28.11:3306)
Sat Jun 16 15:11:33 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jun 16 15:11:33 2018 - [info]  Non-candidate masters:
Sat Jun 16 15:11:33 2018 - [info]   10.17.28.13(10.17.28.13:3306)  Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Sat Jun 16 15:11:33 2018 - [info]     Replicating from 10.17.28.11(10.17.28.11:3306)
Sat Jun 16 15:11:33 2018 - [info]     Not candidate for the new Master (no_master is set)
Sat Jun 16 15:11:33 2018 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Sat Jun 16 15:11:33 2018 - [info]   Not found.
Sat Jun 16 15:11:33 2018 - [info]  Searching from all candidate_master slaves..
Sat Jun 16 15:11:33 2018 - [info] New master is 10.17.28.12(10.17.28.12:3306)

Sat Jun 16 15:11:33 2018 - [info] Starting master failover..
Sat Jun 16 15:11:33 2018 - [info] 
From:
10.17.28.11(10.17.28.11:3306) (current master)
 +--10.17.28.12(10.17.28.12:3306)
 +--10.17.28.13(10.17.28.13:3306)

To:
10.17.28.12(10.17.28.12:3306) (new master)
 +--10.17.28.13(10.17.28.13:3306)
Sat Jun 16 15:11:33 2018 - [info]

3.3 获取新主所需的差异二进制日志，包括两部分

新主和最新的slave之间差异的relay log--缺的relay_log
保存在MHA Manager上的最新的slave和原master之前差异的binlog--尾日志
- 其中，差异的relay log通过如下方式获取：
  - ssh到10.17.28.13上，
  - 执行apply_diff_relay_logs获取差异的relay log。
  - 将差异的relay log scp到10.17.28.12。

Sat Jun 16 15:11:33 2018 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Sat Jun 16 15:11:33 2018 - [info] 
Sat Jun 16 15:11:33 2018 - [info] Server 10.17.28.12 received relay logs up to: mysql-bin.000037:1532
Sat Jun 16 15:11:33 2018 - [info] Need to get diffs from the latest slave(10.17.28.13) up to: mysql-bin.000037:2046 (using the latest slave's relay logs)
Sat Jun 16 15:11:33 2018 - [info] Connecting to the latest slave host 10.17.28.13, generating diff relay log files..
Sat Jun 16 15:11:33 2018 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=10.17.28.12 --latest_mlf=mysql-bin.000037 --latest_rmlp=2046 --target_mlf=mysql-bin.000037 --target_rmlp=1532 --server_id=2813 --diff_file_readtolatest=/home/masterha/binlogdir/relay_from_read_to_latest_10.17.28.12_3306_20180616151129.binlog --workdir=/home/masterha/binlogdir --timestamp=20180616151129 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 --relay_dir=/data/mysql/relaylog --current_relay_log=mysql-relay-bin.000012 ##这个relay_log位置是上面校验得来的

#手工执行
#ssh 10.17.28.13
apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=10.17.28.12 --latest_mlf=mysql-bin.000037 --latest_rmlp=2046 --target_mlf=mysql-bin.000037 --target_rmlp=1532 --server_id=2813 --diff_file_readtolatest=/home/masterha/binlogdir/relay_from_read_to_latest_10.17.28.12_3306_20180616151129.binlog --workdir=/home/masterha/binlogdir --timestamp=20180616151129 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 --relay_dir=/data/mysql/relaylog --current_relay_log=mysql-relay-bin.000012 

Sat Jun 16 15:11:34 2018 - [info] 
    Relay log found at /data/mysql/relaylog, up to mysql-relay-bin.000012
 Fast relay log position search succeeded.
 Target relay log file/position found. start_file:mysql-relay-bin.000012, start_pos:1695.
 Concat binary/relay logs from mysql-relay-bin.000012 pos 1695 to mysql-relay-bin.000012 EOF into /home/masterha/binlogdir/relay_from_read_to_latest_10.17.28.12_3306_20180616151129.binlog ..
 Binlog Checksum enabled
 Binlog Checksum enabled
  Dumping binlog format description event, from position 0 to 283.. ok.
  Dumping effective binlog data from /data/mysql/relaylog/mysql-relay-bin.000012 position 1695 to tail(2209).. ok.
 Binlog Checksum enabled
 Binlog Checksum enabled
 Concat succeeded.
 Generating diff relay log succeeded. Saved at /home/masterha/binlogdir/relay_from_read_to_latest_10.17.28.12_3306_20180616151129.binlog .
 scp db-jq-28-13:/home/masterha/binlogdir/relay_from_read_to_latest_10.17.28.12_3306_20180616151129.binlog to root@10.17.28.12(22) succeeded. ##cp 差的relay_log
Sat Jun 16 15:11:34 2018 - [info]  Generating diff files succeeded.

Sat Jun 16 15:11:34 2018 - [info] Sending binlog..
Sat Jun 16 15:11:34 2018 - [info] scp from local:/data/masterha/app_test/saved_master_binlog_from_10.17.28.11_3306_20180616151129.binlog to root@10.17.28.12:/home/masterha/binlogdir/saved_master_binlog_from_10.17.28.11_3306_20180616151129.binlog succeeded.  ##cp 剩余的binlog尾日志
Sat Jun 16 15:11:34 2018 - [info]

3.4 应用从master保存的二进制日志事件

等原来的所有的relay log都应用完。（每台机器先应用完自身io_thread收到的所有relay_log，因为前面拉尾日志是按从库io_thread最新的那个点去拉的）
再通过apply_diff_relay_logs应用差异的relay log，及差异的binlog。
应用完毕后，得到新的master binlog的文件和位置，其它slave可根据该文件和位置来建立主从复制关系。
执行master_ip_failover脚本，执行如下操作
1.1 将新主的read_only设置为0
1.2 切换域名

Sat Jun 16 15:11:34 2018 - [info] * Phase 3.4: Master Log Apply Phase..
Sat Jun 16 15:11:34 2018 - [info] 
Sat Jun 16 15:11:34 2018 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Sat Jun 16 15:11:34 2018 - [info] Starting recovery on 10.17.28.12(10.17.28.12:3306)..
Sat Jun 16 15:11:34 2018 - [info]  Generating diffs succeeded.
Sat Jun 16 15:11:34 2018 - [info] Waiting until all relay logs are applied.
Sat Jun 16 15:11:34 2018 - [info]  done.
Sat Jun 16 15:11:34 2018 - [info] Getting slave status..
Sat Jun 16 15:11:34 2018 - [info] This slave(10.17.28.12)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000037:1532). No need to recover from Exec_Master_Log_Pos.

Sat Jun 16 15:11:34 2018 - [info] Connecting to the target slave host 10.17.28.12, running recover script..
Sat Jun 16 15:11:34 2018 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=10.17.28.12 --slave_ip=10.17.28.12  --slave_port=3306 --apply_files=/home/masterha/binlogdir/relay_from_read_to_latest_10.17.28.12_3306_20180616151129.binlog,/home/masterha/binlogdir/saved_master_binlog_from_10.17.28.11_3306_20180616151129.binlog --workdir=/home/masterha/binlogdir --target_version=5.6.27-log --timestamp=20180616151129 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 --slave_pass=xxx  ##应用relay_log+binlog尾日志

#手动执行
#ssh 10.17.28.12
apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=10.17.28.12 --slave_ip=10.17.28.12  --slave_port=3306 --apply_files=/home/masterha/binlogdir/relay_from_read_to_latest_10.17.28.12_3306_20180616151129.binlog, /home/masterha/binlogdir/saved_master_binlog_from_10.17.28.11_3306_20180616151129.binlog --workdir=/home/masterha/binlogdir --target_version=5.6.27-log --timestamp=20180616151129 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 --slave_pass=xxx

Sat Jun 16 15:11:34 2018 - [info] 
 Concat all apply files to /home/masterha/binlogdir/total_binlog_for_10.17.28.12_3306.20180616151129.binlog ..
 Copying the first binlog file /home/masterha/binlogdir/relay_from_read_to_latest_10.17.28.12_3306_20180616151129.binlog to /home/masterha/binlogdir/total_binlog_for_10.17.28.12_3306.20180616151129.binlog.. ok.
  Dumping binlog head events (rotate events), skipping format description events from /home/masterha/binlogdir/saved_master_binlog_from_10.17.28.11_3306_20180616151129.binlog..  Binlog Checksum enabled
dumped up to pos 120. ok.
 /home/masterha/binlogdir/saved_master_binlog_from_10.17.28.11_3306_20180616151129.binlog has effective binlog events from pos 120.
  Dumping effective binlog data from /home/masterha/binlogdir/saved_master_binlog_from_10.17.28.11_3306_20180616151129.binlog position 120 to tail(636).. ok.
 Concat succeeded.
All apply target binary logs are concatinated at /home/masterha/binlogdir/total_binlog_for_10.17.28.12_3306.20180616151129.binlog .
MySQL client version is 5.6.27. Using --binary-mode.
Applying differential binary/relay log files /home/masterha/binlogdir/relay_from_read_to_latest_10.17.28.12_3306_20180616151129.binlog,/home/masterha/binlogdir/saved_master_binlog_from_10.17.28.11_3306_20180616151129.binlog on 10.17.28.12:3306. This may take long time...
Applying log files succeeded.
Sat Jun 16 15:11:34 2018 - [info]  All relay logs were successfully applied.

Sat Jun 16 15:11:34 2018 - [info] Getting new master's binlog name and position..
Sat Jun 16 15:11:34 2018 - [info]  mysql-bin.000019:1106
Sat Jun 16 15:11:34 2018 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.17.28.12', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000019', MASTER_LOG_POS=1106, MASTER_USER='repl', MASTER_PASSWORD='xxx';

Sat Jun 16 15:11:34 2018 - [info] Executing master IP activate script:
Sat Jun 16 15:11:34 2018 - [info]   /data/masterha/ymha/pls/master_ip_failover.pl  --master_domain_name=mysql-mhatest2.jq.int.yihaodian.com --master_invalid_ip=1.1.1.1 --app_config_workdir=/data/masterha/app_test  --command=start --ssh_user=root --orig_master_host=10.17.28.11 --orig_master_ip=10.17.28.11 --orig_master_port=3306 --new_master_host=10.17.28.12 --new_master_ip=10.17.28.12 --new_master_port=3306 --new_master_user='root'   --new_master_password=xxx
Set read_only=0 on the new master.
Change mysql-mhatest2.jq.int.yihaodian.com to a new master IP: 10.17.28.12 
--2018-06-16 15:11:34--  http://oms.yihaodian.com.cn/cmdb/dns/api/domain_dba?action=edit&env=prod&oldDomain=mysql-mhatest2.jq.int.yihaodian.com&oldIP=1.1.1.1&newDomain=mysql-mhatest2.jq.int.yihaodian.com&newIP=10.17.28.12
Resolving oms.yihaodian.com.cn... 10.17.34.190
Connecting to oms.yihaodian.com.cn|10.17.34.190|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://oms.yihaodian.com.cn/api/dns/domain_dba/?action=edit&env=prod&oldDomain=mysql-mhatest2.jq.int.yihaodian.com&oldIP=1.1.1.1&newDomain=mysql-mhatest2.jq.int.yihaodian.com&newIP=10.17.28.12 [following]
--2018-06-16 15:11:34--  http://oms.yihaodian.com.cn/api/dns/domain_dba/?action=edit&env=prod&oldDomain=mysql-mhatest2.jq.int.yihaodian.com&oldIP=1.1.1.1&newDomain=mysql-mhatest2.jq.int.yihaodian.com&newIP=10.17.28.12
Reusing existing connection to oms.yihaodian.com.cn:80.
HTTP request sent, awaiting response... Read error (Connection timed out) in headers.
Giving up.

Waiting until change dns...
--2018-06-16 15:11:46--  http://oms.yihaodian.com.cn/cmdb/dns/api/domain_dba?action=edit&env=prod&oldDomain=mysql-mhatest2.jq.int.yihaodian.com&oldIP=1.1.1.1&newDomain=mysql-mhatest2.jq.int.yihaodian.com&newIP=10.17.28.12
Resolving oms.yihaodian.com.cn... 10.17.34.190
Connecting to oms.yihaodian.com.cn|10.17.34.190|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://oms.yihaodian.com.cn/api/dns/domain_dba/?action=edit&env=prod&oldDomain=mysql-mhatest2.jq.int.yihaodian.com&oldIP=1.1.1.1&newDomain=mysql-mhatest2.jq.int.yihaodian.com&newIP=10.17.28.12 [following]
--2018-06-16 15:11:46--  http://oms.yihaodian.com.cn/api/dns/domain_dba/?action=edit&env=prod&oldDomain=mysql-mhatest2.jq.int.yihaodian.com&oldIP=1.1.1.1&newDomain=mysql-mhatest2.jq.int.yihaodian.com&newIP=10.17.28.12
Reusing existing connection to oms.yihaodian.com.cn:80.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: “/data/masterha/app_test/change_dns_msg.log”

     0K                                                        12.5M=0s

2018-06-16 15:11:46 (12.5 MB/s) - “/data/masterha/app_test/change_dns_msg.log” saved [51]

Change stag mysql-mhatest2.jq.int.yihaodian.com to a new master IP: 10.17.28.12 
--2018-06-16 15:11:46--  http://oms.yihaodian.com.cn/cmdb/dns/api/domain_dba?action=edit&env=stag&oldDomain=mysql-mhatest2.jq.int.yihaodian.com&oldIP=10.17.28.11&newDomain=mysql-mhatest2.jq.int.yihaodian.com&newIP=10.17.28.12
Resolving oms.yihaodian.com.cn... 10.17.34.190
Connecting to oms.yihaodian.com.cn|10.17.34.190|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://oms.yihaodian.com.cn/api/dns/domain_dba/?action=edit&env=stag&oldDomain=mysql-mhatest2.jq.int.yihaodian.com&oldIP=10.17.28.11&newDomain=mysql-mhatest2.jq.int.yihaodian.com&newIP=10.17.28.12 [following]
--2018-06-16 15:11:46--  http://oms.yihaodian.com.cn/api/dns/domain_dba/?action=edit&env=stag&oldDomain=mysql-mhatest2.jq.int.yihaodian.com&oldIP=10.17.28.11&newDomain=mysql-mhatest2.jq.int.yihaodian.com&newIP=10.17.28.12
Reusing existing connection to oms.yihaodian.com.cn:80.
HTTP request sent, awaiting response... Read error (Connection timed out) in headers.
Giving up.

Sat Jun 16 15:12:07 2018 - [info]  OK.
Sat Jun 16 15:12:07 2018 - [info] ** Finished master recovery successfully.
Sat Jun 16 15:12:07 2018 - [info] * Phase 3: Master Recovery Phase completed.
Sat Jun 16 15:12:07 2018 - [info]

第四阶段 slave恢复阶段

4.1 因为10.17.28.13拥有最新的relay log，所以也没必要获取差异的relay log
4.2 开始slave的恢复阶段

将monitor上保存的master上的差异的二进制日志scp到slave上。
应用差异日志。
清除原来的复制关系，并再次执行change master命令建立新的主从同步。
如果有多个slave，则该恢复过程是并行的。

Sat Jun 16 15:12:07 2018 - [info] * Phase 4: Slaves Recovery Phase..
Sat Jun 16 15:12:07 2018 - [info] 
Sat Jun 16 15:12:07 2018 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Sat Jun 16 15:12:07 2018 - [info] 
Sat Jun 16 15:12:07 2018 - [info] -- Slave diff file generation on host 10.17.28.13(10.17.28.13:3306) started, pid: 27742. Check tmp log /data/masterha/app_test/10.17.28.13_3306_20180616151129.log if it takes time..
Sat Jun 16 15:12:08 2018 - [info] 
Sat Jun 16 15:12:08 2018 - [info] Log messages from 10.17.28.13 ...
Sat Jun 16 15:12:08 2018 - [info] 
Sat Jun 16 15:12:07 2018 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Sat Jun 16 15:12:08 2018 - [info] End of log messages from 10.17.28.13.
Sat Jun 16 15:12:08 2018 - [info] -- 10.17.28.13(10.17.28.13:3306) has the latest relay log events.
Sat Jun 16 15:12:08 2018 - [info] Generating relay diff files from the latest slave succeeded.
Sat Jun 16 15:12:08 2018 - [info] 
Sat Jun 16 15:12:08 2018 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Sat Jun 16 15:12:08 2018 - [info] 
Sat Jun 16 15:12:08 2018 - [info] -- Slave recovery on host 10.17.28.13(10.17.28.13:3306) started, pid: 27745. Check tmp log /data/masterha/app_test/10.17.28.13_3306_20180616151129.log if it takes time..
Sat Jun 16 15:12:09 2018 - [info] 
Sat Jun 16 15:12:09 2018 - [info] Log messages from 10.17.28.13 ...
Sat Jun 16 15:12:09 2018 - [info] 
Sat Jun 16 15:12:08 2018 - [info] Sending binlog..
Sat Jun 16 15:12:08 2018 - [info] scp from local:/data/masterha/app_test/saved_master_binlog_from_10.17.28.11_3306_20180616151129.binlog to root@10.17.28.13:/home/masterha/binlogdir/saved_master_binlog_from_10.17.28.11_3306_20180616151129.binlog succeeded.

Sat Jun 16 15:12:08 2018 - [info] Starting recovery on 10.17.28.13(10.17.28.13:3306)..
Sat Jun 16 15:12:08 2018 - [info]  Generating diffs succeeded.
Sat Jun 16 15:12:08 2018 - [info] Waiting until all relay logs are applied.
Sat Jun 16 15:12:08 2018 - [info]  done.
Sat Jun 16 15:12:08 2018 - [info] Getting slave status..
Sat Jun 16 15:12:08 2018 - [info] This slave(10.17.28.13)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000037:2046). No need to recover from Exec_Master_Log_Pos.

Sat Jun 16 15:12:08 2018 - [info] Connecting to the target slave host 10.17.28.13, running recover script..
Sat Jun 16 15:12:08 2018 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=10.17.28.13 --slave_ip=10.17.28.13  --slave_port=3306 --apply_files=/home/masterha/binlogdir/saved_master_binlog_from_10.17.28.11_3306_20180616151129.binlog --workdir=/home/masterha/binlogdir --target_version=5.6.27-log --timestamp=20180616151129 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 --slave_pass=xxx  ##应用尾日志

#手动执行
#ssh 10.17.28.13
#apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=10.17.28.13 --slave_ip=10.17.28.13  --slave_port=3306 --apply_files=/home/masterha/binlogdir/saved_master_binlog_from_10.17.28.11_3306_20180616151129.binlog --workdir=/home/masterha/binlogdir --target_version=5.6.27-log --timestamp=20180616151129 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 --slave_pass=xxx  ##应用尾日志

Sat Jun 16 15:12:08 2018 - [info] 
MySQL client version is 5.6.27. Using --binary-mode.
Applying differential binary/relay log files /home/masterha/binlogdir/saved_master_binlog_from_10.17.28.11_3306_20180616151129.binlog on 10.17.28.13:3306. This may take long time...
Applying log files succeeded.
Sat Jun 16 15:12:08 2018 - [info]  All relay logs were successfully applied.

Sat Jun 16 15:12:08 2018 - [info]  Resetting slave 10.17.28.13(10.17.28.13:3306) and starting replication from the new master 10.17.28.12(10.17.28.12:3306)..
Sat Jun 16 15:12:08 2018 - [info]  Executed CHANGE MASTER.
Sat Jun 16 15:12:08 2018 - [info]  Slave started.
Sat Jun 16 15:12:09 2018 - [info] End of log messages from 10.17.28.13.
Sat Jun 16 15:12:09 2018 - [info] -- Slave recovery on host 10.17.28.13(10.17.28.13:3306) succeeded.

Sat Jun 16 15:12:09 2018 - [info] All new slave servers recovered successfully.
Sat Jun 16 15:12:09 2018 - [info] 
Sat Jun 16 15:12:09 2018 - [info] * Phase 5: New master cleanup phase..
Sat Jun 16 15:12:09 2018 - [info] 
Sat Jun 16 15:12:09 2018 - [info] Resetting slave info on the new master..
Sat Jun 16 15:12:09 2018 - [info]  10.17.28.12: Resetting slave info succeeded.
Sat Jun 16 15:12:09 2018 - [info] Master failover to 10.17.28.12(10.17.28.12:3306) completed successfully.
Sat Jun 16 15:12:09 2018 - [info] 

----- Failover Report -----

test: MySQL Master failover 10.17.28.11(10.17.28.11:3306) to 10.17.28.12(10.17.28.12:3306) succeeded

Master 10.17.28.11(10.17.28.11:3306) is down!

Check MHA Manager logs at db-jq-28-19:/data/masterha/app_test/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 10.17.28.11(10.17.28.11:3306)
The latest slave 10.17.28.13(10.17.28.13:3306) has all relay logs for recovery.
Selected 10.17.28.12(10.17.28.12:3306) as a new master.
10.17.28.12(10.17.28.12:3306): OK: Applying all logs succeeded.
10.17.28.12(10.17.28.12:3306): OK: Activated master IP address.
10.17.28.13(10.17.28.13:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
10.17.28.13(10.17.28.13:3306): OK: Applying all logs succeeded. Slave started, replicating from 10.17.28.12(10.17.28.12:3306)
10.17.28.12(10.17.28.12:3306): Resetting slave info succeeded.
Master failover to 10.17.28.12(10.17.28.12:3306) completed successfully.
Sat Jun 16 15:12:09 2018 - [info] Sending mail..
Unknown option: conf

转载于:https://blog.51cto.com/14096233/2368895

weixin_33762130

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
mha切换日志

初始检测判断主库是否已宕一共判断了3次，均判断master mysqld不可用，但是master主机通过ssh还是能登录上去。ssh用于传输日志通过上面的输出才知道是通过save_binary_logs脚本来判断ssh可用性的。(数据库宕)如果这里显示的是SSH is NOT reachable，则代表master主机也已经宕机了（操作系统宕）Sat Jun 16 15:10:5...
复制链接

扫一扫