Mysql-MHA 安装过程中遇到的问题
1.运行masterha_check_repl --conf=/etc/masterha/app1.cnf
Can't exec "mysqlbinlog": No such file or directory at /usr/local/perl5/MHA/BinlogManager.pm line 99.
在node节点上执行 which mysqlbinlog,比如我的结果就是
[localhost~]$ which mysqlbinlog
/usr/local/mysql/bin/mysqlbinlog
需要做一个软连接
ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
2.运行master_check_ssh --conf=/etc/masterha/aap1.cnf
connection via SSH fromroot@192.168.17.199toroot@192.168.17.200 ...
permission denied (publickey,gssapi-keyex,gssapi-with-mic,password)
[error] [/usr/local/share/perl5/MHA/SSHcheck.pm,ln163]
一般是公钥有问题,需要删除 /root/.ssh/known_hosts里面的相关ip内容 重新生成一下就ok了
3.事先解决perl依赖包问题
yum -y install perl-Config-Tiny perl-Params-Validate perl-Log-Dispatch perl-Parallel-ForkManager
yum -y install perl-DBD-MySQL ncftp
使用CPAN模块自动安装方法一:
安装前需要先联上网,并且您需要取得root权限。
perl -MCPAN -e shell
初次运行CPAN时需要做一些设置,如果您的机器是直接与因特网相联(拨号上网、专线,etc.),那么一路回车就行了,只需要在最后一步选一个离您最近的 CPAN 镜像站点。例如我选的是位于国内的http://www.cnblogs.com/itech/admin/ftp://www.perl87.cn/CPAN/。否则,如果您的机器位于防火墙之后,还需要设置ftp代理或http代理。下面是常用 cpan 命令。
获得帮助
cpan>help
列出CPAN上所有模块的列表
cpan>m
安装模块,自动完成Net::Server模块从下载到安装的全过程。
cpan>install Net::Server
退出
cpan>quit
使用CPAN模块自动安装方法二:
cpan -i 模块名 例如:cpan -i Net::Server
安装配置过程
192.168.17.199 | node | manager |
192.168.17.200 | node |
|
192.168.17.201 | node |
|
先到https://code.google.com/p/mysql-master-ha/downloads/list 下载mha-manager 和 mha-node 的包
我下载的是mha4mysql-manager-0.54.tar.gz和mha4mysql-node-0.54.tar.gz
下载好了之后先安装perl依赖模块
yum -y install perl-Config-Tiny perl-Params-Validate perl-Log-Dispatch perl-Parallel-ForkManager
yum -y install perl-DBD-MySQL ncftp
1.安装mha-node(三台机器上都装)
[local]# tar -zxvf mha4mysql-node-0.54.tar.gz -C /user/local/ [local]#cd /user/local/ mha4mysql-node-0.54/ [local]#perl Makefile.PL *** Module::AutoInstall version 1.03 *** Checking for Perl dependencies... [Core Features] - DBI ...loaded. (1.609) - DBD::mysql ...loaded. (4.013) *** Module::AutoInstall configuration finished. Checking if your kit is complete... Looks good Writing Makefile for mha4mysql::node [local]#make && make install |
2.安装manager(192.168.17.199上装)
[local]#tar -zxvf mha4mysql-manager-0.54.tar.gz -C /user/local/ [local]#cd /user/local/mha4mysql-manager-0.54/ [local]#perl Makefile.PL *** Module::AutoInstall version 1.03 *** Checking for Perl dependencies... [Core Features] - DBI ...loaded. (1.609) - DBD::mysql ...loaded. (4.013) - Time::HiRes ...loaded. (1.9721) - Config::Tiny ...loaded. (2.19) - Log::Dispatch ...loaded. (2.41) - Parallel::ForkManager ...loaded. (1.05) - MHA::NodeConst ...loaded. (0.54) *** Module::AutoInstall configuration finished. Checking if your kit is complete... Looks good Writing Makefile for mha4mysql::manager [local]#make && make install |
3.编辑配置文件
[local]#mkdir /etc/masterha [local]#mkdir -p /masterha/app1 [local]#cp samples/conf/* /etc/masterha/
[local]#cat /etc/masterha/app1.cnf [server default] manager_workdir=/masterha/app1 manager_log=/masterha/app1/manager.log #mysql user and password user=king password=king123 # ssh_user=root repl_user=repl repl_password=repl ping_interval=1 shutdown_script="" #master_ip_failover_script="/data/master_ip_failover" master_ip_online_change_script="" report_script="" [server1] hostname=192.168.17.199 master_binlog_dir="/data/mydb/db01/logs/binlog/" candidate_master=1 [server2] hostname=192.168.17.200 master_binlog_dir="/data/mydb/db01/logs/binlog/" candidate_master=1 [server3] hostname=192.168.17.201 master_binlog_dir="/data/mydb/db01/logs/binlog/" candidate_master=1 [local]# |
4.设置三台机器之间的ssh公钥信任
5.测试ssh连接
6.配置主从(过程略)
192.168.17.199:3306 master 192.168.17.200:3306 slave1 192.168.17.201:3306 slave2
三台机器的mysql里都建上king用户和repl用户 GRANT ALL PRIVILEGES ON *.* TO'king'@'%'IDENTIFIED BY 'king123' GRANT REPLICATION SLAVE ON *.* TO'repl'@'%'IDENTIFIED BY 'repl'
|
7.测试replication
[local]#masterha_check_repl --conf=/etc/masterha/app1.cnf Tue Nov 19 02:27:17 2013 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Nov 19 02:27:17 2013 - [info] Reading application default configurations from /etc/masterha/app1.cnf.. Tue Nov 19 02:27:17 2013 - [info] Reading server configurations from /etc/masterha/app1.cnf.. Tue Nov 19 02:27:17 2013 - [info] MHA::MasterMonitor version 0.54. Tue Nov 19 02:27:17 2013 - [info] Dead Servers: Tue Nov 19 02:27:17 2013 - [info] Alive Servers: Tue Nov 19 02:27:17 2013 - [info] 192.168.17.199(192.168.17.199:3306) Tue Nov 19 02:27:17 2013 - [info] 192.168.17.200(192.168.17.200:3306) Tue Nov 19 02:27:17 2013 - [info] 192.168.17.201(192.168.17.201:3306) Tue Nov 19 02:27:17 2013 - [info] Alive Slaves: Tue Nov 19 02:27:17 2013 - [info] 192.168.17.200(192.168.17.200:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled Tue Nov 19 02:27:17 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306) Tue Nov 19 02:27:17 2013 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 19 02:27:17 2013 - [info] 192.168.17.201(192.168.17.201:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled Tue Nov 19 02:27:17 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306) Tue Nov 19 02:27:17 2013 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 19 02:27:17 2013 - [info] Current Alive Master: 192.168.17.199(192.168.17.199:3306) Tue Nov 19 02:27:17 2013 - [info] Checking slave configurations.. Tue Nov 19 02:27:17 2013 - [info] read_only=1 is not set on slave 192.168.17.200(192.168.17.200:3306). Tue Nov 19 02:27:17 2013 - [warning] relay_log_purge=0 is not set on slave 192.168.17.200(192.168.17.200:3306). Tue Nov 19 02:27:17 2013 - [info] read_only=1 is not set on slave 192.168.17.201(192.168.17.201:3306). Tue Nov 19 02:27:17 2013 - [warning] relay_log_purge=0 is not set on slave 192.168.17.201(192.168.17.201:3306). Tue Nov 19 02:27:17 2013 - [info] Checking replication filtering settings.. Tue Nov 19 02:27:17 2013 - [info] binlog_do_db= , binlog_ignore_db= information_schema.%,mysql.% Tue Nov 19 02:27:17 2013 - [info] Replication filtering check ok. Tue Nov 19 02:27:17 2013 - [info] Starting SSH connection tests.. Tue Nov 19 02:27:19 2013 - [info] All SSH connection tests passed successfully. Tue Nov 19 02:27:19 2013 - [info] Checking MHA Node version.. Tue Nov 19 02:27:20 2013 - [info] Version check ok. Tue Nov 19 02:27:20 2013 - [info] Checking SSH publickey authentication settings on the current master.. Tue Nov 19 02:27:20 2013 - [info] HealthCheck: SSH to 192.168.17.199 is reachable. Tue Nov 19 02:27:20 2013 - [info] Master MHA Node version is 0.54. Tue Nov 19 02:27:20 2013 - [info] Checking recovery script configurations on the current master.. Tue Nov 19 02:27:20 2013 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mydb/db01/logs/binlog/ --output_file=/var/tmp/save_binary_logs_test --manager_version=0.54 --start_file=mysql-bin.000009 Tue Nov 19 02:27:20 2013 - [info] Connecting toroot@192.168.17.199(192.168.17.199).. Creating /var/tmp if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /data/mydb/db01/logs/binlog/, up to mysql-bin.000009 Tue Nov 19 02:27:20 2013 - [info] Master setting check done. Tue Nov 19 02:27:20 2013 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Tue Nov 19 02:27:20 2013 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='king' --slave_host=192.168.17.200 --slave_ip=192.168.17.200 --slave_port=3306 --workdir=/var/tmp --target_version=5.5.33-rel31.1-log --manager_version=0.54 --relay_log_info=/data/mydb/db01/data/relay-log.info --relay_dir=/data/mydb/db01/data/ --slave_pass=xxx Tue Nov 19 02:27:20 2013 - [info] Connecting toroot@192.168.17.200(192.168.17.200:22).. Checking slave recovery environment settings.. Opening /data/mydb/db01/data/relay-log.info ... ok. Relay log found at /data/mydb/db01/data, up to relay-bin.000004 Temporary relay log file is /data/mydb/db01/data/relay-bin.000004 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Tue Nov 19 02:27:21 2013 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='king' --slave_host=192.168.17.201 --slave_ip=192.168.17.201 --slave_port=3306 --workdir=/var/tmp --target_version=5.5.33-rel31.1-log --manager_version=0.54 --relay_log_info=/data/mydb/db01/data/relay-log.info --relay_dir=/data/mydb/db01/data/ --slave_pass=xxx Tue Nov 19 02:27:21 2013 - [info] Connecting toroot@192.168.17.201(192.168.17.201:22).. Checking slave recovery environment settings.. Opening /data/mydb/db01/data/relay-log.info ... ok. Relay log found at /data/mydb/db01/data, up to relay-bin.000004 Temporary relay log file is /data/mydb/db01/data/relay-bin.000004 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Tue Nov 19 02:27:21 2013 - [info] Slaves settings check done. Tue Nov 19 02:27:21 2013 - [info] 192.168.17.199 (current master) +--192.168.17.200 +--192.168.17.201 Tue Nov 19 02:27:21 2013 - [info] Checking replication health on 192.168.17.200.. Tue Nov 19 02:27:21 2013 - [info] ok. Tue Nov 19 02:27:21 2013 - [info] Checking replication health on 192.168.17.201.. Tue Nov 19 02:27:21 2013 - [info] ok. Tue Nov 19 02:27:21 2013 - [warning] master_ip_failover_script is not defined. Tue Nov 19 02:27:21 2013 - [warning] shutdown_script is not defined. Tue Nov 19 02:27:21 2013 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.
[local]#
|
8.启动管理节点进程
[local]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log < /dev/null 2>&1 & [local]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 (pid:22852) is running(0:PING_OK), master:192.168.17.199 [local]# |
9.测试master favior
在192.168.17.199(manager)上 tailf /etc/masterha/app1/manager.log 然后停止192.168.17.199的3306 mysql实例,并查看manager.log
]# tail -f /masterha/app1/manager.log 192.168.17.199 (current master) +--192.168.17.200 +--192.168.17.201 Tue Nov 19 00:32:04 2013 - [warning] master_ip_failover_script is not defined. Tue Nov 19 00:32:04 2013 - [warning] shutdown_script is not defined. Tue Nov 19 00:32:04 2013 - [info] Set master ping interval 1 seconds. Tue Nov 19 00:32:04 2013 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes. Tue Nov 19 00:32:04 2013 - [info] Starting ping health check on 192.168.17.199(192.168.17.199:3306).. Tue Nov 19 00:32:04 2013 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. Tue Nov 19 17:59:07 2013 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) Tue Nov 19 17:59:07 2013 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mydb/db01/logs/binlog/ --output_file=/var/tmp/save_binary_logs_test --manager_version=0.54 --binlog_prefix=mysql-bin Tue Nov 19 17:59:08 2013 - [info] HealthCheck: SSH to 192.168.17.199 is reachable. Tue Nov 19 17:59:08 2013 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Tue Nov 19 17:59:08 2013 - [warning] Connection failed 1 time(s).. Tue Nov 19 17:59:09 2013 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Tue Nov 19 17:59:09 2013 - [warning] Connection failed 2 time(s).. Tue Nov 19 17:59:10 2013 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Tue Nov 19 17:59:10 2013 - [warning] Connection failed 3 time(s).. Tue Nov 19 17:59:10 2013 - [warning] Master is not reachable from health checker! Tue Nov 19 17:59:10 2013 - [warning] Master 192.168.17.199(192.168.17.199:3306) is not reachable! Tue Nov 19 17:59:10 2013 - [warning] SSH is reachable. Tue Nov 19 17:59:10 2013 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status.. Tue Nov 19 17:59:10 2013 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Nov 19 17:59:10 2013 - [info] Reading application default configurations from /etc/masterha/app1.cnf.. Tue Nov 19 17:59:10 2013 - [info] Reading server configurations from /etc/masterha/app1.cnf.. Tue Nov 19 17:59:10 2013 - [info] Dead Servers: Tue Nov 19 17:59:10 2013 - [info] 192.168.17.199(192.168.17.199:3306) Tue Nov 19 17:59:10 2013 - [info] Alive Servers: Tue Nov 19 17:59:10 2013 - [info] 192.168.17.200(192.168.17.200:3306) Tue Nov 19 17:59:10 2013 - [info] 192.168.17.201(192.168.17.201:3306) Tue Nov 19 17:59:10 2013 - [info] Alive Slaves: Tue Nov 19 17:59:10 2013 - [info] 192.168.17.200(192.168.17.200:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled Tue Nov 19 17:59:10 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306) Tue Nov 19 17:59:10 2013 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 19 17:59:10 2013 - [info] 192.168.17.201(192.168.17.201:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled Tue Nov 19 17:59:10 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306) Tue Nov 19 17:59:10 2013 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 19 17:59:10 2013 - [info] Checking slave configurations.. Tue Nov 19 17:59:10 2013 - [info] read_only=1 is not set on slave 192.168.17.200(192.168.17.200:3306). Tue Nov 19 17:59:10 2013 - [warning] relay_log_purge=0 is not set on slave 192.168.17.200(192.168.17.200:3306). Tue Nov 19 17:59:10 2013 - [info] read_only=1 is not set on slave 192.168.17.201(192.168.17.201:3306). Tue Nov 19 17:59:10 2013 - [warning] relay_log_purge=0 is not set on slave 192.168.17.201(192.168.17.201:3306). Tue Nov 19 17:59:10 2013 - [info] Checking replication filtering settings.. Tue Nov 19 17:59:10 2013 - [info] Replication filtering check ok. Tue Nov 19 17:59:10 2013 - [info] Master is down! Tue Nov 19 17:59:10 2013 - [info] Terminating monitoring script. Tue Nov 19 17:59:10 2013 - [info] Got exit code 20 (Master dead). Tue Nov 19 17:59:10 2013 - [info] MHA::MasterFailover version 0.54. Tue Nov 19 17:59:10 2013 - [info] Starting master failover. Tue Nov 19 17:59:10 2013 - [info] Tue Nov 19 17:59:10 2013 - [info] * Phase 1: Configuration Check Phase.. Tue Nov 19 17:59:10 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] Dead Servers: Tue Nov 19 17:59:11 2013 - [info] 192.168.17.199(192.168.17.199:3306) Tue Nov 19 17:59:11 2013 - [info] Checking master reachability via mysql(double check).. Tue Nov 19 17:59:11 2013 - [info] ok. Tue Nov 19 17:59:11 2013 - [info] Alive Servers: Tue Nov 19 17:59:11 2013 - [info] 192.168.17.200(192.168.17.200:3306) Tue Nov 19 17:59:11 2013 - [info] 192.168.17.201(192.168.17.201:3306) Tue Nov 19 17:59:11 2013 - [info] Alive Slaves: Tue Nov 19 17:59:11 2013 - [info] 192.168.17.200(192.168.17.200:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled Tue Nov 19 17:59:11 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306) Tue Nov 19 17:59:11 2013 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 19 17:59:11 2013 - [info] 192.168.17.201(192.168.17.201:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled Tue Nov 19 17:59:11 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306) Tue Nov 19 17:59:11 2013 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 19 17:59:11 2013 - [info] ** Phase 1: Configuration Check Phase completed. Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] * Phase 2: Dead Master Shutdown Phase.. Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] Forcing shutdown so that applications never connect to the current master.. Tue Nov 19 17:59:11 2013 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master ip address. Tue Nov 19 17:59:11 2013 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Tue Nov 19 17:59:11 2013 - [info] * Phase 2: Dead Master Shutdown Phase completed. Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] * Phase 3: Master Recovery Phase.. Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] The latest binary log file/position on all slaves is mysql-bin.000009:2386 Tue Nov 19 17:59:11 2013 - [info] Latest slaves (Slaves that received relay log files to the latest): Tue Nov 19 17:59:11 2013 - [info] 192.168.17.200(192.168.17.200:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled Tue Nov 19 17:59:11 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306) Tue Nov 19 17:59:11 2013 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 19 17:59:11 2013 - [info] 192.168.17.201(192.168.17.201:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled Tue Nov 19 17:59:11 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306) Tue Nov 19 17:59:11 2013 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 19 17:59:11 2013 - [info] The oldest binary log file/position on all slaves is mysql-bin.000009:2386 Tue Nov 19 17:59:11 2013 - [info] Oldest slaves: Tue Nov 19 17:59:11 2013 - [info] 192.168.17.200(192.168.17.200:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled Tue Nov 19 17:59:11 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306) Tue Nov 19 17:59:11 2013 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 19 17:59:11 2013 - [info] 192.168.17.201(192.168.17.201:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled Tue Nov 19 17:59:11 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306) Tue Nov 19 17:59:11 2013 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] Fetching dead master's binary logs.. Tue Nov 19 17:59:11 2013 - [info] Executing command on the dead master 192.168.17.199(192.168.17.199:3306): save_binary_logs --command=save --start_file=mysql-bin.000009 --start_pos=2386 --binlog_dir=/data/mydb/db01/logs/binlog/ --output_file=/var/tmp/saved_master_binlog_from_192.168.17.199_3306_20131119175910.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.54 Creating /var/tmp if not exists.. ok. Concat binary/relay logs from mysql-bin.000009 pos 2386 to mysql-bin.000009 EOF into /var/tmp/saved_master_binlog_from_192.168.17.199_3306_20131119175910.binlog .. Dumping binlog format description event, from position 0 to 107.. ok. Dumping effective binlog data from /data/mydb/db01/logs/binlog//mysql-bin.000009 position 2386 to tail(2405).. ok. sh: mysqlbinlog: command not found Failed to save binary log: /var/tmp/saved_master_binlog_from_192.168.17.199_3306_20131119175910.binlog is broken! at /usr/local/bin/save_binary_logs line 170 Tue Nov 19 17:59:11 2013 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln577] Failed to save binary log events from the orig master. Maybe disks on binary logs are not accessible or binary log itself is corrupt? Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] * Phase 3.3: Determining New Master Phase.. Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Tue Nov 19 17:59:11 2013 - [info] All slaves received relay logs to the same position. No need to resync each other. Tue Nov 19 17:59:11 2013 - [info] Searching new master from slaves.. Tue Nov 19 17:59:11 2013 - [info] Candidate masters from the configuration file: Tue Nov 19 17:59:11 2013 - [info] 192.168.17.200(192.168.17.200:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled Tue Nov 19 17:59:11 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306) Tue Nov 19 17:59:11 2013 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 19 17:59:11 2013 - [info] 192.168.17.201(192.168.17.201:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled Tue Nov 19 17:59:11 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306) Tue Nov 19 17:59:11 2013 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 19 17:59:11 2013 - [info] Non-candidate masters: Tue Nov 19 17:59:11 2013 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Tue Nov 19 17:59:11 2013 - [info] New master is 192.168.17.200(192.168.17.200:3306) Tue Nov 19 17:59:11 2013 - [info] Starting master failover.. Tue Nov 19 17:59:11 2013 - [info] From: 192.168.17.199 (current master) +--192.168.17.200 +--192.168.17.201 To: 192.168.17.200 (new master) +--192.168.17.201 Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] * Phase 3.4: Master Log Apply Phase.. Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Tue Nov 19 17:59:11 2013 - [info] Starting recovery on 192.168.17.200(192.168.17.200:3306).. Tue Nov 19 17:59:11 2013 - [info] This server has all relay logs. Waiting all logs to be applied.. Tue Nov 19 17:59:11 2013 - [info] done. Tue Nov 19 17:59:11 2013 - [info] All relay logs were successfully applied. Tue Nov 19 17:59:11 2013 - [info] Getting new master's binlog name and position.. Tue Nov 19 17:59:11 2013 - [info] mysql-bin.000008:2606 Tue Nov 19 17:59:11 2013 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.17.200', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000008', MASTER_LOG_POS=2606, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Tue Nov 19 17:59:11 2013 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address. Tue Nov 19 17:59:11 2013 - [info] ** Finished master recovery successfully. Tue Nov 19 17:59:11 2013 - [info] * Phase 3: Master Recovery Phase completed. Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] * Phase 4: Slaves Recovery Phase.. Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] -- Slave diff file generation on host 192.168.17.201(192.168.17.201:3306) started, pid: 38557. Check tmp log /masterha/app1/192.168.17.201_3306_20131119175910.log if it takes time.. Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] Log messages from 192.168.17.201 ... Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Tue Nov 19 17:59:11 2013 - [info] End of log messages from 192.168.17.201. Tue Nov 19 17:59:11 2013 - [info] -- 192.168.17.201(192.168.17.201:3306) has the latest relay log events. Tue Nov 19 17:59:11 2013 - [info] Generating relay diff files from the latest slave succeeded. Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] -- Slave recovery on host 192.168.17.201(192.168.17.201:3306) started, pid: 38559. Check tmp log /masterha/app1/192.168.17.201_3306_20131119175910.log if it takes time.. Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] Log messages from 192.168.17.201 ... Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] Starting recovery on 192.168.17.201(192.168.17.201:3306).. Tue Nov 19 17:59:11 2013 - [info] This server has all relay logs. Waiting all logs to be applied.. Tue Nov 19 17:59:11 2013 - [info] done. Tue Nov 19 17:59:11 2013 - [info] All relay logs were successfully applied. Tue Nov 19 17:59:11 2013 - [info] Resetting slave 192.168.17.201(192.168.17.201:3306) and starting replication from the new master 192.168.17.200(192.168.17.200:3306).. Tue Nov 19 17:59:11 2013 - [info] Executed CHANGE MASTER. Tue Nov 19 17:59:11 2013 - [info] Slave started. Tue Nov 19 17:59:11 2013 - [info] End of log messages from 192.168.17.201. Tue Nov 19 17:59:11 2013 - [info] -- Slave recovery on host 192.168.17.201(192.168.17.201:3306) succeeded. Tue Nov 19 17:59:11 2013 - [info] All new slave servers recovered successfully. Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] * Phase 5: New master cleanup phase.. Tue Nov 19 17:59:11 2013 - [info] Tue Nov 19 17:59:11 2013 - [info] Resetting slave info on the new master.. Tue Nov 19 17:59:12 2013 - [info] 192.168.17.200: Resetting slave info succeeded. Tue Nov 19 17:59:12 2013 - [info] Master failover to 192.168.17.200(192.168.17.200:3306) completed successfully. Tue Nov 19 17:59:12 2013 - [info] ----- Failover Report ----- app1: MySQL Master failover 192.168.17.199 to 192.168.17.200 succeeded Master 192.168.17.199 is down! Check MHA Manager logs at rhel-king-01:/masterha/app1/manager.log for details. Started automated(non-interactive) failover. The latest slave 192.168.17.200(192.168.17.200:3306) has all relay logs for recovery. Selected 192.168.17.200 as a new master. 192.168.17.200: OK: Applying all logs succeeded. 192.168.17.201: This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. 192.168.17.201: OK: Applying all logs succeeded. Slave started, replicating from 192.168.17.200. 192.168.17.200: Resetting slave info succeeded. Master failover to 192.168.17.200(192.168.17.200:3306) completed successfully.
|
10.切换后旧master的修复及重新上线
master已经由192.168.17.199 3306 切到了192.168.17.200 3306 实际环境中数据是在不断的变化的,而在切换点mha没有记录当时新master的log-file和log-pos 所以要想直接启动192.168.17.199 3306 然后change master to 192.168.17.200 3306的话是不行的,只能对新主或slave2做一个全备然后再恢复再change。 另外,当执行切换后管理节点上的masterha_manager进程会自动stop,所以等修复好后要再次执行启动
[local]#nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log < /dev/null 2>&1 & [2] 41276 [local]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 (pid:41276) is running(0:PING_OK), master:192.168.17.200
看详细日志
Tue Nov 19 20:52:38 2013 - [info] MHA::MasterMonitor version 0.54. Tue Nov 19 20:52:38 2013 - [info] Dead Servers: Tue Nov 19 20:52:38 2013 - [info] Alive Servers: Tue Nov 19 20:52:38 2013 - [info] 192.168.17.199(192.168.17.199:3306) Tue Nov 19 20:52:38 2013 - [info] 192.168.17.200(192.168.17.200:3306) Tue Nov 19 20:52:38 2013 - [info] 192.168.17.201(192.168.17.201:3306) Tue Nov 19 20:52:38 2013 - [info] Alive Slaves: Tue Nov 19 20:52:38 2013 - [info] 192.168.17.199(192.168.17.199:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled Tue Nov 19 20:52:38 2013 - [info] Replicating from 192.168.17.200(192.168.17.200:3306) Tue Nov 19 20:52:38 2013 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 19 20:52:38 2013 - [info] 192.168.17.201(192.168.17.201:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled Tue Nov 19 20:52:38 2013 - [info] Replicating from 192.168.17.200(192.168.17.200:3306) Tue Nov 19 20:52:38 2013 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 19 20:52:38 2013 - [info] Current Alive Master: 192.168.17.200(192.168.17.200:3306) Tue Nov 19 20:52:38 2013 - [info] Checking slave configurations.. Tue Nov 19 20:52:38 2013 - [info] read_only=1 is not set on slave 192.168.17.199(192.168.17.199:3306). Tue Nov 19 20:52:38 2013 - [warning] relay_log_purge=0 is not set on slave 192.168.17.199(192.168.17.199:3306). Tue Nov 19 20:52:38 2013 - [info] read_only=1 is not set on slave 192.168.17.201(192.168.17.201:3306). Tue Nov 19 20:52:38 2013 - [warning] relay_log_purge=0 is not set on slave 192.168.17.201(192.168.17.201:3306). Tue Nov 19 20:52:38 2013 - [info] Checking replication filtering settings.. Tue Nov 19 20:52:38 2013 - [info] binlog_do_db= , binlog_ignore_db= information_schema.%,mysql.% Tue Nov 19 20:52:38 2013 - [info] Replication filtering check ok. Tue Nov 19 20:52:38 2013 - [info] Starting SSH connection tests.. Tue Nov 19 20:52:40 2013 - [info] All SSH connection tests passed successfully. Tue Nov 19 20:52:40 2013 - [info] Checking MHA Node version.. Tue Nov 19 20:52:41 2013 - [info] Version check ok. Tue Nov 19 20:52:41 2013 - [info] Checking SSH publickey authentication settings on the current master.. Tue Nov 19 20:52:41 2013 - [info] HealthCheck: SSH to 192.168.17.200 is reachable. Tue Nov 19 20:52:41 2013 - [info] Master MHA Node version is 0.54. Tue Nov 19 20:52:41 2013 - [info] Checking recovery script configurations on the current master.. Tue Nov 19 20:52:41 2013 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mydb/db01/logs/binlog/ --output_file=/var/tmp/save_binary_logs_test --manager_version=0.54 --start_file=mysql-bin.000008 Tue Nov 19 20:52:41 2013 - [info] Connecting to root@192.168.17.200(192.168.17.200).. Creating /var/tmp if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /data/mydb/db01/logs/binlog/, up to mysql-bin.000008 Tue Nov 19 20:52:42 2013 - [info] Master setting check done. Tue Nov 19 20:52:42 2013 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Tue Nov 19 20:52:42 2013 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='king' --slave_host=192.168.17.199 --slave_ip=192.168.17.199 --slave_port=3306 --workdir=/var/tmp --target_version=5.5.33-rel31.1-log --manager_version=0.54 --relay_log_info=/data/mydb/db01/data/relay-log.info --relay_dir=/data/mydb/db01/data/ --slave_pass=xxx Tue Nov 19 20:52:42 2013 - [info] Connecting to root@192.168.17.199(192.168.17.199:22).. Checking slave recovery environment settings.. Opening /data/mydb/db01/data/relay-log.info ... ok. Relay log found at /data/mydb/db01/data, up to relay-bin.000002 Temporary relay log file is /data/mydb/db01/data/relay-bin.000002 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Tue Nov 19 20:52:42 2013 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='king' --slave_host=192.168.17.201 --slave_ip=192.168.17.201 --slave_port=3306 --workdir=/var/tmp --target_version=5.5.33-rel31.1-log --manager_version=0.54 --relay_log_info=/data/mydb/db01/data/relay-log.info --relay_dir=/data/mydb/db01/data/ --slave_pass=xxx Tue Nov 19 20:52:42 2013 - [info] Connecting to root@192.168.17.201(192.168.17.201:22).. Checking slave recovery environment settings.. Opening /data/mydb/db01/data/relay-log.info ... ok. Relay log found at /data/mydb/db01/data, up to relay-bin.000002 Temporary relay log file is /data/mydb/db01/data/relay-bin.000002 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Tue Nov 19 20:52:42 2013 - [info] Slaves settings check done. Tue Nov 19 20:52:42 2013 - [info] 192.168.17.200 (current master) +--192.168.17.199 +--192.168.17.201 Tue Nov 19 20:52:42 2013 - [warning] master_ip_failover_script is not defined. Tue Nov 19 20:52:42 2013 - [warning] shutdown_script is not defined. Tue Nov 19 20:52:42 2013 - [info] Set master ping interval 1 seconds. Tue Nov 19 20:52:42 2013 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes. Tue Nov 19 20:52:42 2013 - [info] Starting ping health check on 192.168.17.200(192.168.17.200:3306).. Tue Nov 19 20:52:42 2013 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
|