MySQL-MHA 安装配置及遇到的问题

最新推荐文章于 2021-01-18 22:50:00 发布

weixin_34376986

最新推荐文章于 2021-01-18 22:50:00 发布

阅读量393

点赞数

文章标签：运维数据库 shell

原文链接：http://blog.51cto.com/kingbox/1329603

版权

Mysql-MHA 安装过程中遇到的问题

1.运行masterha_check_repl --conf=/etc/masterha/app1.cnf

Can't exec "mysqlbinlog": No such file or directory at /usr/local/perl5/MHA/BinlogManager.pm line 99.

在node节点上执行 which mysqlbinlog，比如我的结果就是

[localhost~]$ which mysqlbinlog
/usr/local/mysql/bin/mysqlbinlog

需要做一个软连接

ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog

2.运行master_check_ssh --conf=/etc/masterha/aap1.cnf

connection via SSH fromroot@192.168.17.199toroot@192.168.17.200 ...

permission denied (publickey,gssapi-keyex,gssapi-with-mic,password)

[error] [/usr/local/share/perl5/MHA/SSHcheck.pm,ln163]

一般是公钥有问题，需要删除 /root/.ssh/known_hosts里面的相关ip内容重新生成一下就ok了

3.事先解决perl依赖包问题

yum -y install perl-Config-Tiny perl-Params-Validate perl-Log-Dispatch perl-Parallel-ForkManager

yum -y install perl-DBD-MySQL ncftp

使用CPAN模块自动安装方法一：

安装前需要先联上网，并且您需要取得root权限。

perl -MCPAN -e shell

初次运行CPAN时需要做一些设置，如果您的机器是直接与因特网相联(拨号上网、专线，etc.)，那么一路回车就行了，只需要在最后一步选一个离您最近的 CPAN 镜像站点。例如我选的是位于国内的http://www.cnblogs.com/itech/admin/ftp://www.perl87.cn/CPAN/。否则，如果您的机器位于防火墙之后，还需要设置ftp代理或http代理。下面是常用 cpan 命令。

获得帮助

cpan>help

列出CPAN上所有模块的列表

cpan>m

安装模块，自动完成Net::Server模块从下载到安装的全过程。

cpan>install Net::Server

退出

cpan>quit

使用CPAN模块自动安装方法二：

cpan -i 模块名例如：cpan -i Net::Server

安装配置过程

192.168.17.199	node	manager
192.168.17.200	node
192.168.17.201	node

先到https://code.google.com/p/mysql-master-ha/downloads/list 下载mha-manager 和 mha-node 的包

我下载的是mha4mysql-manager-0.54.tar.gz和mha4mysql-node-0.54.tar.gz

下载好了之后先安装perl依赖模块

yum -y install perl-Config-Tiny perl-Params-Validate perl-Log-Dispatch perl-Parallel-ForkManager

yum -y install perl-DBD-MySQL ncftp

1.安装mha-node（三台机器上都装）

[local]# tar -zxvf mha4mysql-node-0.54.tar.gz -C /user/local/

[local]#cd /user/local/ mha4mysql-node-0.54/

[local]#perl Makefile.PL

*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI ...loaded. (1.609)
- DBD::mysql ...loaded. (4.013)
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Looks good
Writing Makefile for mha4mysql::node

[local]#make && make install

2.安装manager（192.168.17.199上装）

[local]#tar -zxvf mha4mysql-manager-0.54.tar.gz -C /user/local/
[local]#cd /user/local/mha4mysql-manager-0.54/
[local]#perl Makefile.PL

*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI ...loaded. (1.609)
- DBD::mysql ...loaded. (4.013)
- Time::HiRes ...loaded. (1.9721)
- Config::Tiny ...loaded. (2.19)
- Log::Dispatch ...loaded. (2.41)
- Parallel::ForkManager ...loaded. (1.05)
- MHA::NodeConst ...loaded. (0.54)
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Looks good
Writing Makefile for mha4mysql::manager
[local]#make && make install

3.编辑配置文件

[local]#mkdir /etc/masterha
[local]#mkdir -p /masterha/app1
[local]#cp samples/conf/* /etc/masterha/

[local]#cat /etc/masterha/app1.cnf

[server default]
manager_workdir=/masterha/app1
manager_log=/masterha/app1/manager.log
#mysql user and password
user=king
password=king123
#
ssh_user=root
repl_user=repl
repl_password=repl
ping_interval=1
shutdown_script=""
#master_ip_failover_script="/data/master_ip_failover"
master_ip_online_change_script=""
report_script=""
[server1]
hostname=192.168.17.199
master_binlog_dir="/data/mydb/db01/logs/binlog/"
candidate_master=1
[server2]
hostname=192.168.17.200
master_binlog_dir="/data/mydb/db01/logs/binlog/"
candidate_master=1
[server3]
hostname=192.168.17.201
master_binlog_dir="/data/mydb/db01/logs/binlog/"
candidate_master=1

[local]#

4.设置三台机器之间的ssh公钥信任

192.168.17.199 上
[local]# ssh-keygen -t rsa
[local]# ssh-copy-id -i /root/.ssh/id_rsa.pubroot@192.168.17.200
[local]#ssh-copy-id -i /root/.ssh/id_rsa.pubroot@192.168.17.201

192.168.17.200 上
[local]#ssh-keygen -t rsa
[local]#ssh-copy-id -i /root/.ssh/id_rsa.pubroot@192.168.17.199
[local]#ssh-copy-id -i /root/.ssh/id_rsa.pubroot@192.168.17.201

192.168.17.201 上
[local]# ssh-keygen -t rsa
[local]# ssh-copy-id -i /root/.ssh/id_rsa.pubroot@192.168.17.200
[local]#ssh-copy-id -i /root/.ssh/id_rsa.pubroot@192.168.17.199

5.测试ssh连接

[local]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Tue Nov 19 02:19:56 2013 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Nov 19 02:19:56 2013 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Tue Nov 19 02:19:56 2013 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Tue Nov 19 02:19:56 2013 - [info] Starting SSH connection tests..
Tue Nov 19 02:19:57 2013 - [debug]
Tue Nov 19 02:19:56 2013 - [debug] Connecting via SSH fromroot@192.168.17.199(192.168.17.199:22) toroot@192.168.17.200(192.168.17.200:22)..
Tue Nov 19 02:19:56 2013 - [debug] ok.
Tue Nov 19 02:19:56 2013 - [debug] Connecting via SSH fromroot@192.168.17.199(192.168.17.199:22) toroot@192.168.17.201(192.168.17.201:22)..
Tue Nov 19 02:19:57 2013 - [debug] ok.
Tue Nov 19 02:19:57 2013 - [debug]
Tue Nov 19 02:19:56 2013 - [debug] Connecting via SSH fromroot@192.168.17.200(192.168.17.200:22) toroot@192.168.17.199(192.168.17.199:22)..
Tue Nov 19 02:19:57 2013 - [debug] ok.
Tue Nov 19 02:19:57 2013 - [debug] Connecting via SSH fromroot@192.168.17.200(192.168.17.200:22) toroot@192.168.17.201(192.168.17.201:22)..
Tue Nov 19 02:19:57 2013 - [debug] ok.
Tue Nov 19 02:19:58 2013 - [debug]
Tue Nov 19 02:19:57 2013 - [debug] Connecting via SSH fromroot@192.168.17.201(192.168.17.201:22) toroot@192.168.17.199(192.168.17.199:22)..
Tue Nov 19 02:19:57 2013 - [debug] ok.
Tue Nov 19 02:19:57 2013 - [debug] Connecting via SSH fromroot@192.168.17.201(192.168.17.201:22) toroot@192.168.17.200(192.168.17.200:22)..
Tue Nov 19 02:19:58 2013 - [debug] ok.

Tue Nov 19 02:19:58 2013 - [info] All SSH connection tests passed successfully.

[local]#

6.配置主从(过程略)

192.168.17.199:3306 master

192.168.17.200:3306 slave1

192.168.17.201:3306 slave2

三台机器的mysql里都建上king用户和repl用户

GRANT ALL PRIVILEGES ON *.* TO'king'@'%'IDENTIFIED BY 'king123'

GRANT REPLICATION SLAVE ON *.* TO'repl'@'%'IDENTIFIED BY 'repl'

7.测试replication

[local]#masterha_check_repl --conf=/etc/masterha/app1.cnf

Tue Nov 19 02:27:17 2013 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Nov 19 02:27:17 2013 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Tue Nov 19 02:27:17 2013 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Tue Nov 19 02:27:17 2013 - [info] MHA::MasterMonitor version 0.54.
Tue Nov 19 02:27:17 2013 - [info] Dead Servers:
Tue Nov 19 02:27:17 2013 - [info] Alive Servers:
Tue Nov 19 02:27:17 2013 - [info] 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 02:27:17 2013 - [info] 192.168.17.200(192.168.17.200:3306)
Tue Nov 19 02:27:17 2013 - [info] 192.168.17.201(192.168.17.201:3306)
Tue Nov 19 02:27:17 2013 - [info] Alive Slaves:
Tue Nov 19 02:27:17 2013 - [info] 192.168.17.200(192.168.17.200:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 02:27:17 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 02:27:17 2013 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 02:27:17 2013 - [info] 192.168.17.201(192.168.17.201:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 02:27:17 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 02:27:17 2013 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 02:27:17 2013 - [info] Current Alive Master: 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 02:27:17 2013 - [info] Checking slave configurations..
Tue Nov 19 02:27:17 2013 - [info] read_only=1 is not set on slave 192.168.17.200(192.168.17.200:3306).
Tue Nov 19 02:27:17 2013 - [warning] relay_log_purge=0 is not set on slave 192.168.17.200(192.168.17.200:3306).
Tue Nov 19 02:27:17 2013 - [info] read_only=1 is not set on slave 192.168.17.201(192.168.17.201:3306).
Tue Nov 19 02:27:17 2013 - [warning] relay_log_purge=0 is not set on slave 192.168.17.201(192.168.17.201:3306).
Tue Nov 19 02:27:17 2013 - [info] Checking replication filtering settings..
Tue Nov 19 02:27:17 2013 - [info] binlog_do_db= , binlog_ignore_db= information_schema.%,mysql.%
Tue Nov 19 02:27:17 2013 - [info] Replication filtering check ok.
Tue Nov 19 02:27:17 2013 - [info] Starting SSH connection tests..
Tue Nov 19 02:27:19 2013 - [info] All SSH connection tests passed successfully.
Tue Nov 19 02:27:19 2013 - [info] Checking MHA Node version..
Tue Nov 19 02:27:20 2013 - [info] Version check ok.
Tue Nov 19 02:27:20 2013 - [info] Checking SSH publickey authentication settings on the current master..
Tue Nov 19 02:27:20 2013 - [info] HealthCheck: SSH to 192.168.17.199 is reachable.
Tue Nov 19 02:27:20 2013 - [info] Master MHA Node version is 0.54.
Tue Nov 19 02:27:20 2013 - [info] Checking recovery script configurations on the current master..
Tue Nov 19 02:27:20 2013 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mydb/db01/logs/binlog/ --output_file=/var/tmp/save_binary_logs_test --manager_version=0.54 --start_file=mysql-bin.000009
Tue Nov 19 02:27:20 2013 - [info] Connecting toroot@192.168.17.199(192.168.17.199)..
Creating /var/tmp if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mydb/db01/logs/binlog/, up to mysql-bin.000009
Tue Nov 19 02:27:20 2013 - [info] Master setting check done.
Tue Nov 19 02:27:20 2013 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Tue Nov 19 02:27:20 2013 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='king' --slave_host=192.168.17.200 --slave_ip=192.168.17.200 --slave_port=3306 --workdir=/var/tmp --target_version=5.5.33-rel31.1-log --manager_version=0.54 --relay_log_info=/data/mydb/db01/data/relay-log.info --relay_dir=/data/mydb/db01/data/ --slave_pass=xxx
Tue Nov 19 02:27:20 2013 - [info] Connecting toroot@192.168.17.200(192.168.17.200:22)..
Checking slave recovery environment settings..
Opening /data/mydb/db01/data/relay-log.info ... ok.
Relay log found at /data/mydb/db01/data, up to relay-bin.000004
Temporary relay log file is /data/mydb/db01/data/relay-bin.000004
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Tue Nov 19 02:27:21 2013 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='king' --slave_host=192.168.17.201 --slave_ip=192.168.17.201 --slave_port=3306 --workdir=/var/tmp --target_version=5.5.33-rel31.1-log --manager_version=0.54 --relay_log_info=/data/mydb/db01/data/relay-log.info --relay_dir=/data/mydb/db01/data/ --slave_pass=xxx
Tue Nov 19 02:27:21 2013 - [info] Connecting toroot@192.168.17.201(192.168.17.201:22)..
Checking slave recovery environment settings..
Opening /data/mydb/db01/data/relay-log.info ... ok.
Relay log found at /data/mydb/db01/data, up to relay-bin.000004
Temporary relay log file is /data/mydb/db01/data/relay-bin.000004
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Tue Nov 19 02:27:21 2013 - [info] Slaves settings check done.
Tue Nov 19 02:27:21 2013 - [info]
192.168.17.199 (current master)
+--192.168.17.200
+--192.168.17.201

Tue Nov 19 02:27:21 2013 - [info] Checking replication health on 192.168.17.200..
Tue Nov 19 02:27:21 2013 - [info] ok.
Tue Nov 19 02:27:21 2013 - [info] Checking replication health on 192.168.17.201..
Tue Nov 19 02:27:21 2013 - [info] ok.
Tue Nov 19 02:27:21 2013 - [warning] master_ip_failover_script is not defined.
Tue Nov 19 02:27:21 2013 - [warning] shutdown_script is not defined.
Tue Nov 19 02:27:21 2013 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

[local]#

8.启动管理节点进程

[local]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log < /dev/null 2>&1 &

[local]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:22852) is running(0:PING_OK), master:192.168.17.199
[local]#

9.测试master favior

在192.168.17.199（manager）上 tailf /etc/masterha/app1/manager.log

然后停止192.168.17.199的3306 mysql实例，并查看manager.log

]# tail -f /masterha/app1/manager.log
192.168.17.199 (current master)
+--192.168.17.200
+--192.168.17.201

Tue Nov 19 00:32:04 2013 - [warning] master_ip_failover_script is not defined.
Tue Nov 19 00:32:04 2013 - [warning] shutdown_script is not defined.
Tue Nov 19 00:32:04 2013 - [info] Set master ping interval 1 seconds.
Tue Nov 19 00:32:04 2013 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Tue Nov 19 00:32:04 2013 - [info] Starting ping health check on 192.168.17.199(192.168.17.199:3306)..
Tue Nov 19 00:32:04 2013 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Tue Nov 19 17:59:07 2013 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Tue Nov 19 17:59:07 2013 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mydb/db01/logs/binlog/ --output_file=/var/tmp/save_binary_logs_test --manager_version=0.54 --binlog_prefix=mysql-bin
Tue Nov 19 17:59:08 2013 - [info] HealthCheck: SSH to 192.168.17.199 is reachable.
Tue Nov 19 17:59:08 2013 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Nov 19 17:59:08 2013 - [warning] Connection failed 1 time(s)..
Tue Nov 19 17:59:09 2013 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Nov 19 17:59:09 2013 - [warning] Connection failed 2 time(s)..
Tue Nov 19 17:59:10 2013 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Nov 19 17:59:10 2013 - [warning] Connection failed 3 time(s)..
Tue Nov 19 17:59:10 2013 - [warning] Master is not reachable from health checker!
Tue Nov 19 17:59:10 2013 - [warning] Master 192.168.17.199(192.168.17.199:3306) is not reachable!
Tue Nov 19 17:59:10 2013 - [warning] SSH is reachable.
Tue Nov 19 17:59:10 2013 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..
Tue Nov 19 17:59:10 2013 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Nov 19 17:59:10 2013 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Tue Nov 19 17:59:10 2013 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Tue Nov 19 17:59:10 2013 - [info] Dead Servers:
Tue Nov 19 17:59:10 2013 - [info] 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:10 2013 - [info] Alive Servers:
Tue Nov 19 17:59:10 2013 - [info] 192.168.17.200(192.168.17.200:3306)
Tue Nov 19 17:59:10 2013 - [info] 192.168.17.201(192.168.17.201:3306)
Tue Nov 19 17:59:10 2013 - [info] Alive Slaves:
Tue Nov 19 17:59:10 2013 - [info] 192.168.17.200(192.168.17.200:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 17:59:10 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:10 2013 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 17:59:10 2013 - [info] 192.168.17.201(192.168.17.201:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 17:59:10 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:10 2013 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 17:59:10 2013 - [info] Checking slave configurations..
Tue Nov 19 17:59:10 2013 - [info] read_only=1 is not set on slave 192.168.17.200(192.168.17.200:3306).
Tue Nov 19 17:59:10 2013 - [warning] relay_log_purge=0 is not set on slave 192.168.17.200(192.168.17.200:3306).
Tue Nov 19 17:59:10 2013 - [info] read_only=1 is not set on slave 192.168.17.201(192.168.17.201:3306).
Tue Nov 19 17:59:10 2013 - [warning] relay_log_purge=0 is not set on slave 192.168.17.201(192.168.17.201:3306).
Tue Nov 19 17:59:10 2013 - [info] Checking replication filtering settings..
Tue Nov 19 17:59:10 2013 - [info] Replication filtering check ok.
Tue Nov 19 17:59:10 2013 - [info] Master is down!
Tue Nov 19 17:59:10 2013 - [info] Terminating monitoring script.
Tue Nov 19 17:59:10 2013 - [info] Got exit code 20 (Master dead).
Tue Nov 19 17:59:10 2013 - [info] MHA::MasterFailover version 0.54.
Tue Nov 19 17:59:10 2013 - [info] Starting master failover.
Tue Nov 19 17:59:10 2013 - [info]
Tue Nov 19 17:59:10 2013 - [info] * Phase 1: Configuration Check Phase..
Tue Nov 19 17:59:10 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] Dead Servers:
Tue Nov 19 17:59:11 2013 - [info] 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:11 2013 - [info] Checking master reachability via mysql(double check)..
Tue Nov 19 17:59:11 2013 - [info] ok.
Tue Nov 19 17:59:11 2013 - [info] Alive Servers:
Tue Nov 19 17:59:11 2013 - [info] 192.168.17.200(192.168.17.200:3306)
Tue Nov 19 17:59:11 2013 - [info] 192.168.17.201(192.168.17.201:3306)
Tue Nov 19 17:59:11 2013 - [info] Alive Slaves:
Tue Nov 19 17:59:11 2013 - [info] 192.168.17.200(192.168.17.200:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 17:59:11 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:11 2013 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 17:59:11 2013 - [info] 192.168.17.201(192.168.17.201:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 17:59:11 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:11 2013 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 17:59:11 2013 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] * Phase 2: Dead Master Shutdown Phase..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] Forcing shutdown so that applications never connect to the current master..
Tue Nov 19 17:59:11 2013 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master ip address.
Tue Nov 19 17:59:11 2013 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Tue Nov 19 17:59:11 2013 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] * Phase 3: Master Recovery Phase..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] The latest binary log file/position on all slaves is mysql-bin.000009:2386
Tue Nov 19 17:59:11 2013 - [info] Latest slaves (Slaves that received relay log files to the latest):
Tue Nov 19 17:59:11 2013 - [info] 192.168.17.200(192.168.17.200:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 17:59:11 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:11 2013 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 17:59:11 2013 - [info] 192.168.17.201(192.168.17.201:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 17:59:11 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:11 2013 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 17:59:11 2013 - [info] The oldest binary log file/position on all slaves is mysql-bin.000009:2386
Tue Nov 19 17:59:11 2013 - [info] Oldest slaves:
Tue Nov 19 17:59:11 2013 - [info] 192.168.17.200(192.168.17.200:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 17:59:11 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:11 2013 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 17:59:11 2013 - [info] 192.168.17.201(192.168.17.201:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 17:59:11 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:11 2013 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] Fetching dead master's binary logs..
Tue Nov 19 17:59:11 2013 - [info] Executing command on the dead master 192.168.17.199(192.168.17.199:3306): save_binary_logs --command=save --start_file=mysql-bin.000009 --start_pos=2386 --binlog_dir=/data/mydb/db01/logs/binlog/ --output_file=/var/tmp/saved_master_binlog_from_192.168.17.199_3306_20131119175910.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.54
Creating /var/tmp if not exists.. ok.
Concat binary/relay logs from mysql-bin.000009 pos 2386 to mysql-bin.000009 EOF into /var/tmp/saved_master_binlog_from_192.168.17.199_3306_20131119175910.binlog ..
Dumping binlog format description event, from position 0 to 107.. ok.
Dumping effective binlog data from /data/mydb/db01/logs/binlog//mysql-bin.000009 position 2386 to tail(2405).. ok.
sh: mysqlbinlog: command not found
Failed to save binary log: /var/tmp/saved_master_binlog_from_192.168.17.199_3306_20131119175910.binlog is broken!
at /usr/local/bin/save_binary_logs line 170
Tue Nov 19 17:59:11 2013 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln577] Failed to save binary log events from the orig master. Maybe disks on binary logs are not accessible or binary log itself is corrupt?
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] * Phase 3.3: Determining New Master Phase..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Tue Nov 19 17:59:11 2013 - [info] All slaves received relay logs to the same position. No need to resync each other.
Tue Nov 19 17:59:11 2013 - [info] Searching new master from slaves..
Tue Nov 19 17:59:11 2013 - [info] Candidate masters from the configuration file:
Tue Nov 19 17:59:11 2013 - [info] 192.168.17.200(192.168.17.200:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 17:59:11 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:11 2013 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 17:59:11 2013 - [info] 192.168.17.201(192.168.17.201:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 17:59:11 2013 - [info] Replicating from 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 17:59:11 2013 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 17:59:11 2013 - [info] Non-candidate masters:
Tue Nov 19 17:59:11 2013 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Tue Nov 19 17:59:11 2013 - [info] New master is 192.168.17.200(192.168.17.200:3306)
Tue Nov 19 17:59:11 2013 - [info] Starting master failover..
Tue Nov 19 17:59:11 2013 - [info]
From:
192.168.17.199 (current master)
+--192.168.17.200
+--192.168.17.201

To:
192.168.17.200 (new master)
+--192.168.17.201
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] * Phase 3.4: Master Log Apply Phase..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Tue Nov 19 17:59:11 2013 - [info] Starting recovery on 192.168.17.200(192.168.17.200:3306)..
Tue Nov 19 17:59:11 2013 - [info] This server has all relay logs. Waiting all logs to be applied..
Tue Nov 19 17:59:11 2013 - [info] done.
Tue Nov 19 17:59:11 2013 - [info] All relay logs were successfully applied.
Tue Nov 19 17:59:11 2013 - [info] Getting new master's binlog name and position..
Tue Nov 19 17:59:11 2013 - [info] mysql-bin.000008:2606
Tue Nov 19 17:59:11 2013 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.17.200', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000008', MASTER_LOG_POS=2606, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Tue Nov 19 17:59:11 2013 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address.
Tue Nov 19 17:59:11 2013 - [info] ** Finished master recovery successfully.
Tue Nov 19 17:59:11 2013 - [info] * Phase 3: Master Recovery Phase completed.
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] * Phase 4: Slaves Recovery Phase..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] -- Slave diff file generation on host 192.168.17.201(192.168.17.201:3306) started, pid: 38557. Check tmp log /masterha/app1/192.168.17.201_3306_20131119175910.log if it takes time..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] Log messages from 192.168.17.201 ...
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Tue Nov 19 17:59:11 2013 - [info] End of log messages from 192.168.17.201.
Tue Nov 19 17:59:11 2013 - [info] -- 192.168.17.201(192.168.17.201:3306) has the latest relay log events.
Tue Nov 19 17:59:11 2013 - [info] Generating relay diff files from the latest slave succeeded.
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] -- Slave recovery on host 192.168.17.201(192.168.17.201:3306) started, pid: 38559. Check tmp log /masterha/app1/192.168.17.201_3306_20131119175910.log if it takes time..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] Log messages from 192.168.17.201 ...
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] Starting recovery on 192.168.17.201(192.168.17.201:3306)..
Tue Nov 19 17:59:11 2013 - [info] This server has all relay logs. Waiting all logs to be applied..
Tue Nov 19 17:59:11 2013 - [info] done.
Tue Nov 19 17:59:11 2013 - [info] All relay logs were successfully applied.
Tue Nov 19 17:59:11 2013 - [info] Resetting slave 192.168.17.201(192.168.17.201:3306) and starting replication from the new master 192.168.17.200(192.168.17.200:3306)..
Tue Nov 19 17:59:11 2013 - [info] Executed CHANGE MASTER.
Tue Nov 19 17:59:11 2013 - [info] Slave started.
Tue Nov 19 17:59:11 2013 - [info] End of log messages from 192.168.17.201.
Tue Nov 19 17:59:11 2013 - [info] -- Slave recovery on host 192.168.17.201(192.168.17.201:3306) succeeded.
Tue Nov 19 17:59:11 2013 - [info] All new slave servers recovered successfully.
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] * Phase 5: New master cleanup phase..
Tue Nov 19 17:59:11 2013 - [info]
Tue Nov 19 17:59:11 2013 - [info] Resetting slave info on the new master..
Tue Nov 19 17:59:12 2013 - [info] 192.168.17.200: Resetting slave info succeeded.
Tue Nov 19 17:59:12 2013 - [info] Master failover to 192.168.17.200(192.168.17.200:3306) completed successfully.
Tue Nov 19 17:59:12 2013 - [info]

----- Failover Report -----

app1: MySQL Master failover 192.168.17.199 to 192.168.17.200 succeeded

Master 192.168.17.199 is down!

Check MHA Manager logs at rhel-king-01:/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
The latest slave 192.168.17.200(192.168.17.200:3306) has all relay logs for recovery.
Selected 192.168.17.200 as a new master.
192.168.17.200: OK: Applying all logs succeeded.
192.168.17.201: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.17.201: OK: Applying all logs succeeded. Slave started, replicating from 192.168.17.200.
192.168.17.200: Resetting slave info succeeded.
Master failover to 192.168.17.200(192.168.17.200:3306) completed successfully.

10.切换后旧master的修复及重新上线

master已经由192.168.17.199 3306 切到了192.168.17.200 3306 实际环境中数据是在不断的变化的，而在切换点mha没有记录当时新master的log-file和log-pos 所以要想直接启动192.168.17.199 3306 然后change master to 192.168.17.200 3306的话是不行的，只能对新主或slave2做一个全备然后再恢复再change。

另外，当执行切换后管理节点上的masterha_manager进程会自动stop，所以等修复好后要再次执行启动

[local]#nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log < /dev/null 2>&1 &
[2] 41276
[local]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:41276) is running(0:PING_OK), master:192.168.17.200

看详细日志

Tue Nov 19 20:52:38 2013 - [info] MHA::MasterMonitor version 0.54.
Tue Nov 19 20:52:38 2013 - [info] Dead Servers:
Tue Nov 19 20:52:38 2013 - [info] Alive Servers:
Tue Nov 19 20:52:38 2013 - [info] 192.168.17.199(192.168.17.199:3306)
Tue Nov 19 20:52:38 2013 - [info] 192.168.17.200(192.168.17.200:3306)
Tue Nov 19 20:52:38 2013 - [info] 192.168.17.201(192.168.17.201:3306)
Tue Nov 19 20:52:38 2013 - [info] Alive Slaves:
Tue Nov 19 20:52:38 2013 - [info] 192.168.17.199(192.168.17.199:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 20:52:38 2013 - [info] Replicating from 192.168.17.200(192.168.17.200:3306)
Tue Nov 19 20:52:38 2013 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 20:52:38 2013 - [info] 192.168.17.201(192.168.17.201:3306) Version=5.5.33-rel31.1-log (oldest major version between slaves) log-bin:enabled
Tue Nov 19 20:52:38 2013 - [info] Replicating from 192.168.17.200(192.168.17.200:3306)
Tue Nov 19 20:52:38 2013 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Nov 19 20:52:38 2013 - [info] Current Alive Master: 192.168.17.200(192.168.17.200:3306)
Tue Nov 19 20:52:38 2013 - [info] Checking slave configurations..
Tue Nov 19 20:52:38 2013 - [info] read_only=1 is not set on slave 192.168.17.199(192.168.17.199:3306).
Tue Nov 19 20:52:38 2013 - [warning] relay_log_purge=0 is not set on slave 192.168.17.199(192.168.17.199:3306).
Tue Nov 19 20:52:38 2013 - [info] read_only=1 is not set on slave 192.168.17.201(192.168.17.201:3306).
Tue Nov 19 20:52:38 2013 - [warning] relay_log_purge=0 is not set on slave 192.168.17.201(192.168.17.201:3306).
Tue Nov 19 20:52:38 2013 - [info] Checking replication filtering settings..
Tue Nov 19 20:52:38 2013 - [info] binlog_do_db= , binlog_ignore_db= information_schema.%,mysql.%
Tue Nov 19 20:52:38 2013 - [info] Replication filtering check ok.
Tue Nov 19 20:52:38 2013 - [info] Starting SSH connection tests..
Tue Nov 19 20:52:40 2013 - [info] All SSH connection tests passed successfully.
Tue Nov 19 20:52:40 2013 - [info] Checking MHA Node version..
Tue Nov 19 20:52:41 2013 - [info] Version check ok.
Tue Nov 19 20:52:41 2013 - [info] Checking SSH publickey authentication settings on the current master..
Tue Nov 19 20:52:41 2013 - [info] HealthCheck: SSH to 192.168.17.200 is reachable.
Tue Nov 19 20:52:41 2013 - [info] Master MHA Node version is 0.54.
Tue Nov 19 20:52:41 2013 - [info] Checking recovery script configurations on the current master..
Tue Nov 19 20:52:41 2013 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mydb/db01/logs/binlog/ --output_file=/var/tmp/save_binary_logs_test --manager_version=0.54 --start_file=mysql-bin.000008
Tue Nov 19 20:52:41 2013 - [info] Connecting to root@192.168.17.200(192.168.17.200)..
Creating /var/tmp if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mydb/db01/logs/binlog/, up to mysql-bin.000008
Tue Nov 19 20:52:42 2013 - [info] Master setting check done.
Tue Nov 19 20:52:42 2013 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Tue Nov 19 20:52:42 2013 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='king' --slave_host=192.168.17.199 --slave_ip=192.168.17.199 --slave_port=3306 --workdir=/var/tmp --target_version=5.5.33-rel31.1-log --manager_version=0.54 --relay_log_info=/data/mydb/db01/data/relay-log.info --relay_dir=/data/mydb/db01/data/ --slave_pass=xxx
Tue Nov 19 20:52:42 2013 - [info] Connecting to root@192.168.17.199(192.168.17.199:22)..
Checking slave recovery environment settings..
Opening /data/mydb/db01/data/relay-log.info ... ok.
Relay log found at /data/mydb/db01/data, up to relay-bin.000002
Temporary relay log file is /data/mydb/db01/data/relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Tue Nov 19 20:52:42 2013 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='king' --slave_host=192.168.17.201 --slave_ip=192.168.17.201 --slave_port=3306 --workdir=/var/tmp --target_version=5.5.33-rel31.1-log --manager_version=0.54 --relay_log_info=/data/mydb/db01/data/relay-log.info --relay_dir=/data/mydb/db01/data/ --slave_pass=xxx
Tue Nov 19 20:52:42 2013 - [info] Connecting to root@192.168.17.201(192.168.17.201:22)..
Checking slave recovery environment settings..
Opening /data/mydb/db01/data/relay-log.info ... ok.
Relay log found at /data/mydb/db01/data, up to relay-bin.000002
Temporary relay log file is /data/mydb/db01/data/relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Tue Nov 19 20:52:42 2013 - [info] Slaves settings check done.
Tue Nov 19 20:52:42 2013 - [info]
192.168.17.200 (current master)
+--192.168.17.199
+--192.168.17.201

Tue Nov 19 20:52:42 2013 - [warning] master_ip_failover_script is not defined.
Tue Nov 19 20:52:42 2013 - [warning] shutdown_script is not defined.
Tue Nov 19 20:52:42 2013 - [info] Set master ping interval 1 seconds.
Tue Nov 19 20:52:42 2013 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Tue Nov 19 20:52:42 2013 - [info] Starting ping health check on 192.168.17.200(192.168.17.200:3306)..
Tue Nov 19 20:52:42 2013 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

转载于:https://blog.51cto.com/kingbox/1329603

weixin_34376986

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
MySQL-MHA 安装配置及遇到的问题

Mysql-MHA 安装过程中遇到的问题 1.运行masterha_check_repl --conf=/etc/masterha/app1.cnf Can't exec "mysqlbinlog": No such file or directory at /usr/local/perl5/MHA/BinlogManager.pm line 99. 在node...
复制链接

扫一扫