注:阅读本文前请先参考一下mha官方站点的内容,之后安装和部署的过程遇到一些VIP切换的问题时可以参考本文。

一。环境

操作系统:Redhat 6.3

MySQL:MySQL 5.5

MHA:mha4mysql-manager-0.55,mha4mysql-node-0.54

perl:ActivePerl-5.16

二。安装

1.安装node(装有mysql实例的所有机器上安装,包括mha manager机器)

a.安装依赖的perl module

#安装cpan

#yum install cpan -y

依赖的包(可以通过下载的tar.gz包中的Makefile.PL找到)


YAML

DBD::mysql

依次安装如上moudle

命令参考如下(假设如上列表存在于文件/tmp/mha.list):

#for i in $(cat /tmp/mha.list);do yes| cpan $i;done

发布yes命令可以跳过无数次的yes确认

如果安装失败,可以尝试force安装.

a.下载和安装mha node

#cd /tmp/

#wget https://mysql-master-ha.googlecode.com/files/mha4mysql-node-0.54.tar.gz

#tar zxvf mha4mysql-node-0.54.tar.gz

#cd /tmp/mha4mysql-node-0.54

#perl Makefile.PL

#make

#make install


make install

Installing /usr/local/share/perl5/MHA/BinlogPosFinderXid.pm

Installing /usr/local/share/perl5/MHA/SlaveUtil.pm

Installing /usr/local/share/perl5/MHA/BinlogPosFindManager.pm

Installing /usr/local/share/perl5/MHA/BinlogManager.pm

Installing /usr/local/share/perl5/MHA/NodeUtil.pm

Installing /usr/local/share/perl5/MHA/BinlogHeaderParser.pm

Installing /usr/local/share/perl5/MHA/BinlogPosFinderElp.pm

Installing /usr/local/share/perl5/MHA/BinlogPosFinder.pm

Installing /usr/local/share/perl5/MHA/NodeConst.pm

Installing /usr/local/share/man/man1/filter_mysqlbinlog.1

Installing /usr/local/share/man/man1/save_binary_logs.1

Installing /usr/local/share/man/man1/apply_diff_relay_logs.1

Installing /usr/local/share/man/man1/purge_relay_logs.1

Installing /usr/local/bin/purge_relay_logs

Installing /usr/local/bin/save_binary_logs

Installing /usr/local/bin/filter_mysqlbinlog

Installing /usr/local/bin/apply_diff_relay_logs

Appending installation info to /usr/lib64/perl5/perllocal.pod


2.安装MHA Manager(只安装在管理机即可)

建议:安装在独立的服务器,不建议安装在线上MySQL 服务器,因为假设安装在线上的master的服务器上,该服务器的OS宕机,将不能执行迁移该maste实例.

a.安装依赖的perl module

#安装cpan

#yum install cpan -y

依赖的包(可以通过下载的tar.gz包中的Makefile.PL找到

YAML

DBI

DBD::mysql

Time::HiRes

Config::Tiny

Log::Dispatch

Parallel::ForkManager

MHA::NodeConst

依次安装如下上moudle

命令参考如下(假设如上列表存在于文件/tmp/mha.list):

#for i in $(cat /tmp/mha.list);do yes| cpan $i;done

发布yes命令可以跳过无数次的yes确认

如果安装失败,可以尝试force安装.

a.下载安装mha manager

#cd /tmp

#wget https://mysql-master-ha.googlecode.com/files/mha4mysql-manager-0.55.tar.gz

#tar zxvf mha4mysql-manager-0.55.tar.gz

#cd mha4mysql-manager-0.55

#perl Makefile.PL

# make

#make install

#make install

/usr/bin/perl "-Iinc" Makefile.PL --config= --installdeps=MHA::NodeConst,0

*** Installing dependencies...

*** Installing MHA::NodeConst...

CPAN: Storable loaded ok (v2.20)

Going to read '/root/.cpan/Metadata'

 Database was generated on Fri, 21 Mar 2014 05:17:02 GMT

*** Could not find a version 0 or above for MHA::NodeConst; skipping.

*** Module::AutoInstall installation finished.

Installing /usr/local/share/perl5/MHA/ManagerUtil.pm

Installing /usr/local/share/perl5/MHA/HealthCheck.pm

Installing /usr/local/share/perl5/MHA/Config.pm

Installing /usr/local/share/perl5/MHA/ServerManager.pm

Installing /usr/local/share/perl5/MHA/ManagerConst.pm

Installing /usr/local/share/perl5/MHA/FileStatus.pm

Installing /usr/local/share/perl5/MHA/ManagerAdmin.pm

Installing /usr/local/share/perl5/MHA/MasterFailover.pm

Installing /usr/local/share/perl5/MHA/ManagerAdminWrapper.pm

Installing /usr/local/share/perl5/MHA/MasterRotate.pm

Installing /usr/local/share/perl5/MHA/MasterMonitor.pm

Installing /usr/local/share/perl5/MHA/Server.pm

Installing /usr/local/share/perl5/MHA/SSHCheck.pm

Installing /usr/local/share/perl5/MHA/DBHelper.pm

Installing /usr/local/share/man/man1/masterha_check_repl.1

Installing /usr/local/share/man/man1/masterha_check_ssh.1

Installing /usr/local/share/man/man1/masterha_check_status.1

Installing /usr/local/share/man/man1/masterha_conf_host.1

Installing /usr/local/share/man/man1/masterha_manager.1

Installing /usr/local/share/man/man1/masterha_master_monitor.1

Installing /usr/local/share/man/man1/masterha_master_switch.1

Installing /usr/local/share/man/man1/masterha_secondary_check.1

Installing /usr/local/share/man/man1/masterha_stop.1

Installing /usr/local/bin/masterha_stop

Installing /usr/local/bin/masterha_conf_host

Installing /usr/local/bin/masterha_check_repl

Installing /usr/local/bin/masterha_check_status

Installing /usr/local/bin/masterha_master_monitor

Installing /usr/local/bin/masterha_check_ssh

Installing /usr/local/bin/masterha_master_switch

Installing /usr/local/bin/masterha_secondary_check

Installing /usr/local/bin/masterha_manager

Appending installation info to /usr/lib64/perl5/perllocal.pod


三.通过集成的perl脚本实现VIP漂移.

1.涉及到两个文件

master_ip_failover

master_ip_online_change

两个文件的配置保持相同,具体如下:


my $vip = '10.0.0.100/24';  # Virtual IP

my $key = "3306";

my $nic = "em1";

my $ssh_start_vip = "sudo /sbin/ifconfig $nic:$key $vip";

my $ssh_stop_vip = "sudo /sbin/ifconfig $nic:$key down";


10.0.0.100就是虚拟ip,分别在master宕机和手动切换master时漂移.

3306 可以自己喜好随意指定,只要和其它虚拟ip冲突即可.我的环境是一个服务器多台实例,所以会根据端口来作为虚拟ip的名字

em1就是网卡的设备号,通过ifconfig可以看到.

四。Running MHA Manager from daemontools

Currently MHA Manager process does not run as a daemon. If failover completed successfully or the master process was  by accident, the manager stops working. To run as a daemon, daemontool. or any external daemon program can be used. Here is an example to run from daemontools.

1. 安装daemontools

1).下载daemontools

wget ftp://ftp.pbone.net/mirror/ftp5.gwdg.de/pub/opensuse/repositories/utilities/RHEL_6/x86_64/daemontools-doc-0.76-2.2.x86_64.rpm

2). 

2.配置daemontools(参考http://www.linuxquestions.org/questions/linux-server-73/svc-warning-unable-to-control-service-qmail-smtpd-file-does-not-exist-948161/)


1)增加文件 /etc/init/svscan_test1.conf,如果有多个MHA集群,请增加多个文件:

start on runlevel [345]

respawn

exec masterha_manager --global_conf=/etc/mha/masterha_default.cnf --conf=/etc/mha/masterha_test1/conf/app.conf --wait_on_monitor_error=60 --w从工 on_failover_error=60 >> /var/log/masterha/test1/app.log 2>&1

2)运行如下linux命令

# initctl reload-configuration

# initctl start svscan_test1

如果没有按如下操作,将会报错如下

svc: warning: unable to control /service/masterha_test1: file does not exist

svc: warning: unable to control /service/masterha_test1: supervise not running


五.注意事项:

1.mha manager并会初始化虚拟ip,只会切换IP,所以请在开启mha manager前,在master增加VIP,具体命令如下:

#/sbin/ifconfig em1:3306 10.0.0.100/24

2.如果master以前是slave,并且slave线程停止掉了,请reset掉,不然启动manager时会报错,reset slave的命令如下:

mysql> reset slave all;


五.日志:

最后show 一下正常启动manager的log

Tue Mar 25 18:16:38 2014 - [info] MHA::MasterMonitor version 0.55.

Tue Mar 25 18:16:38 2014 - [info] Dead Servers:

Tue Mar 25 18:16:38 2014 - [info] Alive Servers:

Tue Mar 25 18:16:38 2014 - [info]   db3dg.example.com(10.0.0.177:3306)

Tue Mar 25 18:16:38 2014 - [info]   db4dg.example.com(10.0.0.178:3306)

Tue Mar 25 18:16:38 2014 - [info]   db5dg.example.com(10.0.0.179:3306)

Tue Mar 25 18:16:38 2014 - [info] Alive Slaves:

Tue Mar 25 18:16:38 2014 - [info]   db4dg.example.com(10.0.0.178:3306)  Version=5.5.31-log (oldest major version between slaves) log-bin:enabled

Tue Mar 25 18:16:38 2014 - [info]     Replicating from 10.0.0.177(10.0.0.177:3306)

Tue Mar 25 18:16:38 2014 - [info]     Primary candidate for the new Master (candidate_master is set)

Tue Mar 25 18:16:38 2014 - [info]   db5dg.example.com(10.0.0.179:3306)  Version=5.5.31-log (oldest major version between slaves) log-bin:enabled

Tue Mar 25 18:16:38 2014 - [info]     Replicating from 10.0.0.177(10.0.0.177:3306)

Tue Mar 25 18:16:38 2014 - [info]     Primary candidate for the new Master (candidate_master is set)

Tue Mar 25 18:16:38 2014 - [info] Current Alive Master: db3dg.example.com(10.0.0.177:3306)

Tue Mar 25 18:16:38 2014 - [info] Checking slave configurations..

Tue Mar 25 18:16:38 2014 - [info]  read_only=1 is not set on slave db4dg.example.com(10.0.0.178:3306).

Tue Mar 25 18:16:38 2014 - [warning]  relay_log_purge=0 is not set on slave db4dg.example.com(10.0.0.178:3306).

Tue Mar 25 18:16:38 2014 - [info]  read_only=1 is not set on slave db5dg.example.com(10.0.0.179:3306).

Tue Mar 25 18:16:38 2014 - [warning]  relay_log_purge=0 is not set on slave db5dg.example.com(10.0.0.179:3306).

Tue Mar 25 18:16:38 2014 - [info] Checking replication filtering settings..

Tue Mar 25 18:16:38 2014 - [info]  binlog_do_db= , binlog_ignore_db=

Tue Mar 25 18:16:38 2014 - [info]  Replication filtering check ok.

Tue Mar 25 18:16:38 2014 - [info] Starting SSH connection tests..

Tue Mar 25 18:16:40 2014 - [info] All SSH connection tests passed successfully.

Tue Mar 25 18:16:40 2014 - [info] Checking MHA Node version..

Tue Mar 25 18:16:40 2014 - [info]  Version check ok.

Tue Mar 25 18:16:40 2014 - [info] Checking SSH publickey authentication settings on the current master..

Tue Mar 25 18:16:40 2014 - [info] HealthCheck: SSH to db3dg.example.com is reachable.

Tue Mar 25 18:16:41 2014 - [info] Master MHA Node version is 0.54.

Tue Mar 25 18:16:41 2014 - [info] Checking recovery script configurations on the current master..

Tue Mar 25 18:16:41 2014 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/installed/mysql_offline_multi/data --output_file=/data/installed/mysql_offline_multi/masterha/save_binary_logs_test --manager_version=0.55 --start_file=mysql-bin.000159

Tue Mar 25 18:16:41 2014 - [info]   Connecting to root@db3dg.example.com(db3dg.example.com)..

Warning: Permanently added 'db3dg.example.com' (RSA) to the list of known hosts.

 Creating /data/installed/mysql_offline_multi/masterha if not exists..    ok.

 Checking output directory is accessible or not..

  ok.

 Binlog found at /data/installed/mysql_offline_multi/data, up to mysql-bin.000159

Tue Mar 25 18:16:41 2014 - [info] Master setting check done.

Tue Mar 25 18:16:41 2014 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..

Tue Mar 25 18:16:41 2014 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=db4dg.example.com --slave_ip=10.0.0.178 --slave_port=3306 --workdir=/data/installed/mysql_offline_multi/masterha --target_version=5.5.31-log --manager_version=0.55 --relay_log_info=/data/installed/mysql_offline_multi/data/relay-log.info  --relay_dir=/data/installed/mysql_offline_multi/data/  --slave_pass=xxx

Tue Mar 25 18:16:41 2014 - [info]   Connecting to root@10.0.0.178(db4dg.example.com:22)..

 Checking slave recovery environment settings..

   Opening /data/installed/mysql_offline_multi/data/relay-log.info ... ok.

   Relay log found at /data/installed/mysql_offline_multi/data, up to mysql-relay-bin.000447

   Temporary relay log file is /data/installed/mysql_offline_multi/data/mysql-relay-bin.000447

   Testing mysql connection and privileges.. done.

   Testing mysqlbinlog output.. done.

   Cleaning up test file(s).. done.

Tue Mar 25 18:16:41 2014 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=db5dg.example.com --slave_ip=10.0.0.179 --slave_port=3306 --workdir=/data/installed/mysql_offline_multi/masterha --target_version=5.5.31-log --manager_version=0.55 --relay_log_info=/data/installed/mysql_offline_multi/data/relay-log.info  --relay_dir=/data/installed/mysql_offline_multi/data/  --slave_pass=xxx

Tue Mar 25 18:16:41 2014 - [info]   Connecting to root@10.0.0.179(db5dg.example.com:22)..

 Checking slave recovery environment settings..

   Opening /data/installed/mysql_offline_multi/data/relay-log.info ... ok.

   Relay log found at /data/installed/mysql_offline_multi/data, up to mysql-relay-bin.000447

   Temporary relay log file is /data/installed/mysql_offline_multi/data/mysql-relay-bin.000447

   Testing mysql connection and privileges.. done.

   Testing mysqlbinlog output.. done.

   Cleaning up test file(s).. done.

Tue Mar 25 18:16:42 2014 - [info] Slaves settings check done.

Tue Mar 25 18:16:42 2014 - [info]

db3dg.example.com (current master)

+--db4dg.example.com

+--db5dg.example.com


Tue Mar 25 18:16:42 2014 - [info] Checking master_ip_failover_script status:

Tue Mar 25 18:16:42 2014 - [info]   /etc/mha/masterha_offline_multi/scripts/master_ip_failover --command=status --ssh_user=root --orig_master_host=db3dg.example.com --orig_master_ip=10.0.0.177 --orig_master_port=3306



IN SCRIPT TEST====sudo /sbin/ifconfig em1:3306 down==sudo /sbin/ifconfig em1:3306 10.0.0.210/24===


Checking the Status of the script.. OK

Tue Mar 25 18:16:42 2014 - [info]  OK.

Tue Mar 25 18:16:42 2014 - [warning] shutdown_script is not defined.

Tue Mar 25 18:16:42 2014 - [info] Set master ping interval 3 seconds.

Tue Mar 25 18:16:42 2014 - [info] Set secondary check script: masterha_secondary_check -s db3dg.example.com -s db4dg.example.com -s db5dg.example.com

Tue Mar 25 18:16:42 2014 - [info] Starting ping health check on db3dg.example.com(10.0.0.177:3306)..

Tue Mar 25 18:16:42 2014 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

六.其它配置:

如上笔记是我遇到问题的地方,记录了下来,从而可以形成可用的mha解决方案.

有关MHA的其它配置,请参考附件中我的project及mha 官方wiki.

我的project如下:

https://code.google.com/p/mha-config

如果有疑问和建议,请留言.

七。其它细节:

1.确保master及每个slave的配置文件及show global variables中关闭自动清除relay log 的功能。
relay_log_purge=0
自动清除,将如下行加入到/etc/crontab文件中,根据mysql实例端口进行更改。
0 5 * * * root echo "-----33-06----" >>/var/log/masterha/purge_relay_logs.log && /usr/local/bin/purge_relay_logs --host=127.0.0.1 --port=3306 --user=mha --password='abcd' --disable_relay_log_purge >> /var/log/masterha/purge_relay_logs.log 2>&1abc

2.配置文件masterha_default.cnf中的mysql用户密码部分不要含有特殊符号,否则mha manager将无法登录mysql进行failover,最恶心的是vip也无法漂移,但master/slave仍然能够执行相关操作,料想是scripts/master_ip_failover 这个脚本文件写得不够好。

如果无法漂移vip,/var/log/masterha/test1/app.log将会出现如下报错

Tue Apr  1 15:18:50 2014 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='db4dg.example.com or 10.47.7.178', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000189', MASTER_LOG_POS=547798028, MASTER_USER='repl', MASTER_PASSWORD='xxx';

Tue Apr  1 15:18:50 2014 - [info] Executing master IP activate script:

Tue Apr  1 15:18:50 2014 - [info]   /etc/mha/masterha_offline_multi/scripts/master_ip_failover --command=start --ssh_user=root --orig_master_host=db3dg.example.com --orig_master_ip=10.47.7.177 --orig_master_port=3306 --new_master_host=db4dg.example.com --new_master_ip=10.47.7.178 --new_master_port=3306 --new_master_user='mha' --new_master_password='Adyav3984\#'

DBI connect(';host=10.0.0.100;port=33306;mysql_connect_timeout=4','mha',...) failed: Access denied for user 'mha'@'10.47.7.217' (using password: YES) at /usr/local/share/perl5/MHA/DBHelper.pm line 181

at /etc/mha/masterha_offline_multi/scripts/master_ip_failover line 69



IN SCRIPT TEST====sudo /sbin/ifconfig em1:3306 down==sudo /sbin/ifconfig em1:3306 10.0.0100/24===


Tue Apr  1 15:18:50 2014 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln1262]  Failed to activate master IP address for db4dg.example.com with return code 10:0

3.在线切换命令,如下需求:

 1.原master作为slave运行且alive.

 2.非交互。

 3.端口非默认3306而是9306

切换命令

masterha_master_switch --orig_master_is_new_slave --master_state=alive  --global_conf=/etc/mha/masterha_default.cnf --conf=/etc/mha/masterha_test/conf/app.conf --interactive=0 --new_master_host=test1 --new_master_port=9306

注:1.如果MySQL端口不是默认端口,原来的master将无法Change到新的master.具体报错如下:

Mon Apr 14 16:33:48 2014 - [info]  Executed CHANGE MASTER.

Mon Apr 14 16:34:02 2014 - [error][/usr/local/share/perl5/MHA/Server.pm, ln744] Slave could not be started on test2(192.168.3.107:9306)! Check slave status.

Mon Apr 14 16:34:02 2014 - [error][/usr/local/share/perl5/MHA/Server.pm, ln817] Starting slave IO/SQL thread on test2(192.168.3.107:9306) failed!

Mon Apr 14 16:34:02 2014 - [error][/usr/local/share/perl5/MHA/MasterRotate.pm, ln561]  Failed!

Mon Apr 14 16:34:02 2014 - [error][/usr/local/share/perl5/MHA/MasterRotate.pm, ln590] Switching master to test1(192.168.3.12:9306) done, but switching slaves partially failed.


八。自动切换后需要处理的事宜:

  1.启动原来的master作为slave运行,Change的信息(比如binlog文件和position)可以查看mha的日志。

  2.调整监控,比如zabbix等slave监控。

  3.其它相关。


参考:

https://code.google.com/p/mysql-master-ha/wiki/Installation

http://cr.yp.to/daemontools.html