MHA安装配置

最新推荐文章于 2024-07-02 12:28:55 发布

donghaixiaolongwang

最新推荐文章于 2024-07-02 12:28:55 发布

阅读量989

点赞数

参看网址：http://www.cnblogs.com/gomysql/p/3675429.html

1部署MHA

1.1 规划

角色 ip地址主机名 server_id 类型

monitor 10.1.9.61 server01 - 监控

master 10.1.9.62 server02 1 写入

master_slave 10.1.9.63 server03 2 读

slave 10.1.9.64 server04 3 读

其中master对外提供写服务，备选master（实际的slave，主机名server03）提供读服务，slave也提供相关的读服务，一旦master宕机，将会把备选master提升为新的master，slave指向新的master

1.2 epel yum源安装，直接使用yum安装MHA node所依赖的perl模块

rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm ##安装epel的yum源

yum install perl-DBD-MySQL -y ##上边所有规划的机器都需要执行

1.3 在所有的节点安装mha node：注意monitor上边也是需要安装的。

下载 mha4mysql-node-0.53.tar.gz ##我这里已经下载到文件夹下

tar xf mha4mysql-node-0.53.tar.gz

cd mha4mysql-node-0.53

perl Makefile.PL

Can't locate CPAN.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/AutoInstall.pm line 279.

报错如上则需要安装：CPAN

CPAN-1.9304.tar.gz ##下载CPAN，目录下已经下载完毕

tar xf CPAN-1.9304.tar.gz ##

cd CPAN-1.9304

perl Makefile.PL

make

make install

make && make install

安装完成后会在/usr/local/bin目录下生成以下脚本文件：

-r-xr-xr-x 1 root root 15498 Apr 20 10:05 apply_diff_relay_logs-r-xr-xr-x1 root root 4807 Apr20 10:05 filter_mysqlbinlog-r-xr-xr-x1 root root 7401 Apr20 10:05 purge_relay_logs-r-xr-xr-x1 root root 7263 Apr20 10:05 save_binary_logs

1.4安装MHA Manager，只在monitor上安装即可

rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm ##epel的yum源

yum install perl-DBD-MySQLperl-Config-Tiny perl-Log-Dispatchperl-Parallel-ForkManager perl-Time-HiRes -y ##安装MHA Manager。首先安装MHA Manger依赖的perl模块（我这里使用yum安装）：

mha4mysql-manager-0.53.tar.gz ##下载mha4mysql-manager-0.53.tar.gz

tar xf mha4mysql-manager-0.53.tar.gz

cd mha4mysql-manager-0.53

perl Makefile.PL

make && make install

安装完成后会在/usr/local/bin目录下面生成以下脚本文件，前面已经说过这些脚本的作用，这里不再重复

apply_diff_relay_logs masterha_check_ssh masterha_manager masterha_secondary_check purge_relay_logs unrar

filter_mysqlbinlog masterha_check_status masterha_master_monitor masterha_stop rar

masterha_check_repl masterha_conf_host masterha_master_switch nload save_binary_logs

复制相关脚本到/usr/local/bin目录(软件包解压缩后就有了，不是必须，因为这些脚本不完整，需要自己修改，这是软件开发着留给我们自己发挥的,如果开启下面的任何一个脚本对应的参数，而对应这里的脚本又没有修改，则会抛错，自己被坑的很惨)

cd mha4mysql-manager-0.53/samples/scripts/

cp ./* /usr/local/bin/

1.5 配置SSH登录无密码验证。注意任何两个主机间都必须互相能够 ssh无密码登录才可以。

vim /etc/hosts ##不用dns解析

10.1.9.61 server01

10.1.9.62 server02

10.1.9.63 server03

10.1.9.64 server04

ssh-keygen -t rsa ##每台机器上都需要生成私钥和公钥

ssh-copy-id -i ~/.ssh/id_rsa.pub server## ##把公钥复制到其它服务器上

给个例子：注意每台机器都得复制到其它的机器上才可以。

for i in 10.1.9.61 10.1.9.62 10.1.9.63 ; do ssh-copy-id -i ~/.ssh/id_rsa.pub $i; done

1.6 搭建主从环境：

主从配置文件请自行解决（不同公司情况不一样），server-id必须不同、二进制日志文件必须打开。

注意：binlog-do-db 和 replicate-ignore-db 设置必须相同。 MHA 在启动时候会检测过滤规则，如果过滤规则不同，MHA 不启动监控和故障转移。

在master上（server02）执行备份:

mysqldump -h127.0.0.1 -uroot -p --master-data=2 --single-transaction -R --triggers -A > ./all.sql ##完全备份下主库

其中--master-data=2代表备份时刻记录master的Binlog位置和Position，--single-transaction意思是获取一致性快照，-R意思是备份存储过程和函数，--triggres的意思是备份触发器，-A代表备份所有的库。更多信息请自行mysqldump --help查看。

scp all.sql server03:/tmp

scp all.sql server04:/tmp

mysql -h127.0.0.1 -uroot -p < /tmp/all.sql ##在server03\server04上还原

在server02、server03、server04上的数据中都需要建立从库的复制账号。

grant replication slave on *.* to 'repl'@'10.1.9.%' identified by '123456';

flush privileges;

查看主库二进制文件名称和位置，用于从库复制主库

head -n 30 all.sql |grep 'CHANGE MASTER TO'

登录server03\server04的mysql服务。执行

CHANGE MASTER TO MASTER_HOST='192.168.0.50',MASTER_USER='repl', MASTER_PASSWORD='123456',MASTER_LOG_FILE='mysql-bin.000010',MASTER_LOG_POS=112;

start slave;

show slave status\G ##查看从库复制主库的状态。

1.7两台slave服务器设置read_only（从库对外提供读服务，只所以没有写进配置文件，是因为随时slave会提升为master）

set global read_only=1; ##登录到server03\server04的mysql服务执行即可。注意root用户仍然有写权限。

1.8创建监控用户，

登录到server02\server03\server04执行：注意都需要有这个账号。否则下次启动会有问题。

grant all privileges on *.* to root@’10.1.9.%’ identified by ‘123456’;

flush privileges;

1.9 配置MHA

创建MHA的工作目录，并且创建相关配置文件（在软件包解压后的目录里面有样例配置文件）。

mkdir -p /etc/masterha

cp mha4mysql-manager-0.53/samples/conf/app1.cnf /etc/masterha/

修改app1.cnf配置文件，修改后的文件内容如下（注意，配置文件中的注释需要去掉，我这里是为了解释清楚）：

[server default]

manager_log=/var/log/masterha/app1/manager.log #设置manager的日志

manager_workdir=/var/log/masterha/app1 #设置manager的工作目录

master_binlog_dir=/var/lib/mysql #设置master 保存binlog的位置，以便MHA可以找到master的日志，我这里的也就是mysql的数据目录

master_ip_failover_script=/usr/local/bin/master_ip_failover #设置自动failover时候的切换脚本

master_ip_online_change_script=/usr/local/bin/master_ip_online_change#//设置手动切换时候的切换脚本

password=123456 ##设置mysql中root用户的密码，这个密码是前文中创建监控用户的那个密码

ping_interval=5 #//设置监控主库，发送ping包的时间间隔，默认是3秒，尝试三次没有回应的时候自动进行railover

remote_workdir=/tmp #//设置远端mysql在发生切换时binlog的保存位置

repl_password=123456 //设置复制用户的密码

repl_user=repl //设置复制环境中的复制用户名

report_script=/usr/local/bin/send_report //设置发生切换后发送的报警的脚本

secondary_check_script=/usr/local/bin/masterha_secondary_check -s server03 -s server02

shutdown_script=""

ssh_user=root //设置ssh的登录用户名

user=root //设置监控用户root

[server1]

hostname=server02

port=3306

[server2]

candidate_master=1 //设置为候选master，如果设置该参数以后，发生主从切换以后将会将此从库提升为主库，即使这个主库不是集群中事件最新的slave

check_repl_delay=0 //默认情况下如果一个slave落后master 100M的relay logs的话，MHA将不会选择该slave作为一个新的master，因为对于这个slave的恢复需要花费很长时间，通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时，这个参数对于设置了candidate_master=1的主机非常有用，因为这个候选主在切换的过程中一定是新的master

hostname=server03

[server3]

hostname=server04

1.10设置relay log的清除方式（在每个slave节点上）：

mysql -e 'set global relay_log_purge=0'

注意：

MHA在发生切换的过程中，从库的恢复过程中依赖于relay log的相关信息，所以这里要将relay log的自动清除设置为OFF，采用手动清除relay log的方式。在默认情况下，从服务器上的中继日志会在SQL线程执行完毕后被自动删除。但是在MHA环境中，这些中继日志在恢复其他从服务器时可能会被用到，因此需要禁用中继日志的自动删除功能。定期清除中继日志需要考虑到复制延时的问题。在ext3的文件系统下，删除大的文件需要一定的时间，会导致严重的复制延时。为了避免复制延时，需要暂时为中继日志创建硬链接，因为在linux系统中通过硬链接删除大文件速度会很快。（在mysql数据库中，删除大表时，通常也采用建立硬链接的方式）

设置定期清理relay脚本（两台slave服务器）

[root@192.168.0.60 ~]#cat purge_relay_log.sh

#!/bin/bash

host=10.1.9.64 ##那台主机

user=root ##数据库账号

passwd=123456 ##数据库密码

port=3306

log_dir='/tmp/log' ##脚本工作日志存放位置

work_dir='/tmp' ##临时工作目录

purge='/usr/local/bin/purge_relay_logs'

if [ ! -d $log_dir ]

then

mkdir $log_dir -p

$purge --host=$host --user=$user --password=$passwd --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1

添加到crontab定期执行

[root@192.168.0.60 ~]# crontab -l

0 4 * * * /bin/bash /root/purge_relay_log.sh

purge_relay_logs脚本删除中继日志不会阻塞SQL线程。下面我们手动执行看看什么情况。

[root@192.168.0.60 ~]# purge_relay_logs --user=root --password=123456 --port=3306 -disable_relay_log_purge --workdir=/tmp 15:47:24: purge_relay_logs script started.

Found relay_log.info: /data/mysql/relay-log.info

Removing hard linked relay log files server03-relay-bin* under /data/.. done.

Current relay log file: /data/mysql/server03-relay-bin.000002

Archiving unused relay log files (up to /data/mysql/server03-relay-bin.000001) ...

Creating hard link for /data/mysql/server03-relay-bin.000001 under /data//server03-relay-bin.000001 .. ok.

Creating hard links for unused relay log files completed.

Executing SET GLOBAL relay_log_purge=1; FLUSH LOGS; sleeping a few seconds so that SQL thread can delete older relay log files (if it keeps up); SET GLOBAL relay_log_purge=0; .. ok.

Removing hard linked relay log files server03-relay-bin* under /data/.. done.2014-04-20 15:47:27: All relay log purging operations succeeded.

1.11检查SSH配置

masterha_check_ssh --conf=/etc/masterha/app1.cnf

检查结果：All SSH connection tests passed successfully.

1.12检查复制环境，

需要先把/etc/masterha/app1.cnf文件中#master_ip_failover_script= /usr/local/bin/master_ip_failover注释。

masterha_check_repl --conf=/etc/masterha/app1.cnf

1.13检查MHA Manager的状态

masterha_check_status --conf=/etc/masterha/app1.cnf

注意：如果正常，会显示"PING_OK"，否则会显示"NOT_RUNNING"，这代表MHA监控没有开启。

1.14 开启MHA Manager监控

nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null >>/var/log/masterha/app1/manager.log2>&1 &

启动参数介绍：

--remove_dead_master_conf 该参数代表当发生主从切换后，老的主库的ip将会从配置文件中移除。

--manger_log 日志存放位置

--ignore_last_failover 在缺省情况下，如果MHA检测到连续发生宕机，且两次宕机间隔不足8小时的话，则不会进行Failover，之所以这样限制是为了避免ping-pong效应。该参数代表忽略上次MHA触发切换产生的文件，默认情况下，MHA发生切换后会在日志目录，也就是上面我设置的/data产生app1.failover.complete文件，下次再次切换的时候如果发现该目录下存在该文件将不允许触发切换，除非在第一次切换后收到删除该文件，为了方便，这里设置为--ignore_last_failover。

检测是否已经开启了

masterha_check_status --conf=/etc/masterha/app1.cnf

1.15查看启动日志

tail -n20 /var/log/masterha/app1/manager.log

1.16关闭MHA Manage监控

masterha_stop --conf=/etc/masterha/app1.cnf

1.17配置VIP

通过脚本的方式管理VIP。这里是修改/usr/local/bin/master_ip_failover，也可以使用其他的语言完成，比如php语言。使用php脚本编写的failover这里就不介绍了。修改完成后内容如下，而且如果使用脚本管理vip的话，需要手动在master服务器上绑定一个vip

master上（server02）

ifconfig eth1:1 10.1.9.65/24

#在server01

vim /usr/local/bin/master_ip_failover

#!/usr/bin/env perl

use strict;

use warnings FATAL => 'all';

use Getopt::Long;

my (

$command, $ssh_user, $orig_master_host, $orig_master_ip,

$orig_master_port, $new_master_host, $new_master_ip, $new_master_port

);

my $vip = '10.1.9.65/24';

my $key = '1';

my $ssh_start_vip = "/sbin/ifconfig eth1:$key $vip";

my $ssh_stop_vip = "/sbin/ifconfig eth1:$key down";

GetOptions(

'command=s' => \$command,

'ssh_user=s' => \$ssh_user,

'orig_master_host=s' => \$orig_master_host,

'orig_master_ip=s' => \$orig_master_ip,

'orig_master_port=i' => \$orig_master_port,

'new_master_host=s' => \$new_master_host,

'new_master_ip=s' => \$new_master_ip,

'new_master_port=i' => \$new_master_port,

);

exit &main();

sub main {

print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

if ( $command eq "stop" || $command eq "stopssh" ) {

my $exit_code = 1;

eval {

print "Disabling the VIP on old master: $orig_master_host \n";

&stop_vip();

$exit_code = 0;

};

if ($@) {

warn "Got Error: $@\n";

exit $exit_code;

}

exit $exit_code;

}

elsif ( $command eq "start" ) {

my $exit_code = 10;

eval {

print "Enabling the VIP - $vip on the new master - $new_master_host \n";

&start_vip();

$exit_code = 0;

};

if ($@) {

warn $@;

exit $exit_code;

}

exit $exit_code;

}

elsif ( $command eq "status" ) {

print "Checking the Status of the script.. OK \n";

exit 0;

}

else {

&usage();

exit 1;

}

sub start_vip() {

`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;

}

sub stop_vip() {

return 0 unless ($ssh_user);

`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;

}

sub usage {

print

"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";

}

再检查下配置文件是否有问题，这里将里边的master_ip_failover_script= /usr/local/bin/master_ip_failover打开。

masterha_check_repl --conf=/etc/masterha/app1.cnf

1.18再次开启监控：

nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null >> /var/log/masterha/app1/manager.log2>&1 &

1.19关闭master主库，看看是否会进行切换即可。

1.20 修复宕机的Master

通常情况下自动切换以后，原master可能已经废弃掉，待原master主机修复后，如果数据完整的情况下，可能想把原来master重新作为新主库的slave，这时我们可以借助当时自动切换时刻的MHA日志来完成对原master的修复。下面是提取相关日志的命令：

[root@192.168.0.20 app1]#grep -i "All other slaves should start" manager.log

Mon Apr 21 22:28:33 2014 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.0.60', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000022', MASTER_LOG_POS=506716, MASTER_USER='repl', MASTER_PASSWORD='xxx';

1.21 邮件报警脚本 vim /usr/local/bin/send_report

#!/usr/bin/perl

# This program is free software; you can redistribute it and/or modify

# it under the terms of the GNU General Public License as published by

# the Free Software Foundation; either version 2 of the License, or

# (at your option) any later version.

# This program is distributed in the hope that it will be useful,

# but WITHOUT ANY WARRANTY; without even the implied warranty of

# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

# GNU General Public License for more details.

# You should have received a copy of the GNU General Public License

# along with this program; if not, write to the Free Software

# Foundation, Inc.,

# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;

use warnings FATAL => 'all';

use Mail::Sender;

use Getopt::Long;

#new_master_host and new_slave_hosts are set only when recovering master succeeded

my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );

my $smtp='smtp.qq.com';

my $mail_from='1250134974@qq.com';

my $mail_user='1250134974';

my $mail_pass='**********';

my $mail_to=['18210967234@163.com']; ##多个用户用逗号隔开。

GetOptions(

'orig_master_host=s' => \$dead_master_host,

'new_master_host=s' => \$new_master_host,

'new_slave_hosts=s' => \$new_slave_hosts,

'subject=s' => \$subject,

'body=s' => \$body,

);

mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);

sub mailToContacts {

my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;

open my $DEBUG, "> /tmp/monitormail.log"

or die "Can't open the debug file:$!\n";

my $sender = new Mail::Sender {

ctype => 'text/plain; charset=utf-8',

encoding => 'utf-8',

smtp => $smtp,

from => $mail_from,

auth => 'LOGIN',

TLS_allowed => '0',

authid => $user,

authpwd => $passwd,

to => $mail_to,

subject => $subject,

debug => $DEBUG

};

$sender->MailMsg(

{ msg => $msg,

debug => $DEBUG

}

) or print $Mail::Sender::Error;

return 1;

}

# Do whatever you want here

exit 0;

1.22 在线切换

在许多情况下，需要将现有的主服务器迁移到另外一台服务器上。比如主服务器硬件故障，RAID 控制卡需要重建，将主服务器移到性能更好的服务器上等等。维护主服务器引起性能下降，导致停机时间至少无法写入数据。另外，阻塞或杀掉当前运行的会话会导致主主之间数据不一致的问题发生。 MHA 提供快速切换和优雅的阻塞写入，这个切换过程只需要 0.5-2s 的时间，这段时间内数据是无法写入的。在很多情况下，0.5-2s 的阻塞写入是可以接受的。因此切换主服务器不需要计划分配维护时间窗口。

MHA在线切换的大概过程：
1.检测复制设置和确定当前主服务器
2.确定新的主服务器
3.阻塞写入到当前主服务器
4.等待所有从服务器赶上复制
5.授予写入到新的主服务器
6.重新设置从服务器

为了保证数据完全一致性，在最快的时间内完成切换，MHA的在线切换必须满足以下条件才会切换成功，否则会切换失败。

1.所有slave的IO线程都在运行

2.所有slave的SQL线程都在运行

3.所有的show slave status的输出中Seconds_Behind_Master参数小于或者等于running_updates_limit秒，如果在切换过程中不指定running_updates_limit,那么默认情况下running_updates_limit为1秒。

4.在master端，通过show processlist输出，没有一个更新花费的时间大于running_updates_limit秒。

在线切换步骤如下：

首先，停掉MHA监控：

[root@192.168.0.20 ~]# masterha_stop --conf=/etc/masterha/app1.cnf

在线切换命令：

[root@192.168.0.20 ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=server03 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000

注意：server03为新主的主机名。不能使用ip地址。否则切换失败。最好把切换完后的日志保存起来。最起码也要记下切换过程中的 CHANGE MASTER TO MASTER_HOST='server03 or 192.168.99.162', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-b.000009', MASTER_LOG_POS=120, MASTER_USER='repl', MASTER_PASSWORD='xxx';

1.23 手动Failover（MHA Manager必须没有运行）。并且MHA当前主已经出故障停掉了（模拟的话关掉当前主mysql服务）

手动failover，这种场景意味着在业务上没有启用MHA自动切换功能，当主服务器故障时，人工手动调用MHA来进行故障切换操作，具体命令如下：

注意：如果，MHA manager检测到没有dead的server，将报错，并结束failover：

Mon Apr 21 21:23:33 2014 - [info] Dead Servers:

Mon Apr 21 21:23:33 2014 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln181] None of server is dead. Stop failover.

Mon Apr 21 21:23:33 2014 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln178] Got ERROR: at /usr/local/bin/masterha_master_switch line 53

进行手动切换命令如下：

[root@192.168.0.20 ~]# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=server03 --dead_master_port=3306 --new_master_host=server02 --new_master_port=3306 --ignore_last_failover

注意server03\server02为相对应的主机名字。使用ip会切换失败。

###上边用脚本的方式切换vip已经实现。下边给出keepalived方式切换vip

附加：keepalived方式实现vip漂移

（1）下载软件进行并进行安装（两台master，准确的说一台是master，另外一台是备选master，在没有切换以前是slave）：

[root@192.168.0.50 ~]#wget http://www.keepalived.org/software/keepalived-1.2.12.tar.gz

tar xf keepalived-1.2.12.tar.gz

cd keepalived-1.2.12

./configure --prefix=/usr/local/keepalivedmake && make installcp /usr/local/keepalived/etc/rc.d/init.d/keepalived /etc/init.d/cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/mkdir /etc/keepalivedcp /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/cp /usr/local/keepalived/sbin/keepalived /usr/sbin/

（2）配置keepalived的配置文件（绿色字体为配置文件），在master上配置。里边的注释去掉。

! Configuration File for keepalived

global_defs {

notification_email {

saltstack@163.com ##接收邮件的账号

}

notification_email_from dba@dbserver.com ##发送人

smtp_server 127.0.0.1 ##smtp服务器

smtp_connect_timeout 30

router_id MySQL-HA ##两个keepalived服务保持一致

}

vrrp_instance VI_1 {

state BACKUP ##工作模式为backup

interface eth1 ##ip地址将会配置到的设备上

virtual_router_id 51

priority 150 ##优先级，两个keepalived服务优先级不一样就行。

advert_int 1

nopreempt

authentication {

auth_type PASS ##keepalived服务之间传递信息认证方式，简单字符认证

auth_pass 1111 ##认证密码

}

virtual_ipaddress { ##下边就是将会给主keepalived（MHA的master服务器）配置的ip地址。一个或者多个都可以。

192.168.99.231

192.168.99.232

192.168.99.233

}

在候选master上配置

! Configuration File for keepalived

global_defs {

notification_email {

saltstack@163.com

}

notification_email_from dba@dbserver.com

smtp_server 127.0.0.1

smtp_connect_timeout 30

router_id MySQL-HA

}

vrrp_instance VI_1 {

state BACKUP

interface eth1

virtual_router_id 51

priority 120

advert_int 1

nopreempt

authentication {

auth_type PASS

auth_pass 1111

}

virtual_ipaddress {

192.168.99.231

192.168.99.232

192.168.99.233

}

（3）启动keepalived服务，在master上启动并查看日志

/etc/init.d/keepalived start

Starting keepalived: [ OK ]

[root@192.168.0.50 ~]#tail -f /var/log/messages

Apr 20 20:22:16 192 Keepalived_healthcheckers[15334]: Opening file '/etc/keepalived/keepalived.conf'.

Apr 20 20:22:16 192 Keepalived_healthcheckers[15334]: Configuration is using : 7231 Bytes

Apr 20 20:22:16 192 kernel: IPVS: Connection hash table configured (size=4096, memory=64Kbytes)

Apr 20 20:22:16 192 kernel: IPVS: ipvs loaded.

Apr 20 20:22:16 192 Keepalived_healthcheckers[15334]: Using LinkWatch kernel netlink reflector...

Apr 20 20:22:19 192 Keepalived_vrrp[15335]: VRRP_Instance(VI_1) Transition to MASTER STATE

Apr 20 20:22:20 192 Keepalived_vrrp[15335]: VRRP_Instance(VI_1) Entering MASTER STATE

Apr 20 20:22:20 192 Keepalived_vrrp[15335]: VRRP_Instance(VI_1) setting protocol VIPs.

Apr 20 20:22:20 192 Keepalived_vrrp[15335]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 192.168.0.88

Apr 20 20:22:20 192 Keepalived_healthcheckers[15334]: Netlink reflector reports IP 192.168.0.88 added

Apr 20 20:22:25 192 Keepalived_vrrp[15335]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 192.168.0.88

（4）在另外一台服务器，候选master上启动keepalived服务，并观察

[root@192.168.0.60 ~]# /etc/init.d/keepalived start ;tail -f /var/log/messages

Starting keepalived: [ OK ]

Apr 20 20:26:18 192 Keepalived_vrrp[9472]: Registering gratuitous ARP shared channel

Apr 20 20:26:18 192 Keepalived_vrrp[9472]: Opening file '/etc/keepalived/keepalived.conf'.

Apr 20 20:26:18 192 Keepalived_vrrp[9472]: Configuration is using : 62976 Bytes

Apr 20 20:26:18 192 Keepalived_vrrp[9472]: Using LinkWatch kernel netlink reflector...

Apr 20 20:26:18 192 Keepalived_vrrp[9472]:VRRP_Instance(VI_1) Entering BACKUP STATEApr20 20:26:18 192 Keepalived_vrrp[9472]: VRRP sockpool: [ifindex(3), proto(112), unicast(0), fd(10,11)]

Apr 20 20:26:18 192 Keepalived_healthcheckers[9471]: Netlink reflector reports IP 192.168.80.138 added

Apr 20 20:26:18 192 Keepalived_healthcheckers[9471]: Netlink reflector reports IP 192.168.0.60 added

Apr 20 20:26:18 192 Keepalived_healthcheckers[9471]: Netlink reflector reports IP fe80::20c:29ff:fe9d:6a9e added

Apr 20 20:26:18 192 Keepalived_healthcheckers[9471]: Netlink reflector reports IP fe80::20c:29ff:fe9d:6aa8 added

Apr 20 20:26:18 192 Keepalived_healthcheckers[9471]: Registering Kernel netlink reflector

Apr 20 20:26:18 192 Keepalived_healthcheckers[9471]: Registering Kernel netlink command channel

Apr 20 20:26:18 192 Keepalived_healthcheckers[9471]: Opening file '/etc/keepalived/keepalived.conf'.

Apr 20 20:26:18 192 Keepalived_healthcheckers[9471]: Configuration is using : 7231 Bytes

Apr 20 20:26:18 192 kernel: IPVS: Registered protocols (TCP, UDP, AH, ESP)

Apr 20 20:26:18 192 kernel: IPVS: Connection hash table configured (size=4096, memory=64Kbytes)

Apr 20 20:26:18 192 kernel: IPVS: ipvs loaded.

Apr 20 20:26:18 192 Keepalived_healthcheckers[9471]: Using LinkWatch kernel netlink reflector...

注意：

上面两台服务器的keepalived都设置为了BACKUP模式，在keepalived中2种模式，分别是master->backup模式和backup->backup模式。这两种模式有很大区别。在master->backup模式下，一旦主库宕机，虚拟ip会自动漂移到从库，当主库修复后，keepalived启动后，还会把虚拟ip抢占过来，即使设置了非抢占模式（nopreempt）抢占ip的动作也会发生。在backup->backup模式下，当主库宕机后虚拟ip会自动漂移到从库上，当原主库恢复和keepalived服务启动后，并不会抢占新主的虚拟ip，即使是优先级高于从库的优先级别，也不会发生抢占。为了减少ip漂移次数，通常是把修复好的主库当做新的备库。

（5）MHA引入keepalived（MySQL服务进程挂掉时通过MHA 停止keepalived）:

要想把keepalived服务引入MHA，我们只需要修改切换是触发的脚本文件master_ip_failover即可，在该脚本中添加在master发生宕机时对keepalived的处理。

编辑脚本/usr/local/bin/master_ip_failover，修改后如下

在MHA Manager修改脚本修改后的内容如下：

#!/usr/bin/env perl

use strict;use warnings FATAL => 'all';

use Getopt::Long;

my (

$command, $ssh_user, $orig_master_host, $orig_master_ip,

$orig_master_port, $new_master_host, $new_master_ip, $new_master_port

);

my $vip = '192.168.0.88';

my $ssh_start_vip = "/etc/init.d/keepalived start";

my $ssh_stop_vip = "/etc/init.d/keepalived stop";

GetOptions(

'command=s' => \$command,

'ssh_user=s' => \$ssh_user,

'orig_master_host=s' => \$orig_master_host,

'orig_master_ip=s' => \$orig_master_ip,

'orig_master_port=i' => \$orig_master_port,

'new_master_host=s' => \$new_master_host,

'new_master_ip=s' => \$new_master_ip,

'new_master_port=i' => \$new_master_port,

);

exit &main();

sub main {

print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

if ( $command eq "stop" || $command eq"stopssh" ) {

my $exit_code = 1;

eval {

print "Disabling the VIP on old master: $orig_master_host \n";

&stop_vip();

$exit_code = 0;

};

if ($@) {

warn "Got Error: $@\n";

exit $exit_code;

}

exit $exit_code;

}

elsif ( $command eq "start" ) {

my $exit_code = 10;

eval {

print "Enabling the VIP - $vip on the new master - $new_master_host \n";

&start_vip();

$exit_code = 0;

};

if ($@) {

warn $@;

exit $exit_code;

}

exit $exit_code;

}

elsif ( $command eq "status" ) {

print "Checking the Status of the script.. OK \n";

#`ssh $ssh_user\@cluster1 \" $ssh_start_vip \"`;

exit 0;

}

else {

&usage();

exit 1;

}

# A simple system call that enable the VIP on the new mastersub start_vip() {

`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;

}

# A simple system call that disable the VIP on the old_master

sub stop_vip() {
return 0 聽unless 聽($ssh_user);

`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;

}

sub usage {

print

"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";

}

（6）在monitor上检查集群状态，看是否还报错。

masterha_check_repl --conf=/etc/masterha/app1.cnf

（7）在monitor上启动MHA manager

nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null >> /var/log/masterha/app1/manager.log2>&1 &

(8)查看是否正常运行

masterha_check_status --conf=/etc/masterha/app1.cnf

（9）测试当模拟主master宕机后，keepalived是否会被MHA调用，将主master上的vip漂移到备主。

在主master上关闭mysqld服务:service mysqld stop

(10)宕机主机的恢复和上边一样。先回复mysqld服务。没问题后，这台修好的服务器会直接用作从服务器。之后再次开启keepalived。再次开启MHA的manager服务恢复正常状态。

步骤：修复后service mysqld start 重启mysql服务、在MHA的manager上查找新主的二进制日志和二进制日志位置用来指向新主 grep -i"All other slaves should start" manager.log 、启动keepalived服务（旧主上）、启动MHA的manager服务（注意日志最好使用追加方式）。具体命令上边有。