author:skate
time:2012/07/12
双master+heartbeat实现自动切换
heartbeat主要是主机故障切换,服务故障不切换,如要服务故障切换就需要自己写脚本检测服务的状态,如果服务异常则调用heartbeat切换脚本完成切换
环境:
os:rht5.2
mysql:percona5.5
heartbeat:2.1.3
1.首先在双机上安装mysql软件
参考:http://blog.csdn.net/wyzxg/article/details/7695663
master1.skate.com的my.cnf添加如下配置
server-id = 1
log_bin=/data/mysql/binlog/master_binlog.log
binlog-do-db=skate
log_slave_updates=1
auto_increment_increment=2
auto_increment_offset=1
binlog_format=mixed
expire_logs_days=7
master2.skate.com的my.cnf添加如下配置
server-id = 2
log_bin=/data/mysql/binlog/master_binlog.log
binlog-do-db=skate
log_slave_updates=1
auto_increment_increment=2
auto_increment_offset=1
binlog_format=mixed
expire_logs_days=7
创建复制帐户
mysql>GRANT REPLICATION SLAVE ON *.* TO 'rep'@'%' IDENTIFIED BY 'rep';
2.然后在线创建slave
参考:http://www.percona.com/doc/percona-xtrabackup/howtos/setting_up_replication.html
3.创建好slave后,检测其正常;
为了保证数据一致性,首先在slave上运行
mysql> FLUSH TABLES WITH READ LOCK; //这个会阻塞slave的同步
mysql> show master status;
4.
因为这个时候slave全库锁住,不会被更新,然后在master运行
mysql> stop slave;
mysql> CHANGE MASTER TO
-> MASTER_HOST='master2.skate.com',
-> MASTER_USER='rep',
-> MASTER_PASSWORD='rep',
-> MASTER_PORT=3306,
-> MASTER_LOG_FILE='master-bin.001', //在slave看到信息
-> MASTER_LOG_POS=4; //在slave看到信息
mysql> start slave;
mysql> show slave status\G;
最后在slave山解锁
mysql> unlock tables;
到目前为止 双master以及配置完,下面配置heartbeat实现故障自动切换
环境说明
vip:192.168.211.163
master1:eth0/192.168.211.127 对外
eth1/172.16.0.11 心跳
master2:192.168.211.199 对外
eth1/172.16.0.12 心跳
需要关注修改如下文件:
/etc/hosts
/etc/host.conf
/etc/resolv.conf
/etc/sysconfig/network
/etc/sysconfig/network-scripts/ifcfg-eth0
/etc/sysconfig/network-scripts/ifcfg-eth1
master1的hosts
[root@master1 mysql]# more /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 master1.skate.com master1 loalhost.localdomain localhost
192.168.211.127 master1.skate.com
192.168.211.199 master2.skate.com
::1 localhost6.localdomain6 localhost6
master2的hosts
[root@master2 ~]# more /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 master2.skate.com master2 loalhost.localdomain localhost
192.168.211.127 master1.skate.com
192.168.211.199 master2.skate.com
::1 localhost6.localdomain6 localhost6
master1和master2的host.conf
# more /etc/host.conf
order hosts,bind
master1和master2的resolv.conf
# more /etc/resolv.conf
nameserver 202.106.0.20
nameserver 202.106.196.115
search localhost
master1的network
[root@master1 mysql]# more /etc/sysconfig/network
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=master1.skate.com
GATEWAY="192.168.211.1"
GATEWAY="eth0" //网关使用的网卡
ONBOOT=YES
FORWARD_IPV4="yes"
master2的network
[root@master2 ~]# more /etc/sysconfig/network
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=master2.skate.com
GATEWAY="192.168.211.1"
GATEWAY="eth0" //网关使用的网卡
ONBOOT=YES
FORWARD_IPV4="yes"
master1的ifcfg-eth0
[root@master1 mysql]# more /etc/sysconfig/network-scripts/ifcfg-eth0
# Intel Corporation 82540EM Gigabit Ethernet Controller
DEVICE=eth0
ONBOOT=yes
BOOTPROTO=none
HWADDR=08:00:27:3e:3e:09
NETMASK=255.255.255.0
IPADDR=192.168.211.127
GATEWAY=192.168.211.1
TYPE=Ethernet
[root@master1 mysql]# more /etc/sysconfig/network-scripts/ifcfg-eth1
# Intel Corporation 82540EM Gigabit Ethernet Controller
DEVICE=eth1
ONBOOT=yes
BOOTPROTO=none
HWADDR=08:00:27:9a:96:11
TYPE=Ethernet
NETMASK=255.255.255.0
IPADDR=172.16.0.11
USERCTL=no
IPV6INIT=no
PEERDNS=yes
master2的ifcfg-eth0
[root@master2 ~]# more /etc/sysconfig/network-scripts/ifcfg-eth0
# Intel Corporation 82540EM Gigabit Ethernet Controller
DEVICE=eth0
BOOTPROTO=none
HWADDR=08:00:27:a8:84:fc
ONBOOT=yes
TYPE=Ethernet
USERCTL=no
IPV6INIT=no
PEERDNS=yes
NETMASK=255.255.255.0
IPADDR=192.168.211.199
GATEWAY=192.168.211.1
[root@master2 ~]# more /etc/sysconfig/network-scripts/ifcfg-eth1
# Intel Corporation 82540EM Gigabit Ethernet Controller
DEVICE=eth1
ONBOOT=yes
BOOTPROTO=none
HWADDR=08:00:27:38:98:26
NETMASK=255.255.255.0
IPADDR=172.16.0.12
TYPE=Ethernet
USERCTL=no
IPV6INIT=no
PEERDNS=yes
检测各自的主机名;ping对方的机器名称
[root@master2 ~]# uname -n
master2.skate.com
[root@master2 ~]# ping master1.skate.com
PING master1.skate.com (192.168.211.127) 56(84) bytes of data.
64 bytes from master1.skate.com (192.168.211.127): icmp_seq=1 ttl=64 time=0.551 ms
主机已经配置完,下面开始安装heartbeat和依赖包
1.
master1和master2分别安装
# yum install heartbeat
# yum install ipvsadm
# yum install libnet
heartbeat有三个配置文件:
— ha.cf
— authkyes
— haresources
[root@master1 mysql]# cd /usr/share/doc/heartbeat-2.1.3/
[root@master1 heartbeat-2.1.3]# cp ha.cf /etc/ha.d/
[root@master1 heartbeat-2.1.3]# cp haresources /etc/ha.d/
[root@master1 heartbeat-2.1.3]# cp authkeys /etc/ha.d/
2.
首先配置ha.cf(两个节点一样的)
[root@master1 ha.d]# pwd
/etc/ha.d
[root@master1 ha.d]# more ha.cf
logfile /var/log/ha-log #ha的日志文件记录位置。如没有该目录,则需要手动添加
logfacility local0 #这个是设置heartbeat的日志,这里是用的系统日志
keepalive 2 #多长时间检测一次,设定心跳(监测)时间时间为2秒
warntime 4 #连续多长时间联系不上后开始警告提示
deadtime 20 #连续多长时间联系不上后认为对方挂掉了(单位是秒)
initdead 60 #这里主要是给重启后预留的一段忽略时间段(比如:重启后启动网络等,如果在网络还没有通,keepalive检测肯定通不过,但这时候并不能切换)
#采用eth1的udp广播用来发送心跳信息
#bcast eth1
#采用网卡eth1的udp单播来通知心跳,ip应为对方IP,建议采用单播。当一个网段有多台这样cluster话,则一定要采用单播,否则每组cluster都会看到对方的节点,从而报错。
#ucast eth1 172.16.0.12
##使用udp端口694 进行心跳监测
udpport 694
auto_failback off #恢复正常后是否需要再自动切换回来,一般都设为off。
##节点1的HOSTNAME,必须要与 uname -n 指令得到的结果一致。
node master1.skate.com
##节点2的HOSTNAME
node master1.skate.com
##通过ping 网关来监测心跳是否正常
ping 172.16.0.12
hopfudge 1
#ping确定节点dead时间
deadping 10
#指定和heartbeat一起启动、关闭的进程
respawn hacluster /usr/lib/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster
#是否采用v2 style模式,在三节点以上时一定要打开
#crm on
3.
编辑双机互联验证文件:authkeys
[root@master1 ha.d]# more authkeys
auth 1
1 crc
[root@master1 ha.d]# chmod 600 authkeys //authkeys的权限一定要是600
4.
编辑集群资源文件:haresources (切换时备机需要做的事情)
[root@master1 ha.d]# more haresources
#node-name resource1 resource2 ... resourceN
master1.skate.com 192.168.211.163 test
#其中,192.168.211.163为VIP,这里test这个脚本一定需要,没有这个脚本,heartbeat是无法运行的,脚本完成切换所需要的动作。因为我这个是双master结构的,切换时只需要切换vip即可
[root@master1 ha.d]# more /etc/init.d/test
#!/bin/bash
echo "" $ > /dev/null
如果是master/slave结构的,这test脚本就要完成在start时将slave变成master。在stop时将master变成slave
建议还是采用heartbeat+双master模式,这样将数据丢失降到最低。采用innodb存储引擎,并且设置innodb_flush_log_at_trx_commit = 1,这使得几乎每个提交的事务都能记录在 ib_logfile* 中,在secondary节点上能得到恢复,减小损失
5.
测试
A.拔掉心跳网线,模拟网络故障
B.shutdown主机,模拟主机宕机
C.主机掉电,模拟故障
D.手动切换,调用脚本“/usr/lib/heartbeat/hb_standby”,让heartbeat通知对方节点自己请求变成standby节点。vip漂移到对方节点上
如果mysql服务有问题,主机是正常的,目前的环境是无法切换的。我们可以自己写脚本
cat /usr/local/mysql/bin/moniter.sh
#!/bin/bash
mysql_path=/usr/local/mysql/bin/
user="root"
password="skate"
logfile=/var/log/moniter.log
date=`(date +%y-%m-%d--%H:%M:%S)`
sleeptime=30
ip=$(/sbin/ifconfig | grep "inet addr" | grep -v "127.0.0.1" | awk '{print $2;}' |awk -F':' '{print $2;}' |head -1)
Slave_IO_Running=$(mysql -u$user -p$password -e 'show slave status\G' |grep "Slave_IO_Running" | awk '{print $2}')
Slave_SQL_Running=$(mysql -u$user -p$password -e 'show slave status\G' | grep "Slave_SQL_Running" | awk '{print $2}')
mysql -p$password -e "use test;"
if [[ $? != 0 ]]
then
/usr/share/heartbeat/hb_standby
echo "relase vip ,and become standby!!"
#报警脚本
else
echo "mysql is ok"
fi
if [ "$Slave_IO_Running" = "Yes" -a "$Slave_SQL_Running" = "Yes" ]
then
echo "Slave is running!" >/dev/null
else
echo "{$ip}_replicate error please fix it "
#报警脚本
fi
这样这个脚本就可以检测mysql服务故障并切换,还可以报警(根据自己的报警方式)
6.维护
启动和关闭heartbeat的方法:
# /etc/init.d/hearbeat start 或 service heartbeat start
# /etc/init.d/hearbeat stop 或 service heartbeat stop
------end-------