学习参考链接:https://blog.csdn.net/wzy0623/article/details/80916567
1.安装keepalived
yum install -y keepalived
2.设置开机自动启动
systemctl enable keepalived.service
3.配置keepalived实现mysql主从复制高可用
#主服务器安装keepalived并配置:
vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
notification_email {
admin@test.com
}
notification_email_from admin@test.com
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id MYSQL_HA
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
virtual_router_id 51
priority 100
advert_int 1
nopreempt
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.42.88
}
}
virtual_server 192.168.42.88 3306 {
delay_loop 2
lb_algo rr
lb_kind NAT
persistence_timeout 50
protocol TCP
real_server 192.168.42.131 3306 {
weight 3
notify_down /tmp/mysql.sh
TCP_CHECK {
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
}
}
}
#从服务器安装keepalived并配置:
vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
notification_email {
admin@test.com
}
notification_email_from admin@test.com
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id MYSQL_HA
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
virtual_router_id 51
priority 90
advert_int 1
nopreempt
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.42.88
}
}
virtual_server 192.168.42.88 3306 {
delay_loop 2
lb_algo rr
lb_kind NAT
persistence_timeout 50
protocol TCP
real_server 192.168.42.132 3306 {
weight 3
notify_down /tmp/mysql.sh
TCP_CHECK {
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
#connect_port 3306
}
}
}
4.主服务器和 从服务器上编辑mysql.sh,当mysql服务down时,执行此脚本,杀死keepalived实现切换
# vim /tmp/mysql.sh
#!/bin/bash
pkill keepalived
5.在两台服务器上对该脚本授予权限
cd /tmp #进入mysql.sh所在目录
chmod u+x mysql.sh #授予权限
注意:
#hmod是权限管理bai命令change the permissions mode of a file的缩写;duu代表所有者#zhiuser;x代表执行权限;+ 表示增加权限。
#chmod u+x file.sh就表示对当前目录下的file.sh文件dao的所有者增加可执行权限。
6.先重启机器(reboot命令),再启动keepalived
systemctl start keepalived
7.查看主服务和从服务器上是否有VIP
#主服务器
[root@cluster4 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:d8:05:d4 brd ff:ff:ff:ff:ff:ff
inet 192.168.42.131/24 brd 192.168.42.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet 192.168.42.88/32 scope global ens33
valid_lft forever preferred_lft forever
inet6 fe80::ed82:44f9:b0d2:19f2/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::a4db:5ac7:6fbd:2deb/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::d613:4ea7:fcef:9fe2/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
#从服务器
[root@cluster5 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:df:b5:d8 brd ff:ff:ff:ff:ff:ff
inet 192.168.42.132/24 brd 192.168.42.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet 192.168.42.88/32 scope global ens33
valid_lft forever preferred_lft forever
inet6 fe80::ed82:44f9:b0d2:19f2/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::a4db:5ac7:6fbd:2deb/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::d613:4ea7:fcef:9fe2/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
发现主从服务器上均绑定了VIP(129.168.42.88) ,显然有问题。可能是配置文件/etc/keepalived/keepalived.conf有问题,也可能是如下问题
问题解决过程:
首先在128主机上用tcpdump抓包,监控一下ens33,也就是绑定了vip的网卡的报文,发现如下:
root@cluster5 ~]# sudo tcpdump -i ens33 vrrp -n
sudo: tcpdump: command not found
[root@cluster5 ~]# yum install tcpdump
需要先安装抓包命令,在主服务器(131)服务器得到下面的内容:
[root@cluster5 ~]# sudo tcpdump -i ens33 vrrp -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens33, link-type EN10MB (Ethernet), capture size 262144 bytes
23:15:30.526364 IP 192.168.42.132 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 90, authtype simple, intvl 1s, length 20
23:15:30.930987 IP 192.168.42.131 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
23:15:31.527923 IP 192.168.42.132 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 90, authtype simple, intvl 1s, length 20
23:15:31.932857 IP 192.168.42.131 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
23:15:32.529671 IP 192.168.42.132 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 90, authtype simple, intvl 1s, length 20
23:15:32.933840 IP 192.168.42.131 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
131(主机)和132(备份机)两台机器在轮询往224.0.0.18(vrrp的组播地址)发送报文。理论上来说,主机处于活跃状态的时候,备份机收到报文之后是不会发送组播消息的,这个很明显就是备份机没收到主机的组播报文。
在132上抓包,也发现同样的输出。
再三检查之后,确定配置没问题,所以就把问题锁定在主备机与组播ip之间的通信问题上。最后查了一下firewall开启组播通信的方法:
[root@cluster5 ~]# firewall-cmd --direct --permanent --add-rule ipv4 filter INPUT 0 --in-interface ens33 --destination 224.0.0.18 --protocol vrrp -j ACCEPT
success
[root@cluster5 ~]# firewall-cmd --reload;
success
其中INPUT 0 --in-interface ens33这段的ens33是绑定了vip的网卡,替换成自己的网卡就可以了。
主备机都运行之后,直接查看vip的绑定情况,发现已经恢复正常:
#主服务器上绑定了VIP
[root@cluster4 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:d8:05:d4 brd ff:ff:ff:ff:ff:ff
inet 192.168.42.131/24 brd 192.168.42.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet 192.168.42.88/32 scope global ens33
valid_lft forever preferred_lft forever
inet6 fe80::ed82:44f9:b0d2:19f2/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::a4db:5ac7:6fbd:2deb/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::d613:4ea7:fcef:9fe2/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
[root@cluster4 ~]#
#从服务器没有绑定VIP
[root@cluster5 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:df:b5:d8 brd ff:ff:ff:ff:ff:ff
inet 192.168.42.132/24 brd 192.168.42.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet6 fe80::ed82:44f9:b0d2:19f2/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::a4db:5ac7:6fbd:2deb/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::d613:4ea7:fcef:9fe2/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
8.两台mysql服务器授权允许root远程登录
# mysql -uroot -proot
mysql> grant all on *.* to 'root'@'192.168.42.%' identified by 'root';
mysql> flush privileges;
9.测试高可用
通过mysql客户端通过VIP连接,看是否连接成功。
这里我用同网段的另一台机器,连接测试:
# mysql -h192.168.42.88 -uroot -proot
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> select * from testdb.user;
+--------+------+
| number | name |
+--------+------+
| 1 | testid |
+--------+------+
1 row in set (0.01 sec)
可以看到,连接成功,且查询数据没有问题,停止master上mysql服务,是否能正常切换到从服务器上,可以使用 ip addr命令来查看VIP在哪台服务器上。
10.master上查看是否有VIP,可以看到VIP在master上
[root@cluster4 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:d8:05:d4 brd ff:ff:ff:ff:ff:ff
inet 192.168.42.131/24 brd 192.168.42.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet 192.168.42.88/32 scope global ens33
valid_lft forever preferred_lft forever
inet6 fe80::ed82:44f9:b0d2:19f2/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::a4db:5ac7:6fbd:2deb/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::d613:4ea7:fcef:9fe2/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
11.停掉master上mysql服务
# systemctl stop mysqld
发现VIP依旧绑定在主服务器上,但是执行mysql.sh文件就会发生漂移。这是由于检查脚本不执行,脚本本身是没有问题的。主要原因是selinux,将其关闭就行。
#暂时关闭
setenforce 0
#永久关闭
sed -i "s/^SELINUX=.*/SELINUX=disabled/g" /etc/selinux/config
然后就会发现vip发生了漂移,如下:
#主服务器停止mysqld服务,发生vip漂移
[root@cluster4 ~]# systemctl stop mysqld
[root@cluster4 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:d8:05:d4 brd ff:ff:ff:ff:ff:ff
inet 192.168.42.131/24 brd 192.168.42.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet6 fe80::ed82:44f9:b0d2:19f2/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::a4db:5ac7:6fbd:2deb/64 scope link noprefixroute
valid_lft forever preferred_lft forever
可以看到,keepalived在mysql服务停掉之后也被停掉。然后会发现从服务器上出现了VIP
#从服务器绑定了VIP
[root@cluster5 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:df:b5:d8 brd ff:ff:ff:ff:ff:ff
inet 192.168.42.132/24 brd 192.168.42.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet 192.168.42.88/32 scope global ens33
valid_lft forever preferred_lft forever
inet6 fe80::ed82:44f9:b0d2:19f2/64 scope link noprefixroute
valid_lft forever preferred_lft forever
主服务器上的keepalived进程也已经被杀掉:
[root@cluster4 ~]# ps -ef | grep keepalived | grep -v grep
[root@cluster4 ~]#
关闭服务之前,keepalived进程如下:
[root@cluster4 ~]# ps -ef | grep keepalived | grep -v grep
root 1349 1 0 18:03 ? 00:00:00 /usr/sbin/keepalived -D
root 1350 1349 0 18:03 ? 00:00:00 /usr/sbin/keepalived -D
root 1351 1349 0 18:03 ? 00:00:00 /usr/sbin/keepalived -D
[root@cluster4 ~]#
12.恢复主服务器故障
主服务器上启动mysql服务和keepalived服务
[root@cluster4 ~]# systemctl start mysqld
[root@cluster4 ~]# systemctl start keepalived
可以看到vip仍然从服务器上。
#主服务器
[root@cluster4 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:d8:05:d4 brd ff:ff:ff:ff:ff:ff
inet 192.168.42.131/24 brd 192.168.42.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet6 fe80::ed82:44f9:b0d2:19f2/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::a4db:5ac7:6fbd:2deb/64 scope link noprefixroute
valid_lft forever preferred_lft forever
#从服务器
[root@cluster5 ~]# ip addr
[root@cluster5 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:df:b5:d8 brd ff:ff:ff:ff:ff:ff
inet 192.168.42.132/24 brd 192.168.42.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet 192.168.42.88/32 scope global ens33
valid_lft forever preferred_lft forever
inet6 fe80::ed82:44f9:b0d2:19f2/64 scope link noprefixroute
valid_lft forever preferred_lft forever
inet6 fe80::a4db:5ac7:6fbd:2deb/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
这是因为配置文件/etc/keepalived/keepalived.conf中的nopreemp参数问题。
nopreempt:设置为不抢占,注意这个配置只能设置在state为BACKUP的主机上。当MASTER出现问题后,BACKUP会竞选为新的MASTER,那么当之前的MASTER重新在线后,是继续成为MASTER还是变成BACKUP呢?默认不设置不抢占,那么之前的MASTER起来后会继续抢占成为MASTER。这样的频繁切换对于业务是不能容忍的,我们希望MASTER起来后成为BACKUP,所以要设置不抢占。又因为nopreempt配置只能用在state为BACKUP的主机上,因此MASTER的state也得设置为BACKUP,也就是说主机IP和从机IP都要将state设置为BACKUP。通过在两台BACKUP上面设置不同的priority,让它们一起来就抢占,高priority的主机成为最初的MASTER。