目录
1 Keepalived简介
Keepalived:是一个基于 VRRP(Virtual Router Redundancy Protocol,虚拟路由器冗余协议)的高可用性解决方案,主要用于实现 IP 地址的故障转移和高可用性。它可以在多个服务器之间实现负载均衡和故障切换,确保服务的持续可用性。Keepalived 通常用于 Web 服务器、数据库服务器等关键服务的故障转移。
1.1 VRRP协议简介
VRRP:是一种网络协议,用于在多个路由器之间提供冗余。通过VRRP,多个路由器可以组成一个虚拟路由器组,其中一个路由器作为主路由器(Master),其他路由器作为备份路由器(Backup)。当主路由器发生故障时,备份路由器会自动接管主路由器的职责,确保网络的持续可用性。
1.2 Keepalived工作原理
Keepalived通过VRRP协议实现IP地址的故障转移。它会在多个服务器之间选举出一个主服务器(Master),并将虚拟IP地址(VIP)绑定到主服务器上。当主服务器发生故障时,Keepalived会自动将VIP转移到备份服务器(Backup)上,从而实现服务的高可用性。
2 Keepalived安装
2.1 环境准备
##安装依赖包
yum install gcc make openssl openssl-devel -y
2.2 下载Keepalived
2.3 解压、编译并安装
tar -zxvf keepalived-2.2.8.tar.gz
cd keepalived-2.2.8
./configure --prefix=/usr/local/keepalived
make install
2.4 复制Keepalived到系统服务目录
cp /root/tool/keepalived-2.2.8/keepalived/etc/init.d/keepalived /etc/rc.d/init.d
2.5 复制配置文件
mkdir -p /etc/keepalived
cp /usr/local/keepalived/etc/keepalived/keepalived.conf.sample /etc/keepalived/keepalived.conf
2.6 设置开机自启动
chkconfig --add keepalived
chkconfig keepalived on
2.7 验证安装
/usr/local/keepalived/sbin/keepalived --version
[root@node-2 ~]# /usr/local/keepalived/sbin/keepalived --version
Keepalived v2.2.8 (04/04,2023), git commit v2.2.7-154-g292b299e+
Copyright(C) 2001-2023 Alexandre Cassen, <acassen@gmail.com>
Built with kernel headers for Linux 3.10.0
Running on Linux 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020
Distro: CentOS Linux 7 (Core)
configure options: --prefix=/usr/local/keepalived
Config options: LVS VRRP VRRP_AUTH VRRP_VMAC OLD_CHKSUM_COMPAT INIT=systemd
System options: VSYSLOG RTA_ENCAP RTA_EXPIRES RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTA_VIA IFA_FLAGS NET_LINUX_IF_H_COLLISION LIBIPTC_LINUX_NET_IF_H_COLLISION IFLA_LINK_NETNSID GLOB_BRACE GLOB_ALTDIRFUNC INET6_ADDR_GEN_MODE SO_MARK
[root@node-2 ~]#
3 Keepalived配置
这里我们以MySQL主从复制为例进行配置示例操作,用于在两个服务器之间实现IP地址的故障转移。
3.1 配置示例
3.1.1 主服务器配置
cp /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf_bak
>/etc/keepalived/keepalived.conf
vi /etc/keepalived/keepalived.conf
#编辑文件添加如下内容
vrrp_script vs_mysql_dbchk {
script "/etc/keepalived/chk_mysql.sh"
interval 10
}
vrrp_sync_group VG_1 {
group {
VM_1
}
}
vrrp_instance VM_1 {
state MASTER #在备机上修改为BACKUP
interface ens33 #VIP要绑定到enss33上,是具体情况而定,填写具体的主机网卡名称
virtual_router_id 51
priority 100 #对应备机的值要小于这个值
nopreempt
advert_int 1
authentication {
auth_type PASS #备机上要与之一致
auth_pass La123456 #备机上要与之一致
}
track_script {
vs_mysql_dbchk
}
virtual_ipaddress {
192.168.10.29/24 dev ens33 label ens33:0 #VIP要绑定到ens33上,是具体情况而定,填写具体的主机网卡名称,修改为对应的VIP
}
}
virtual_server 192.168.10.29 3306{ #修改为对应的VIP
delay_loop 6
lb_algo rr #lvs负载均衡算法
lb_kind DR #lvs的转发模式
nat_mask 255.255.255.0
persistence_timeout 50
protocol TCP
real_server 192.168.10.30 3306 { #修改为对应的realserever
weight 2
TCP_CHECK {
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
connect_port 3306
}
}
real_server 192.168.10.31 3306 { #修改为对应的realserver
weight 2
TCP_CHECK {
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
connect_port 3306
}
}
}
3.1.2 备服务器配置
cp /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf_bak
>/etc/keepalived/keepalived.conf
vim >/etc/keepalived/keepalived.conf
##编辑问价添加如下内容
vrrp_script vs_mysql_dbchk {
script "/etc/keepalived/chk_mysql.sh"
interval 10
}
vrrp_sync_group VG_1 {
group {
VM_1
}
}
vrrp_instance VM_1 {
state BACKUP # 备节点设置为 BACKUP
interface ens33 # VIP 绑定的网卡名称(与主节点一致)
virtual_router_id 51 # VRRP 路由 ID,必须与主节点一致
priority 90 # 优先级,备节点的值要小于主节点
nopreempt # 禁止抢占
advert_int 1 # VRRP 通告间隔时间(秒)
authentication {
auth_type PASS # 认证类型,必须与主节点一致
auth_pass La123456 # 认证密码,必须与主节点一致
}
track_script {
vs_mysql_dbchk # 关联的 VRRP 脚本检查
}
virtual_ipaddress {
192.168.10.29/24 dev ens33 label ens33:0 # VIP 地址及绑定的网卡
}
}
virtual_server 192.168.10.29 3306 { # VIP 地址及端口
delay_loop 6 # 检查间隔时间(秒)
lb_algo rr # LVS 负载均衡算法(轮询)
lb_kind DR # LVS 转发模式(直接路由)
nat_mask 255.255.255.0
persistence_timeout 50 # 持久连接超时时间(秒)
protocol TCP # 协议类型
real_server 192.168.10.30 3306 { # 真实服务器 1
weight 2 # 权重
TCP_CHECK {
connect_timeout 3 # 连接超时时间(秒)
nb_get_retry 3 # 重试次数
delay_before_retry 3 # 重试间隔时间(秒)
connect_port 3306 # 检查的端口
}
}
real_server 192.168.10.31 3306 { # 真实服务器 2(如果有多个后端服务器)
weight 2
TCP_CHECK {
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
connect_port 3306
}
}
}
3.2 配置解析
vrrp_script vs_mysql_dbchk {
script "/etc/keepalived/chk_mysql.sh"
interval 10
}
VRRP脚本检查
- 作用:定义一个VRRP脚本检查,用于监控MySQL服务的状态
- script:指定脚本路径/etc/keepalived/chk_mysql.sh,该脚本用于检查MySQL是否正常运行
- interval:每10秒执行一次检查
vrrp_sync_group VG_1 {
group {
VM_1
}
}
VRRP同步组
- 作用:将多个 VRRP 实例组合在一起,确保它们的状态同步
- group:包含一个 VRRP 实例VM_1
vrrp_instance VM_1 {
state MASTER # 在备机上修改为 BACKUP
interface ens33 # VIP 绑定的网卡名称
virtual_router_id 51 # VRRP 路由 ID,主备机必须一致
priority 100 # 优先级,备机的值要小于这个值
nopreempt # 禁止抢占
advert_int 1 # VRRP 通告间隔时间(秒)
authentication {
auth_type PASS # 认证类型,备机必须一致
auth_pass La123456 # 认证密码,备机必须一致
}
track_script {
vs_mysql_dbchk # 关联的 VRRP 脚本检查
}
virtual_ipaddress {
192.168.10.29/24 dev ens33 label ens33:0 # VIP 地址及绑定的网卡
}
}
VRRP实例
- state:当前节点的角色,MASTER表示主节点,备节点需要改为BACKUP
- interface:VIP绑定的网卡名称(如ens33)。
- virtual_router_id:VRRP路由 ID,主备机必须一致
- priority:优先级,主节点值应大于备节点
- nopreempt:禁止抢占模式,确保主节点故障恢复后不会抢占VIP
- virtual_ipaddress:VIP地址及绑定的网卡
virtual_server 192.168.10.29 3306 { # VIP 地址及端口
delay_loop 6 # 检查间隔时间(秒)
lb_algo rr # LVS 负载均衡算法(轮询)
lb_kind DR # LVS 转发模式(直接路由)
nat_mask 255.255.255.0
persistence_timeout 50 # 持久连接超时时间(秒)
protocol TCP # 协议类型
real_server 192.168.10.30 3306 { # 真实服务器 1
weight 2 # 权重
TCP_CHECK {
connect_timeout 3 # 连接超时时间(秒)
nb_get_retry 3 # 重试次数
delay_before_retry 3 # 重试间隔时间(秒)
connect_port 3306 # 检查的端口
}
}
real_server 192.168.10.30 3306 { # 真实服务器 2
weight 2
TCP_CHECK {
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
connect_port 3306
}
}
}
虚拟服务器配置
- virtual_server:定义虚拟服务器(VIP)及其端口
- lb_algo:LVS负载均衡算法,rr 表示轮询
- lb_kind:LVS转发模式,DR表示直接路由
- real_server:定义真实服务器(后端 MySQL 服务器)及其健康检查配置
4 启动Keepalived服务
systemctl start keepalived
[root@node-1 keepalived-2.2.8]# systemctl status keepalived
● keepalived.service - LVS and VRRP High Availability Monitor
Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled)
Active: active (running) since Sun 2025-03-02 21:09:04 CST; 2s ago
Docs: man:keepalived(8)
man:keepalived.conf(5)
man:genhash(1)
https://keepalived.org
Process: 41974 ExecStart=/usr/local/keepalived/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 41975 (keepalived)
Tasks: 3
Memory: 928.0K
CGroup: /system.slice/keepalived.service
├─41975 /usr/local/keepalived/sbin/keepalived -D
├─41976 /usr/local/keepalived/sbin/keepalived -D
└─41977 /usr/local/keepalived/sbin/keepalived -D
Mar 02 21:09:04 node-1 Keepalived_vrrp[41977]: Assigned address 192.168.10.30 for interface ens33
Mar 02 21:09:04 node-1 Keepalived_healthcheckers[41976]: Gained quorum 1+0=1 <= 4 for VS [192.168.10.29]:tcp:3306
Mar 02 21:09:04 node-1 Keepalived_healthcheckers[41976]: Activating healthchecker for service [192.168.10.30]:tcp:3306 for VS [192.168.10.29]:tcp:3306
Mar 02 21:09:04 node-1 Keepalived_healthcheckers[41976]: Activating healthchecker for service [192.168.10.31]:tcp:3306 for VS [192.168.10.29]:tcp:3306
Mar 02 21:09:04 node-1 Keepalived_vrrp[41977]: Registering gratuitous ARP shared channel
Mar 02 21:09:04 node-1 Keepalived_vrrp[41977]: (VM_1) removing VIPs.
Mar 02 21:09:04 node-1 Keepalived[41975]: Startup complete
Mar 02 21:09:04 node-1 Keepalived_vrrp[41977]: (VM_1) Entering BACKUP STATE (init)
Mar 02 21:09:04 node-1 Keepalived_vrrp[41977]: VRRP sockpool: [ifindex( 2), family(IPv4), proto(112), fd(12,13) multicast, address(224.0.0.18)]
Mar 02 21:09:06 node-1 Keepalived_healthcheckers[41976]: TCP connection to [192.168.10.30]:tcp:3306 success.
[root@node-1 keepalived-2.2.8]#
5 检查VIP绑定情况
ip addr show ens33
[root@node-2 ~]# ip addr show ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:8f:5d:8e brd ff:ff:ff:ff:ff:ff
inet 192.168.10.31/24 brd 192.168.10.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet 192.168.10.29/24 scope global secondary ens33:0
valid_lft forever preferred_lft forever
[root@node-2 ~]#
6 手动触发故障切换
通过手动停止主节点的Keepalived服务,测试VIP是否漂移到备节点。
##主节点上运行停止命令
[root@node-2 ~]# systemctl stop keepalived
[root@node-2 ~]#
##备节点上检查VIP是否绑定
[root@node-1 keepalived-2.2.8]# ip addr show ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:e5:d1:9d brd ff:ff:ff:ff:ff:ff
inet 192.168.10.30/24 brd 192.168.10.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet 192.168.10.29/24 scope global secondary ens33:0
valid_lft forever preferred_lft forever
[root@node-1 keepalived-2.2.8]#
7 附件
7.1 MySQL状态健康脚本
#!/bin/bash
#每隔10秒(keepalived配置文件里面设置)检查一次mysql进程和状态是否正常,不正常则关闭keepalived服务以停止对外提供服务
user=root
pass=La12345678
sockfile=/tmp/mysql.sock
logfile=/var/log/keepalived/mysql_check.log
count=1
while true
do
mysql -u${user} -p${pass} -S ${sockfile} -e "show status;" > /dev/null 2>&1
i=$?
ps aux | grep mysqld | grep -v grep > /dev/null 2>&1
j=$?
if [ $i = 0 ] && [ $j = 0 ]
then
exit 0
else
if [ $count -gt 5 ]
then
echo "$(date): MySQL check failed, stopping Keepalived" >> $logfile
systemctl stop keepalived
exit 1
fi
let count++
sleep 10
continue
fi
done
7.2 Keepalived监控脚本
#!/bin/bash
#监控keepalived,keepalived服务正常则退出不做任何操作,服务异常,则每5s检查一次mysql进程和状态是否正常,检查10次,Mysql进程和状态都正常则重新启动keepalived服务
user=root
pass=La12345678
sockfile=/tmp/mysql.sock
logfile=/var/log/keepalived/monitor.log
count=1
while true
do
# 检查 Keepalived 进程
h=$(ps aux | grep keepalived | grep -v grep | wc -l)
# 如果 Keepalived 进程存在,退出脚本
if [ $h -gt 0 ]
then
echo "$(date): Keepalived is running, exiting" >> $logfile
exit 0
fi
# 检查 MySQL 服务状态
mysql -u${user} -p${pass} -S ${sockfile} -e "show status;" > /dev/null 2>&1
i=$?
ps aux | grep mysqld | grep -v grep > /dev/null 2>&1
j=$?
# 如果 MySQL 服务正常,重启 Keepalived
if [ $i = 0 ] && [ $j = 0 ]
then
echo "$(date): MySQL is healthy, restarting Keepalived" >> $logfile
systemctl start keepalived
exit 0
else
# 如果重试次数超过 10 次,退出脚本
if [ $count -gt 10 ]
then
echo "$(date): MySQL check failed after 10 retries, exiting" >> $logfile
exit 1
fi
# 等待 5 秒后重试
sleep 5
let count++
continue
fi
done