单点故障和高可用介绍
单点故障: 某个重要的功能,只有一份,容易出现这个点出现问题,导致全局不能使用。
单点故障: 本质上就是一份,没有其他的备份。
单点故障: 某些重要的应用,只有1个节点,如果这个节点出现故障,导致服务不可用。
高可用(High Availability ):都要有备份,一个坏了,另外一个可以顶替,核心业务基本上不受到影响。
高可用性 HA(High Availability)指的是通过尽量缩短因日常维护操作(计划)和突发的系统崩溃(非计划)所导致的停机时间,以提高系统和应用的可用性。它也被认为是不间断操作的容错技术有所不同。HA系统是企业防止核心计算机系统因故障停机的最有效手段。
高可用(High Availability):至少有2个以上的节点提供服务,互相备份,其中的一个坏了,另外一个可用顶替。
灾备---》多搞几台机器--》成本会增加
master :主要的,对外提供服务的
backup :备份的,不对外提供服务,在master是好的情况下。一旦master挂了,backup马上就会接替master的工作,成为master
高可用的软件:keepalived 、HA Proxy、heartbeat
案例
架构图
环境准备:2个负载均衡器(centos 7.9)
1.负载均衡器上都需要安装nginx,使用nginx做7层负载均衡
使用脚本一键编译安装nginx
[root@lb-1 ~]# nginx
[root@lb-1 ~]# ps aux|grep nginx
root 1379 0.0 0.0 47268 1012 ? Ss 14:57 0:00 nginx: master process nginx
hanwei 1380 0.2 0.0 48144 3980 ? S 14:57 0:00 nginx: worker process
hanwei 1381 0.2 0.0 48144 3980 ? S 14:57 0:00 nginx: worker process
root 1383 0.0 0.0 112832 2368 pts/0 S+ 14:57 0:00 grep --color=auto nginx
[root@lb2 ~]# ps aux|grep nginx
root 1444 0.0 0.0 47268 1012 ? Ss 14:57 0:00 nginx: master process nginx
hanwei 1445 0.0 0.0 48144 4164 ? S 14:57 0:00 nginx: worker process
hanwei 1446 0.0 0.0 48144 4164 ? S 14:57 0:00 nginx: worker process
root 1449 0.0 0.0 112832 2312 pts/0 S+ 14:57 0:00 grep --color=auto nginx
2.编辑/etc/hosts文件
vim /etc/hosts
www.sc.com 192.168.227.188
www.sc.com 192.168.227.199
3.用户去访问www.sc.com
curl www.sc.com
4.安装keepalived软件,在2台负载均衡上都安装
[root@lb-1 conf]# yum install keepalived -y
[root@lb2 conf]# yum install keepalived -y
keepalived的架构:
单vip 架构: 只有master上有vip,backup上没有vip,这个时候master会比较忙,backup机器会比较闲,设备使用率比较低。
双vip 架构: 启动2个vrrp实例,每台机器上都启用2个vrrp实例,一个做master,一个做backup,启用2个vip,每台机器上都会有一个vip,这2个vip都对外提供服务,这样就可以避免单vip的情况下,一个很忙一个很闲。 可以提升设备的使用率。
单vip 架构:
1.修改配置文件
[root@lb-1 conf]# cd /etc/keepalived/
[root@lb-1 keepalived]# ls
keepalived.conf
[root@lb-1 keepalived]# cat keepalived.conf
! Configuration File for keepalived
global_defs {
notification_email {
acassen@firewall.loc
failover@firewall.loc
sysadmin@firewall.loc
}
notification_email_from Alexandre.Cassen@firewall.loc
smtp_server 192.168.200.1
smtp_connect_timeout 30
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
#vrrp_strict
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_instance VI_1 {
state MASTER
interface ens33
virtual_router_id 58
priority 120
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.227.188
}
}
[root@lb2 keepalived]# cat keepalived.conf
! Configuration File for keepalived
global_defs {
notification_email {
acassen@firewall.loc
failover@firewall.loc
sysadmin@firewall.loc
}
notification_email_from Alexandre.Cassen@firewall.loc
smtp_server 192.168.200.1
smtp_connect_timeout 30
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
#vrrp_strict
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
virtual_router_id 58
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.227.188
}
}
#重启keepalived服务
[root@lb-1 keepalived]# service keepalived restart
Redirecting to /bin/systemctl restart keepalived.service
[root@lb2 keepalived]# service keepalived restart
Redirecting to /bin/systemctl restart keepalived.service
[root@lb2 keepalived]# ps aux|grep keepa
root 1708 0.0 0.0 123020 2032 ? Ss 16:14 0:00 /usr/sbin/keepalived -D
root 1709 0.0 0.1 133992 7892 ? S 16:14 0:00 /usr/sbin/keepalived -D
root 1712 0.0 0.1 133860 6160 ? S 16:14 0:00 /usr/sbin/keepalived -D
root 1719 0.0 0.0 112832 2392 pts/0 S+ 16:14 0:00 grep --color=auto keepa
2.在LB1服务器上我们看到vip
[root@lb-1 keepalived]# ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:37:fb:39 brd ff:ff:ff:ff:ff:ff
inet 192.168.227.144/24 brd 192.168.227.255 scope global noprefixroute dynamic ens33
valid_lft 1075sec preferred_lft 1075sec
inet 192.168.227.188/32 scope global ens33
valid_lft forever preferred_lft forever
inet6 fe80::fa78:740d:a3e5:50c4/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::7e6:9c4b:c52f:f311/64 scope link noprefixroute
valid_lft forever preferred_lft forever
双vip架构步骤:
1.在每个机器上启用2个vrrp实例
[root@lb-1 keepalived]# pwd
/etc/keepalived
[root@lb-1 keepalived]# ls
keepalived.conf
[root@lb-1 keepalived]# vim keepalived.conf
[root@lb-1 keepalived]# cat keepalived.conf
! Configuration File for keepalived
global_defs {
notification_email {
acassen@firewall.loc
failover@firewall.loc
sysadmin@firewall.loc
}
notification_email_from Alexandre.Cassen@firewall.loc
smtp_server 192.168.200.1
smtp_connect_timeout 30
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
#vrrp_strict
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_instance VI_1 {
state MASTER
interface ens33
virtual_router_id 59
priority 120
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.227.188
}
}
vrrp_instance VI_2 {
state BACKUP
interface ens33
virtual_router_id 60
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.227.199
}
}
# 第2台机器上的配置
[root@lb2 keepalived]# cat keepalived.conf
! Configuration File for keepalived
global_defs {
notification_email {
acassen@firewall.loc
failover@firewall.loc
sysadmin@firewall.loc
}
notification_email_from Alexandre.Cassen@firewall.loc
smtp_server 192.168.200.1
smtp_connect_timeout 30
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
#vrrp_strict
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
virtual_router_id 59
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.227.188
}
}
vrrp_instance VI_2 {
state MASTER
interface ens33
virtual_router_id 60
priority 120
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.227.199
}
}
#重启keepalived服务
[root@lb-1 keepalived]# service keepalived restart
Redirecting to /bin/systemctl restart keepalived.service
[root@lb2 keepalived]# service keepalived restart
Redirecting to /bin/systemctl restart keepalived.service
每个机器上都有一个vip对外提供服务
[root@lb2 keepalived]# ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:22:26:bd brd ff:ff:ff:ff:ff:ff
inet 192.168.227.148/24 brd 192.168.227.255 scope global noprefixroute dynamic ens33
valid_lft 1098sec preferred_lft 1098sec
inet 192.168.227.199/32 scope global ens33
valid_lft forever preferred_lft forever
inet6 fe80::fa78:740d:a3e5:50c4/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::7e6:9c4b:c52f:f311/64 scope link noprefixroute
valid_lft forever preferred_lft forever
inet6 fe80::fdf9:8436:1c1:d532/64 scope link noprefixroute
valid_lft forever preferred_lft forever
[root@lb-1 keepalived]# ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:37:fb:39 brd ff:ff:ff:ff:ff:ff
inet 192.168.227.144/24 brd 192.168.227.255 scope global noprefixroute dynamic ens33
valid_lft 1415sec preferred_lft 1415sec
inet 192.168.227.188/32 scope global ens33
valid_lft forever preferred_lft forever
inet6 fe80::fa78:740d:a3e5:50c4/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::7e6:9c4b:c52f:f311/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::fdf9:8436:1c1:d532/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
keepalived正常启动的时候,共启动3个进程:
一个是父进程,负责监控其子进程;一个是VRRP子进程,另外一个是checkers子进程;
两个子进程都被系统watchdog看管,两个子进程各自负责自己的事。
Healthcheck子进程检查各自服务器的健康状况,,例如http,lvs。如果healthchecks进程检查到master上服务不可用了,就会通知本机上的VRRP子进程,让他删除通告,并且去掉虚拟IP,转换为BACKUP状态。
如果负载均衡器上的nginx程序出现问题,keepalived是否还有价值?
keepalived的价值是建立在nginx能正常工作的情况下,如果nginx异常,这台机器就不是负载均衡器了,需要停止它的master身份,将优先级降低,让位给其他的机器。 背后需要有健康检测功能。
实例:监控本机的nginx进程是否运行,如果nginx进程不运行就立马将优先级降低30,观察vip是否漂移?
如何判断nginx是否运行
1.pidof nginx
2.killall -0 nginx
1.编写监控nginx的脚本
[root@lb-1 nginx]# pwd
/nginx
[root@lb-1 nginx]# cat check_nginx.sh
#!/bin/bash
#检测nginx是否正常运行
if /usr/sbin/pidof nginx ;then
exit 0
else
exit 1
fi
[root@lb-1 nginx]# chmod +x check_nginx.sh
[root@lb-1 nginx]# ll
总用量 4
-rwxr-xr-x 1 root root 102 12月 21 15:01 check_nginx.sh
keepalived 会通过看脚本执行的返回值来判断脚本是否正确执行
0 执行成功
非0 表示执行失败
2.在两台负载均衡器上都要完成脚本的编写,并且授予可执行权限
[root@lb2 keepalived]# mkdir /nginx
[root@lb2 keepalived]# cd /nginx/
[root@lb2 nginx]# ls
[root@lb2 nginx]# vim check_nginx.sh
[root@lb2 nginx]# ll
总用量 4
-rw-r--r-- 1 root root 128 12月 21 15:06 check_nginx.sh
[root@lb2 nginx]# chmod +x check_nginx.sh
[root@lb2 nginx]# ll
总用量 4
-rwxr-xr-x 1 root root 128 12月 21 15:06 check_nginx.sh
3.在keepalived里定义监控脚本
#定义监控脚本chk_nginx
vrrp_script chk_nginx {
#当脚本/nginx/check_nginx.sh脚本执行返回值为0的时候,不执行下面的weight -30的操作,只有脚本执行失败,返回值非0的时候,就执行执行权重值减30的操作
script "/nginx/check_nginx.sh"
interval 1
weight -30
}
4.在keepalived里调用监控脚本
[root@lb-1 keepalived]# cat keepalived.conf
! Configuration File for keepalived
global_defs {
notification_email {
acassen@firewall.loc
failover@firewall.loc
sysadmin@firewall.loc
}
notification_email_from Alexandre.Cassen@firewall.loc
smtp_server 192.168.200.1
smtp_connect_timeout 30
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
#vrrp_strict
vrrp_garp_interval 0
vrrp_gna_interval 0
}
#定义监控脚本chk_nginx
vrrp_script chk_nginx {
#当脚本/nginx/check_nginx.sh脚本执行返回值为0的时候,不执行下面的weight -30的操作,只有脚本执行失败,返回值非0的时候,就执行执行权重值减30的操作
script "/nginx/check_nginx.sh"
interval 1
weight -30
}
vrrp_instance VI_1 {
state MASTER
interface ens33
virtual_router_id 59
priority 120
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.227.188
}
#调用监控脚本
track_script {
chk_nginx
}
}
vrrp_instance VI_2 {
state BACKUP
interface ens33
virtual_router_id 60
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.227.199
}
}
当本节点服务器成为某个角色的时候,我们去执行某个脚本
#notify_master 状态改变为MASTER后执行的脚本
notify_master /nginx/master.sh
#notify_backup 状态改变为BACKUP后执行的脚本
notify_backup /nginx/backup.sh
#notify_stop VRRP停止后后执行的脚本
notify_stop /nginx/stop.sh
如果检查到nginx进程关闭,如何关闭keepalived的软件?
第1步先编写脚本
[root@lb-1 nginx]# pwd
/nginx
[root@lb-1 nginx]# ls
check_nginx.sh halt_keepalived.sh
[root@lb-1 nginx]# cat halt_keepalived.sh
#!/bin/bash
service keepalived stop
第2步:在vrrp实例里使用notify_backup 调用脚本
#当本机成为backup的时候,立马执行下面的脚本
notify_backup "/nginx/halt_keepalived.sh"
# 在vrrp实例里调用
vrrp_instance VI_1 {
state MASTER
interface ens33
virtual_router_id 59
priority 120
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.227.188
}
#调用监控脚本
track_script {
chk_nginx
}
#当本机成为backup的时候,立马执行下面的脚本
notify_backup "/nginx/halt_keepalived.sh"
}