文章目录
实验环境:rhel7
高可用集群:server1、server2
pacemaker
Pacemaker是一个集群管理器。它利用首选集群基础设施(OpenAIS 或heartbeat)提供的消息和成员能力,由辅助节点和系统进行故障检测和回收,实现性群集服务(亦称资源)的高可用性。
注意:pacemaker的字面意思为心脏起搏器,但是并不发送心跳,只是一个集群管理器。
corosync
Corosync是集群管理套件的一部分,通常会与其他资源管理器一起组合使用它在传递信息的时候可以通过一个简单的配置文件来定义信息传递的方式和协议等,实现HA心跳信息传输的功能。
环境的准备(server1、server2)
- 主机名的解析:
- server1、server2之间设置ssh免密
- 关闭firewalld及selinux
软件的安装(server1、server2)
rhel中,系统镜像自带的资源有额外的高可用套件及存储套件
他们存放在镜像里的addons:
所以,需要配置yum仓库(server1、server2):
接着进行安装(server1、server2):
yum install pacemaker corosync -y
安装集群管理工具pcs及依赖性psmisc policycoreutils-python
集群管理工具pcs命令需要连接pcsd服务,所以,打开pcsd服务并设置为开机自启动:
[root@server1 ~]# systemctl enable pcsd --now
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
集群的建立
设置建立认证的密码(server1、server2)
为建立认证的用户hacluster创建密码:
server1、server2建立认证
[root@server1 ~]# pcs cluster auth server1 server2
server1: Already authorized
server2: Already authorized
将server1、server2加入到集群
pcs cluster setup --name mycluster server1 server2
集群的启动
[root@server1 ~]# pcs cluster start --all
server1: Starting Cluster (corosync)...
server2: Starting Cluster (corosync)...
server1: Starting Cluster (pacemaker)...
server2: Starting Cluster (pacemaker)...
集群状态的校验
[root@server1 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id = 172.25.5.1
status = ring 0 active with no faults
[root@server1 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id = 172.25.5.1
status = ring 0 active with no faults
集群状态的查看
[root@server1 ~]# pcs cluster status
Cluster Status:
Stack: corosync
Current DC: server1 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Thu Aug 6 12:00:02 2020
Last change: Wed Aug 5 09:10:47 2020 by root via cibadmin on server2
2 nodes configured
2 resources configured
PCSD Status:
server1: Online
server2: Online
高可用集群建立成功!
资源的放置
pcs resource create vip ocf:heartbeat:IPaddr2 ip=172.25.5.99 op monitor interval=30s
在集群中添加资源vip:172.168.5.99 j监控时间为30s
ocf:heartbeat:IPaddr2
为资源启动脚本
[root@server1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: server1 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Thu Aug 6 13:35:45 2020
Last change: Thu Aug 6 13:35:27 2020 by root via cibadmin on server1
2 nodes configured
1 resource configured
Online: [ server1 server2 ]
Full list of resources:
vip (ocf::heartbeat:IPaddr2): Started server2
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
可以看到,此时vip被添加到了server2上:
[root@server2 ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:12:90:b5 brd ff:ff:ff:ff:ff:ff
inet 172.25.5.2/16 brd 172.25.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet 172.25.5.99/16 brd 172.25.255.255 scope global secondary eth0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe12:90b5/64 scope link
valid_lft forever preferred_lft forever
- 此时停掉server2上的集群组件,server2会接管vip:
[root@server1 ~]# pcs cluster stop server2
server2: Stopping Cluster (pacemaker)...
server2: Stopping Cluster (corosync)...
[root@server1 ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:49:4e:8f brd ff:ff:ff:ff:ff:ff
inet 172.25.5.1/16 brd 172.25.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet 172.25.5.99/16 brd 172.25.255.255 scope global secondary eth0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe49:4e8f/64 scope link
valid_lft forever preferred_lft forever
[root@server1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: server1 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Thu Aug 6 13:43:13 2020
Last change: Thu Aug 6 13:35:27 2020 by root via cibadmin on server1
2 nodes configured
1 resource configured
Online: [ server1 ]
OFFLINE: [ server2 ]
Full list of resources:
vip (ocf::heartbeat:IPaddr2): Started server1
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
此时server2上的vip已经没有了:
[root@server2 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:12:90:b5 brd ff:ff:ff:ff:ff:ff
inet 172.25.5.2/16 brd 172.25.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe12:90b5/64 scope link
valid_lft forever preferred_lft forever
- 手动删除vip后,vip会自己创建:
[root@server1 ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:49:4e:8f brd ff:ff:ff:ff:ff:ff
inet 172.25.5.1/16 brd 172.25.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe49:4e8f/64 scope link
valid_lft forever preferred_lft forever
[root@server1 ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:49:4e:8f brd ff:ff:ff:ff:ff:ff
inet 172.25.5.1/16 brd 172.25.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe49:4e8f/64 scope link
[root@server1 ~]# ip addr del 172.25.5.99 dev eth0
Warning: Executing wildcard deletion to stay compatible with old scripts.
Explicitly specify the prefix length (172.25.5.99/32) to avoid this warning.
This special behaviour is likely to disappear in further releases,
fix your scripts!
[root@server1 ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:49:4e:8f brd ff:ff:ff:ff:ff:ff
inet 172.25.5.1/16 brd 172.25.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe49:4e8f/64 scope link
valid_lft forever preferred_lft forever
[root@server1 ~]#
[root@server1 ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:49:4e:8f brd ff:ff:ff:ff:ff:ff
inet 172.25.5.1/16 brd 172.25.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet 172.25.5.99/16 brd 172.25.255.255 scope global secondary eth0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe49:4e8f/64 scope link
valid_lft forever preferred_lft forever
当系统发现vip没有时会调用资源启动脚本ocf:heartbeat:IPaddr2
来创建vip
- 网络故障(关掉vip所在主机网卡)
ifdown eth0
此时vip迁移到server2:
[root@server2 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: server2 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Thu Aug 6 14:01:20 2020
Last change: Thu Aug 6 13:35:27 2020 by root via cibadmin on server1
2 nodes configured
1 resource configured
Online: [ server2 ]
OFFLINE: [ server1 ]
Full list of resources:
vip (ocf::heartbeat:IPaddr2): Started server2
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
组资源的设置
如果需要的资源有httpd及vip,使用上述方式则有可能出现httpd和vip不在同一台服务器上。所以,此时应使用组资源。
单个服务的资源设置
资源调用的方式:
[root@server1 ~]# pcs resource standards
lsb
ocf
service
systemd
当将apache作为资源时,就应该使用systemd的方式:
pcs resource create apache systemd:httpd op monitor interval=1min
[root@server1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: server2 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Thu Aug 6 14:31:20 2020
Last change: Thu Aug 6 14:31:06 2020 by root via cibadmin on server1
2 nodes configured
2 resources configured
Online: [ server1 server2 ]
Full list of resources:
vip (ocf::heartbeat:IPaddr2): Started server2
apache (systemd:httpd): Started server1
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
可以发现,apache和vip并没有绑定在同一服务器,不能实现通过vip对httpd的访问。
组资源的设置
pcs resource group add webgroup vip apache
注意:vip 和 apache为按顺序启动!
[root@server1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: server2 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Thu Aug 6 14:34:23 2020
Last change: Thu Aug 6 14:33:26 2020 by root via cibadmin on server1
2 nodes configured
2 resources configured
Online: [ server1 server2 ]
Full list of resources:
Resource Group: webgroup
vip (ocf::heartbeat:IPaddr2): Started server2
apache (systemd:httpd): Started server2
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@server1 ~]# curl 172.25.5.99
server2
此时停掉server2,vip及apache会整体迁到server1:
[root@server1 ~]# pcs cluster stop server2
server2: Stopping Cluster (pacemaker)...
^[[Aserver2: Stopping Cluster (corosync)...
[root@server1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: server1 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Thu Aug 6 14:39:00 2020
Last change: Thu Aug 6 14:33:26 2020 by root via cibadmin on server1
2 nodes configured
2 resources configured
Online: [ server1 ]
OFFLINE: [ server2 ]
Full list of resources:
Resource Group: webgroup
vip (ocf::heartbeat:IPaddr2): Started server1
apache (systemd:httpd): Started server1
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@server1 ~]# curl 172.25.5.99
server1