linux的pacemaker集群

最新推荐文章于 2024-05-16 14:13:42 发布

zhou562334410

最新推荐文章于 2024-05-16 14:13:42 发布

阅读量1.8k

点赞数 2

本文链接：https://blog.csdn.net/zhou562334410/article/details/86495372

版权

一、pacemaker的介绍

Pacemaker是一个集群资源管理者。他用资源级别的监测和恢复来保证集群服务(aka.资源)的最大可用性。它可以用你所擅长的基础组件(Corosync或者是Heartbeat)来实现通信和关系管理。

pacemaker作为linux系统高可用HA的资源管理器，位于HA集群架构中的资源管理，资源代理层，它不提供底层心跳信息传递功能。(心跳信息传递是通过corosync来处理的这个使用有兴趣的可以在稍微了解一下，其实corosync并不是心跳代理的唯一组件，可以用 hearbeat等来代替)。pacemaker管理资源是通过脚本的方式来执行的。我们可以将某个服务的管理通过shell,python等脚本语言进行处理，在多个节点上启动相同的服务时，如果某个服务在某个节点上出现了单点故障那么pacemaker会通过资源管理脚本来发现服务在改节点不可用。

实验环境：

server1：172.25.37.1

server2：172.25.37.2

server3： 172.25.37.3

二、pacemaker的搭建

1.yum源的设置

server1:操作
[root@server1 ~]# vim /etc/yum.repos.d/rhel-source.repo

[rhel-source]
name=Red Hat Enterprise Linux $releasever - $basearch - Source
baseurl=http://172.25.37.250/rhel6.5
enabled=1
gpgcheck=0

[HighAvailability]
name=Red Hat Enterprise Linux HighAvailability
baseurl=http://172.25.37.250/rhel6.5/HighAvailability
enabled=1
gpgcheck=0

[LoadBalancer]
name=Red Hat Enterprise Linux LoadBalancer
baseurl=http://172.25.37.250/rhel6.5/LoadBalancer
enabled=1
gpgcheck=0

[ResilientStorage]
name=Red Hat Enterprise Linux ResilientStorage
baseurl=http://172.25.37.250/rhel6.5/ResilientStorage
enabled=1
gpgcheck=0

[ScalableFileSystem]
name=Red Hat Enterprise Linux ScalableFileSystem
baseurl=http://172.25.37.250/rhel6.5/ScalableFileSystem
enabled=1
gpgcheck=0

[root@server1 ~]# scp /etc/yum.repos.d/rhel-source.repo /etc/yum.repos.d/rhel-source.repo

[root@server1 ~]# yum install pacemaker -y
[root@server1 ~]# scp root@172.25.37.250:/home/kiosk/desktop/pssh-2.3.1-2.1.x86_64.rpm crmsh-1.2.6-0.rc2.2.1.x86_64.rpm  /root/
[root@server1 ~]# yum install pssh-2.3.1-2.1.x86_64.rpm crmsh-1.2.6-0.rc2.2.1.x86_64.rpm -y
[root@server1 ~]# cd /etc/corosync/
[root@server1 corosync]# ls
[root@server1 corosync]# cp corosync.conf.example corosync.conf
[root@server1 corosync]# vim corosync.conf
[root@server1 corosync]# scp corosync.conf root@172.25.37.2:/etc/corosync/
[root@server1 corosync]# /etc/init.d/corosync start
[root@server2 corosync]# /etc/init.d/corosync start

server1、server2同时做解析

[root@server1 corosync]# vim   /etc/hosts

[root@server1 corosync]# crm   #进入交互式界面
crm(live)# configure 
crm(live)configure# 
crm(live)configure# property 
usage: property [$id=<set_id>] <option>=<value>
crm(live)configure# property stonith-enabled=false
crm(live)configure# verify  #检查语法是否错误
crm(live)configure# commit   #保存
crm(live)configure# show  #查看集群服务
node server1
node server2
property $id="cib-bootstrap-options" \
	dc-version="1.1.10-14.el6-368c726" \
	cluster-infrastructure="classic openais (with plugin)" \
	expected-quorum-votes="2" \
	stonith-enabled="false"
crm(live)configure# Ctrl-C, leaving

server2:
[root@server2 ~]# yum install pssh-2.3.1-2.1.x86_64.rpm crmsh-1.2.6-0.rc2.2.1.x86_64.rpm -y
[root@server2 ~]# yum install pacemaker -y
[root@server2 ~]# cd /etc/corosync/
[root@server2 corosync]# ls
[root@server2 corosync]# cp corosync.conf.example corosync.conf
[root@server2 corosync]# /etc/init.d/corosync start

[root@server1 corosync]# vim corosync.conf

vim /etc/hosts

三、在没有服务器优先级前的选举
pacemaker资源管理器在一个节点宕机后进行资源切换时，要遵从选举机制，它规定法定票数要大于等于2

若没有一个节点达到2，则整个集群坏掉。

1.监控

[root@server1 corosync]# crm_mon    #监控
Last updated: Tue Jan 15 20:41:19 2019
Last change: Tue Jan 15 19:10:30 2019 via crmd on server1
Stack: classic openais (with plugin)
Current DC: server1 - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
0 Resources configured


Online: [ server1 server2 ]

 
[root@server1 corosync]#  crm
crm(live)# node 
crm(live)node# standby    ##server1正常下线
crm(live)node# 

[root@server1 corosync]# crm_mon
Last updated: Tue Jan 15 20:46:06 2019
Last change: Tue Jan 15 20:44:35 2019 via crm_attribute	on server1
Stack: classic openais (with plugin)
Current DC: server1 - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
0 Resources configured


Node server1: standby
Online: [ server2 ]

[root@server1 corosync]#  crm
crm(live)# node 
crm(live)node# online    ##server1正常上线
crm(live)node#

[root@server1 corosync]# crm_mon 
Last updated: Tue Jan 15 20:48:23 2019
Last change: Tue Jan 15 20:47:04 2019 via crm_attribute	on server1
Stack: classic openais (with plugin)
Current DC: server1 - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
0 Resources configured


Online: [ server1 server2 ]

2.配置资源vip

[root@server1 corosync]# crm
crm(live)# configure 
crm(live)configure# primitive vip ocf:heartbeat:IPaddr2 params ip=172.25.37.100 nic=eth0 cidr_netmask=24
 crm(live)configure# verify 
crm(live)configure# commit 
crm(live)configure# show 
node server1 \
	attributes standby="off"
node server2
primitive vip ocf:heartbeat:IPaddr2 \
	params ip="172.25.77.100" nic="eth0" cidr_netmask="24"
property $id="cib-bootstrap-options" \
	dc-version="1.1.10-14.el6-368c726" \
	cluster-infrastructure="classic openais (with plugin)" \
	expected-quorum-votes="2" \
	stonith-enabled="false"
crm(live)configure# property no-quorum-policy=ignore  ##忽略投票结果
crm(live)configure# verify 
crm(live)configure# commit 
crm(live)configure# show 
node server1 \
	attributes standby="off"
node server2
primitive vip ocf:heartbeat:IPaddr2 \
	params ip="172.25.77.100" nic="eth0" cidr_netmask="24"
property $id="cib-bootstrap-options" \
	dc-version="1.1.10-14.el6-368c726" \
	cluster-infrastructure="classic openais (with plugin)" \
	expected-quorum-votes="2" \
	stonith-enabled="false" \
	no-quorum-policy="ignore"
crm(live)configure#

3.测试：

当server2，poweroff时，集群坏掉，重启之后集群恢复正常。

当忽略票数时，关掉server1，重启之后，集群正常工作，并且不回切。

四、添加资源httpd服务

1.添加资源

服务不要自己启动，交给集群服务自己启动

server1:
[root@server1 ~]# yum install httpd -y
[root@server1 ~]# cd /var/www/html/
[root@server1 html]# vim index.html
server1

server2:
[root@server2 ~]# yum install httpd -y
[root@server2 ~]# cd /var/www/html/
[root@server2 html]# vim index.html
server2

[root@server1 html]# crm
crm(live)# configure
crm(live)configure# primitive webserver lsb:httpd
crm(live)configure# verify
crm(live)configure# commit

2.让资源绑在一起

方法一：加如统一资源组

crm(live)configure# group webgroup vip webserver
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# cd ..
crm(live)# resource
crm(live)resource# show
 Resource Group: webgroup
     vip    (ocf::heartbeat:IPaddr2):    Started
     webserver    (lsb:httpd):    Started

测试：

当节点server1下线，server2工作；当server1再上线的时候，server2继续工作并不回切。

删除组：

crm(live)# resource
crm(live)resource# show
 Resource Group: webgroup
     vip    (ocf::heartbeat:IPaddr2):    Started
     webserver    (lsb:httpd):    Started
crm(live)resource# stop webgroup           ###一定先stop
crm(live)resource# cd ..
crm(live)# configure
crm(live)configure# delete webgroup
crm(live)configure# verify
crm(live)configure# commit

方法二：克隆（克隆只有在pacemaker中能使用，并且资源不回切。）

crm(live)configure# colocation webserver_with_vip inf: webserver vip
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# cd ..
crm(live)# resource
crm(live)resource# show
 vip    (ocf::heartbeat:IPaddr2):    Started
 webserver    (lsb:httpd):    Started

五、回切（谁的权重大，资源就分配给谁）

1.先定义资源的启动顺序，先ip再httpd

crm(live)configure# order vip_before_webserver Mandatory:  vip webserver
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# show

2.定义资源对节点的倾向性 ----权重的分配

注意：分配好权重之后，只要权重大的在线，那么资源就会分配给权重大的

crm(live)configure# location vip_on_server2 vip rule 50: #uname eq serverv2 ##定义了server2的权重为50，server1未设置则默认为0
crm(live)configure# verify
crm(live)configure# commit

3.定义资源对节点的粘性

权重给资源，有资源分配权重，资源再那个结点上，那么那个节点就权重最大。（定义资源对节点的粘性后，倾向性就再服务上线时其作用。）

pcrm(live)configure# property default-resource-stickiness=50  ##两个资源权重50*2=100
crm(live)configure# verify
crm(live)configure# commit

六、监控

注：

如果服务没有监控，那么server上的资源（httpd）挂掉了，集群服务服务并不会显示报错信息，显示集群正常，但是节点资源已经丢失了。

加上监控之后，节点上的资源挂掉之后，集群服务报错。

crm(live)configure# primitive vip ocf:heartbeat:IPaddr2 params ip=172.25.38.100 nic=eth0 cidr_netmask=24 op monitor interval=10s timeout=20s           ###每10s监控检查一次；给20s的延迟时间
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# primitive  webserver lsb:httpd op monitor interval=10s timeout=20s
crm(live)configure# verify
crm(live)configure# commit

七、给集群上添加存储


#server3：
[root@server3 ~]# yum install nfs-utils rpcbind -y
[root@server3 ~]# /etc/init.d/rpcbind start
Starting rpcbind:                                          [  OK  ]
[root@server3 ~]# /etc/init.d/nfs start
Starting NFS services:                                     [  OK  ]
Starting NFS mountd:                                       [  OK  ]
Starting NFS daemon:                                       [  OK  ]
Starting RPC idmapd:                                       [  OK  ]
[root@server3 ~]# mkdir -p /web/httpdocs
[root@server3 ~]# chmod o+w /web/httpdocs/
[root@server3 ~]# ls -ld /web/httpdocs/
drwxr-xrwx 2 root root 4096 Jan  3 16:22 /web/httpdocs/
[root@server3 ~]# vim /etc/exports  ##共享策略
/web/httpdocs           172.25.37.0/24(rw)
 
[root@server3 ~]# exportfs -r
[root@server3 ~]# showmount -e   
Export list for server3:
/web/httpdocs 172.25.37.0/24
[root@server3 ~]# cd /web/httpdocs/
[root@server3 httpdocs]# vim index.html
server3

2.测试是否可以远程挂载

[root@server1 html]# mount -t nfs 172.25.37.3:/web/httpdocs /mnt

[root@server1 html]# df
Filesystem                   1K-blocks   Used Available Use% Mounted on
/dev/mapper/VolGroup-lv_root   8813300 982296   7383312  12% /
tmpfs                           251136  21744    229392   9% /dev/shm
/dev/sda1                       495844  33475    436769   8% /boot
172.25.37.3:/web/httpdocs      8813312 905728   7459904  11% /mnt
[root@server1 html]# umount /mnt

[root@server2 html]# mount -t nfs 172.25.37.3:/web/httpdocs /mnt

[root@server2 html]# df
Filesystem                   1K-blocks   Used Available Use% Mounted on
/dev/mapper/VolGroup-lv_root   8813300 982296   7383312  12% /
tmpfs                           251136  21744    229392   9% /dev/shm
/dev/sda1                       495844  33475    436769   8% /boot
172.25.37.3:/web/httpdocs      8813312 905728   7459904  11% /mnt
[root@server2 html]# umount /mnt

3.为集群添加nfs存储

[root@server1 ~]# crm
crm(live)# configure 
crm(live)configure# primitive webdata ocf:heartbeat:Filesystem params device="172.25.37.3:/web/httpdocs" directory='/var/www/html' fstype='nfs' op monitor interval=20s timeout=40s op start timeout=60s stop timeout=60s 
crm(live)configure# verify 
WARNING: webdata: default timeout 20s for stop is smaller than the advised 60
crm(live)configure# commit 
WARNING: webdata: default timeout 20s for stop is smaller than the advised 60
crm(live)configure# group webgroup vip webserver webdata 
crm(live)configure# verify 
WARNING: webdata: default timeout 20s for stop is smaller than the advised 60
crm(live)configure# commit 
crm(live)configure# cd ..
crm(live)# node 
crm(live)node# standby server2
crm(live)node# cd ..
crm(live)# resource 
crm(live)resource# cleanup webgroup 
Cleaning up vip on server1
Cleaning up vip on server2
Cleaning up webserver on server1
Cleaning up webserver on server2
Cleaning up webdata on server1
Cleaning up webdata on server2
Waiting for 1 replies from the CRMd. OK
crm(live)resource# show 
 Resource Group: webgroup
     vip	(ocf::heartbeat:IPaddr2):	Started 
     webserver	(lsb:httpd):	Started 
     webdata	(ocf::heartbeat:Filesystem):	Started 
crm(live)resource#

测试：

[root@foundation37 Desktop]# curl 172.25.37.100
server3
[root@foundation37 Desktop]# curl 172.25.37.100
server3

八、Fence机制

1.在server1、server2上创建/etc/cluster

[root@server1 ~]# mkdir -p /etc/cluster

[root@server2 ~]# mkdir -p /etc/cluster

[root@foundation37 ~]# systemctl start fence_virtd
[root@foundation37 cluster]# scp -r fence_xvm.key root@172.25.37.1:/etc/cluster/   #将fence的钥匙传给server1
[root@foundation37 cluster]# scp -r fence_xvm.key root@172.25.37.2:/etc/cluster/   #将fence的钥匙传给server2

2.为集群添加fence机制

[root@server1 cluster]# yum install -y fence-virt 
[root@server1 ~]# crm
crm(live)# configure
crm(live)configure# property stonith-enabled=true      #开启fence机制，更改为ture表示资源会迁移
crm(live)configure# commit

[root@server2 cluster]# yum install -y fence-virt             
[root@server2 cluster]# crm
crm(live)# configure
crm(live)configure# primitive vmfence stonith:fence_xvm params pcmk_host_map="vm1:test1;vm2:test2" op monitor interval=1min                 ###添加vmfence
crm(live)configure# commit

测试：

若资源在server1上，当server1挂掉之后，vmfence服务立即切换到server2上。

当server1恢复时，vmfence服务回切到server1上。

zhou562334410

关注

2
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
linux的pacemaker集群

一、pacemaker的介绍Pacemaker是一个集群资源管理者。他用资源级别的监测和恢复来保证集群服务(aka.资源)的最大可用性。它可以用你所擅长的基础组件(Corosync或者是Heartbeat)来实现通信和关系管理。pacemaker作为linux系统高可用HA的资源管理器，位于HA集群架构中的资源管理，资源代理层，它不提供底层心跳信息传递功能。(心跳信息传递是通过coros...
复制链接

扫一扫