corosync & pacemaker群集-命令

使用pcs shell配置corosync & pacemaker群集

Pacemaker
Pacemaker，即Cluster Resource Manager（CRM），管理整个HA，客户端通过pacemaker管理监控整个集群。

CRM支持ocf和lsb两种资源类型：

ocf格式的启动脚本在/usr/lib/ocf/resource.d/下面。
lsb的脚本一般在/etc/rc.d/init.d/下面。

1、常用的集群管理工具：
（1）基于命令行
crm shell/pcs

（2）基于图形化
pygui/hawk/lcmc/pcs

2、相关的资源文件：
（1）/usr/lib/ocf/resource.d，pacemaker资源库文件位置，可安装资源包：resource-agents 获取更多ocf格式的资源。
（2）/usr/sbin/fence_***，Fencing设备的执行脚本名称，可安装资源包：fence-agents 获取更多Fencing设备资源。

3、查看使用说明：
[shell]# man ocf_heartbeat_*** ## 查看OCF资源说明，man ocf_heartbeat_apache
[shell]# man fence_*** ## 查看Fencing设备说明，man fence_vmware

4、参考文档
https://github.com/ClusterLabs
http://clusterlabs.org/doc/
http://www.linux-ha.org/doc/man-pages/man-pages.html
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Configuring_the_Red_Hat_High_Availability_Add-On_with_Pacemaker/index.html

在群集配置过程中参考了互联网上众多优秀文章，在此感谢原作者！！！

以下记录整理了在vmware esxi5.5 + centos6.6环境中使用PCS命令配置corosync & pacemaker群集的一些操作，由于本人水平有限，仅供参考：

--------------------------------------------------

1.安装群集软件：
[shell]# yum -y install corosync pacemaker pcs
[shell]# yum -y install fence-agents resource-agents

2.拷贝配置文件、启动脚本
[shell]# mkdir -p /etc/cluster/
[shell]# ln -s /etc/rc.d/init.d/corosync /etc/rc.d/init.d/cman
[shell]# ln -s /usr/sbin/corosync-cmapctl /usr/sbin/corosync-objctl
[shell]# cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf

注意：群集需要严格的时间同步机制，如果启用了防火墙需要开放相应的端口。

--------------------------------------------------

PCS（Pacemaker/Corosync configuration system）命令配置群集示例：

一、建立群集：

1、配置群集节点的认证as the hacluster user:
[shell]# pcs cluster auth node11 node12

2、创建一个二个节点的群集
[shell]# pcs cluster setup --name mycluster node11 node12
[shell]# pcs cluster start --all ## 启动群集

3、设置资源默认粘性（防止资源回切）
[shell]# pcs resource defaults resource-stickiness=100
[shell]# pcs resource defaults

4、设置资源超时时间
[shell]# pcs resource op defaults timeout=90s
[shell]# pcs resource op defaults

5、二个节点时，忽略节点quorum功能
[shell]# pcs property set no-quorum-policy=ignore

6、没有 Fencing设备时，禁用STONITH 组件功能
在 stonith-enabled="false" 的情况下，分布式锁管理器 (DLM) 等资源以及依赖DLM 的所有服务（例如 cLVM2、GFS2 和 OCFS2）都将无法启动。
[shell]# pcs property set stonith-enabled=false
[shell]# crm_verify -L -V ## 验证群集配置信息

二、建立群集资源

1、查看可用资源
[shell]# pcs resource list ## 查看支持资源列表，pcs resource list ocf:heartbeat
[shell]# pcs resource describe agent_name ## 查看资源使用参数，pcs resource describe ocf:heartbeat:IPaddr2

2、配置虚拟IP
[shell]# pcs resource create ClusterIP ocf:heartbeat:IPaddr2 \
ip="192.168.10.15" cidr_netmask=32 nic=eth0 op monitor interval=30s

3、配置Apache(httpd)
[shell]# pcs resource create WebServer ocf:heartbeat:apache \
httpd="/usr/sbin/httpd" configfile="/etc/httpd/conf/httpd.conf" \
statusurl="http://localhost/server-status" op monitor interval=1min

4、配置Nginx
[shell]# pcs resource create WebServer ocf:heartbeat:nginx \
httpd="/usr/sbin/nginx" configfile="/etc/nginx/nginx.conf" \
statusurl="http://localhost/ngx_status" op monitor interval=30s

5.1、配置FileSystem
[shell]# pcs resource create WebFS ocf:heartbeat:Filesystem \
device="/dev/sdb1" directory="/var/www/html" fstype="ext4"

[shell]# pcs resource create WebFS ocf:heartbeat:Filesystem \
device="-U 32937d65eb" directory="/var/www/html" fstype="ext4"

5.2、配置FileSystem-NFS
[shell]# pcs resource create WebFS ocf:heartbeat:Filesystem \
device="192.168.10.18:/mysqldata" directory="/var/lib/mysql" fstype="nfs" \
options="-o username=your_name,password=your_password" \
op start timeout=60s op stop timeout=60s op monitor interval=20s timeout=60s

6、配置Iscsi
[shell]# pcs resource create WebData ocf:heartbeat:iscsi \
portal="192.168.10.18" target="iqn.2008-08.com.starwindsoftware:" \
op monitor depth="0" timeout="30" interval="120"

[shell]# pcs resource create WebFS ocf:heartbeat:Filesystem \
device="-U 32937d65eb" directory="/var/www/html" fstype="ext4" options="_netdev"

7、配置DRBD
[shell]# pcs resource create WebData ocf:linbit:drbd \
drbd_resource=wwwdata op monitor interval=60s

[shell]# pcs resource master WebDataClone WebData \
master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

[shell]# pcs resource create WebFS ocf:heartbeat:Filesystem \
device="/dev/drbd1" directory="/var/www/html" fstype="ext4"

8、配置MySQL
[shell]# pcs resource create MySQL ocf:heartbeat:mysql \
binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" datadir="/var/lib/mysql" \
pid="/var/run/mysqld/mysql.pid" socket="/tmp/mysql.sock" \
op start timeout=180s op stop timeout=180s op monitor interval=20s timeout=60s

9、配置Pingd，检测节点与目标的连接有效性
[shell]# pcs resource create PingCheck ocf:heartbeat:pingd \
dampen=5s multiplier=100 host_list="192.168.10.1 router" \
op monitor interval=30s timeout=10s

10、创建资源clone，克隆的资源会在全部节点启动
[shell]# pcs resource clone PingCheck
[shell]# pcs resource clone ClusterIP clone-max=2 clone-node-max=2 globally-unique=true ## clone-max=2，数据包分成2路
[shell]# pcs resource update ClusterIP clusterip_hash=sourceip ## 指定响应请求的分配策略为：sourceip

三、调整群集资源

1、配置资源约束
[shell]# pcs resource group add WebSrvs ClusterIP ## 配置资源组，组中资源会在同一节点运行
[shell]# pcs resource group remove WebSrvs ClusterIP ## 移除组中的指定资源
[shell]# pcs resource master WebDataClone WebData ## 配置具有多个状态的资源，如 DRBD master/slave状态
[shell]# pcs constraint colocation add WebServer ClusterIP INFINITY ## 配置资源捆绑关系
[shell]# pcs constraint colocation remove WebServer ## 移除资源捆绑关系约束中资源
[shell]# pcs constraint order ClusterIP then WebServer ## 配置资源启动顺序
[shell]# pcs constraint order remove ClusterIP ## 移除资源启动顺序约束中资源
[shell]# pcs constraint ## 查看资源约束关系， pcs constraint --full

2、配置资源位置
[shell]# pcs constraint location WebServer prefers node11 ## 指定资源默认某个节点，node=50 指定增加的 score
[shell]# pcs constraint location WebServer avoids node11 ## 指定资源避开某个节点，node=50 指定减少的 score
[shell]# pcs constraint location remove location-WebServer ## 移除资源节点位置约束中资源ID，可用pcs config获取
[shell]# pcs constraint location WebServer prefers node11=INFINITY ## 手工移动资源节点，指定节点资源的 score of INFINITY
[shell]# crm_simulate -sL ## 验证节点资源 score 值

3、修改资源配置
[shell]# pcs resource update WebFS ## 更新资源配置
[shell]# pcs resource delete WebFS ## 删除指定资源

4、管理群集资源
[shell]# pcs resource disable ClusterIP ## 禁用资源
[shell]# pcs resource enable ClusterIP ## 启用资源
[shell]# pcs resource failcount show ClusterIP ## 显示指定资源的错误计数
[shell]# pcs resource failcount reset ClusterIP ## 清除指定资源的错误计数
[shell]# pcs resource cleanup ClusterIP ## 清除指定资源的状态与错误计数

四、配置Fencing设备，启用STONITH

1、查询Fence设备资源
[shell]# pcs stonith list ## 查看支持Fence列表
[shell]# pcs stonith describe agent_name ## 查看Fence资源使用参数，pcs stonith describe fence_vmware_soap

2、配置fence设备资源
[shell]# pcs stonith create ipmi-fencing fence_ipmilan \
pcmk_host_list="pcmk-1 pcmk-2" ipaddr="10.0.0.1" login=testuser passwd=acd123 \
op monitor interval=60s

mark:
If the device does not support the standard port parameter or may provide additional ones, you may also need to set the special pcmk_host_argument parameter. See man stonithd for details.
If the device does not know how to fence nodes based on their uname, you may also need to set the special pcmk_host_map parameter. See man stonithd for details.
If the device does not support the list command, you may also need to set the special pcmk_host_list and/or pcmk_host_check parameters. See man stonithd for details.
If the device does not expect the victim to be specified with the port parameter, you may also need to set the special pcmk_host_argument parameter. See man stonithd for details.
example: pcmk_host_argument="uuid" pcmk_host_map="node11:4;node12:5;node13:6" pcmk_host_list="node11,node12" pcmk_host_check="static-list"

3、配置VMWARE (fence_vmware_soap)
特别说明：本次实例中使用了第3项（pcs stonith create vmware-fencing fence_vmware_soap）这个指定pcmk配置参数才能正常执行Fencing动作。

3.1、确认vmware虚拟机的状态：
[shell]# fence_vmware_soap -o list -a vcenter.example.com -l cluster-admin -p <password> -z ## 获取虚拟机UUID
[shell]# fence_vmware_soap -o status -a vcenter.example.com -l cluster-admin -p <password> -z -U <UUID> ## 查看状态
[shell]# fence_vmware_soap -o status -a vcenter.example.com -l cluster-admin -p <password> -z -n <vm name>

3.2、配置fence_vmware_soap
[shell]# pcs stonith create vmware-fencing-node11 fence_vmware_soap \
action="reboot" ipaddr="192.168.10.10" login="vmuser" passwd="vmuserpd" ssl="1" \
port="node11" shell_timeout=60s login_timeout=60s op monitor interval=90s

[shell]# pcs stonith create vmware-fencing-node11 fence_vmware_soap \
action="reboot" ipaddr="192.168.10.10" login="vmuser" passwd="vmuserpd" ssl="1" \
uuid="421dec5f-c484-3d69-ddfb-65af46530581" shell_timeout=60s login_timeout=60s op monitor interval=90s

[shell]# pcs stonith create vmware-fencing fence_vmware_soap \
action="reboot" ipaddr="192.168.10.10" login="vmuser" passwd="vmuserpd" ssl="1" \
pcmk_host_argument="uuid" pcmk_host_check="static-list" pcmk_host_list="node11,node12" \
pcmk_host_map="node11:421dec5f-c484-3d69-ddfb-65af46530581;node12:421dec5f-c484-3d69-ddfb-65af46530582" \
shell_timeout=60s login_timeout=60s op monitor interval=90s

注：如果配置fence_vmware_soap设备时用port=vm name在测试时不能识别，则使用uuid=vm uuid代替；
建议使用 pcmk_host_argument、pcmk_host_map、pcmk_host_check、pcmk_host_list 参数指明节点与设备端口关系，格式：
pcmk_host_argument="uuid" pcmk_host_map="node11:uuid4;node12:uuid5;node13:uuid6" pcmk_host_list="node11,node12,node13" pcmk_host_check="static-list"

4、配置SCSI
[shell]# ls /dev/disk/by-id/wwn-* ## 获取Fencing磁盘UUID号，磁盘须未格式化
[shell]# pcs stonith create iscsi-fencing fence_scsi \
action="reboot" devices="/dev/disk/by-id/wwn-0x600e002" meta provides=unfencing

5、配置DELL DRAC
[shell]# pcs stonith create dell-fencing-node11 fence_drac
.....

6、管理 STONITH
[shell]# pcs resource clone vmware-fencing ## clone stonith资源，供多节点启动
[shell]# pcs property set stonith-enabled=true ## 启用 stonith 组件功能
[shell]# pcs stonith cleanup vmware-fencing ## 清除Fence资源的状态与错误计数
[shell]# pcs stonith fence node11 ## fencing指定节点

五、群集操作命令

1、验证群集安装
[shell]# pacemakerd -F ## 查看pacemaker组件，ps axf | grep pacemaker
[shell]# corosync-cfgtool -s ## 查看corosync序号
[shell]# corosync-cmapctl | grep members ## corosync 2.3.x
[shell]# corosync-objctl | grep members ## corosync 1.4.x

2、查看群集资源
[shell]# pcs resource standards ## 查看支持资源类型
[shell]# pcs resource providers ## 查看资源提供商
[shell]# pcs resource agents ## 查看所有资源代理
[shell]# pcs resource list ## 查看支持资源列表
[shell]# pcs stonith list ## 查看支持Fence列表
[shell]# pcs property list --all ## 显示群集默认变量参数
[shell]# crm_simulate -sL ## 检验资源 score 值

3、使用群集脚本
[shell]# pcs cluster cib ra_cfg ## 将群集资源配置信息保存在指定文件
[shell]# pcs -f ra_cfg resource create ## 创建群集资源并保存在指定文件中（而非保存在运行配置）
[shell]# pcs -f ra_cfg resource show ## 显示指定文件的配置信息，检查无误后
[shell]# pcs cluster cib-push ra_cfg ## 将指定配置文件加载到运行配置中

4、STONITH 设备操作
[shell]# stonith_admin -I ## 查询fence设备
[shell]# stonith_admin -M -a agent_name ## 查询fence设备的元数据，stonith_admin -M -a fence_vmware_soap
[shell]# stonith_admin --reboot nodename ## 测试 STONITH 设备

5、查看群集配置
[shell]# crm_verify -L -V ## 检查配置有无错误
[shell]# pcs property ## 查看群集属性
[shell]# pcs stonith ## 查看stonith
[shell]# pcs constraint ## 查看资源约束
[shell]# pcs config ## 查看群集资源配置
[shell]# pcs cluster cib ## 以XML格式显示群集配置

6、管理群集
[shell]# pcs status ## 查看群集状态
[shell]# pcs status cluster
[shell]# pcs status corosync
[shell]# pcs cluster stop [node11] ## 停止群集
[shell]# pcs cluster start --all ## 启动群集
[shell]# pcs cluster standby node11 ## 将节点置为后备standby状态，pcs cluster unstandby node11
[shell]# pcs cluster destroy [--all] ## 删除群集，[--all]同时恢复corosync.conf文件
[shell]# pcs resource cleanup ClusterIP ## 清除指定资源的状态与错误计数
[shell]# pcs stonith cleanup vmware-fencing ## 清除Fence资源的状态与错误计数