pacemaker常用命令

最新推荐文章于 2024-09-28 13:53:01 发布

Ankele

最新推荐文章于 2024-09-28 13:53:01 发布

阅读量2.9k

点赞数 2

分类专栏：云计算文章标签：服务器 linux 运维

本文链接：https://blog.csdn.net/OldHusband/article/details/127847438

版权

云计算专栏收录该内容

9 篇文章

订阅专栏

根据红帽高可用组件配置指南编写，只截取了简单部分，详细内容参考原文或pcs帮助文档

安装配置❤️

每个节点都执行：

安装

yum -y install pcs pacemaker fence-agents-all

启动

systemctl enable --now pcsd

密码

echo 'pacemakerFortristack' | passwd --stdin hacluster

在第一个节点执行验证

pcs cluster auth cnode1 cnode2 cnode3

在第一个节点执行

pcs cluster setup --name mycluster cnode1 cnode2 cnode3
pcs cluster start --all

第一节点添加远程节点

pcs cluster auth cnode4
pcs cluster node add-remote cnode4

第一节点添加stonith

[root@cnode1 ~]# pcs stonith create cnode1_ipmi fence_ipmilan pcmk_host_list=cnode1 ipaddr=192.168.3.11 login=ADMIN passwd=ADMIN lanplus=1 cipher=1 op monitor interval=60s
[root@cnode1 ~]# pcs stonith create cnode2_ipmi fence_ipmilan pcmk_host_list=cnode2 ipaddr=192.168.3.12 login=ADMIN passwd=ADMIN lanplus=1 cipher=1 op monitor interval=60s
[root@cnode1 ~]# pcs stonith create cnode3_ipmi fence_ipmilan pcmk_host_list=cnode3 ipaddr=192.168.3.13 login=ADMIN passwd=ADMIN lanplus=1 cipher=1 op monitor interval=60s
[root@cnode1 ~]# pcs stonith create cnode4_ipmi fence_ipmilan pcmk_host_list=cnode4 ipaddr=192.168.3.14 login=ADMIN passwd=ADMIN lanplus=1 cipher=1 op monitor interval=60s

配置位置

[root@cnode1 ~]# pcs constraint location cnode1_ipmi avoids cnode1
[root@cnode1 ~]# pcs constraint location cnode2_ipmi avoids cnode2
[root@cnode1 ~]# pcs constraint location cnode3_ipmi avoids cnode3
[root@cnode1 ~]# pcs constraint location cnode4_ipmi avoids cnode4

5 隔离：配置STONITH❤️

5.1 可用的stonith设备

pcs stonith list [filter]

pcs stonith disable stonith_id # 禁用隔离设备，这会阻止任何节点使用该设备

pcs constraint location cnode1_ipmilan avoids cnode1 # 配置位置avoid

pcs property set stonith-enabled=false # 将完全禁用所有隔离设备

5.2 隔离设备的常规属性

字段	类型	默认值	描述
pcmk_host_map	string		A mapping of host names to port numbers for devices that do not support host names. For example: node1:1;node2:2,3 tells the cluster to use port 1 for node1 and ports 2 and 3 for node2
pcmk_host_list	string		此设备控制的机器列表（可选，当pcmk_host_check=static-list时必选）
pcmk_host_check	string	dynamic-list	如何确定被设备控制的机器。允许的值： dynamic-list （查询设备）、static-list （检查 pcmk_host_list 属性）、none（假设每个设备都可以隔离每台机器）

5.3 显示device-specific 隔离选项

pcs stonith describe stonith_agent

5.4 创建隔离设备

pcs stonith create cnode1_ipmi fence_ipmilan pcmk_host_list=cnode1 ipaddr=192.168.3.11 login=ADMIN passwd=ADMIN lanplus=1 cipher=1 op monitor interval=60s

一些fence设备需要主机名映射到fence设备能理解的格式，比如加端口

5.5 显示隔离设备

pcs stonith show [stonith_id] [--full]

5.6 修改和删除隔离设备

pcs stonith update stonith_id [stonith_device_options]

pcs stonith delete stonith_id

5.7 使用隔离设备来管理节点

pcs stonith fence nodexx [--off] # 指定了off，会调用stonith来关闭节点，而不是重启

如果 stonith 设备无法隔离节点，即使它不再活跃，集群可能无法恢复该节点中的资源。

如果发生了这种情况，在手动确定该节点已关闭后，您可以输入以下命令向集群确认节点已关闭，并释放其资源以用于恢复。

pcs stonith confirm nodexx

5.8 其他隔离配置选项

pcmk_host_argument
pcmk_reboot_action √
pcmk_reboot_timeout
pcmk_reboot_retries
pcmk_off_action
pcmk_off_timeout
pcmk_off_retries
pcmk_list_action
pcmk_list_timeout
pcmk_list_retries
pcmk_monitor_action
pcmk_monitor_timeout
pcmk_monitor_retries
pcmk_status_action
pcmk_status_timeout
pcmk_status_retries
pcmk_delay_base
pcmk_delay_max
pcmk_action_limit
pcmk_on_action
pcmk_on_timeout
pcmk_on_retries

5.9 配置隔离级别

pcmk支持多个设备来fence一个节点，这叫做fencing topologies。要使用拓扑，先像平时一样创建独立fence设备，然后定义一个或多个fence等级。

级别以整数形式递增，从 1 开始。
如果设备故障，当前级别的处理将终止。不再使用该级别中的其他设备，而是尝试下一级别。
如果所有设备都成功防护，则该级别成功，不再尝试其他级别。
当一个级别通过(成功)或所有级别都已尝试(失败)时，操作结束。

添加隔离的级别

pcs stonith level add level node devices # devices以逗号分隔

列出

pcs stonith level

举个例子

# pcs stonith level add 1 rh7-2 my_ilo
# pcs stonith level add 2 rh7-2 my_apc
# pcs stonith level
 Node: rh7-2
  Level 1 - my_ilo
  Level 2 - my_apc

删除level和清理

pcs stonith level remove level [node_id] [stonith_id] ... [stonith_id]

pcs stonith level clear [node|stonith_id(s)]

6 配置集群资源❤️

6.1 资源创建

pcs resource create resource_id type [resource_options] [op opeation_options]
pcs resource delete resource_id

6.2 资源属性

pcs resource list # 列举所有可用资源
pcs resource list <filter> # 根据filter过滤

6.3 资源特定参数

pcs resource describe [standard:provider:]type # 查看资源option

6.4 元数据修改

pcs resource meta resource_id|group_id|clone_id|master_id meta_options
pcs resource show resource_id # 查看资源详情
pcs resource defaults [options] # 设置资源默认值，如resource-stickiness=100

6.5 资源组

pcs resource group list
pcs resource group add <group id> <resource id> # 无此group，则顺势添加
pcs resource group remove <group id> <resource id>

6.6 资源操作

三种操作: monitor start stop

属性有：timeout, on-fail, enabled, interval, name, id

pcs resource op add resource_id operation_action [operation_properties] # 添加操作
pcs resource op remove resource_id operation_name operation_properties
只能这样更新：
pcs resource update VirtualIP op stop interval=0s timeout=40s # 没有列出的选项都将重置为默认值

6.7 展示资源

pcs resource [show] # 列举所有已配置的资源
pcs resource show resource_id # 展示资源参数详情

6.8 修改资源参数

pcs resource update resource_id [resource_options]
如：pcs resource update VirtualIP ip=192.168.0.120

见8.5

6.9 多个监控操作

允许同一资源在不同时刻进行不同的监控操作

pcs resource op add VirtualIP monitor interval=60s OCF_CHECK_LEVEL=10

6.10 启用和禁用集群资源

pcs resource enable resource_id
pcs resource disable resource_id

6.11 集群资源清理

pcs resource failcount show httpd # 故障计数
pcs resource failcount reset httpd # 清理计数

pcs resource cleanup # 针对失败的资源，忘记资源操作历史记录并重新检测当前状态
pcs resource cleanup resource_id # 清理指定的resource
pcs resource refresh # 针对所有状态资源

7 资源约束❤️

位置约束、顺序约束、共存约束（资源相对于其他资源的位置）

location order colocation

7.1 位置限制

资源运行在哪个节点除了位置限制还有资源粘性的影响

以下命令为资源创建位置约束，指定偏好节点：

pcs constraint location resource_id prefers node=[score] [node=[score]] ...    INFINITY是默认分数值

以下命令为资源创建位置约束，指定避免节点：

pcs constraint location resource_id avoids node=[score] [node=[score]] ...

redhat7.4开始支持正则，通过正则表达式指定dummy0到dummy9位置限制：

pcs constraint location 'regexp%dummy[0-9]' prefers node1 # or dummy[[:digit:]]

pcs constraint location命令的 resource-discovery 选项，是否应该为指定资源在该节点上执行资源发现。在大量节点时显著提高性能
命令为：

pcs constraint location add id resource_id node score [resource-discovery=option]

其中的option可以为：

always 始终为此节点上的指定资源执行资源发现，默认值
never 永不发现
exclusive 仅在此节点上执行资源发现，可以多个节点标记exclusive
只有超过8个节点时才使用此选项

更复杂的位置约束，使用第11章的pacemaker rules，如下：

pcs constraint location resource_id rule [resource-discovery=option] [role=master|slave] [score=socre] expression

expression选项是以下之一：

define|not_defined
attribute lt|gt value
date gt|lt date
date in-range date to date
date in-range date to duration duration_options
date-spec date_spec_options
expression and|or expression
(expression)

以下命令配置一个周一到周五从上午 9 点下午 5 点为 true 的表达式。请注意，小时值为 16 可以匹配到 16:59:59，因为小时数仍然匹配

pcs constraint location WebServer rule score=INIFITY date-spec hours="9-16" weekdays="1-5"

7.1.4 opt-in集群，白名单，默认任何资源无法在任何位置运行

配置opt-in集群，首先
pcs property set symmetric-cluster=false
# pcs constraint location Webserver prefers example-1=200
# pcs constraint location Webserver prefers example-3=0
# pcs constraint location Database prefers example-2=200
# pcs constraint location Database prefers example-3=0
设置为0分，表示可运行在此节点上，但不是首选

7.1.4 opt-out集群，黑名单，默认所有资源可在任何位置运行

配置opt-out集群，首先
pcs property set symmetric-cluster=true
# pcs constraint location Webserver prefers example-1=200
# pcs constraint location Webserver avoids example-2=INFINITY
# pcs constraint location Database avoids example-1=INFINITY
# pcs constraint location Database prefers example-2=200
两个资源都可切换到节点 example-3

pcs resource defaults resource-stickiness=1
粘性设置为0时，Pacemaker 的默认行为是移动资源，以便在集群节点中平均分配这些资源。这可能导致健康的资源变化频率超过您的要求。
所以设置为1，这个小值可以被您创建的其他限制轻松覆盖，但可以防止 Pacemaker 在集群中无用地移动处于健康状态的资源
如果位置约束分数高于资源粘性值，集群仍然可以将健康资源移至位置约束点的节点

7.2 顺序限制

pcs constraint order [action] resource_id then [action] resource_id [options]

action可能的值有：start / stop / promote / demote

option可能的项目有：
kind optional(), mandatory(若第一个资源停止，则第二个资源必须停止), serialize()
symmetrical(true：按反序停止)

pcs constraint order set RSC1 RSC2 RSC3
pcs constraint order remove RSC1 RSC2 RSC3

7.3 资源共存

pcs constraint colocation add [master|slave] source_resource with [master|slave] target_resource [score] [options]
source_resource 依赖 target_resource，所以先决定target的位置，然后source也就确定了
score 正数表示运行在同一节点上，负数表示否。默认+INFINITY

7.3.1 强制放置

当score为+INFINITY或-INFINITY时，强制放置，不满足情况则source_resource不运行
如，要求RSC1和RSC2始终在同一台机器中运行：

pcs constraint colocation add RSC1 with RSC2 score=INFINITY。。若RSC2无法在任何节点上运行，则不允许RSC1运行

7.3.2 advisory放置

强制放置是must，advisory放置是I would prefer if，

7.3.3 资源共存集合

pcs constraint colocation set RSC1 RSC2 with RSC3 RSC4 … 表示RSC1 RSC2是有关系的，这里是RSC3 RSC4集合的共存关系

7.3.4 删除资源共存限制

pcs constraint colocation remove source_resource target_resource

7.4 显示限制

pcs constraint [show|list] # 列出所有限制

pcs constraint location [show [resources|nodes [<node> | <resource>]...] [--full]]
如：pcs constraint location show resources openstack-nova-compute-clone

pcs constraint order show # 列出所有顺序限制

pcs constraint colocation show [--full] # 显示所有colocation

pcs constraint ref resource_id # 列出引用特定资源的约束

8 管理集群资源❤️

8.1 手动在集群中移动资源

从当前节点移动资源

pcs resource move resource_id [dest-node] [--master] [lifetime=lifetime]

执行如上命令时，实际上是添加constraint INFINITY or -INFINITY

可使用pcs resource clear / pcs constraint delete 命令删除

如果指定--master，则约束范围仅为master角色的资源，必须指定master_id而不是resource_id

修改集群参数

pcs property set cluster-recheck-interval=value

这样就可以在move的时候指定lifetime了

pcs resource move resource1 node2 lifetime=PT1H30M # 区别Month和Minute，Minute前加PT

将资源移动到首选节点

由于故障转移或手动移动，资源可能不在原始节点上，要想重定位到首选节点上

pcs resource relocate run [resource1] [resource2] ... # 若未指定任何资源，则所有资源重定位到首选节点

删除由pcs resource relocate run创建的限制，使用pcs resource relocate clear

显示资源的当前状态及最佳节点忽略资源粘性输入 pcs resource relocate show

8.2 因为失败而移动资源

创建资源时，可为资源设置migration-threshold，使其在多个故障后迁移至新节点

直到 pcs resource failcount reset 重新计数后

migration-threshold默认为INFINITY

例如，为RSC1添加一个迁移阈值为10，表示资源在10个故障后将迁移到新节点

pcs resource meta RSC1 migration-threshold=10

以下为整个集群设置迁移阈值的默认值

pcs resource defaults migration-threshold=10

有特殊情况：

若集群有属性start-failure-is-fatal为true（默认值），资源启动失败会导致故障计数为INFINITY，
所以资源会立马迁移

停止失败不同，若没启用STONITH，集群将无法继续

8.3 由于连接更改而移动资源

创建一个ping资源，创建为clone，一遍资源在所有节点中运行

pcs resource create ping ocf:pacemaker:ping dampen=5s multiplier=1000 host_list=192.168.4.11 clone

如下，配置WebServer的位置约束规则，若当前运行主机无法ping桶4.11，则WebServer将移至能ping 通4.11的主机

pcs constraint location WebServer rule score=-INFINITY pingd lt 1 or not_defined pingd

8.4 启用、禁用和禁止集群资源

禁用和启用

pcs resource disable resource_id [--wait=[n]] # 指定了wait，但不指定n，默认60min，试了，并不是
pcs resource eanble resource_id [--wait=[n]]

禁止，相当于添加-INFINITY的位置约束

pcs resource ban resource_id [node] [--master] [lifetime=lifetime] [--wait=[n]]

可执行pcs resource clear或者pcs constraint delete删除约束

重启

pcs resource restart resource_id # 只能重启已运行的资源

debug-start

pcs resource debug-start resource_id # 可以看到启动是否成功，失败原因

pcs cluster start --all 启动所有资源，失败日志在/var/log/cluster/corosync.log或/var/log/messages中查看

8.5 禁用monitor操作

停止monitor最简单的方法就是删除它，但有时，只想临时禁用

pcs resource update将enabled="false"，要恢复即enable="true"

当使用pcs resource update更新资源操作时，没有强调的选项都将置为默认值
见6.8

pcs resource update RSC1 op monitor enabled=false

比如，你曾配置过超时为600s的监控操作，为了保持这个配置你就得在更新其他值的时候把这个超时也指定出来

pcs resource update RSC1 op monitor timeout=600 enabled=true

8.6 受管资源

将资源设置为非受管，表示资源仍然在配置中，但pcmk不管理该资源

pcs resource unmanage resource1 [resource2] ...

pcs resource manage resource1 [reosurce2] ...

9 高级配置❤️

9.1 资源克隆

创建克隆资源，以便在多个节点上激活该资源

可以克隆资源代理支持的任何资源，克隆由一个资源或一个资源组组成

只有同时可在多个节点上活跃的资源才适用克隆

创建克隆资源

pcs resource create resource_id standard:provider:type|type [resource options] clone [meta clone_options] # 克隆的名称为resource_id-clone

不能在单条命令中创建资源组以及资源组的克隆，但可以这样

pcs resource clone resource_id | group_name [clone_options] ... # resource_id-clone or group_name-clone

当你创建依赖另一个克隆资源A的克隆资源B时，应该设置 interleave=true ，确保当依赖的克隆在同一节点上停止或启动时，依赖克隆的副本可以停止或启动。如果不设置此项，则如果克隆资源B依赖于克隆资源A，且某个节点离开集群，当该节点返回集群且资源A在该节点上启动时，则所有节点上资源B的所有副本将重新启动。这是因为当依赖的克隆资源没有设置interleave选项时，该资源的所有实例都依赖于它所依赖的资源的任何运行实例。

删除资源或资源组的克隆，不会删除资源或资源组本身：

pcs resource unclone resource_id | group_name
如
pcs resource unclone ping # 之前创建的ping

上边clone_options包括哪些？

字段	描述
priority, target-role, ismanaged	继承自resource，见6.3
clone-max	要启动的资源副本数量。默认为集群中的节点数量
clone-node-max	在一个节点上可以启动资源的副本数 ; 默认值为 1
notify	当停止或启动克隆的副本时，预先并在操作成功时告知所有其他副本。false or true.默认false
globally-unique	每个副本是否执行不同的功能，true or false，若false，则资源在任何位置行为相同，因此每台机器只能有一个克隆副本；若true，同一机器上的副本不同。若clone-node-max大于1，则默认true，否则默认false
ordered	是否以顺序的方式启动副本，而不是并行，默认false
interleave	更改顺序限制的行为，不用等待所有节点上的A依赖启动或停止完成才启停B，默认false
clone-min	If a value is specified, any clones which are ordered after this clone will not be able to start until the specified number of instances of the original clone are running, even if the `interleave` option is set to true.

克隆限制
当clone-max小于节点总数时，可以使用位置约束，与常规的位置约束不同，需指定clone-id

pcs constraint location SRC-clone prefers node1

…

9.2 多状态资源：具有多个模式的资源

多状态是一种特殊的克隆资源

Master 和 Slave 两种

当已经有一个实例启动了，再启动的实例只能是Slave

创建

pcs resource create resource_id ::type [resource opts] master # 名称为resource_id-master

从已有资源创建

pcs resource master mster/slve_name resource_id|group_name [master_options]

多状态资源属性 master_options：

项	描述
id	多状态资源的名称
priority, target-role, is-managed	见6.3
clone-max, clone-node-max, notify, globally-unique, ordered, interleave	见`clone-options`
master-max	可以提升为`master`状态的副本数，默认1
master-node-max	单个节点上，可以提升为`master`状态的最多副本数，默认1

9.2.1 监控多状态资源

要仅为master资源添加监控操作，可以在资源中添加额外的monitor，但注意，资源中的每个monitor操作都必须具有不同的interval

例如：

pcs resource op add RSC interval=11s role=Master

9.2.2 多状态约束

在大多数情况下，多状态资源在每个活跃的集群节点上都有一个副本

也可以指定集群使用资源位置约束来优先分配哪些节点。这些限制与常规资源的写法不同。

参考7.1 位置限制

使用colocation来指定资源是master还是slave

pcs constraint colocation add [master|slave] source_resource with [master|slave] target_resource [score] [options]

参考7.3 资源共存

顺序也可以限制，参考7.2

pcs constraint order [action] resource_id then [aciton] resource_id [options]

多状态粘性

…

9.3 将虚拟机配置为资源

libvirt管理的虚拟机可使用VirtualDomain这个type来创建为资源

还应该考虑如下事项：

虚拟机被配置为集群资源前，应当关机
配置后，只能使用集群工具来对其进行开、关、迁移等操作
不要为已配置为集群资源的虚拟机设置自启动
所有节点应有权访问虚拟机的配置文件和存储设备

若还想管理虚拟机内部的服务，可以将其配置为guest node，详见9.4

虚拟化部署和管理指南

…

9.4 pacemaker_remote 服务

pacemaker_remote可以让未运行corosync的节点集成到集群，像真实集群节点一样管理集群资源

pacemaker_remote 提供的功能包括以下：

支持RHEL7.7 对32节点限制进行扩展（7.6好像是16节点限制）
允许将虚拟环境作为集群资源管理，还可以将虚拟环境内部的服务作为集群资源

描述pacemaker_remote

集群节点-运行高可用服务（pacemaker和corosync）的节点
远程节点-运行pacemaker_remote服务的节点，不需要corosync。使用ocf:pacemaker:remote代理将远程节点作为资源加入到集群中
客户节点-运行pacemaker_remote服务的虚拟机节点，虚拟机作为集群资源，并集成到集群中作为远程节点
pacemaker_remote - 一个可以在pcmk集群中的远程节点和客户机节点内（KVM和LXC）执行的本地资源管理守护进程(LRMD)的改进版，可以在没有corosync的节点上远程管理资源
LXC - 由libvirt-lxclinux容器驱动程序定义的Linux容器

运行了pacemaker_remote的pcmk集群有以下特征：

远程节点和客户机节点运行pacemaker_remote服务
集群节点上运行pacemaker和corosync连接到远程节点的pacemaker_remote服务
集群节点可启动客户节点，并立即连接到客户节点上的pacemaker_remote服务

集群节点与其管理的远程节点和客户节点区别在于运行了cluster stack - (corosync)，意味着远程节点和客户节点有以下限制：

它们不在仲裁里
它们不执行隔离操作
它们没有资格成为集群的DC 指定控制器
它们本身不运行完整的pcs命令

9.4.1 主机和客户机身份认证

集群节点和remote节点必须共享相同的私钥，默认情况下，私钥放在集群节点和远程节点的/etc/pacemaker/authkey中

RHEL7.4开始，

pcs cluster node add-guest 会设置客户机authkey

pcs cluster node add-remote 设置远程节点的authkey

9.4.2 客户节点资源选项

remote-node – 客户节点资源名称
remote-port – 默认3121，用于guest连接的pacemaker_remote端口
remote-addr – 若remote-node不是客户机主机名，则要连接的IP地址或主机名
remote-connect-timeout – 默认60s

9.4.3 远程节点资源选项

…

9.4.4 更改默认端口

编辑 /etc/sysconfig/pacemaker 修改变量 PCMK_remote_port=3121

9.4.5 配置概述：KVM客户机节点

概述使用PCMK启动虚拟机，并将虚拟机作为集群资源

配置VirtualDomain资源
rhel7.3以前，每个节点执行生成authkey
1. mkdir -p --mode=0750 /etc/pacemaker
2. chgrp haclient /etc/pacemaker
3. 创建一次密钥，复制到所有节点 dd if=/dev/urandom of=/etc/pacemaker/authkey bs=4096 count=1
rhel7.3以前，pcs cluster remote-node add hostname resource_id [options]
rhel7.4开始，pcs cluster node add-guest hostname resource_id [options]
pcs constraint location webserver prefers guest1

9.4.6 配置概述：远程节点(RHEL7.4)

防火墙开启 firewall-cmd --permanent --add-service=high-availability 然后 firewall-cmd --reload
远程节点安装 yum -y install pacemaker-remote resource-agents pcs
开启 systemctl enable --now pcsd
按理说还需要在远程节点执行 passwd hacluster 配置认证密码
集群节点执行 pcs cluster node add-remote <remote_node>
pcs constraint location nginx-clone avoids <remote_node>
和集群节点一样为远程节点配置fence resource，但只有集群节点才会真正隔离其他节点

9.4.7 配置概述：远程节点(RHEL7.3 及以前)

…

9.4.8 系统升级和pacemaker_remote

…

9.5 Docker容器的pacemaker支持

9.5.1 配置Pacemaker Bundle资源

pcs resource bundle create bundle_id container docker [container_options] [network network_options] [port-map port_options]...  [storage-map storage_options]... [meta meta_options] [--disabled] [--wait[=n]]

需要每个运行Bundle的节点都能正常使用docker，已有docker image

Docker Parameters

image – Docker镜像
replicas – value of promoted-max if that is positive, otherwise 1 – 一个正整数，指定要启动的容器数
replicas-per-host – 1 – 正整数，指定单个节点运行的容器数量
promoted-max – 0 – 非负整数，为正则容器服务被视为多状态服务，表示允许在master节点上运行此服务的数量
network – 如果指定，将传递给docker run，作为容器网络配置
run-command – /usr/sbin/pacemaker_remote or None – 该命令在容器中运行，若bundle包含资源，则先启动pacemaker-remoted守护进程
options – 传递给docker run的其他命令行选项

Bundle Network Parameters

add-host
ip-range-start
host-netmask
host-interface
control-port

** Bundle资源端口映射参数**

id
port
internal-port
range

** Bundle资源存储映射参数**

id
source-dir
source-dir-root
target-dir
options

source-dir不存在，期望容器或资源代理会创建

若bundle包含PCMK资源，PCMK将自动映射 source-dir=/etc/pacemaker/authkey target-dir=/etc/pacemaker/authkey source-dir-root=/var/log/pacemaker/bundles target-dir=/var/log 到容器，所以没必要配置这些到 storage

在集群的任何节点上，PCMK_auth_location环境变量不得设置为/etc/pacemaker/authkey以外的任何值

9.5.2 在Bundle中配置Pacemaker资源

若Bundle包含有PCMK资源，则容器镜像必须包含pacemaker_remote

包含资源的捆绑包中的容器必须具有可访问的网络环境，以便集群节点上的 Pacemaker 可以与容器内的 Pacemaker 远程联系

…

9.5.3 Pacemaker Bundle的限制

bundle不能包含在组中，不能使用pcs显示地创建为clone资源
当bundle非受管或集群处于维护模式时，重启pacemaker可能会导致bundle失败
bundle没有实例属性、使用属性或操作，尽管bundle中的资源可能有这些
只有当包含有资源的bundle使用独一无二的control-port，bundle才能运行在远程节点上

9.5.4 Pacemaker Bundle配置示例

创建一个包含有httpd资源（ocf:heartbeat:apach）的bundle —— httpd-bundle

此流程需要以下先决条件：

Docker has been installed and enabled on every node in the cluster
there is an existing Docker image, named pcmktest:http
the container image includes the Pacemaker Remote daemon
the container image includes a configured Apache web server
every node in the cluster has directories /var/local/containers/httpd-bundle-0, /var/local/containers/httpd-bundle-1, and /var/local/containers/httpd-bundle-2,其中包含 web 服务器 root 的 index.html 文件。在生产中，更有可能使用一个共享的文档根目录，但示例中，此配置允许您使每个主机上的 index.html 文件与众不同，以便您可以连接到 Web 服务器并验证是否提供了 index.html 文件。

此流程为bundle配置参数：

bundle id 是 httpd-bundle
docker image 是 pcmktest:http
将创建三个容器实例
将命令行选项 --log-driver=journald 传递给 docker run 命令。此参数不是必需的，但用于演示如何将额外选项传递给 docker 命令。值为 journald 表示容器内的系统日志将记录在底层主机的 systemd 日志中。
连续三个隐式ocf:heartbeat:IPaddr2资源，每个容器一个，IP从192.168.122.131开始
IP地址创建在eth0接口上
CIDR是24
创建一个端口映射http-port，容器:80
创建存储映射 httpd-root
- source-dir-root是/var/local/containers
- target-dir是/var/www/html
- rw挂载
- pacemaker自动在容器中映射source-dir=/etc/pacemaker/authkey，不用指定

9.6 使用和放置策略

…

9.7 为不由pacemaker管理的资源依赖项配置启动顺序(rhel74)

9.8 使用SNMP查询pacemaker集群(rhel75)

9.9 配置资源以保持在clean node shutdown 上停止(rhel78)

10 集群仲裁❤️

RHEL HA add-on 集群使用votequorum和fencing来避免脑裂

很多votes分配给集群中的每个系统，只有当得票超过半数，集群操作才被允许。

该服务必须载入到所有集群节点，若只是部分集群节点，则结果无法预计

10.1 配置仲裁选项

pcs cluster setup时的选项：

–auto_tie_breaker – 启用此，集群可同时承受50%的节点故障。集群分区，或者仍然与auto_tie_breaker_node中配置的nodeid(或者如果没有设置最低的nodeid)保持联系的节点集，将保持quorum。其他节点将inquorate。此选项主要用于节点数量为偶数的集群，因为它允许集群以偶数分割继续工作。
–wait_for_all – 启用后，只有所有节点至少在同一时间可见一次后，集群才第一次被quorum。wait_for_all选项主要用于双节点集群和使用仲裁设备lms(最后一人站)算法的偶数节点集群。 wait_for_all选项在双节点集群、不使用仲裁设备且禁用auto_tie_breaker时自动启用。您可以通过显式地将wait_for_all设置为0来覆盖它。
–last_man_standing – 启用后，集群可以在特定情况下动态地重新计算expected_votes和quorum。在启用该选项时，必须启用wait_for_all。与quorum devices不兼容。
–last_man_standing_window – 时间 in ms，集群节点丢失后重新计算expected_votes和quorum所需的等待时间。

10.2 仲裁管理命令

查看集群仲裁配置

pcs quota [config]

查看集群仲裁状态

pcs quota status

直接修改预期vote，让集群在没有仲裁的情况下继续操作
pcs quorum expected-votes votes

10.3 修改仲裁选项

pcs quorum update [auto_tie_breaker=[0|1]] [last_man_standing=[0|1]] [last_man_standing_window=[time-in-ms] [wait_for_all=[0|1]]

10.4 仲裁unblock命令

在您知道集群不仲裁但您希望集群进行资源管理的情况下，您可以使用以下命令来防止集群在建立仲裁时等待所有节点。

使用这个命令时需要特别小心。在运行此命令前，请确定关闭没有在集群中的节点，并确保无法访问共享资源。

pcs cluster quorum unblock

10.5 仲裁设备

建议在具有偶数节点的集群中使用仲裁设备

…

11 pacemaker规则❤️

rules可以让集群配置更动态

规则的一种用法是分配机器到不同的组（使用node attribute），然后在创建位置约束时使用该属性

每条规则包含多个表达式、日期表达式甚至其他规则

表达式的结果根据规则的 boolean-op 字段合并，以确定规则最终评估为 true 或 false。接下来的操作要看规则使用的上下文而定。

规则的属性：

字段	描述
role	只有资源位于该角色时才会应用该规则。允许的值有：`started`, `slave` and `master`
score	规则评估为true时要应用的分数，仅限于作为位置约束的一部分时使用
score-attribute	如果规则评估为 true，则要查找节点属性并将其用作分数，仅限于作为位置约束一部分的规则使用
boolean-op	如何组合多个表达式对象的结果。允许的值： `and` , `or`.默认值为 `and`.

11.1 节点属性表达式

节点属性表达式用于根据节点定义的属性控制资源

字段	描述
attribute	要测试的节点属性
type	值应该如何进行测试，string, integer or version，默认string
operation	执行的对比，允许的值， lt gt lte gte eq ne defined not_defined
value	用于比较的值（必填）

除了管理员为集群添加的属性，集群还有內建属性

字段	描述
#uname	节点名
#id	节点ID
#kind	节点类型：cluster, remote or container.
#is_dc	是否是DC
#cluster_name	集群属性cluster-name的值
#site_name	节点属性site-name的值，若未设置则为 #cluster_name
#role	此节点上相关的多状态资源的角色。仅在多状态资源的位置约束的规则内有效。

11.2 基于时间/日期的表达式

…

11.3 日期格式

…

11.4 持续时间

11.5 使用pcs配置规则

要使用pcs配置规则，见7.1.3 使用规则确定资源位置

若删除的规则是constraint中的最后一条规则，则constraint被删除

pcs cosntraint rule remove rule_id

12 Pacemaker集群属性❤️

12.1 集群属性和选项概述

集群属性：

选项	默认值	描述
batch-limit	0	集群可以并行执行多少资源动作，与网络负载和速度有关
migration-limit	-1(umlimited)	在一个节点上并行执行迁移的数量
no-quorum-policy	stop	当集群没有仲裁时做啥，允许：ignore（继续所有资源管理）、freeze、stop、suicide
symmetric-cluster	true	资源是否默认可以在任何节点上运行
stonith-enabled	true	是否开启stonith，失败的节点或有资源却无法停止的节点应该被隔离
stonith-action	reboot	发送到STONITH设备的操作，允许的值：reboot、off，poweroff也允许，但只用于旧设备
cluster-delay	60s	网络上的往返延迟（不包括动作执行时间）“正确”的值取决于网络和集群节点的速度和负载
stop-orphan-resources	true	已删除的资源是否应被停止
stop-orphan-actions	true	指示是否应取消已删除的操作
start-failure-is-fatal	true	指示在特定节点上启动资源的失败是否会阻止该节点上的进一步启动尝试。
pe-error-series-max	-1 (all)
pe-warn-series-max	-1(all)
pe-input-seires-max	-1(all)
cluster-infrastructure		当前运行的 Pacemaker 的消息堆栈。用于信息和诊断目的，用户不能配置。
DC-version		集群的 Designated Controller(DC)上的 Pacemaker 版本。用于诊断目的，用户不能配置。
last-lrm-refresh		最后一次刷新本地资源管理器，自 epoca 起以秒为单位。用于诊断目的，用户不能配置。
cluster-recheck-interval	15min	对选项、资源参数和约束进行基于时间的更改的轮询间隔。允许的值:0禁用轮询，正值是以秒为单位的间隔(除非指定了其他SI单位，例如5min)。注意，这个值是检查之间的最大时间间隔; 如果集群事件发生的时间早于该值指定的时间，检查将更快完成。
maintenance-mode	false	Maintenance Mode 让集群进入"手动关闭"模式，而不要启动或停止任何服务，直到有其他指示为止。当维护模式完成后，集群会对任何服务的当前状态进行完整性检查，然后停止或启动任何需要它的状态。
shutdown-escalation	20min
stonith-timeout	60s	等待 STONITH 操作完成的时间。
stop-all-resources	false	集群是否应该停止所有资源
enable-acl	false	指明群集是否可以使用访问控制列表，如 pcs acl 命令所设置。
placement-strategy	default
fence-reaction	stop

12.2 设置和删除集群属性

设置集群属性的值

pcs property set property=value

删除集群属性

pcs property unset property

恢复默认值

pcs property set property=   # 对咯，留空

12.3 查询集群属性设置

pcs property list # 显示所有

pcs property list --all # 显示所有，包括未明确设置的默认值

pcs property show <property>

pcs property list --defaults # 显示所有默认的

13 触发脚本 for 集群事件❤️

pacemaker集群是一个事件驱动的系统，其中事件可能是资源或节点故障、配置更改、资源启停

可以按如下两种方式配置集群告警：

从RHEL7.3开始，可以通过告警代理来配置pcmk告警，像集群资源代理那样，详见13.1
ocf:pacemaker:ClusterMon资源可以监控集群状态，并触发每个集群事件的警报，此资源在后台以固定时间间隔运行crm_mon命令，详见13.2

13.1 pacemaker告警代理

13.1.1 使用示例警报代理

先copy告警脚本

install --mode=0755 /usr/share/pacemaker/alerts/alert_file.sh.sample /var/lib/pacemaker/alert_file.sh

创建文件，创建告警代理

touch /var/log/pcmk_alert_file.log
chown hacluster:haclient /var/log/pcmk_alert_file.log
chmod 600 /var/log/pcmk_alert_file.log
pcs alert create id=alert_file description="Log events to a file" path=/var/lib/pacemaker/alert_file.sh
pcs alert recipient add alert_file id=my-alert_logfile value=/var/log/pcmk_alert_file.log

创建snmp告警代理

# install --mode=0755 /usr/share/pacemaker/alerts/alert_snmp.sh.sample /var/lib/pacemaker/alert_snmp.sh
# pcs alert create id=snmp_alert path=/var/lib/pacemaker/alert_snmp.sh meta timestamp-format="%Y-%m-%d,%H:%M:%S.%01N"
# pcs alert recipient add snmp_alert value=192.168.1.2
# pcs alert
Alerts:
 Alert: snmp_alert (path=/var/lib/pacemaker/alert_snmp.sh)
  Meta options: timestamp-format=%Y-%m-%d,%H:%M:%S.%01N.
  Recipients:
   Recipient: snmp_alert-recipient (value=192.168.1.2)

创建电子邮件告警

# install --mode=0755 /usr/share/pacemaker/alerts/alert_smtp.sh.sample /var/lib/pacemaker/alert_smtp.sh
# pcs alert create id=smtp_alert path=/var/lib/pacemaker/alert_smtp.sh options email_sender=donotreply@example.com
# pcs alert recipient add smtp_alert value=admin@example.com
# pcs alert
Alerts:
 Alert: smtp_alert (path=/var/lib/pacemaker/alert_smtp.sh)
  Options: email_sender=donotreply@example.com
  Recipients:
   Recipient: smtp_alert-recipient (value=admin@example.com)

13.1.2 创建警报

pcs alert create path=path [id=alert-id] [description=description] [options [option=value]...] [meta [meta-option=value]...]

13.1.3 显示、修改和删除警报

pcs alert [config|show] # 显示

pcs alert update alert-id [path=path] [description=description] [options [option=value]...] [meta [meta-option=value]...]

pcs alert remove alert-id

13.1.4 警报Recipients

通常，警报是针对接收方的。因此，每个警报可能被额外配置为一个或多个接收方。集群将为每个接收者单独调用代理。

pcs alert recipient add alert-id ... # 添加

pcs alert recipient update recipient-id ... # 更新

pcs alert recipient remove recipient-id # 移除

13.1.5 警报元数据选项

与资源代理一样，可以对警报代理配置 meta 选项来影响 Pacemaker 调用它们的方式

Meta-Attribute	Default	Description
timestamp-format	%H:%M:%S.%06N
timeout	30s	如果警报代理没有在这段时间内完成，它将被终止

# pcs alert create id=my-alert path=/path/to/myscript.sh meta timeout=15s
# pcs alert recipient add my-alert value=someuser@example.com id=my-alert-recipient1 meta timestamp-format="%D %H:%M"
# pcs alert recipient add my-alert value=otheruser@example.com id=my-alert-recipient2 meta timestamp-format=%c

13.1.6 警报配置命令示例

…

13.1.7 编写警报代理

Pacemaker 警报有三种类型：节点警报、保护警报和资源警报

传递给警报代理的环境变量

环境变量	描述
CRM_alert_kind	警报类型(node,fencing, or resource)
CRM_alert_version	Pacemaker 发送警报的版本
CRM_alert_recipient	配置的接受者
CRM_alert_node_sequence
CRM_alert_timestamp
CRM_alert_node	受影响的节点
CRM_alert_desc	有关事件的详情
CRM_alrt_nodeid	状态更改的节点ID
CRM_alert_task	请求的隔离或资源操作
CRM_alert_rc	保护或资源操作的数字返回代码（仅由隔离和资源警告提供）
CRM_alert_rsc	受影响资源的名称（仅限资源警报）
CRM_alert_interval	资源操作的时间间隔（仅限资源警报）
CRM_alert_target_rc	操作的语气数字返回代码（仅限资源警报）
CRM_alert_status	Pacemaker 用来表示操作结果的数字代码（仅用于资源警报）

在编写警报代理时，您必须考虑以下问题

警告代理可以在没有接收者的情况下被调用（如果没有配置任何接收者），因此代理必须能够处理这种情况，即使它只在那种情况下才会退出。用户可以修改配置阶段，并在以后添加一个接收者。
如果为警报配置了多个接收者，则会为每个接收者调用一个警报代理。如果代理无法同时运行，则应该只使用单个的接收者进行配置。不过，代理可以自由地将接收者解析为一个列表。
当发生集群事件时，所有警报都会与独立进程同时触发。根据配置了警报和接收方的数量以及警报代理中的操作，可能会发生大量负载。可以编写代理来考虑这一点，例如将资源密集型操作排队到其他实例中，而不是直接执行。
警报代理以 hacluster 用户身份运行，该用户具有最小权限集。如果代理需要额外的特权，建议配置 sudo 以允许代理以具有适当特权的另一用户身份运行必要的命令。
请小心地验证和清理用户配置的参数，如 CRM_alert_timestamp （由用户配置的 timestamp-format）、CRM_alert_recipient 和所有警报选项指定的内容。这是防止配置错误所必需的。此外，如果某些用户可以在没有 hacluster-level 访问集群节点的情况下修改 CIB，则也是潜在的安全问题，您应该避免注入代码的可能性。
如果群集包含将 on-fail 参数设置为隔离的操作的资源，则失败时会有多个隔离通知，每个资源都有一个用于设置此参数的资源，再加上一个附加通知。STONITH 守护进程和 crmd 守护进程都将发送通知。pacemaker 在这种情况下只能执行一个实际隔离操作，无论发送了多少条通知。

警报接口设计为与 ocf:pacemaker:ClusterMon 资源使用的外部脚本界面向后兼容。为了保持这种兼容性，传递给警报代理的环境变量会预先带有 CRM_notify_ 和 CRM_alert_。兼容性问题之一是 ClusterMon 资源以 root 用户身份运行外部脚本，而警报代理则以 hacluster 用户身份运行。有关配置由 ClusterMon 触发的脚本的详情请参考第 13.2 节 “使用监控资源的事件通知”。