Part III. Administrative Tasks

最新推荐文章于 2023-06-10 15:24:14 发布

zhumin726

最新推荐文章于 2023-06-10 15:24:14 发布

阅读量813

点赞数

分类专栏：译文

译文专栏收录该内容

8 篇文章 0 订阅

订阅专栏

Part III. 管理任务

5. 配置和初始化Heartbeat

5.1. The ha.cf file 5.2. The authkeys file 5.3. 传播到节点的集群配置 5.4. 启动 Heartbeat 服务 5.5. Where to go from here

6. 升级 Heartbeat

6.1. 从 Heartbeat 2.1 集群升级，不使用 CRM

6.1.1. 中止 Heartbeat services 6.1.2. 升级软件 6.1.3. 启用 Heartbeat集群的 Pacemaker 6.1.4. 重启服务 Restarting Heartbeat

6.2.使用 CRM 升级 Heartbeat 2.1 clusters

6.2.1. Placing the cluster in unmanaged mode 6.2.2. Backing up the CIB 6.2.3. Stopping Heartbeat services 6.2.4. Wiping files related to the CRM 6.2.5. Restoring the CIB 6.2.6. Upgrading software 6.2.7. Restarting Heartbeat services 6.2.8. Returning the cluster to managed mode 6.2.9. Upgrading the CIB schema

对于所有的 Heartbeat 集群，下面的配置是必须的：

/etc/ha.d/ha.cf — 全局 cluster配置文件 .
/etc/ha.d/authkeys — 单个的节点的用于检验权限的配置文件

5.1. `ha.cf` 文件

最简单的 ha.cf 文件的例子：

autojoin none
mcast bond0 239.0.0.43 694 1 0
bcast eth2
warntime 5
deadtime 15
initdead 60
keepalive 2
node alice
node bob
pacemaker respawn

设置自动加入到没有禁用集群节点的自动发现，并要求明确地列出了群集节点，使用节点选项。这将加快集群开始在一个固定的少数节点的集群。

此示例假定Bond0是在群集的共享网络接口，，ETH2是DRBD的复制两个节点之间的专用接口。因此，BOND0可以用于多播的心跳，而在eth2广播eth2的是可以接受的，因为是不共享的网络。

接下来的选项配置节点的故障检测。他们设定的时间后，它的心跳发出警告，一个不再可用对等的节点可能死亡（warntime），在此时间后心跳认为一个节点确认的死亡（死区时间），并等待其他节点的最大时间办理登机手续在群集启动时（initdead）。保持连接设置心跳发送保持活动数据包的时间间隔。所有这些选项都以秒为单位。

节点选项标识群集成员。这里列出的选项值必须完全匹配的群集节点的主机名，使用uname-n。

使pacemaker 使pacemaker 群集管理器重新有效，并，确保Pacemaker 在失败的情况下会自动重新启动。

须知

	须知
在此之前心跳版本3.0.4， `pacemaker`关键字被称为CRM。较新的版本仍然保留旧名称的兼容性别名，但起搏器是首选的语法。

在此之前心跳版本3.0.4， pacemaker关键字被称为CRM。较新的版本仍然保留旧名称的兼容性别名，但起搏器是首选的语法。

5.2. `authkeys` 文件

/etc/ha.d/authkeys 包含用于相互群集节点认证的预共享的秘密。它应当只允许root可以读取，且格式如下：

auth <num>
<num> <algorithm> <secret>

num是一个简单的键索引，从1开始。通常情况下，你只会有一个关键的authkeys文件。
算法是所使用的签名算法。您可以使用MD5或SHA1，不建议使用一个简单的循环冗余校验（CRC，而不是安全的）。
secret 是实际的认证密钥。

你可以创建一个 authkeys 文件,使用生成的秘密，用下面的shell命令：

( echo -ne "auth 1\n1 sha1 "; \
  dd if=/dev/urandom bs=512 count=1 | openssl md5 ) \
  > /etc/ha.d/authkeys
chmod 0600 /etc/ha.d/authkeys

5.3. 传播到节点的集群配置

为了传播在ha.cf和authkeys配置文件的内容，您可以使用ha_propagate的命令，你将调用使用下列命令：

/usr/lib/heartbeat/ha_propagate

或者

/usr/lib64/heartbeat/ha_propagate

该实用程序将使用scp把配置文件/etc/ha.d/ha.cf复制到任何节点。还可以连接到节点上使用ssh和问题chkconfig的心跳，以便在系统启动时的心跳服务。

5.4. 启动 Heartbeat 服务

Heartbeat 服务器被启动就像你会在你的机器上的任何其他系统服务启动的。根据您的系统平台上，你可能会使用下面的命令：

/etc/init.d/heartbeat start

service heartbeat start

rcheartbeat start

通过 pacemaker 项目在您的ha.cf，心跳会现在开始的起搏器守护进程（CRMD由于历史的原因名为），以及与其他的服务。几秒钟后，你应该能够检测到 Heartbeat的过程，在你的进程表：

# ps -AHfww | grep heartbeat
root      2772  1639  0 14:27 pts/0    00:00:00         grep heartbeat
root      4175     1  0 Nov08 ?        00:37:57   heartbeat: master control process
root      4224  4175  0 Nov08 ?        00:01:13     heartbeat: FIFO reader
root      4227  4175  0 Nov08 ?        00:01:28     heartbeat: write: bcast eth2
root      4228  4175  0 Nov08 ?        00:01:29     heartbeat: read: bcast eth2
root      4229  4175  0 Nov08 ?        00:01:35     heartbeat: write: mcast bond0
root      4230  4175  0 Nov08 ?        00:01:32     heartbeat: read: mcast bond0
102       4233  4175  0 Nov08 ?        00:03:37     /usr/lib/heartbeat/ccm
102       4234  4175  0 Nov08 ?        00:15:02     /usr/lib/heartbeat/cib
root      4235  4175  0 Nov08 ?        00:17:14     /usr/lib/heartbeat/lrmd -r
root      4236  4175  0 Nov08 ?        00:02:48     /usr/lib/heartbeat/stonithd
102       4237  4175  0 Nov08 ?        00:00:54     /usr/lib/heartbeat/attrd
102       4238  4175  0 Nov08 ?        00:08:32     /usr/lib/heartbeat/crmd
102       5724  4238  0 Nov08 ?        00:04:47       /usr/lib/heartbeat/pengine

最终,你必须用如下命令确认集群已经在工作。

# crm_mon -1
============
Last updated: Mon Dec 13 14:29:36 2010
Stack: Heartbeat
Current DC: alice (083146b9-6e26-4ac8-a705-317095d0ba57) - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, unknown expected votes
24 Resources configured.
============

Online: [ alice bob ]

5.5. Where to go from here

现在，你有一个 Heartbeat配置，你会希望继续配置起搏器，并添加群集资源。

可用于您的阅读下列文件：

Clusters From Scratch is an excellent guide for configuring Pacemaker clusters. The document primarily covers Pacemaker on Corosync, but starting with its chapter "Using Pacemaker Tools" applies identically to Heartbeat clusters.
The DRBD User’s Guide has a chapter dedicated to integrating DRBD with Pacemaker clusters.
Several Technical Guides covering Heartbeat/Pacemaker clusters for a variety of applications are available fromthe LINBIT web site.

Chapter 6.升级 Heartbeat

6.1. 不使用CRM升级Heartbeat 2.1 集群

不使用CRM Heartbeat 2.1 集群 ( 集群配置haresources文件，升级到3.0需要将当前配置到一个合适的Pacemaker
。）

	须知：
	升级过程会造成短暂的停机，尽管如此，如果升级经过计划，测试和执行，这个停机时间会缩短到分钟级别，甚至于秒级别，这个取决于配置

6.1.1. 中止heartbeat服务

您当前的备用节点，也就是说，目前的群集节点没有运行任何资源，你应该开始升级过程。如果您的集群使用的是主动 - 主动配置（两个节点上运行资源），选择其中一个并发出以下命令的所有资源转移到对等节点：

# hb_standby

然后，仅在该节点上，停止心跳服务：

# /etc/init.d/heartbeat stop

6.1.2. 升级软件

在升级过程中，重要的是要记得，Heartbeat 2.1 tree2.1 已拆分成模块化的部件。因此，你，将取代Heartbeat 成三个独立的软件：Cluster Glue, Pacemaker, and Heartbeat 3 （包括集群消息层）。

从源代码升级：在解压缩归档文件可以安装Heartbeat2.1，运行卸载。然后，安装Cluster Glue, 和Heartbeat。
使用本地制造的包：手动安装软件包时，先卸载心跳包升级。然后安装Cluster Glue，Heartbeat3 ，资源代理，和Pacemaker。
使用一个包的库升级：升级时使用APT，YUM，或使用Zypper库，你应该能够运行安装命令的Heartbeat3 和Pacemaker的依赖将自动解决。Up

这是不要重启heartbeat的服务

6.1.3. 生效Heartbeat 集群去使用 Pacemaker

现在必须指示Pacemaker在集群启动集群通讯层。要做到这一点，添加

crm respawn

到您的ha.cf配置文件.

	重要：
	在这一点上，你也应该检查您的ha.cf文件中，对在ha.cf（5）手册页，并删除任何过时的选项。

当您的ha.cf修改是完整的，将该文件复制到对等节点。

6.1.4. 重启 Heartbeat

你的集群将被以Pacemaker-enabled mode模式重启. 动作如下:

运行 /etc/init.d/heartbeat 停止你活动的节点，然后关闭你的集群资源。
运行 /etc/init.d/heartbeat 开始您的备用节点（您创建您的CIB）。这将启动本地 Heartbeat 实例和 Pacemaker。然后等待爱他的集群中的节点并检查。
运行 /etc/init.d/heartbeat 启动你的其他节点，然后启动本地 Heartbeat实例和 Pacemaker ,让 CIB 自动运行,然后启动应用

6.2. 从CRM功能的心跳2.1集群升级

本节概述了必要的步骤，升级内置的CRM启用群集心跳2.1，心跳3.0起搏器。

注意
当适当的规划和执行，完成升级程序可以在一分钟的方式，没有应用停机时间。我们强烈建议您阅读和理解本节中介绍的步骤，然后再尝试一个生产集群uqpgrade的。本节所述的所有命令必须以root身份运行。不要所有群集节点上并行执行的各个步骤。相反，在每个节点上完成此过程，然后再继续下一个。

6.2.1. 集群的非托管模式下

With this step, the cluster temporarily relinquishes control of its resources. This means that the cluster no longer monitors its nodes or resources for the duration of the upgrade, and will not rectify and application or node failures during this time. Currently running resources, however, will continue to run.

# crm_attribute -t crm_config -n is_managed_default -v false

In most configurations, individual resources do not set the is_managed attribute individually, and hence the cluster-wide attribute is_managed_default applies to all of them.

If in your specific configuration you do have resources that have this attribute set, you should remove it to make sure the default applies:

# crm_resource -t primitive -r <resource-id> -p is_managed -D

6.2.2.备份CIB

At this point, is is important to save a copy of the Cluster Information Base (CIB). The CIB to save is stored in a file named cib.xml, normally located in /var/lib/heartbeat/crm.

# cp /var/lib/heartbeat/crm/cib.xml ~

You need to perform this step on only one node currently connected to the cluster. Do not delete this file, it will be restored later.

6.2.3. 停止Heartbeat服务

You may now stop Heartbeat with /etc/init.d/heartbeat stop or the preferred command to stop a system service on your distribution (service heartbeat stop, rcheartbeat stop, etc.).

In case you are running a legacy version of Heartbeat affected by a shutdown bug, then graceful crmd shutdown will not work properly in unmanaged mode.

In this case, after you have initiated a graceful service shutdown with the above command, kill the crmd process manually:

use ps -AHfww to retrieve the process ID of crmd;
kill crmd with a TERM signal.

6.2.4. 删除 CRM的相关文件

	警告
在继续本节之前，请确认你已经创建的备份副本的CIB您的群集节点之一，第6.2.2节中所述，“备份CIB”。备份CIB

警告

在继续本节之前，请确认你已经创建的备份副本的CIB您的群集节点之一，第6.2.2节中所述，“备份CIB”。

备份CIB

现在你需要从你的节点删除本地的 CRM相关文件，你必须删除所有CRM储存的CIB信息的文件 , 常用的有 /var/lib/heartbeat/crm.

# rm /var/lib/heartbeat/crm/*

6.2.5. 恢复CIB

	Note
	You should only proceed with this step if Heartbeat is still stopped on all cluster nodes, and all cluster nodes have had their CIB contents wiped. If you still have remaining nodes that have a residual CIB configuration, proceed as outlined in Section 6.2.4, “Wiping files related to the CRM”.

Restoring the CIB means copying the CIB backup described in Section 6.2.2, “Backing up the CIB” to the/var/lib/heartbeat/crm directory.

# cp ~/cib.xml /var/lib/heartbeat/crm/cib.xml
# chown hacluster:haclient /var/lib/heartbeat/crm/cib.xml
# chmod 0600 /var/lib/heartbeat/crm/cib.xml

You must perform this step on one node only, namely the first node on which you are about to upgrade the cluster software. On all other nodes, the /var/lib/heartbeat/crm directory must remain empty — Pacemaker distributes the CIB automatically.

6.2.6. Upgrading software

While upgrading, it is important to recall that the monolithic Heartbeat 2.1 tree has been split up into modular parts. Thus you will replace Heartbeat with three individual pieces of software: Cluster Glue, Pacemaker, and Heartbeat 3 which comprises just the cluster messaging layer.

Upgrading from source: In the unpacked archive that you installed Heartbeat 2.1 from, run make uninstall. Then, install Cluster Glue and Heartbeat.
Upgrading using locally built packages: When installing packages manually, uninstall the heartbeat package first. Then install cluster-glue, the version 3 heartbeat package, resource-agents, and pacemaker.
Upgrading using a package repository: When upgrading using an APT, YUM, or Zypper repository, you should just be able to run the install command for heartbeat version 3 and pacemaker, and the dependencies will be resolved automatically.

If this is the last node to be upgraded in your cluster, and your package management system did not restart Heartbeat services after the software upgrade, you should now proceed to Section 6.2.7, “Restarting Heartbeat services”. Otherwise, you should move to the next node and proceed as outlined in Section 6.1.1, “Stopping Heartbeat services” Section 6.2.6, “Upgrading software”.

6.2.7. Restarting Heartbeat services

	Note
	This step may be omitted in case your package management system automatically restarts Heartbeat services during post-install.

First, restart Heartbeat on the node where you restored the CIB (see Section 6.2.5, “Restoring the CIB”) with/etc/init.d/heartbeat start. Then, repeat this command on the remaining cluster nodes. At this time,

the cluster is still in unmanaged mode (meaning it does not start, stop, or monitor any resources),
the cluster redistributes the old CIB among its nodes, and
the cluster is still using the pre-upgrade CIB schema.

6.2.8. Returning the cluster to managed mode

Once the cluster software has been upgraded, it is recommended to return the cluster into managed mode:

# crm_attribute -t crm_config -n is_managed_default -v true

6.2.9. Upgrading the CIB schema

Although an upgraded cluster can theoretically operate on the pre-upgrade CIB schema indefinitely, it is strongly recommended to upgrade the CIB to the current schema. To do so, run the following command after cluster communications between all nodes have been re-established: