Linux 高可用集群之Pacemaker和Corosync

最新推荐文章于 2024-07-10 08:38:45 发布

@shitouji

最新推荐文章于 2024-07-10 08:38:45 发布

阅读量735

点赞数

分类专栏： Linux运维

本文链接：https://blog.csdn.net/minioesina/article/details/114005248

版权

Linux运维专栏收录该内容

50 篇文章 0 订阅

订阅专栏

一、环境描述
系统版本：centos7.4 x64
node1(主节点)IP: 192.168.1.201　主机名：lb01
node2(从节点)IP: 192.168.1.202　主机名：lb02
node3(从节点)IP: 192.168.1.203　主机名：lb03
虚拟IP地址（VIP）：192.168.1.250

(node1)　仅为主节点配置
(node2)　仅为从节点配置
(node3)　仅为从节点配置

二、环境准备

1、更改主机名和hosts记录

#lb01配置
[root@lb01 ~]# cat /etc/hostname 
lb01
[root@lb01 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.103   lb03
192.168.1.102   lb02
192.168.1.101   lb01
 
#lb02配置
[root@lb02 ~]# cat /etc/hostname 
lb02
[root@lb02 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.103   lb03
192.168.1.102   lb02
192.168.1.101   lb01

#lb03配置
[root@lb03 ~]# cat /etc/hostname 
lb02
[root@lb03 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.103   lb03
192.168.1.102   lb02
192.168.1.101   lb01

2、双机互信

#lb01
[root@lb01 ~]# ssh-keygen -t rsa -P ''
...
 
[root@lb01 ~]# ssh-copy-id -i .ssh/id_rsa.pub root@192.168.1.102
[root@lb01 ~]# ssh-copy-id -i .ssh/id_rsa.pub root@192.168.1.103
#测试
[root@lb01 ~]# ssh 192.168.1.102 'ifconfig'
[root@lb01 ~]# ssh 192.168.1.103 'ifconfig'
 
#lb02
[root@lb02 ~]# ssh-keygen -t rsa -P ''
...
 
[root@lb02 ~]# ssh-copy-id -i .ssh/id_rsa.pub root@192.168.1.101
[root@lb02 ~]# ssh-copy-id -i .ssh/id_rsa.pub root@192.168.1.103
#测试
[root@lb02 ~]# ssh 192.168.1.101 'ifconfig'
[root@lb02 ~]# ssh 192.168.1.103 'ifconfig'

#lb03
[root@lb03 ~]# ssh-keygen -t rsa -P ''
...
 
[root@lb03 ~]# ssh-copy-id -i .ssh/id_rsa.pub root@192.168.1.101
[root@lb03 ~]# ssh-copy-id -i .ssh/id_rsa.pub root@192.168.1.102
#测试
[root@lb03 ~]# ssh 192.168.1.101 'ifconfig'
[root@lb03 ~]# ssh 192.168.1.102 'ifconfig'

3、检查iptables,关闭防火墙和selinux

#lb01
[root@lb01 ~]# setenforce 0
[root@lb01 ~]# systemctl stop firewalld
[root@lb01 ~]# getenforce
Disabled
[root@lb01 ~]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:firewalld(1)
 
#lb02/lb03一样

4、时间同步

#本地搭建了ntp时间服务，这里以本地的时间服务器为准
[root@lb01 ~]# vi /etc/ntp.conf 
server 192.168.1.200
Fudge 192.1681.200 stratum 8
[root@lb01 ~]# ntpdate 192.168.1.200
[root@lb02 ~]# vi /etc/ntp.conf 
server 192.168.1.200
Fudge 192.1681.200 stratum 8
[root@lb02 ~]# ntpdate 192.168.1.200

三、pacemaker

1、pacemaker说明

pacemaker是个资源管理器，不是提供心跳信息的，由于任何高可用集群都必须具备心跳监测机制，因而很多初学者总会误以为 Pacemaker本身具有心跳检测功能，而事实上 Pacemaker的心跳机制主要基于 Corosync或 Heartbeat来实现。pacemaker是一个延续的CRM（亦称Heartbeat V2资源管理器），最初是为心跳，但已经成为独立的项目。

Heartbeat v2第二版的时候，heartbeat被做了很大的改进，自己可以做为一个独立进程来运行，并而可以通过它接收用户请求，它就叫crm，在运行时它需要在各节点上运行一个叫crmd的进程，这个进程通常要监听在一个上，端口就是5560，所以服务器端叫crmd，而客户端叫crm(可以称为crm shell)，是个命令行接口，通过这个命令行接口就可以跟服务器端的crm通信了，heartbeat也有它的图形化界面工具，就叫heartbeat-GUI工具，通过这个界面就可以配置进行。Heartbeat 2.0 在基于Heartbeat1.x 基础上配置引入了模块结构的配置方法，集群资源管理器(Cluster Rescource Manager-CRM).

CRM模型可以支持最多16个节点，这个模型使用基于XML的集群信息(Cluster Information Base-CIB)配置。
Heartbeat 2.x官方最后一个STABLE release 2.x 版本是2.1.4。

Heartbeat v3.x

在v3版本后，整个heartbeat项目进行了功能拆分，分为不同的子项目来分别进行开发。但是HA实现原理与Heartbeat2.x基本相同，配置也基本一致。在v3版本后，被拆分为heartbeat、(心脏起博器)、cluster-glue(集群的贴合器)，架构分离开来了，可以结合其它的组件工作。

Heartbeat：将原来的消息通信层独立为heartbeat项目，新的heartbeat只负责维护集群各节点的信息以及它们之前通信；
Cluster Glue：相当于一个中间层，它用来将heartbeat和pacemaker关联起来，主要包含2个部分，即为LRM和STONITH。
Resource Agent：用来控制服务启停，监控服务状态的脚本集合，这些脚本将被LRM调用从而实现各种资源启动、停止、监控等等。
Pacemaker： 也就是Cluster Resource Manager （简称CRM），用来管理整个HA的控制中心，客户端通过pacemaker来配置管理监控整个集群。

2、安装配置pacemaker

#安装程序
[root@lb01 ~]# yum install pacemaker pcs psmisc policycoreutils-python -y
安装完成后，会自动创建一个用于操作和执行 pcs 命令的hacluster用户
[root@lb01 ~]# tail /etc/passwd
...
hacluster:x:189:189:cluster user:/home/hacluster:/sbin/nologin
#为用户创建密码
[root@lb01 ~]# echo pacemaker | passwd --stdin hacluster

#################
CentOS 7 系统防火墙已经为高可用设置了放行端口，文件保存于”/usr/lib/firewalld/services/high-availability.xml”当中，涵盖了大部高可用软件常用的端口。使用如下命令放行高可用服务：
firewall-cmd --add-service=high-availability --zone=public --permanent
firewall-cmd --reload
演示环境，关闭了防火墙，可以忽略
#################

启动并添加服务
[root@lb01 ~]# systemctl start pcsd
[root@lb01 ~]# systemctl enable pcsd

#lb02、lb03同上一样操作

********************************
配置 CoroSync 同步
#在配置 CoroSync 之前，我们需要使用如下命令让”hacluster”用户作为集群的认证用户在所有节点可以通过认证
[root@lb01 ~]# pcs cluster auth -u hacluster -p pacemaker lb01 lb02 lb03

#认证成功后，会生成令牌文件，令牌文件保存于”/var/lib/pcsd/tokens”文件中，默认情况下，集群中的所有节点保存其它成员的认证信息。可以查看该令牌文件的内容，大致如下
[root@lb01 ~]# cat /var/lib/pcsd/tokens 
{
  "format_version": 3,
  "data_version": 4,
  "tokens": {
    "lb01": "eca8b776-c078-4fa4-8201-bf3a98219c94",
    "lb02": "bc3933dc-5a29-4d91-b8b8-c727ea1e7497",
    "lb03": "32a68e65-7775-478b-aaf8-587cf159c90d"
  },
  "ports": {
    "lb01": 2224,
    "lb02": 2224,
    "lb03": 2224
  }

#集群认证成功后，就可以使用以下命令创建集群并同步 corosync 配置：
-name 创建了名称为web-cluster的集群
[root@lb01 ~]# pcs cluster setup --name web-cluster lb01 lb02 lb03
Destroying cluster on nodes: lb01, lb02, lb03...
lb01: Stopping Cluster (pacemaker)...
lb02: Stopping Cluster (pacemaker)...
lb03: Stopping Cluster (pacemaker)...
lb01: Successfully destroyed cluster
lb02: Successfully destroyed cluster
lb03: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'lb01', 'lb02', 'lb03'
lb01: successful distribution of the file 'pacemaker_remote authkey'
lb02: successful distribution of the file 'pacemaker_remote authkey'
lb03: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
lb01: Succeeded
lb02: Succeeded
lb03: Succeeded

Synchronizing pcsd certificates on nodes lb01, lb02, lb03...
lb02: Success
lb03: Success
lb01: Success
Restarting pcsd on the nodes in order to reload the certificates...
lb02: Success
lb03: Success
lb01: Success

在启动集群之前，检查配置文件
[root@lb01 ~]# crm_verify -L -V
   error: unpack_resources:	Resource start-up disabled since no STONITH resources have been defined
   error: unpack_resources:	Either configure some or disable STONITH with the stonith-enabled option
   error: unpack_resources:	NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid

#因为这是演示环境，并没有stonith设备，所以要关闭相关设置
[root@lb01 ~]# pcs property set stonith-enabled=false
#再次检查就没有问题了
[root@lb01 ~]# crm_verify -L -V

启动集群
[root@lb01 ~]# pcs cluster start --all
查看状态
[root@lb01 ~]# pcs status
Cluster name: web-cluster
Stack: corosync
Current DC: lb03 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Fri Feb 26 18:01:08 2021
Last change: Fri Feb 26 10:00:01 2021 by root via cibadmin on lb01

3 nodes configured
0 resource instances configured

Online: [ lb01 lb02 lb03 ]

No resources

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/disabled

添加永久启动集群
[root@lb01 ~]# pcs cluster enable --all

---命令操作说明---
检查仲裁状态
[root@lb01 ~]# pcs status quorum
[root@lb01 ~]# corosync-quorumtool
查看corosync状态
[root@lb01 ~]# pcs status corosync
查看 CIB 文件
集群启动成功后，会生成一个 CIB（集群信息库）的文件，用以维护集群节点和资源的信息。CIB 会在集群中的所有节点中同步，并处理修改它的请求
[root@lb01 ~]# pcs cluster cib
[root@lb01 ~]# cat /var/lib/pacemaker/cib/cib.xml
查看集群日志信息
根据 corosync 的配置文件”/etc/corosync/corosync.conf”中的设置，集群日志文件默认保存在”/var/log/cluster/corosync.log”
[root@lb01 ~]# tail -f /var/log/cluster/corosync.log

***********************************************

3、GUI界面管理集群

集群启动成功后，会自动启动一个 GUI 界面，我们可以通过浏览器访问该 Web GUI 界面，其地址为：
https://192.168.1.201:2224/login

登录用户和密码，就是设置的hacluster用户和密码

点击Add Existing 按钮，添加集群信息，输入节点IP

如上图，然后确认 Add Existing,再次输入登录密码，出现管理的集群web-cluster

点击web-cluster集群，进入集群管理界面，管理菜单包括节点（NODES）、资源（RESOURCES）、隔离设备（FENCE DEVICES）、访问控制（ACLs）以及集群属性设置（CLUSTER PROPERTIES）

点击RESOURCES进行资源管理，然后选择Add添加资源

在此添加了两个资源，创建了一个VIP(虚拟IP)，HTTPD服务(先yum install -y httpd安装web服务)

将创建的资源管理在一个组，勾选资源，创建分组

设置分组名

创建好后，可以对资源进行约束等相关设置，在这里设置了Resource Location Preferences,设置内容为lb01=100；其作用，当lb01节点优先拥有资源，只要lb01正常在线。可以关闭重启lb01测试

可以通过查看配置文件进行查看，如下，可以明显看到lb01优先拥有VIP资源

[root@lb01 ~]# cat /var/lib/pacemaker/cib/cib.xml
<cib crm_feature_set="3.0.14" validate-with="pacemaker-2.10" epoch="15" num_updates="0" admin_epoch="0" cib-last-written="Mon Mar  1 20:09:01 2021" update-origin="lb01" update-client="cibadmin" update-user="hacluster" have-quorum="1" dc-uuid="2">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-have-watchdog" name="have-watchdog" value="false"/>
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.23-1.el7_9.1-9acf116022"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
        <nvpair id="cib-bootstrap-options-cluster-name" name="cluster-name" value="web-cluster"/>
        <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="1" uname="lb01">
        <instance_attributes id="nodes-1">
          <nvpair id="nodes-1-lb01" name="lb01" value="inf"/>
        </instance_attributes>
      </node>
      <node id="2" uname="lb02">
        <instance_attributes id="nodes-2"/>
      </node>
      <node id="3" uname="lb03"/>
    </nodes>
    <resources>
      <group id="web-group">
        <primitive class="ocf" id="VIP" provider="heartbeat" type="IPaddr2">
          <instance_attributes id="VIP-instance_attributes">
            <nvpair id="VIP-instance_attributes-ip" name="ip" value="192.168.1.250"/>
          </instance_attributes>
          <operations>
            <op id="VIP-monitor-interval-10s" interval="10s" name="monitor" timeout="20s"/>
            <op id="VIP-start-interval-0s" interval="0s" name="start" timeout="20s"/>
            <op id="VIP-stop-interval-0s" interval="0s" name="stop" timeout="20s"/>
          </operations>
        </primitive>
        <primitive class="ocf" id="web" provider="heartbeat" type="apache">
          <operations>
            <op id="web-monitor-interval-10s" interval="10s" name="monitor" timeout="20s"/>
            <op id="web-start-interval-0s" interval="0s" name="start" timeout="40s"/>
            <op id="web-stop-interval-0s" interval="0s" name="stop" timeout="60s"/>
          </operations>
        </primitive>
      </group>
    </resources>
    <constraints>
      <rsc_location id="location-VIP-lb01-100" node="lb01" rsc="VIP" score="100"/>
    </constraints>
  </configuration>
</cib>

参考https://www.daehub.com/archives/8486.html

@shitouji

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Linux 高可用集群之Pacemaker和Corosync

一、环境描述系统版本：centos7.4 x64nfs共享存储IP:192.168.1.200node1(主节点)IP: 192.168.1.101　主机名：lb01node2(从节点)IP: 192.168.1.102　主机名：lb02虚拟IP地址（VIP）：192.168.1.100(node1)　仅为主节点配置(node2)　仅为从节点配置(node1,node2)　为主从节点共同配置二、环境准备1、更改主机名和hosts记录#lb01配置[root@lb01 ~]#
复制链接

扫一扫

专栏目录