基于corosync+pacemaker实现nfs+nginx部署

最新推荐文章于 2023-08-17 11:39:42 发布

ls_jokerking

最新推荐文章于 2023-08-17 11:39:42 发布

阅读量771

点赞数

基于 corosync+pacemaker实现 nfs+nginx（crm管理）高可用 -centos7

pcs相关配置：（因为在 7版本，所以 pcs支持比较好，crmsh比较复杂）

环境主机 -centos7：node1：172.25.0.29 node2：172.25.0.30

配置集群的前提：

1、时间同步

2、主机名互相访问

3、是否使用仲裁设备。

生命周期管理工具主要包括以下：

Pcs:agent(pcsd) ：应用于 corosync+pacemaker

Crash:pssh ：应用于 ansible相关的服务

一、安装 corosync+pacemaker和 crm管理包

1、先配置相关主机和相关时间同步服务器：

node1：

[root@node1 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
172.25.0.29 node1
172.25.0.30 node2
[root@node1 ~]# crontab -e
*/5 * * * * ntpdate cn.pool.ntp.org   ###添 加 任 务

node2：

[root@node2 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
172.25.0.29 node1 
172.25.0.30 node2
[root@node1 ~]# crontab -e
*/5 * * * * ntpdate cn.pool.ntp.org   ###添 加 任 务

在 node1和 node2上可以看到已经添加时间任务：

[root@node1 ~]# crontab -l
*/5 * * * * ntpdate cn.pool.ntp.org
[root@node2 ~]# crontab -l
*/5 * * * * ntpdate cn.pool.ntp.org

添加 node1和 node2的信任关系

[root@node1 ~]# ssh-keygen 
[root@node1 ~]# ssh-copy-id node2
The authenticity of host 'node2 (172.25.0.30)' can't be established.
ECDSA key fingerprint is ae:88:02:59:f9:7f:e9:4f:48:8d:78:d2:6f:c7:7a:f1.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.

我这里已经添加了 ,才会出现警告

2、在 node1和 node2个结点上执行：

[root@node1 corosync]# yum install -y pacemaker pcs psmisc policycoreutils-python
[root@node2 corosync]# yum install -y pacemaker pcs psmisc policycoreutils-python

3、node1和 node2上启动 pcs并且让开机启动：

[root@node1 corosync]# systemctl start pcsd.service
[root@node1 corosync]# systemctl enable pcsd
[root@node2 corosync]# systemctl start pcsd.service
[root@node2 corosync]# systemctl enable pcsd

4、在两台主机上修改用户 hacluster的密码 :

[root@node1 corosync]# echo 123456 | passwd --stdin hacluster
[root@node2 corosync]# echo 123456 | passwd --stdin hacluster

下面的可以一台主机同步配置了

node1上：

5、注册 pcs集群主机（默认注册使用用户名 hacluster，和密码）：

[root@node1 corosync]# pcs cluster auth node1 node2     ##设 置 注 册 那 个 集 群 节 点
node2: Already authorized
node1: Already authorized

6、在集群上注册两台集群：

[root@node1 corosync]# pcs cluster setup --name mycluster node1 node2 --force   ##设 置 集 群

7、接下来就在某个节点上已经生成来 corosync配置文件：

[root@node1 ~]# cd /etc/corosync/  ##进 入 corosync目 录
[root@node1 corosync]# ls
corosync.conf  corosync.conf.example  corosync.conf.example.udpu  corosync.xml.example  uidgid.d

#我们看到生成来 corosync.conf配置文件：

8、启动集群：

[root@node2 corosync]# pcs cluster start --all
node1: Starting Cluster...
node2: Starting Cluster...
##相 当 于 启 动 pacemaker和 corosync:
[root@node1 corosync]#  ps -ef | grep corosync
root      19586      1  0 18:05 ?        00:00:40 corosync
root      29230  21295  0 19:13 pts/1    00:00:00 grep --color=auto corosync
[root@node1 corosync]# ps -ef | grep pacemaker
root       1843      1  0 11:21 ?        00:00:04 /usr/libexec/pacemaker/lrmd
haclust+   1845      1  0 11:21 ?        00:00:03 /usr/libexec/pacemaker/pengine
root      19593      1  0 18:05 ?        00:00:01 /usr/sbin/pacemakerd -f
haclust+  19594  19593  0 18:05 ?        00:00:01 /usr/libexec/pacemaker/cib
root      19595  19593  0 18:05 ?        00:00:00 /usr/libexec/pacemaker/stonithd
haclust+  19596  19593  0 18:05 ?        00:00:00 /usr/libexec/pacemaker/attrd
haclust+  19597  19593  0 18:05 ?        00:00:01 /usr/libexec/pacemaker/crmd
root      29288  21295  0 19:14 pts/1    00:00:00 grep --color=auto pacemaker
###可 以 看 到 corosync和 pacemaker已 经 起 来 了

9、查看集群的状态

[root@node1 corosync]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id= 172.25.0.29
status= ring 0 active with no faults
[root@node1 corosync]# ssh node2 corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
id= 172.25.0.30
status= ring 0 active with no faults
###可 以 发 现 node1和 node2的 集 群 都 已 经 起 来 。

10、到这里我们先查看集群是否有错：

[root@node1 corosync]# crm_verify -L -V
   error: unpack_resources:     Resource start-up disabled since no STONITH resources have been defined
   error: unpack_resources:     Either configure some or disable STONITH with the stonith-enabled option
   error: unpack_resources:     NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
##发 现 有 错 ，要 我 们 关 掉  stonith-enabled，避 免 下 一 步 出 错 我 们 先 关 掉 这 个
[root@node1 corosync]# pcs property set stonith-enabled=false
[root@node1 corosync]# crm_verify -L -V
[root@node1 corosync]# pcs property list
Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: mycluster
 dc-version: 1.1.16-12.el7_4.2-94ff4df
 have-watchdog: false
 stonith-enabled: false

11、现在我们可以下载安装 crmsh来操作 (从 github来下载，然后解压直接安装 )：

https://codeload.github.com/ClusterLabs/crmsh/tar.gz/2.3.2

node1上：

[root@node1 ~]# cd /usr/local/src/
[root@node1 src]# ls
crmsh-2.3.2.tar      
[root@node1 src]#tar xvf crmsh-2.3.2.tar
[root@node1 src]# ls
crmsh-2.3.2.tar crmsh-2.3.2
[root@node1 src]# cd crmsh-2.3.2
[root@node1 crmsh-2.3.2]# python setup.py install  ##编 译 安 装

node2上：跟 node1同样的操作

二、源代码安装 nginx和安装 nfs

###在 node1和 node2安装 nginx，下面是 node1的操作：

1、安装 nginx软件依赖包：

yum -y groupinstall "Development Tools" "Server Platform Deveopment"
yum -y install openssl-devel pcre-devel

2、在所有的主机上面都操作，下载 nginx包

[root@node1 src]# yum install wget Cy               ##安 装 wget工 具

3、下载 nginx包：

[root@node1 src]# wget http://****/download/nginx-1.12.0.tar.gz

4、添加 nginx运行的用户：

[root@node1 sbin]# useradd nginx

5解压 nginx包，并且安装：

[root@node1 src]# tar zxvf nginx-1.12.0.tar.gz
[root@node1 src]# cd nginx-1.12.0/

6、安装 nginx包：

[root@node1 nginx-1.12.0]# ./configure --prefix=/usr/local/nginx --user=nginx --group=nginx --with-http_ssl_module --with-http_flv_module --with-http_stub_status_module --with-http_gzip_static_module  --with-pcre
###编 译 安 装
[root@node1 nginx-1.12.0]# make && make install
node1、node2装 完 后 测 试 nginx

6、测试 nginx：

node1上：

[root@node1 nginx]# cd /usr/local/nginx/
[root@node1 nginx]# echo node1 >　html/index.html
[root@node1 nginx]#/usr/local/nginx/sbin/nginx

node2上：

[root@node2 nginx]# cd /usr/local/nginx/
[root@node2 nginx]# echo node2 >　html/index.html
[root@node2 nginx]#/usr/local/nginx/sbin/nginx

访问 web服务：

[root@node1 nginx]#curl 172.25.0.29
node1
[root@node1 nginx]#curl 172.25.0.29
node2

node1、node2可以正常访问

把 nginx关闭，因为等会利用 corosync和 pacemaker自动管理 nginx

建个 nginx启动脚本，等下启动 nginx需要，在 node1和 node2上都要新建

[root@node1 ~]# cat /etc/systemd/system/nginx.service 
[Unit]
Description=nginx
After=network.target
  
[Service]
Type=forking
ExecStart=/usr/local/nginx/sbin/nginx
ExecReload=/usr/local/nginx/sbin/nginx -s reload
ExecStop=/usr/local/nginx/sbin/nginx -s quit
PrivateTmp=true
  
[Install]
WantedBy=multi-user.target 
##node2上 也 同 样 的 操 作

需要给脚本执行权限

[root@node1 ~]# chmod a+x /etc/systemd/system/nginx.service 
[root@node2 ~]# chmod a+x /etc/systemd/system/nginx.service
[root@node1 ~]# systemctl enable nginx
[root@node2 ~]# systemctl enable nginx   ##在 systemd资 源 代 理 下 ，要 有 enable 才 能 被 crm识 别 ，所 以 要 把 nginx enable掉

nfs搭建：

nfs的作用我们都明确，所以我们只需在一台上安装就好，我这里在 node1安装

[root@node1 ~]#yum install -y rpc-bind nfs-utils
[root@node1 ~]# mkdir /www   ###新 建 www的 目 录 ，等 会 用 于 共 享 。
[root@node1 ~]# cat /etc/exports
/www  *(rw,async,no_root_squash) 
[root@node1 ~]#systemctl restart nfs     ###重 启 nfs
[root@node1 ~]# showmount -e 172.25.0.29
Export list for 172.25.0.29:
/www *                     ##可 以 发 现 www这 个 目 录 已 经 共 享 了
[root@node1 ~]# echo node  >　 /www/index.html    ###给 共 享 目 录 添 加 index.html,用 于 虚 拟 ip的 访 问

三、高可用实现 nfs+nginx

1、资源嗲里的使用方法：

在 node1上配置：

[root@node1 ~]# crm ra
crm(live)ra# info systemd:nginx
systemd unit file for nginx (systemd:nginx)
Cluster Controlled nginx
Operations' defaults (advisory minimum):
    start         timeout=100
    stop          timeout=100
    status        timeout=100
    monitor       timeout=100 interval=60

2、进入配置模式 configure下：

crm(live)ra# cd
crm(live)#cd configure
crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=172.25.0.100  ###添 加 虚 拟 ip
##配 置 好 之 后 用 show查 看
crm(live)configure# show
node 1: node1
node 2: node2
primitive webip IPaddr \
        params ip=172.25.0.100
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=1.1.16-12.el7_4.2-94ff4df \
        cluster-infrastructure=corosync \
        cluster-name=mycluster \
        stonith-enabled=false
crm(live)configure# verify      #检 查 脚 本 是 否 有 错
crm(live)configure# commit      ##提 交 、保 存
crm(live)configure# cd

3、定义 web服务资源：

进入配置模式 configure：

crm(live)configure# primitive webserver systemd:nginx      ##添 加 nginx服 务
crm(live)configure# verify
WARNING: webserver: default timeout 20s for start is smaller than the advised 100
WARNING: webserver: default timeout 20s for stop is smaller than the advised 100
### 小 于 时 间 间 隔 会 有 警 告 ，可 以 不 用 理 会 。
crm(live)configure# commit  
WARNING: webserver: default timeout 20s for start is smaller than the advised 100
WARNING: webserver: default timeout 20s for stop is smaller than the advised 100

##提交有个警告不用管：

crm(live)configure# show
node 1: node1 \
        attributes standby=off
node 2: node2
primitive vip IPaddr \
        params ip=172.25.0
primitive web systemd:nginx \
        op monitor interval=30s timeout=100s \
        op start timeout=100s interval=0 \
        op stop timeout=100s interval=0
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=1.1.16-12.el7_4.4-94ff4df \
        cluster-infrastructure=corosync \
        cluster-name=mycluster \
        stonith-enabled=false

##我们检测下已经有两个资源了：

crm(live)configure# cd 
crm(live)# status
Stack: corosync
Current DC: node1 (version 1.1.16-12.el7_4.2-94ff4df) - partition with quorum
Last updated: Sat Oct 14 21:20:59 2017
Last change: Sat Oct 14 21:17:43 2017 by root via cibadmin on node1
2 nodes configured
2 resources configured
Online: [ node1 node2 ]
Full list of resources:
 webip  (ocf::heartbeat:IPaddr):        Started node2
 webserver      (systemd:nginx):        Started node1

##我们也发现默认资源也是均衡了，但是我们发现不均衡了分配了资源，但是我们需要定义是一个组的，所以把两个资源加一组 (为了实现高可用 )

把两个添加到同个组里面：

crm(live)# configure
crm(live)configure# group webservice webip webserver   ##添 加  webservice webip在 同 个 组 里 面
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# cd ..
crm(live)# status
Stack: corosync
Current DC: node1 (version 1.1.16-12.el7_4.2-94ff4df) - partition with quorum
Last updated: Sat Oct 14 21:24:17 2017
Last change: Sat Oct 14 21:24:12 2017 by root via cibadmin on node1
2 nodes configured
2 resources configured
Online: [ node1 node2 ]
Full list of resources:
 Resource Group: webservice
     webip      (ocf::heartbeat:IPaddr):        Started node1
     webserver  (systemd:httpd):        Started node1                 ##可 以 发 现  webservice webip在 同 个 组 里 面 了

4、定义 nfs资源：

查看文件系统类型

crm(live)ra# info ocf:heartbeat:Filesystem 
device* (string): block device
    The name of block device for the filesystem, or -U, -L options for mount, or NFS mount specification.
directory* (string): mount point
    The mount point for the filesystem.
fstype* (string): filesystem type
    The type of filesystem to be mounted.

###有三个必填项目

##开始配置

crm(live)configure# primitive webstore ocf:heartbeat:Filesystem params device="172.25.0.29:/www" directory="/usr/local/nginx/html" fstype="nfs" op start timeout=60s op stop timeout=60s op monitor interval=20s timeout=40s      ###定 义 /www 挂 载 到 /usr/local/nginx/html下

5、定义排列约束：

crm(live)configure# colocation webserver_with_webstore_and_webip inf: webserver ( webip webstore)
crm(live)configure# verify
WARNING: webserver_with_webstore_and_webip: resource webserver is grouped, constraints should apply to the group
WARNING: webserver_with_webstore_and_webip: resource webip is grouped, constraints should apply to the group
crm(live)configure# commit

##查看状态：

crm(live)configure# show
node 1: node1 \
attributes standby=off
node 2: node2
primitive webip IPaddr \
params ip=172.25.0.100
primitive webserver systemd:nginx \
op monitor interval=30s timeout=100s \
op start timeout=60s interval=0 \
op stop timeout=60s interval=0
primitive webstore Filesystem \
params device="172.25.0.29:/www" directory="/usr/local/nginx/html" fstype=nfs \
op start timeout=60s interval=0 \
op stop timeout=60s interval=0 \
op monitor interval=20s timeout=40s
group webservice webip webserver
colocation webserver_with_webstore_and_webip inf: webserver ( webip webstore )
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.16-12.el7_4.4-94ff4df \
cluster-infrastructure=corosync \
cluster-name=mycluster \
stonith-enabled=false \

6、定义执行顺序：

crm(live)configure# order webstore_after_webip Mandatory: webip webstore
crm(live)configure# verify
crm(live)configure# order webserver_after_webstore Mandatory: webstore webserver
crm(live)configure#

###查看一下状态

crm(live)# status
Stack: corosync
Current DC: node1 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum
Last updated: Wed Oct 25 20:46:41 2017
Last change: Wed Oct 25 16:56:52 2017 by root via cibadmin on node1
2 nodes configured
3 resources configured
Online: [ node1 node2 ]
Full list of resources:
 Resource Group: webservice
     webip(ocf::heartbeat:IPaddr):Started node1
     webserver(systemd:nginx):Started node1
 webstore(ocf::heartbeat:Filesystem):Started node1
##可 以 看 到 我 们 的 顺 序 是 webip webserver webstore

7、测试

[root@node1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:49:e9:da brd ff:ff:ff:ff:ff:ff
    inet 172.25.0.29/24 brd 172.25.0.255 scope global ens33
       valid_lft forever preferred_lft forever
    inet 172.25.0.100/24 brd 172.25.0.255 scope global secondary ens33
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe49:e9da/64 scope link

可以看到 vip已经起来了

接下来访问 web服务：

[root@node1 ~]# curl 172.25.0.100
node

可以发现访问的是 /www/index里的内用

我们把 node1的 pacemaker和 corosync停掉

[root@node1 ~]# systemctl stop pacemaker    ##先 关 pacemaker先
[root@node1 ~]# systemctl stop corosync

在 node2上可以看到 node2已经接管了

[root@node2 crmsh-2.3.2]# crm
crm(live)# status
Stack: corosync
Current DC: node2 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum
Last updated: Wed Oct 25 20:54:33 2017
Last change: Wed Oct 25 16:56:52 2017 by root via cibadmin on node1
2 nodes configured
3 resources configured
Online: [ node2 ]
OFFLINE: [ node1 ]
Full list of resources:
 Resource Group: webservice
     webip(ocf::heartbeat:IPaddr):Started node2
     webserver(systemd:nginx):Started node2
 webstore(ocf::heartbeat:Filesystem):Started node2
 crm(live)#exit
[root@node2 crmsh-2.3.2]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:64:00:b1 brd ff:ff:ff:ff:ff:ff
    inet 172.25.0.30/24 brd 172.25.0.255 scope global ens33
       valid_lft forever preferred_lft forever
    inet 172.25.0.100/24 brd 172.25.0.255 scope global secondary ens33
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe64:b1/64 scope link

##vip已经转移到 node2上

[root@node2 crmsh-2.3.2]# df -h
Filesystem           Size  Used Avail Use% Mounted on
/dev/mapper/cl-root   18G  2.5G   16G  14% /
devtmpfs             226M     0  226M   0% /dev
tmpfs                237M   86M  151M  37% /dev/shm
tmpfs                237M  8.6M  228M   4% /run
tmpfs                237M     0  237M   0% /sys/fs/cgroup
/dev/sda1           1014M  197M  818M  20% /boot
tmpfs                 48M     0   48M   0% /run/user/0
172.25.0.29:/www      18G  2.5G   16G  14% /usr/local/nginx/html

###/www 也已经挂载到 /usr/local/nginx/html下

[root@node2 crmsh-2.3.2]# curl 172.25.0.100
node

###访问 web资源也没问题了，说明实现成功

在 node1上把 pacemaker和 corosync重启

[root@node1 ~]# crm
crm(live)# status
Stack: corosync
Current DC: node2 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum
Last updated: Wed Oct 25 21:00:40 2017
Last change: Wed Oct 25 16:56:52 2017 by root via cibadmin on node1
2 nodes configured
3 resources configured
Online: [ node1 node2 ]
Full list of resources:
 Resource Group: webservice
     webip(ocf::heartbeat:IPaddr):Started node2
     webserver(systemd:nginx):Started node2
 webstore(ocf::heartbeat:Filesystem):Started node2
crm(live)#

###可以看到 node2已经接管了。

四、其他优化

如果设置抢占模式可以这样设

crm(live)configure# location nginx_in_node1 nginx inf: node1   ###位 置 绑 定 ，慎 用

服务管理

crm(live)configure# property  migration-limit=1      ###当 本 地 服 务 停 掉 了 ，将 会 启 动 本 地 服 务 一 次 ，如 果 起 不 来 就 换 到 另 一 主 机 的 服 务 。

crm更改文件

crm(live)# configure
crm(live)configure# edit    ###会 进 入 配 置 文 件 ，模 式 相 当 于 vim的 模 式
node 1: node1 \
        attributes standby=off
node 2: node2
primitive webip IPaddr \
        params ip=172.25.0.100
primitive webserver systemd:nginx \
        op monitor interval=30s timeout=100s \
        op start timeout=60s interval=0 \
        op stop timeout=60s interval=0
primitive webstore Filesystem \
        params device="172.25.0.29:/www" directory="/usr/local/nginx/html" fstype=nfs \
        op start timeout=60s interval=0 \
        op stop timeout=60s interval=0 \
        op monitor interval=20s timeout=40s
group webservice webip webserver
order webserver_after_webstore Mandatory: webstore webserver
colocation webserver_with_webstore_and_webip inf: webserver ( webip webstore )
order webstore_after_webip Mandatory: webip webstore
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=1.1.16-12.el7_4.4-94ff4df \
        cluster-infrastructure=corosync \
        cluster-name=mycluster \
        stonith-enabled=false \
        migration-limit=1

###可以看到刚刚配的内容，可以增删修改。

以上所有是我基于 pacemaker+corosync实现 nfs+nginx部署内容。

来源http://www.qdjyedu.com