linux下高可用集群之corosync详解

1.corosync相当于heartbeat功能,提供Messaging Layer,收集节点之间心跳等信息

 pacemaker相当于haresources,提供crm管理资源信息

2.实验:双集群节点为node1.willow.com,IP为1.1.1.18 node2.willow.com,IP为1.1.1.19

            在node1.willow.com主机配置如下:(与node2.willow.com集群配置全部相同)

2.1.安装corosync和pacemaker等需要安装的包

 cluster-glue-1.0.6-1.6.el5.i386.rpm

 cluster-glue-libs-1.0.6-1.6.el5.i386.rpm

 corosync-1.2.7-1.1.el5.i386.rpm

 corosynclib-1.2.7-1.1.el5.i386.rpm

 heartbeat-3.0.3-2.3.el5.i386.rpm

 heartbeat-libs-3.0.3-2.3.el5.i386.rpm

 libesmtp-1.0.4-5.el5.i386.rpm

 pacemaker-1.1.5-1.1.el5.i386.rpm

 pacemaker-cts-1.1.5-1.1.el5.i386.rpm

 pacemaker-libs-1.1.5-1.1.el5.i386.rpm

 resource-agents-1.0.4-1.1.el5.i386.rpm

#yum --nogpgcheck localinstall  *.rpm

2.2.配置corosync配置文件

#cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf

#vim /etc/corosync/corosync.conf

totem {

        version: 2

        secauth: on

        threads: 2

        interface {

                ringnumber: 0

                bindnetaddr: 1.1.1.0

                mcastaddr: 226.98.1.21

                mcastport: 5405

        }

}


logging {

        fileline: off

        to_stderr: no

        to_logfile: yes

        to_syslog: no

        logfile: /var/log/cluster/corosync.log

        debug: off

        timestamp: on

        logger_subsys {

                subsys: AMF

                debug: off

        }

}


amf {

        mode: disabled

}


service {

        ver: 0

        name: pacemaker

}


aisexec {

        user: root

        group: root

}

2.3.生成authkey认证文件

#corosync-keygen 

2.4.从node1节点复制authkey和corosync.conf文件至node2节点上,内容保持一致并创建日志目录

mkdir /var/log/cluster

cd /etc/corosync/

scp -p authkey corosync.conf node2:/etc/corosync/

ssh node2 'mkdir /var/log/cluster'

2.5.启动corosync服务

service corosync start

ssh node2 'service corosync start'

2.6.查看日志信息

#grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log 

Aug 05 09:36:14 corosync [MAIN  ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.

Aug 05 09:36:14 corosync [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.

#grep TOTEM /var/log/cluster/corosync.log

Aug 05 09:36:14 corosync [TOTEM ] Initializing transport (UDP/IP).

Aug 05 09:36:14 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).

Aug 05 09:36:15 corosync [TOTEM ] The network interface [1.1.1.18] is now up.

Aug 05 09:36:15 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.

Aug 05 09:36:42 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.

#grep ERROR: /var/log/cluster/corosync.log

Aug 05 09:37:17 node1.willow.com pengine: [9917]: ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined

Aug 05 09:37:17 node1.willow.com pengine: [9917]: ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option

Aug 05 09:37:17 node1.willow.com pengine: [9917]: ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity

Aug 05 09:52:17 node1.willow.com pengine: [9917]: ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined

Aug 05 09:52:17 node1.willow.com pengine: [9917]: ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option

Aug 05 09:52:17 node1.willow.com pengine: [9917]: ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity

#grep pcmk_startup /var/log/cluster/corosync.log 

Aug 05 09:36:15 corosync [pcmk  ] info: pcmk_startup: CRM: Initialized

Aug 05 09:36:15 corosync [pcmk  ] Logging: Initialized pcmk_startup

Aug 05 09:36:15 corosync [pcmk  ] info: pcmk_startup: Maximum core file size is: 4294967295

Aug 05 09:36:15 corosync [pcmk  ] info: pcmk_startup: Service: 9

Aug 05 09:36:15 corosync [pcmk  ] info: pcmk_startup: Local hostname: node1.willow.com

2.7.crm_mon 监控

3.配置集群的工作属性,禁用stonith

# crm configure property stonith-enabled=false

# crm configure verify

# crm configure commit

4.使用如下命令查看当前的配置信息:

# crm configure show

5.为集群添加集群资源

5.1:添加IP资源

# crm configure primitive WebIP ocf:heartbeat:IPaddr params ip=1.1.1.100

5.2.添加资源httpd:

# crm configure primitive httpd lsb:httpd

5.3.查看资源状态

# crm status

5.4.模拟当前节点失效,查看发生的状态

# crm node standby

信息显示node1.willow.com已经离线,但资源WebIP却没能在node2.willow.com上启动。这是因为此时的集群状态为"WITHOUT quorum",即已经失去了quorum,此时集群服务本身已经不满足正常运行的条件,这对于只有两节点的集群来讲是不合理的。因此,我们可以通过如下的命令来修改忽略quorum不能满足的集群状态检查:

# crm configure property no-quorum-policy=ignore

注意:以上通过crm status命令显示Webip和httpd运行在在不同节点上,显然不合理

小结:

# crm node online          #让当前节点重新上线

# crm resource cleanup WebIP  #清理资源状态

# crm resource cleanup httpd  #清理资源状态

# crm configure edit        #可以编辑生成的配置

6.基于组实现两资源WebIP与httpd都运行在同一节点上

# crm configure group webservice WebIP httpd

# crm configure verify

# crm configure commit

# crm status 查看资源绑定在同一节点上

7.基于约束对资源进行管理

# crm resource stop webservice

# crm resource cleanup webservice

# crm resource cleanup WebIP

# crm resource cleanup httpd

# crm configure delete webservice

# crm configure verify

# crm configure commit

7.1.排列约束

# crm configure colocation httpd_with__WebIP inf: httpd WebIP

# crm configure show xml

# crm configure verify

# crm configure commit

7.2.顺序约束

# crm configure order WebIP_before_httpd mandatory: WebIP httpd

# crm configure verify

# crm configure commit

7.3.位置约束:更倾向于运行在哪个指定节点上

# crm configure location WebIP_on_node1 WebIP rule 100: #uname eq node1.willow.com

# crm configure verify

# crm configure commit

8.资源粘性:更倾向于运行在当前节点上即DC所在的节点

# crm configure rsc_defaults resource-stickiness=200

# crm configure verify

# crm configure commit

9.NFS共享web页面命令写法

# crm configureprimitive filesystem ocf:heartbeat:Filesystem params device=1.1.1.20:/web/ha directory=/var/www/html/ fstype=nfs

10.测试命令

# crm node standby

# crm node online 

11.总结:也可以进行crm模式,依次进行命令配置或获取命令帮助信息

# crm

crm(live)# configure 

crm(live)configure# help group 

rm(live)configure# verify 

crm(live)configure# commit