Corosync:
corosync: votes
corosync: votequorum
cman+corosync
cman+rgmanager, cman+pacemaker
corosync+pacemaker
前提
1)本配置共有两个测试节点,分别hadoop1.abc.com和hadoop2.abc.com,相的IP地址分别为172.16.100.15和172.16.100.16;
2)集群服务为apache的httpd服务;
3)提供web服务的地址为172.16.100.11,即vip;
4)系统为CentOS 6.4 64bits
1、准备工作
为了配置一台Linux主机成为HA的节点,通常需要做出如下的准备工作:
1)所有节点的主机名称和对应的IP地址解析服务可以正常工作,且每个节点的主机名称需要跟"uname -n“命令的结果保持一致;因此,需要保证两个节点上的/etc/hosts文件均为下面的内容:
192.168.1.3 hadoop1.abc.com hadoop1 192.168.1.4 hadoop2.abc.com hadoop2
为了使得重新启动系统后仍能保持如上的主机名称,还分别需要在各节点执行类似如下的命令:
Node1:
# sed -i 's@\(HOSTNAME=\).*@\1hadoop1.abc.com@g' /etc/sysconfig/network # hostname hadoop1.abc.com
Node2:
# sed -i 's@\(HOSTNAME=\).*@\1hadoop2.abc.com@g' /etc/sysconfig/network # hostname hadoop2.abc.com
2、安装pacemaker
[root@hadoop1 corosync]# yum install pacemaker [root@hadoop2 corosync]# yum install pacemaker
3、配置corosync
[root@hadoop1 ~]# yum install corosync [root@hadoop1 ~]# cd /etc/corosync/
[root@hadoop1 corosync]# ll 总用量 16 -rw-r--r--. 1 root root 2663 10月 15 2014 corosync.conf.example -rw-r--r--. 1 root root 1073 10月 15 2014 corosync.conf.example.udpu drwxr-xr-x. 2 root root 4096 10月 15 2014 service.d drwxr-xr-x. 2 root root 4096 10月 15 2014 uidgid.d
[root@hadoop1 corosync]# cp corosync.conf.example corosync.conf
[root@hadoop1 corosync]# vim corosync.conf
接着编辑corosync.conf,添加如下内容:表示corosync启动自动启动pacemaker service { ver: 0 name: pacemaker # use_mgmtd: yes } aisexec { user: root group: root } 并设定此配置文件中 bindnetaddr后面的IP地址为你的网卡所在网络的网络地址,我们这里的两个节点在192.168.1.0网络,因此这里将其设定为172.16.0.0;如下 bindnetaddr: 172.16.0.0
4、安装crmsh
RHEL自6.4起不再提供集群的命令行配置工具crmsh,转而使用pcs;如果你习惯了使用crm命令,可下载相关的程序包自行安装即可。crmsh依赖于pssh,因此需要一并下载。
[root@hadoop1 ~]# cd /etc/yum.repos.d/ [root@hadoop1 yum.repos.d]# wget http://download.opensuse.org/repositories/network:ha-clustering:Stable/RedHat_RHEL-6/network:ha-clustering:Stable.repo [root@hadoop1 yum.repos.d]# yum install crmsh [root@hadoop1 yum.repos.d]# yum install pssh
[root@hadoop1 corosync]# ll 总用量 28 -rw-r--r--. 1 root root 989 7月 14 19:05 \ -r--------. 1 root root 128 7月 14 19:30 authkey //自动征收authkey文件了 -rw-r--r--. 1 root root 2811 7月 14 19:15 corosync.conf -rw-r--r--. 1 root root 2663 10月 15 2014 corosync.conf.example -rw-r--r--. 1 root root 1073 10月 15 2014 corosync.conf.example.udpu drwxr-xr-x. 2 root root 4096 10月 15 2014 service.d drwxr-xr-x. 2 root root 4096 10月 15 2014 uidgid.d
将corosync和authkey复制至hadoop2:
[root@hadoop1 corosync]# scp -p authkey corosync.conf hadoop2:/etc/corosync/ authkey 100% 128 0.1KB/s 00:00 corosync.conf 100% 2811 2.8KB/s 00:00
5、启动corosync
[root@hadoop1 corosync]# service corosync start Starting Corosync Cluster Engine (corosync): [确定] [root@hadoop1 corosync]# ssh hadoop2 'service corosync start' Starting Corosync Cluster Engine (corosync): [确定] [root@hadoop1 corosync]#
查看corosync引擎是否正常启动:
[root@hadoop1 cluster]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log Jul 14 19:36:33 corosync [MAIN ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service. Jul 14 19:36:33 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
查看初始化成员节点通知是否正常发出:
[root@hadoop1 cluster]# grep TOTEM /var/log/cluster/corosync.log Jul 14 19:36:33 corosync [TOTEM ] Initializing transport (UDP/IP Multicast). Jul 14 19:36:33 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Jul 14 19:36:33 corosync [TOTEM ] The network interface [192.168.1.3] is now up. Jul 14 19:36:33 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
检查启动过程中是否有错误产生。下面的错误信息表示packmaker不久之后将不再作为corosync的插件运行,因此,建议使用cman作为集群基础架构服务;此处可安全忽略。
[root@hadoop1 cluster]# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources Jul 14 19:36:33 corosync [pcmk ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon. Jul 14 19:36:33 corosync [pcmk ] ERROR: process_ais_conf: Please see Chapter 8 of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN
查看pacemaker是否正常启动:
[root@hadoop1 cluster]# grep pcmk_startup /var/log/cluster/corosync.log Jul 14 19:36:33 corosync [pcmk ] info: pcmk_startup: CRM: Initialized Jul 14 19:36:33 corosync [pcmk ] Logging: Initialized pcmk_startup Jul 14 19:36:33 corosync [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615 Jul 14 19:36:33 corosync [pcmk ] info: pcmk_startup: Service: 9 Jul 14 19:36:33 corosync [pcmk ] info: pcmk_startup: Local hostname: hadoop1.abc.com
如果上面命令执行均没有问题,接着可以执行如下命令启动hadoop2上的corosync
[root@hadoop1 ~]# ssh hadoop2 -- /etc/init.d/corosync startStarting Corosync Cluster Engine (corosync): [确定]
注意:启动hadoop2需要在hadoop1上使用如上命令进行,不要在hadoop2节点上直接启动。下面是node1上的相关日志。
[root@hadoop1 ~]# tail /var/log/cluster/corosync.log Jul 15 15:44:28 [1771] hadoop1.abc.com pengine: info: determine_online_status: Node hadoop2.abc.com is online Jul 15 15:44:28 [1771] hadoop1.abc.com pengine: notice: stage6: Delaying fencing operations until there are resources to manage Jul 15 15:44:28 [1772] hadoop1.abc.com crmd: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Jul 15 15:44:28 [1772] hadoop1.abc.com crmd: info: do_te_invoke: Processing graph 6 (ref=pe_calc-dc-1436946268-37) derived from /var/lib/pacemaker/pengine/pe-input-14.bz2 Jul 15 15:44:28 [1772] hadoop1.abc.com crmd: notice: run_graph: Transition 6 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-14.bz2): Complete Jul 15 15:44:28 [1772] hadoop1.abc.com crmd: info: do_log: FSA: Input I_TE_SUCCESS from notify_