基于heartbeat_v2_haresources搭建web ha cluster

最新推荐文章于 2024-08-14 22:57:51 发布

weixin_33827590

最新推荐文章于 2024-08-14 22:57:51 发布

阅读量115

点赞数

文章标签：操作系统运维后端

原文链接：http://blog.51cto.com/renfeng/1424737

版权

heartbeat至今一共有3个版本，v1，v2，v3；

v1比较古老，我们常用的是v2的，

v2版本的heartbeat不仅有messaging layers，还具有crm功能。且crm功能有2个，分别为haresoures（兼容v1的haresources），与crm；

v3即发展成3个项目，分别为heartbeat、pacemaker、以及cluster-glue。

hearbeat最核心的功能无非是心跳检测，以及资源接管。

心跳检测在heartbeat messaging layel层完成，资源接管也是heartbeat自己完成的。而人工需要做的就是告诉heartbeat，资源的类型，即定义资源。

一个普通的web服务的高可用性集群，有三种资源：

vip：对外提供统一的访问地址

httpd：web 服务，ha中的每个节点必须有

共享存储：保证用户访问web，内容相同。

而资源与资源之间也有一定的先后顺序，如若共享存储，httpd服务无法启动；

理清楚了思路，准备下试验环境，

ha node： 192.168.1.106 192.168.1.107（messaging layel）

vip：192.168.1.22

共享存储：192.168.1.170 nfs服务器

一、搭建NFS服务器，提供共享存储

注：在实验环境中，完全可以使用本地存储，测试试验效果。

搭建共享存储nfs，用于网页index.html

[root@one ~]# mkdir -pv /web/htmldoc
mkdir: created directory `/web
[root@one ~]# echo  >> /web/htmldoc/index.html
[root@one ~]# echo  >> /etc/exports 
[root@one ~]# service nfs restart
Shutting down NFS daemon:                                  [FAILED]
Shutting down NFS mountd:                                  [FAILED]
Shutting down NFS quotas:                                  [FAILED]
Starting NFS services:                                     [  OK  ]
Starting NFS quotas:                                       [  OK  ]
Starting NFS mountd:                                       [  OK  ]
Stopping RPC idmapd:                                       [  OK  ]
Starting RPC idmapd:                                       [  OK  ]
Starting NFS daemon:                                       [  OK  ]
[root@one ~]# showmount -e 192.168.1.170
Export list  192.168.1.170:
/web/htmldoc 192.168.1.0/255.255.255.0

二、配置ha 节点互信，无密码访问以及时间同步

既然是cluster，cluster之间节点之间如何管理？利用ssh执行任务。

时间同步，可利用ntp

三、配置节点主机名解析，

可使用dns，也可使用/etc/hosts，建议使用/etc/hosts，如dns故障，将导致集群损坏。

四、安装httpd服务

五、安装heartbeat

可从epel上下载，下载地址：

http://dl.fedoraproject.org/pub/epel/5/x86_64/repoview/letter_h.group.html

heartbeat - Heartbeat subsystem for High-Availability Linux
heartbeat-devel - Heartbeat development package
heartbeat-gui - Provides a gui interface to manage heartbeat clusters
heartbeat-ldirectord - Monitor daemon for maintaining high availability resources 为ipvs高可用提供规则自动生成及后端realserver健康状态检查的组件；
heartbeat-pils - Provides a general plugin and interface loading library
heartbeat-stonith - Provides an interface to Shoot The Other Node In The Head

heartbeat-devel 不需要安装，heartbeat-ldirectord现在用不上，把剩下的安装上，即可。

[root@node1 ~]# rpm -ivh heartbeat-2.1.4-11.el5.x86_64.rpm heartbeat-gui-2.1.4-11.el5.x86_64.rpm heartbeat-pils-2.1.4-11.el5.x86_64.rpm heartbeat-stonith-2.1.4-11.el5.x86_64.rpm 
warning: heartbeat-2.1.4-11.el5.x86_64.rpm: Header V3 DSA signature: NOKEY, key ID 217521f6
error: Failed dependencies:
        libltdl.so.3()(64bit)  needed by heartbeat-2.1.4-11.el5.x86_64
        libnet.so.1()(64bit)  needed by heartbeat-2.1.4-11.el5.x86_64
        libltdl.so.3()(64bit)  needed by heartbeat-gui-2.1.4-11.el5.x86_64
        libltdl.so.3()(64bit)  needed by heartbeat-pils-2.1.4-11.el5.x86_64
        libltdl.so.3()(64bit)  needed by heartbeat-stonith-2.1.4-11.el5.x86_64
        libopenhpi.so.2()(64bit)  needed by heartbeat-stonith-2.1.4-11.el5.x86_64

系统缺少依赖包，是用yum安装，且libnet，yum源中没有，需要自行在epel上下载。
使用yum安装
[root@node1 ~]# yum --nogpgcheck localinstall heartbeat-2.1.4-11.el5.x86_64.rpm heartbeat-gui-2.1.4-11.el5.x86_64.rpm heartbeat-pils-2.1.4-11.el5.x86_64.rpm heartbeat-stonith-2.1.4-11.el5.x86_64.rpm  libnet-1.1.6-7.el5.x86_64.rpm

[root@node1 ~]# ssh node2 yum --nogpgcheck localinstall heartbeat-2.1.4-11.el5.x86_64.rpm heartbeat-gui-2.1.4-11.el5.x86_64.rpm heartbeat-pils-2.1.4-11.el5.x86_64.rpm heartbeat-stonith-2.1.4-11.el5.x86_64.rpm&#160; libnet-1.1.6-7.el5.x86_64.rpm

六、heartbeat配置

heartbeat服务主目录 /etc/ha.d/，rpm安装完成并没有配置文件，可以示例文件中copy

[root@node1 ~]# ll  /etc/ha.d/
total 24
-rwxr-xr-x 1 root root  745 Mar 21  2010 harc
drwxr-xr-x 2 root root 4096 Jun 10 05:48 rc.d
-rw-r--r-- 1 root root  692 Mar 21  2010 README.config
drwxr-xr-x 2 root root 4096 Jun 10 05:48 resource.d
-rw-r--r-- 1 root root 7864 Mar 21  2010 shellfuncs
[root@node1 ~]# cp -p /usr/share/doc/heartbeat-2.1.4/
apphbd.cf            COPYING.LGPL         GettingStarted.txt   hb_report.html       README               startstop
authkeys             DirectoryMap.txt     ha.cf                hb_report.txt        Requirements.html    
AUTHORS              faqntips.html        HardwareGuide.html   heartbeat_api.html   Requirements.txt     
ChangeLog            faqntips.txt         HardwareGuide.txt    heartbeat_api.txt    rsync.html           
COPYING              GettingStarted.html  haresources          logd.cf              rsync.txt            
[root@node1 ~]# cp -p /usr/share/doc/heartbeat-2.1.4/{authkeys,ha.cf,haresources} /etc/ha.d/

3个主配置文件详解：

authkeys 密钥文件，权限600

authkeys文件用于设定Heartbeat的认证方式，共有3种可用的认证方式，即 crc、md5和sha1。3种认证方式的安全性依次提高，但是占用的系统资源也依次增加。如果Heartbeat集群运行在安全的网络上，可以使用 crc方式；如果HA每个节点的硬件配置很高，建议使用sha1，这种认证方式安全级别最高；如果是处于网络安全和系统资源之间，可以使用md5认证方式。

[root@node1 ~]# cat /etc/ha.d/authkeys 
auth 3
#1 crc
#2 sha1 HI!
3 md5 Hello!

ha.cf heartbeat主配置文件

ha.cf主要定义集群的配置，如集群节点，心跳监控等

[root@node1 ~]# cat /etc/ha.d/ha.cf 
#debugfile /var/log/ha-debug         用于记录heartbeat的调试信息
#logfile        /var/log/ha-log      指明heartbeat的日志位置
#如果没有定义上述二个日志文件，heartbeat将把日志信息送往logfacility    local0对应的/var/log/messages,若三个参数都没有定义，那默认情况下，将在/var/log/下建立ha-debug,ha-log记录日志
logfacility     local0
#keepalive 2       发送心跳报文的间隔，默认单位为秒，如果你毫秒为单位，那么需要在后面跟ms单位，如1500ms即代表1.5s 
#deadtime 30       用于指定集群节点，认为对方宕机的间隔
#warntime 10       指定心跳延迟的时间为10秒。当10秒钟内备份节点不能接收到主节点的心跳信号时，就会往日志中写入一个警告日志，但此时不会切换服务。  
#initdead 120      在某些系统上，系统启动或重启之后需要经过一段时间网络才 能正常工作，该选项用于解决这种情况产生的时间间隔。取值至少为deadtime的两倍。  
#udpport        694    设置广播/多播/组播通信使用的端口，694为默认使用的端口号
#baud   19200          设置若使用串口作为心跳时的速率

#serial /dev/ttyS0      # Linux      设置作为心跳的串口设备
#serial /dev/cuaa0      # FreeBSD    设置作为心跳的串口设备
#serial /dev/cuad0      # FreeBSD 6.x   设置作为心跳的串口设备
#serial /dev/cua/a      # Solaris        设置作为心跳的串口设备
#bcast  eth0            # Linux          广播网卡
#bcast  eth1 eth2       # Linux
#bcast  le0             # Solaris
#bcast  le1 le2         # Solaris
#mcast eth0 225.0.0.1 694 1 0           多播地址以及网卡等信息
#ucast eth0 192.168.1.2                 组播网卡即地址
auto_failback on                               定义是否failback
#stonith_host *     baytech 10.0.0.3 mylogin mysecretpassword      定义stonith，stonith的主要作用是使出现问题的节点从集群环境中脱离，进而释放集群资源，避免两个节点争用一个资源的情形发生。保证共享数据的安全性和完整性。  
#stonith_host ken3  rps10 /dev/ttyS1 kathy 0 
#stonith_host kathy rps10 /dev/ttyS1 ken3 0 

#watchdog /dev/watchdog       俗称看门狗，若节点一分钟内没有心跳，将自我重新启动。使用该特性，需要在内核中载入内核模块，用来生成实际的设备文件， 如果系统中没有这个内核模块，就需要指定此模块，重新编译内核。
#编译完成输入 加载该模块。然后输入(应为10)， 输入(应为130)。最后，生成设备文件：  。即可使用此功能。  
#node   ken3     设置集群中的节点，注意：节点名必须与uname –n相匹配
node1 192.168.1.106
node2 192.168.1.107

#ping 10.10.10.254    ping指令以及下面的ping_group指令是用于建立伪集群成员，它们必须与下述#的ipfail指令一起使用，它们的作用是监测物理链路，也就是说如果集群节点与上述伪设备不相通，那么该节点也将无权接管资源或服务，它将释放掉资源。
#ping_group group1 10.10.10.254 10.10.10.253
#hbaping fc-card-name
#
#
#       Processes started and stopped with heartbeat.  Restarted unless
#               they exit with rc=100
#
#respawn userid /path/name/to/run
#respawn hacluster /usr/lib/heartbeat/ipfail     该选项是可选配置，列出与 heartbeat一起启动和关闭的进程，该进程一般是和heartbeat集成的插件，这些进程 遇到故障可以自动重新启动。
#最常用的进程是ipfail，此进程用于检测和处理网络故障， 需要配合ping语句指定的ping node来检测网络的连通性。其中hacluster表示启动ipfail进程的身份。 

#apiauth client-name gid=gidlist uid=uidlist   设置你所指定的启动进程的权限
#apiauth ipfail gid=haclient uid=hacluster         设置你所指定的启动进程的权限     

#hopfudge 1
#deadping 30
#hbgenmethod time
#realtime off
#debug 1
#apiauth ipfail uid=hacluster
#apiauth ccm uid=hacluster
#apiauth cms uid=hacluster
#apiauth ping gid=haclient uid=alanr,root
#apiauth  gid=haclient

#msgfmt  classic/netstring

# use_logd yes/no
#conn_logd_time 60
#compression    bz2
#compression_threshold 2

修改如下：

[root@node2 ~]# cat  /etc/ha.d/ha.cf  | grep -v ^# | grep -v ^$
logfacility     local0
keepalive 1
udpport 694
bcast   eth0            # Linux
auto_failback on
node  node1
node  node2
ping 192.168.1.253

haresources

该文件主要是为集群配置资源或者服务，（资源定义 +resource agent）

格式： nodename resource1::参数1::参数2::参数N resource2::参数1::参数2::参数 resourceN

nodename是集群中某一节点的名称（主节点）

resource每一个资源都是一个shell脚本，它们的搜索路径为/etc/init.d/（基于lsb，不允许传递参数，只提供stop|start|restart|status）

/etc/ha.d/resource.d   有/usr/lib64/heartbeat   基于ocf，开放式的集群框架，允许传递很多参数）

如：默认集群的vip，会配置在于vip同网段的网卡的别名上，主要是依靠于/usr/lib64/heartbeat/findif

定义资源如下：

node1 IPaddr::192.168.1.22/24/eth0  Filesystem::192.168.1.170:/web/htmldoc::/var/www/html::nfs httpd

启动heartbeart

[root@node1 ~]# service heartbeat start
Starting High-Availability services: 
2014/06/10_07:58:08 INFO:  Resource  stopped
                                                           [  OK  ]
[root@node1 ~]# ssh node2 service heartbeat start
Starting High-Availability services: 
2014/06/10_07:58:11 INFO:  Resource  stopped
[  OK  ]

查看资源情况

[root@node1 ~]# ifconfig  -a
eth0      Link encap:Ethernet  HWaddr 00:0C:29:29:C5:5D  
          inet addr:192.168.1.106  Bcast:255.255.255.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe29:c55d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:23218 errors:0 dropped:0 overruns:0 frame:0
          TX packets:23367 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:5099502 (4.8 MiB)  TX bytes:6811677 (6.4 MiB)

eth0:0    Link encap:Ethernet  HWaddr 00:0C:29:29:C5:5D  
          inet addr:192.168.1.22  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:1674 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1674 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:4341233 (4.1 MiB)  TX bytes:4341233 (4.1 MiB)

sit0      Link encap:IPv6--IPv4  
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

[root@node1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       16G  2.6G   13G  18% /
/dev/sda1              99M   13M   82M  14% /boot
tmpfs                 501M     0  501M   0% /dev/shm
/dev/scd0             3.5G  3.5G     0 100% /mnt
192.168.1.170:/web/htmldoc
                       16G  2.1G   13G  14% /var/www/html
[root@node1 ~]# netstat -tnpl | grep :80
tcp        0      0 :::80                       :::*                        LISTEN      29856/httpd

访问测试：

测试ha功能，heartbeat提供了测试脚本：

[root@node1 ~]# /usr/lib64/heartbeat/hb_standby 
2014/06/10_08:00:13 Going standby [all

查看日志：

[root@node1 ~]# tail -f /var/log/messages 
Jun 10 07:58:49 node1 heartbeat: [29288]: info: remote resource transition completed.
Jun 10 07:58:49 node1 heartbeat: [29288]: info: node1 wants to go standby [foreign]
Jun 10 07:58:50 node1 heartbeat: [29288]: info: standby: node2 can take our foreign resources
Jun 10 07:58:50 node1 heartbeat: [29903]: info: give up foreign HA resources (standby).
Jun 10 07:58:50 node1 heartbeat: [29903]: info: foreign HA resource release completed (standby).
Jun 10 07:58:50 node1 heartbeat: [29288]: info: Local standby process completed [foreign].
Jun 10 07:58:50 node1 heartbeat: [29288]: WARN: 1 lost packet(s)  [node2] [76:78]
Jun 10 07:58:50 node1 heartbeat: [29288]: info: remote resource transition completed.
Jun 10 07:58:50 node1 heartbeat: [29288]: info: No pkts missing from node2!
Jun 10 07:58:50 node1 heartbeat: [29288]: info: Other node completed standby takeover of foreign resources.
Jun 10 08:00:13 node1 heartbeat: [29288]: info: node1 wants to go standby [all]
Jun 10 08:00:14 node1 heartbeat: [29288]: info: standby: node2 can take our all resources
Jun 10 08:00:14 node1 heartbeat: [29938]: info: give up all HA resources (standby).
Jun 10 08:00:14 node1 ResourceManager[29951]: info: Releasing resource group: node1 IPaddr::192.168.1.22/24/eth0 Filesystem::192.168.1.170:/web/htmldoc::/var/www/html::nfs httpd
Jun 10 08:00:14 node1 ResourceManager[29951]: info: Running /etc/init.d/httpd  stop
Jun 10 08:00:14 node1 ResourceManager[29951]: info: Running /etc/ha.d/resource.d/Filesystem 192.168.1.170:/web/htmldoc /var/www/html nfs stop
Jun 10 08:00:14 node1 Filesystem[30025]: INFO: Running stop  192.168.1.170:/web/htmldoc on /var/www/html
Jun 10 08:00:14 node1 Filesystem[30025]: INFO: Trying to unmount /var/www/html
Jun 10 08:00:14 node1 Filesystem[30025]: INFO: unmounted /var/www/html successfully
Jun 10 08:00:14 node1 Filesystem[30014]: INFO:  Success
Jun 10 08:00:14 node1 ResourceManager[29951]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.22/24/eth0 stop
Jun 10 08:00:14 node1 IPaddr[30143]: INFO: ifconfig eth0:0 down
Jun 10 08:00:14 node1 avahi-daemon[3313]: Withdrawing address record  192.168.1.22 on eth0.
Jun 10 08:00:14 node1 IPaddr[30114]: INFO:  Success
Jun 10 08:00:14 node1 heartbeat: [29938]: info: all HA resource release completed (standby).
Jun 10 08:00:14 node1 heartbeat: [29288]: info: Local standby process completed [all].
Jun 10 08:00:15 node1 heartbeat: [29288]: WARN: 1 lost packet(s)  [node2] [164:166]
Jun 10 08:00:15 node1 heartbeat: [29288]: info: remote resource transition completed.
Jun 10 08:00:15 node1 heartbeat: [29288]: info: No pkts missing from node2!
Jun 10 08:00:15 node1 heartbeat: [29288]: info: Other node completed standby takeover of all resources.

再次访问

查看node2上的资源

[root@node2 ~]# ifconfig  -a
eth0      Link encap:Ethernet  HWaddr 00:0C:29:3F:4F:0A  
          inet addr:192.168.1.107  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe3f:4f0a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:11302 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8589 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:4326627 (4.1 MiB)  TX bytes:1420870 (1.3 MiB)

eth0:0    Link encap:Ethernet  HWaddr 00:0C:29:3F:4F:0A  
          inet addr:192.168.1.22  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:6126 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6126 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:10490665 (10.0 MiB)  TX bytes:10490665 (10.0 MiB)

sit0      Link encap:IPv6--IPv4  
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

[root@node2 ~]# mount
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/scd0 on /mnt type iso9660 (ro)
192.168.1.170:/web/htmldoc on /var/www/html type nfs (rw,addr=192.168.1.170)
[root@node2 ~]# netstat -tnpl | grep :80
tcp        0      0 :::80

转载于:https://blog.51cto.com/renfeng/1424737

weixin_33827590

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
基于heartbeat_v2_haresources搭建web ha cluster

heartbeat至今一共有3个版本，v1，v2，v3；v1比较古老，我们常用的是v2的，v2版本的heartbeat不仅有messaging layers，还具有crm功能。且crm功能有2个，分别为haresoures（兼容v1的haresources），与crm；v3即发展成3个项目，分别为heartbeat、pacemaker、以及cluster-glue。hearb...
复制链接

扫一扫