跟inthirties学Linux上的HA(Heartbeat之Heartbeat的应用和测试实例)

heartbeat实现的failover和take over的有效性
下面我们来测试我们的heartbeat实现的failover和take over的有效性。

先都停止两个节点的heartbeat的服务。
service heartbeat stop
先停掉节点二上的heartbeat,
[root@www2 ~]# service heartbeat stop
Stopping High-Availability services:
[ OK ]

再停掉节点一上面的heartbeat
[root@www1 ~]# service heartbeat stop
[root@www1 ~]# service heartbeat stop
Stopping High-Availability services:
[ OK ]

查询ifconfig
[root@www2 ~]# ifconfig eth0:0
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:01:1A:6C
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:185 Base address:0x1400

[root@www1 ~]# ifconfig eth0:0
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:56:84:0F
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:185 Base address:0x1400

可以看到eth0:0都是无效的

现在监控/var/log/ha-log
在两个节点上运行tail -s 1 -n 50 -f /var/log/ha-log
[root@www1 ~]# echo "" > /var/log/ha-log
[root@www1 ~]# tail -s 1 -n 50 -f /var/log/ha-log

[root@www2 ~]# echo "" > /var/log/ha-log
[root@www2 ~]# tail -s 1 -n 50 -f /var/log/ha-log

监控日志的变化

在节点一上启动heartbeat
[root@www1 ~]# service heartbeat start
Starting High-Availability services:
2010/01/31_08:42:43 INFO: IPaddr Resource is stopped
[ OK ]
节点以上的日志发生变化。
heartbeat[8952]: 2010/01/31_08:42:43 WARN: Traditional compression selected. Realtime behavior will likely be impacted(!)
heartbeat[8952]: 2010/01/31_08:42:43 info: See http://linux-ha.org/ha.cf/TraditionalCompressionDirective for more information.
heartbeat[8952]: 2010/01/31_08:42:43 WARN: Logging daemon is disabled --enabling logging daemon is recommended
heartbeat[8952]: 2010/01/31_08:42:43 info: **************************
heartbeat[8952]: 2010/01/31_08:42:43 info: Configuration validated. Starting heartbeat 2.0.4
heartbeat[8953]: 2010/01/31_08:42:43 info: heartbeat: version 2.0.4
heartbeat[8953]: 2010/01/31_08:42:43 info: Heartbeat generation: 7
heartbeat[8953]: 2010/01/31_08:42:43 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[8953]: 2010/01/31_08:42:43 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[8953]: 2010/01/31_08:42:43 info: Removing /var/run/heartbeat/rsctmp failed, recreating.
heartbeat[8953]: 2010/01/31_08:42:43 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
heartbeat[8953]: 2010/01/31_08:42:43 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 - Status: 1
heartbeat[8953]: 2010/01/31_08:42:43 info: glib: ping heartbeat started.
heartbeat[8953]: 2010/01/31_08:42:43 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[8953]: 2010/01/31_08:42:43 info: Local status now set to: 'up'
heartbeat[8953]: 2010/01/31_08:42:43 info: Exiting write_hostcachedata process 8961 returned rc 0.
heartbeat[8953]: 2010/01/31_08:42:44 info: Link www1:eth1 up.
heartbeat[8953]: 2010/01/31_08:42:44 info: Link 192.168.137.1:192.168.137.1 up.
heartbeat[8953]: 2010/01/31_08:42:44 info: Status update for node 192.168.137.1: status ping
服务启动成功,但是我们并没有看到192.168.137.37和httpd服务的信息,
访问 http://192.168.137.37

也无法访问,还记得我们在前面设置的initdead把,主机启动,需要等候120秒,等上2分钟。
我们看看日志的变化
heartbeat[8953]: 2010/01/31_08:44:44 WARN: node www2: is dead
heartbeat[8953]: 2010/01/31_08:44:44 info: Comm_now_up(): updating status to active
heartbeat[8953]: 2010/01/31_08:44:44 info: Local status now set to: 'active'
heartbeat[8953]: 2010/01/31_08:44:44 info: Starting child client "/usr/lib64/heartbeat/ipfail" (502,503)
heartbeat[8962]: 2010/01/31_08:44:44 info: Starting "/usr/lib64/heartbeat/ipfail" as uid 502 gid 503 (pid 8962)
heartbeat[8953]: 2010/01/31_08:44:44 WARN: No STONITH device configured.
heartbeat[8953]: 2010/01/31_08:44:44 WARN: Shared disks are not protected.
heartbeat[8953]: 2010/01/31_08:44:44 info: Resources being acquired from www2.
harc[8963]: 2010/01/31_08:44:44 info: Running /etc/ha.d/rc.d/status status
mach_down[8983]: 2010/01/31_08:44:44 info: /usr/lib64/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[8983]: 2010/01/31_08:44:44 info: mach_down takeover complete for node www2.
heartbeat[8953]: 2010/01/31_08:44:44 info: mach_down takeover complete.
heartbeat[8953]: 2010/01/31_08:44:44 info: Initial resource acquisition complete (mach_down)
IPaddr[9019]: 2010/01/31_08:44:45 INFO: IPaddr Resource is stopped
heartbeat[8964]: 2010/01/31_08:44:45 info: Local Resource acquisition completed.
harc[9126]: 2010/01/31_08:44:45 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[9126]: 2010/01/31_08:44:45 received ip-request-resp 192.168.137.37 OK yes
ResourceManager[9141]: 2010/01/31_08:44:45 info: Acquiring resource group: www1 192.168.137.37 ipvsadm httpd
IPaddr[9165]: 2010/01/31_08:44:45 INFO: IPaddr Resource is stopped
ResourceManager[9141]: 2010/01/31_08:44:45 info: Running /etc/ha.d/resource.d/IPaddr 192.168.137.37 start
IPaddr[9354]: 2010/01/31_08:44:46 INFO: /sbin/ifconfig eth0:0 192.168.137.37 netmask 255.255.255.0
IPaddr[9354]: 2010/01/31_08:44:46 INFO: Sending Gratuitous Arp for 192.168.137.37 on eth0:0 [eth0]
IPaddr[9354]: 2010/01/31_08:44:46 INFO: /usr/lib64/heartbeat/send_arp -i 500 -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.137.37 eth0 192.168.137.37 auto 192.168.137.37 ffffffffffff
IPaddr[9284]: 2010/01/31_08:44:46 INFO: IPaddr Success
ResourceManager[9141]: 2010/01/31_08:44:46 info: Running /etc/init.d/ipvsadm start
ResourceManager[9141]: 2010/01/31_08:44:47 info: Running /etc/init.d/httpd start
heartbeat[8953]: 2010/01/31_08:44:54 info: Local Resource acquisition completed. (none)
heartbeat[8953]: 2010/01/31_08:44:54 info: local resource transition completed.

出现我们需要的东西了吧, 看看日志的时间。
最上面的时间点2010/01/31_08:42:44
下面的时间点 2010/01/31_08:44:44
刚好2分钟,电脑就是这般,一点不差的

下面我们看看eth0:0的信息
[root@www1 ~]# ifconfig eth0:0
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:56:84:0F
inet addr:192.168.137.37 Bcast:192.168.137.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:185 Base address:0x1400
OK

访问 http://192.168.137.37 成功

下面我们看看节点二的状况吧。 节点二还没有启动,所以日志是空的,还没有变化。

现在我们启动节点二
[root@www2 ~]# service heartbeat start
Starting High-Availability services:
2010/01/31_01:41:28 INFO: IPaddr Resource is stopped
[ OK ]
节点启动,由于这时候是健康的,所以节点二不会take over资源,而是等待着崭露头角的机会
看看他的日志
heartbeat[13844]: 2010/01/31_01:41:29 WARN: Traditional compression selected. Realtime behavior will likely be impacted(!)
heartbeat[13844]: 2010/01/31_01:41:29 info: See http://linux-ha.org/ha.cf/TraditionalCompressionDirective for more information.
heartbeat[13844]: 2010/01/31_01:41:29 WARN: Logging daemon is disabled --enabling logging daemon is recommended
heartbeat[13844]: 2010/01/31_01:41:29 info: **************************
heartbeat[13844]: 2010/01/31_01:41:29 info: Configuration validated. Starting heartbeat 2.0.4
heartbeat[13845]: 2010/01/31_01:41:29 info: heartbeat: version 2.0.4
heartbeat[13845]: 2010/01/31_01:41:29 info: Heartbeat generation: 2
heartbeat[13845]: 2010/01/31_01:41:29 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[13845]: 2010/01/31_01:41:29 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[13845]: 2010/01/31_01:41:29 info: Removing /var/run/heartbeat/rsctmp failed, recreating.
heartbeat[13845]: 2010/01/31_01:41:29 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
heartbeat[13845]: 2010/01/31_01:41:29 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 - Status: 1
heartbeat[13845]: 2010/01/31_01:41:29 info: glib: ping heartbeat started.
heartbeat[13845]: 2010/01/31_01:41:29 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[13845]: 2010/01/31_01:41:29 info: Local status now set to: 'up'
heartbeat[13845]: 2010/01/31_01:41:29 info: Exiting write_hostcachedata process 13853 returned rc 0.
heartbeat[13845]: 2010/01/31_01:41:30 info: Link 192.168.137.1:192.168.137.1 up.
heartbeat[13845]: 2010/01/31_01:41:30 info: Status update for node 192.168.137.1: status ping
heartbeat[13845]: 2010/01/31_01:41:30 info: Link www2:eth1 up.
heartbeat[13845]: 2010/01/31_01:41:30 info: Link www1:eth1 up.
heartbeat[13845]: 2010/01/31_01:41:30 info: Status update for node www1: status active
heartbeat[13845]: 2010/01/31_01:41:31 info: Exiting write_hostcachedata process 13855 returned rc 0.
harc[13854]: 2010/01/31_01:41:31 info: Running /etc/ha.d/rc.d/status status
heartbeat[13845]: 2010/01/31_01:41:31 info: Comm_now_up(): updating status to active
heartbeat[13845]: 2010/01/31_01:41:31 info: Local status now set to: 'active'
heartbeat[13845]: 2010/01/31_01:41:31 info: Starting child client "/usr/lib64/heartbeat/ipfail" (502,503)
heartbeat[13865]: 2010/01/31_01:41:31 info: Starting "/usr/lib64/heartbeat/ipfail" as uid 502 gid 503 (pid 13865)
heartbeat[13845]: 2010/01/31_01:41:31 info: remote resource transition completed.
heartbeat[13845]: 2010/01/31_01:41:31 info: remote resource transition completed.
heartbeat[13845]: 2010/01/31_01:41:31 info: Local Resource acquisition completed. (none)
heartbeat[13845]: 2010/01/31_01:41:32 info: www1 wants to go standby [foreign]
heartbeat[13845]: 2010/01/31_01:41:33 info: standby: acquire [foreign] resources from www1
heartbeat[13866]: 2010/01/31_01:41:33 info: acquire local HA resources (standby).
heartbeat[13866]: 2010/01/31_01:41:33 info: local HA resource acquisition completed (standby).
heartbeat[13845]: 2010/01/31_01:41:33 info: Standby resource acquisition done [foreign].
heartbeat[13845]: 2010/01/31_01:41:33 info: Initial resource acquisition complete (auto_failback)
heartbeat[13845]: 2010/01/31_01:41:34 info: remote resource transition completed.
heartbeat[13845]: 2010/01/31_01:41:45 info: www2 wants to go standby [foreign]
heartbeat[13845]: 2010/01/31_01:41:46 info: standby: www1 can take our foreign resources
heartbeat[13876]: 2010/01/31_01:41:46 info: give up foreign HA resources (standby).
ResourceManager[13886]: 2010/01/31_01:41:46 info: Releasing resource group: www1 192.168.137.37 ipvsadm httpd
ResourceManager[13886]: 2010/01/31_01:41:46 info: Running /etc/init.d/httpd stop
ResourceManager[13886]: 2010/01/31_01:41:46 ERROR: Return code 1 from /etc/init.d/httpd
ResourceManager[13886]: 2010/01/31_01:41:47 info: Retrying failed stop operation [httpd]
ResourceManager[13886]: 2010/01/31_01:41:47 info: Running /etc/init.d/httpd stop
ResourceManager[13886]: 2010/01/31_01:41:48 ERROR: Return code 1 from /etc/init.d/httpd
ResourceManager[13886]: 2010/01/31_01:41:49 info: Retrying failed stop operation [httpd]
ResourceManager[13886]: 2010/01/31_01:41:49 info: Running /etc/init.d/httpd stop
ResourceManager[13886]: 2010/01/31_01:41:49 ERROR: Return code 1 from /etc/init.d/httpd
ResourceManager[13886]: 2010/01/31_01:41:50 info: Retrying failed stop operation [httpd]
ResourceManager[13886]: 2010/01/31_01:41:50 info: Running /etc/init.d/httpd stop
ResourceManager[13886]: 2010/01/31_01:41:50 ERROR: Return code 1 from /etc/init.d/httpd
ResourceManager[13886]: 2010/01/31_01:41:51 info: Retrying failed stop operation [httpd]
ResourceManager[13886]: 2010/01/31_01:41:51 info: Running /etc/init.d/httpd stop
ResourceManager[13886]: 2010/01/31_01:41:52 ERROR: Return code 1 from /etc/init.d/httpd
ResourceManager[13886]: 2010/01/31_01:41:53 info: Retrying failed stop operation [httpd]
ResourceManager[13886]: 2010/01/31_01:41:53 info: Running /etc/init.d/httpd stop
ResourceManager[13886]: 2010/01/31_01:41:53 ERROR: Return code 1 from /etc/init.d/httpd
ResourceManager[13886]: 2010/01/31_01:41:54 info: Retrying failed stop operation [httpd]
ResourceManager[13886]: 2010/01/31_01:41:54 info: Running /etc/init.d/httpd stop
ResourceManager[13886]: 2010/01/31_01:41:54 ERROR: Return code 1 from /etc/init.d/httpd
ResourceManager[13886]: 2010/01/31_01:41:56 info: Retrying failed stop operation [httpd]
ResourceManager[13886]: 2010/01/31_01:41:56 info: Running /etc/init.d/httpd stop
ResourceManager[13886]: 2010/01/31_01:41:56 ERROR: Return code 1 from /etc/init.d/httpd
ResourceManager[13886]: 2010/01/31_01:41:57 info: Retrying failed stop operation [httpd]
ResourceManager[13886]: 2010/01/31_01:41:57 info: Running /etc/init.d/httpd stop
ResourceManager[13886]: 2010/01/31_01:41:57 ERROR: Return code 1 from /etc/init.d/httpd
ResourceManager[13886]: 2010/01/31_01:41:58 info: Retrying failed stop operation [httpd]
ResourceManager[13886]: 2010/01/31_01:41:58 info: Running /etc/init.d/httpd stop
ResourceManager[13886]: 2010/01/31_01:41:59 ERROR: Return code 1 from /etc/init.d/httpd
ResourceManager[13886]: 2010/01/31_01:42:00 info: Retrying failed stop operation [httpd]
ResourceManager[13886]: 2010/01/31_01:42:00 info: Running /etc/init.d/httpd stop
ResourceManager[13886]: 2010/01/31_01:42:00 ERROR: Return code 1 from /etc/init.d/httpd
ResourceManager[13886]: 2010/01/31_01:42:00 ERROR: Resource script for httpd probably not LSB-compliant.
ResourceManager[13886]: 2010/01/31_01:42:00 WARN: it (httpd) MUST succeed on a stop when already stopped
ResourceManager[13886]: 2010/01/31_01:42:00 WARN: Machine reboot narrowly avoided!
ResourceManager[13886]: 2010/01/31_01:42:00 info: Running /etc/init.d/ipvsadm stop
ResourceManager[13886]: 2010/01/31_01:42:00 info: Running /etc/ha.d/resource.d/IPaddr 192.168.137.37 stop
IPaddr[14409]: 2010/01/31_01:42:01 INFO: IPaddr Success
heartbeat[13876]: 2010/01/31_01:42:01 info: foreign HA resource release completed (standby).
heartbeat[13845]: 2010/01/31_01:42:01 info: Local standby process completed [foreign].
heartbeat[13845]: 2010/01/31_01:42:02 WARN: 1 lost packet(s) for [www1] [209:211]
heartbeat[13845]: 2010/01/31_01:42:02 info: remote resource transition completed.
heartbeat[13845]: 2010/01/31_01:42:02 info: No pkts missing from www1!
heartbeat[13845]: 2010/01/31_01:42:02 info: Other node completed standby takeover of foreign resources.

这里的日志可以看的清楚他的动作
由于节点1是healthy的,所以节点二要关闭自己的资源,虽然我们前面的httpd和ipvadm都是stop状态的。这个节点二还是有点死板,还是会去stop一把,所以这里出现的ERROR就是这个原因,可以忽略吧。

现在看看节点二的服务
[root@www2 ~]# ps -ef | grep httpd
没有httpd的进程
节点一上的
[root@www2 ~]# ps -ef | grep httpd
出现一堆的httpd进程,现在启动的实验都已经成功。

正常状况下,虚拟IP和资源建立在节点一(primary)上。节点二(standby)上没有虚拟IP没有资源

下面我们来看看主机由异常的状况。

先做断网实验
ifdown掉节点的eth0
注意监控两个节点的日志变化
在节点一上运行ifdown eth0
节点一网断了,日志我们就看不到了
不过我们可以看到节点二上的日志变化
heartbeat[13845]: 2010/01/31_01:50:03 info: www1 wants to go standby [all]
heartbeat[13845]: 2010/01/31_01:50:29 info: standby: acquire [all] resources from www1
heartbeat[14550]: 2010/01/31_01:50:29 info: acquire all HA resources (standby).
ResourceManager[14560]: 2010/01/31_01:50:29 info: Acquiring resource group: www1 192.168.137.37 ipvsadm httpd
IPaddr[14584]: 2010/01/31_01:50:29 INFO: IPaddr Resource is stopped
ResourceManager[14560]: 2010/01/31_01:50:30 info: Running /etc/ha.d/resource.d/IPaddr 192.168.137.37 start
IPaddr[14773]: 2010/01/31_01:50:30 INFO: /sbin/ifconfig eth0:0 192.168.137.37 netmask 255.255.255.0
IPaddr[14773]: 2010/01/31_01:50:30 INFO: Sending Gratuitous Arp for 192.168.137.37 on eth0:0 [eth0]
IPaddr[14773]: 2010/01/31_01:50:30 INFO: /usr/lib64/heartbeat/send_arp -i 500 -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.137.37 eth0 192.168.137.37 auto 192.168.137.37 ffffffffffff
IPaddr[14703]: 2010/01/31_01:50:30 INFO: IPaddr Success
ResourceManager[14560]: 2010/01/31_01:50:30 info: Running /etc/init.d/ipvsadm start
ResourceManager[14560]: 2010/01/31_01:50:31 info: Running /etc/init.d/httpd start
heartbeat[14550]: 2010/01/31_01:50:31 info: all HA resource acquisition completed (standby).
heartbeat[13845]: 2010/01/31_01:50:31 info: Standby resource acquisition done [all].
heartbeat[13845]: 2010/01/31_01:50:32 info: remote resource transition completed.
很清楚,我们现在的节点二终于有了露脸的机会,已经take over了节点一上的资源,现在虚拟IP和资源都运行在了节点二上

查查看就知道了,先看虚拟IP
[root@www2 ~]# ifconfig eth0:0
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:01:1A:6C
inet addr:192.168.137.37 Bcast:192.168.137.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:185 Base address:0x1400
虚拟IP已经建立好了
httpd的资源
[root@www2 ~]# ps -ef | grep httpd
出现一堆,已经OK叻。

下面我们恢复eth0
节点二上的信息
heartbeat[13845]: 2010/01/31_01:54:36 info: www2 wants to go standby [foreign]
heartbeat[13845]: 2010/01/31_01:54:36 info: standby: www1 can take our foreign resources
heartbeat[14941]: 2010/01/31_01:54:36 info: give up foreign HA resources (standby).
ResourceManager[14951]: 2010/01/31_01:54:36 info: Releasing resource group: www1 192.168.137.37 ipvsadm httpd
ResourceManager[14951]: 2010/01/31_01:54:36 info: Running /etc/init.d/httpd stop
ResourceManager[14951]: 2010/01/31_01:54:48 info: Running /etc/init.d/ipvsadm stop
IPaddr 192.168.137.37 stop
IPaddr[15112]: 2010/01/31_01:54:48 INFO: /sbin/route -n del -host 192.168.137.37
IPaddr[15112]: 2010/01/31_01:54:48 INFO: /sbin/ifconfig eth0:0 192.168.137.37 down
IPaddr[15112]: 2010/01/31_01:54:48 INFO: IP Address 192.168.137.37 released
IPaddr[15042]: 2010/01/31_01:54:48 INFO: IPaddr Success
heartbeat[14941]: 2010/01/31_01:54:48 info: foreign HA resource release completed (standby).
heartbeat[13845]: 2010/01/31_01:54:48 info: Local standby process completed [foreign].
heartbeat[13845]: 2010/01/31_01:54:50 WARN: 1 lost packet(s) for [www1] [602:604]
heartbeat[13845]: 2010/01/31_01:54:50 info: remote resource transition completed.
heartbeat[13845]: 2010/01/31_01:54:50 info: No pkts missing from www1!
heartbeat[13845]: 2010/01/31_01:54:50 info: Other node completed standby takeover of foreign resources.

日志相当的清楚了
节点二又到了standby的模式,配角的生活又开始叻

现在又进入了主角的时代叻。

有兴趣的朋友可以继续做不同的实验来进行验证,这里inthirties就介绍到这里,有疑问的朋友可以联系我一起研究。

这个HA是个简易的方案,仅仅只是做了failover,在HA里还有lb和parallel computing,这里的方式都没有实现了。与Oracle的RAC相比简直是功能太有局限叻。这个是低成本的HA方案。针对于与低成本需要的应用可以试试。

在下一篇文章里,inthirties还会以Oracle服务为例,通过heartbeat的方式实现低成本的HA为例,进一步带领大家学习和了解heartbeat的具体应用。敬请关注。



  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

inthirties

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值