CentOS安装heartbeat并配置

最新推荐文章于 2023-01-08 16:55:24 发布

redfox1985

最新推荐文章于 2023-01-08 16:55:24 发布

阅读量8.1k

点赞数

分类专栏：服务器文章标签： heartbeat ha centos

本文链接：https://blog.csdn.net/redfox1985/article/details/49994179

版权

服务器专栏收录该内容

2 篇文章 0 订阅

订阅专栏

第一步：因为Centos默认的yum源里没有heartbeat的资源，所以首先需要将fedora的源添加到centos系统中

wget http://mirrors.sohu.com/fedora-epel/6/x86_64/epel-release-6-8.noarch.rpm

rpm -ivf epel-release-6-8.noarch.rpm

第二步： 执行 yum install heartbeat即可将heartbeat安装到服务器上。

通过查看heatbeat的配置文件，在/etc/ha.d/目录下有个README.config文件，

You need three configuration files to make heartbeat happy,
and they all go in this directory.

They are:
        ha.cf           Main configuration file
        haresources     Resource configuration file
        authkeys        Authentication information

These first two may be readable by everyone, but the authkeys file
must not be.

The good news is that sample versions of these files may be found in
the documentation directory (providing you installed the documentation).

If you installed heartbeat using rpm packages then
this command will show you where they are on your system:
               <span style="color:#ff0000;"> rpm -q heartbeat -d</span>

If you installed heartbeat using Debian packages then
the documentation should be located in /usr/share/doc/heartbeat

执行上面的命令可以看到相关的目录

[root@TEST-43 log]# rpm -q heartbeat -d
/usr/share/doc/heartbeat-3.0.4/AUTHORS
/usr/share/doc/heartbeat-3.0.4/COPYING
/usr/share/doc/heartbeat-3.0.4/COPYING.LGPL
/usr/share/doc/heartbeat-3.0.4/ChangeLog
/usr/share/doc/heartbeat-3.0.4/README
/usr/share/doc/heartbeat-3.0.4/apphbd.cf
/usr/share/doc/heartbeat-3.0.4/authkeys
/usr/share/doc/heartbeat-3.0.4/ha.cf
/usr/share/doc/heartbeat-3.0.4/haresources
/usr/share/man/man1/cl_status.1.gz
/usr/share/man/man1/hb_addnode.1.gz
/usr/share/man/man1/hb_delnode.1.gz
/usr/share/man/man1/hb_standby.1.gz
/usr/share/man/man1/hb_takeover.1.gz
/usr/share/man/man5/authkeys.5.gz
/usr/share/man/man5/ha.cf.5.gz
/usr/share/man/man8/apphbd.8.gz
/usr/share/man/man8/heartbeat.8.gz

然后拷贝authkeys、ha.cf和haresources到ha.d目录下。

ha.cf配置：

修改相关的内容，在这里遇到了一些问题，首先是ha.cf，这里是配置当前服务的信息的，主要是下面几部分，

首先是心跳周期以及检测到失败的周期

#     A note on specifying "how long" times below...
#
#     The default time unit is seconds
#          10 means ten seconds
#
#     You can also specify them in milliseconds
#          1500ms means 1.5 seconds
#
#
#     keepalive: how long between heartbeats?<span style="color:#ff0000;">心跳发送的周期</span>
#
keepalive 2
#
#     deadtime: how long-to-declare-host-dead?<span style="color:#ff0000;">多久收不到心跳认为对方宕机</span>
#
#          If you set this too low you will get the problematic
#          split-brain (or cluster partition) problem.
#          See the FAQ for how to use warntime to tune deadtime.
#
deadtime 30
#
#     warntime: how long before issuing "late heartbeat" warning?<span style="color:#ff0000;">多久收不到心跳会告警</span>
#     See the FAQ for how to use warntime to tune deadtime.
#
warntime 10
#
#
#     Very first dead time (initdead)<span style="color:#ff0000;">系统启动后隔多久检测心跳</span>
#
#     On some machines/OSes, etc. the network takes a while to come up
#     and start working right after you've been rebooted.  As a result
#     we have a separate dead time for when things first come up.
#     It should be at least twice the normal dead time.
#
initdead 120

网络配置：

#
#     What UDP port to use for bcast/ucast communication?针对广播或者单播
#
#udpport     3800
#
#     Baud rate for serial ports...
#
#baud     19200
#    
#     serial     serialportname ...如果不是采用网络方式，而是串口的话就需要设置下面的参数
#serial     /dev/ttyS0     # Linux
#serial     /dev/cuaa0     # FreeBSD
#serial /dev/cuad0      # FreeBSD 6.x
#serial     /dev/cua/a     # Solaris
#
#     Set up a multicast heartbeat medium设置组播参数
#     mcast [dev] [mcast group] [port] [ttl] [loop]
#
#     [dev]          device to send/rcv heartbeats on接受和发送心跳用的网卡【eth0 eht1 。。。】
#     [mcast group]     multicast group to join (class D multicast address设置一个组播地址，D类网址
#               224.0.0.0 - 239.255.255.255)
#     [port]          udp port to sendto/rcvfrom (set this value to the组播的端口
#               same value as "udpport" above)
#     [ttl]          the ttl value for outbound heartbeats.  this effects
#               how far the multicast packet will propagate.  (0-255)
#               Must be greater than zero.
#     [loop]          toggles loopback for outbound multicast heartbeats.是否支持组播数据包的循环，默认是不支持
#               if enabled, an outbound packet will be looped back and
#               received by the interface it was sent on. (0 or 1)
#               Set this value to zero.
#         
#
#mcast eth0 225.0.0.1 694 1 0
#
#     Set up a unicast / udp heartbeat medium设置单播，也就是点对点通信，适合只有两台主机的系统，需要配合上面的端口来使用
#     ucast [dev] [peer-ip-addr]
#
#     [dev]          device to send/rcv heartbeats on
#     [peer-ip-addr]     IP address of peer to send packets to
#
ucast eth0 172.17.1.45

设置是否启动会自动恢复

#
#     About boolean values...
#
#     Any of the following case-insensitive values will work for true:
#          true, on, yes, y, 1
#     Any of the following case-insensitive values will work for false:
#          false, off, no, n, 0
#
#
#
#     auto_failback:  determines whether a resource will
#     automatically fail back to its "primary" node, or remain
#     on whatever node is serving it until that node fails, or
#     an administrator intervenes.当主服务恢复后接管备份服务的状态，备份服务切换到待命状态
#
#     The possible values for auto_failback are:
#          on     - enable automatic failbacks
#          off     - disable automatic failbacks
#          legacy     - enable automatic failbacks in systems
#               where all nodes do not yet support
#               the auto_failback option.
#
#     auto_failback "on" and "off" are backwards compatible with the old
#          "nice_failback on" setting.
#
#     See the FAQ for information on how to convert
#          from "legacy" to "on" without a flash cut.
#          (i.e., using a "rolling upgrade" process)
#
#     The default value for auto_failback is "legacy", which
#     will issue a warning at startup.  So, make sure you put
#     an auto_failback directive in your ha.cf file.
#     (note: auto_failback can be any boolean or "legacy")
#
auto_failback on

集群主机列表：

#      
#     Tell what machines are in the cluster
#     node     nodename ...     -- must match uname -n 节点名称必须与主机中uname -n查到的一致
node     TEST-43
node     TEST-45

authkeys配置：

authkeys是配置集群中服务认证方式的文件
#
auth 1 指定的认证方式，数字是后面所列的方法的一种，最简单的是crc，最复杂的是sha1，下面的列表必须按顺序来，不能只有2，没有1，而且这里指定的方法必须是下面列表有的
1 crc
#2 sha1 HI!
#3 md5 Hello

haresources配置：

这里是配置集群资源的，主要是集群的虚拟IP以及由heartbeat所管理的服务和资源

#
#     This is a list of resources that move from machine to machine as
#     nodes go down and come up in the cluster.  Do not include
#     "administrative" or fixed IP addresses in this file.
#
# <VERY IMPORTANT NOTE>
#     The haresources files MUST BE IDENTICAL on all nodes of the cluster.
#
#     The node names listed in front of the resource group information
#     is the name of the preferred node to run the service.  It is
#     not necessarily the name of the current machine.  If you are running
#     auto_failback ON (or legacy), then these services will be started
#     up on the preferred nodes - any time they're up.
#
#     If you are running with auto_failback OFF, then the node information
#     will be used in the case of a simultaneous start-up, or when using
#     the hb_standby {foreign,local} command.
#
#     BUT FOR ALL OF THESE CASES, the haresources files MUST BE IDENTICAL.
#     If your files are different then almost certainly something
#     won't work right.
# </VERY IMPORTANT NOTE>
#
#    
#     We refer to this file when we're coming up, and when a machine is being
#     taken over after going down.
#
#     You need to make this right for your installation, then install it in
#     /etc/ha.d
#
#     Each logical line in the file constitutes a "resource group".
#     A resource group is a list of resources which move together from
#     one node to another - in the order listed.  It is assumed that there
#     is no relationship between different resource groups.  These
#     resource in a resource group are started left-to-right, and stopped
#     right-to-left.  Long lists of resources can be continued from line
#     to line by ending the lines with backslashes ("\").
#
#     These resources in this file are either IP addresses, or the name
#     of scripts to run to "start" or "stop" the given resource. 资源可以是IP地址或者启动或者停止这些资源的脚本名字
#
#     The format is like this:格式如下：节点名 空格 资源1 空格 资源2 空格 。。。。heartbeat会把资源传递给处理脚本来执行资源
#
#node-name resource1 resource2 ... resourceN
#
#
#     If the resource name contains an :: in the middle of it, the
#     part after the :: is passed to the resource script as an argument.
#       Multiple arguments are separated by the :: delimeter
#
#     In the case of IP addresses, the resource script name IPaddr is
#     implied.<span style="color:#ff0000;">IP地址也是一种资源，意思就是创建虚拟IP，资源的格式一般是【资源名::资源参数】IP地址的资源名是IPaddr，可以省略，例如IPaddr::172.17.1.49==172.17.1.49</span>
#
#     For example, the IP address 135.9.8.7 could also be represented
#     as IPaddr::135.9.8.7
#
#     THIS IS IMPORTANT!!     vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
#
#     The given IP address is directed to an interface which has a route
#     to the given address.  This means you have to have a net route
#     set up outside of the High-Availability structure.  We don't set it
#     up here -- we key off of it.
#
#     The broadcast address for the IP alias that is created to support
#     an IP address defaults to the highest address on the subnet.
#
#     The netmask for the IP alias that is created defaults to the same
#     netmask as the route that it selected in in the step above.
#
#     The base interface for the IPalias that is created defaults to the
#     same netmask as the route that it selected in in the step above.
#
#     If you want to specify that this IP address is to be brought up
#     on a subnet with a netmask of 255.255.255.0, you would specify
#     this as IPaddr::135.9.8.7/24 . 
#
#     If you wished to tell it that the broadcast address for this subnet
#     was 135.9.8.210, then you would specify that this way:
#          IPaddr::135.9.8.7/24/135.9.8.210
#
#     If you wished to tell it that the interface to add the address to
#     is eth0, then you would need to specify it this way:
#          IPaddr::135.9.8.7/24/eth0
#
#       And this way to specify both the broadcast address and the
#       interface:
#          IPaddr::135.9.8.7/24/eth0/135.9.8.210
#
#     The IP addresses you list in this file are called "service" addresses,
#     since they're they're the publicly advertised addresses that clients
#     use to get at highly available services.
#
#     For a hot/standby (non load-sharing) 2-node system with only
#     a single service address,
#     you will probably only put one system name and one IP address in here.
#     The name you give the address to is the name of the default "hot"
#     system.
#
#     Where the nodename is the name of the node which "normally" owns the
#     resource.  If this machine is up, it will always have the resource
#     it is shown as owning.
#
#     The string you put in for nodename must match the uname -n name
#     of your machine.  Depending on how you have it administered, it could
#     be a short name or a FQDN.
#
#-------------------------------------------------------------------
#
#     Simple case: One service address, default subnet and netmask
#          No servers that go up and down with the IP address
#
#just.linux-ha.org     135.9.216.110 <span style="color:#ff0000;">这样表示，本机设置一个虚拟IP 135.9.216.110，默认网关，默认网卡，默认子网</span>
#
#-------------------------------------------------------------------
#
#     Assuming the adminstrative addresses are on the same subnet...
#     A little more complex case: One service address, default subnet
#     and netmask, and you want to start and stop http when you get
#     the IP address...
#
#just.linux-ha.org     135.9.216.110 http <span style="color:#ff0000;">这样表示，本机设置一个虚拟IP 135.9.216.110，默认网关，默认网卡，默认子网，并且绑定一个apache服务</span>
#-------------------------------------------------------------------
#
#     A little more complex case: Three service addresses, default subnet
#     and netmask, and you want to start and stop http when you get
#     the IP address...
#
#just.linux-ha.org     135.9.216.110 135.9.215.111 135.9.216.112 httpd<span style="color:#ff0000;">这样表示，本机设置一个虚拟IP 列表，默认网关，默认网卡，默认子网，并且绑定一个apache服务</span>
#-------------------------------------------------------------------
#
#     One service address, with the subnet, interface and bcast addr
#       explicitly defined.
#
#just.linux-ha.org     135.9.216.3/28/eth0/135.9.216.12 httpd<span style="color:#ff0000;">这样表示，本机设置一个虚拟IP 135.9.216.3，默认网关，28位子网掩码，eth0网卡，广播地址是135.9.216.12 【主机号全1】，并且绑定一个apache服务</span>
#
#-------------------------------------------------------------------
#
#       An example where a shared filesystem is to be used.
#       Note that multiple aguments are passed to this script using
#       the delimiter '::' to separate each argument.
#
#node1  10.0.0.170 Filesystem::/dev/sda1::/data1::ext2 <span style="color:#ff0000;">这行表示使用一个共享磁盘，双冒号为多个参数间隔符，这句话的意思是，分配node1一个虚拟IP 10.0.0.170，默认网关、默认网卡、默认子网，然后执行"mount -t /ext2 data1 /dev/sda1”</span>
#
#     Regarding the node-names in this file:
#
#     They must match the names of the nodes listed in ha.cf, which in turn
#     must match the `uname -n` of some node in the cluster.  So they aren't
#     virtual in any sense of the word.
#

上面的配置，要求主节点和备用节点必须完全一致，否则无法接管资源

问题：

安装配置完毕后，发现启动失败，错误日志为：

Nov 20 09:40:44 TEST-43 heartbeat: [6961]: WARN: heartbeat: udp port 3800 reserved for service "pwgpsi".
Nov 20 09:40:44 TEST-43 heartbeat: [6961]: info: Pacemaker support: false
Nov 20 09:40:44 TEST-43 heartbeat: [6961]: WARN: Logging daemon is disabled --enabling logging daemon is recommended

Nov 20 09:40:44 TEST-43 heartbeat: [6961]: info: **************************
Nov 20 09:40:44 TEST-43 heartbeat: [6961]: info: Configuration validated. Starting heartbeat 3.0.4
Nov 20 09:40:44 TEST-43 heartbeat: [6962]: info: heartbeat: version 3.0.4
Nov 20 09:40:45 TEST-43 heartbeat: [6962]: info: Heartbeat generation: 1447925863
Nov 20 09:40:45 TEST-43 heartbeat: [6962]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0
Nov 20 09:40:45 TEST-43 heartbeat: [6962]: info: glib: ucast: bound send socket to device: eth0
<span style="color:#ff0000;">Nov 20 09:40:45 TEST-43 heartbeat: [6962]: ERROR: glib: ucast: error setting option SO_REUSEPORT(w): Protocol not available
Nov 20 09:40:45 TEST-43 heartbeat: [6962]: ERROR: make_io_childpair: cannot open ucast eth0</span>
Nov 20 09:40:46 TEST-43 heartbeat: [6966]: CRIT: Emergency Shutdown: Master Control process died.
Nov 20 09:40:46 TEST-43 heartbeat: [6966]: CRIT: Killing pid 6962 with SIGTERM
Nov 20 09:40:46 TEST-43 heartbeat: [6966]: CRIT: Emergency Shutdown(MCP dead): Killing ourselves.

在RedHat的bug列表里有相关的描述，但是具体原因没有找到，下面是bug描述和讨论，有一个编译的补丁可以解决这个问题

=======================================================================================

irst Last Prev Next This bug is not in your last search results.

Bug 1028127 - Heartbeat not working on centos6 after last update [NEEDINFO]

Status:	CLOSED ERRATA

Aliases:	None

Product:	Fedora EPEL
Component:	heartbeat (Show other bugs)

Version:	el6
Hardware:	x86_64 Linux

Priority	low Severity low
Target Milestone:	---
Target Release:	---
Assigned To:	Kevin Fenzi
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:

URL:
Whiteboard:
Keywords:

Duplicates:	1028957 (view as bug list)
Depends On:	869826
Blocks:
	Show dependency tree / graph

Reported:	2013-11-07 12:44 EST by qmic
Modified:	2015-04-16 04:37 EDT (History)
CC List:	20 users (show)

See Also:
Fixed In Version:	heartbeat-3.0.4-2.el6
Doc Type:	Bug Fix
Doc Text:

Clone Of:
Environment:
Last Closed:	2013-12-17 19:19:11 EST








Flags:	tony.abohwo: needinfo? tony.abohwo: needinfo?

Attachments	(Terms of Use)
make heartbeat compile against rhel cluster-glue-libs-devel 1.0.5 and fix init script (2.84 KB, patch) 2013-11-30 05:01 EST, Lars Ellenberg	no flags	Details \| Diff
Add an attachment (proposed patch, testcase, etc.)

External Trackers
Tracker	ID	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	638323	None	None	None	Never

Description qmic 2013-11-07 12:44:42 EST

Description of problem:
Heartbeat`s init.d scripts stopped working after update.


Version-Release number of selected component (if applicable):
heartbeat-3.0.4-1.el6.x86_64

How reproducible:
Try to restart service

Steps to Reproduce:
1. service heartbeat status or service heartbeat start
2.
3.

Actual results:
Nothing (empty line) 

Expected results:
heartbeat OK [pid 1377 et al] is running on et [et]...
Or any error. 

Additional info:
In logs there is  no any information related

Comment 1 Robert Scheck 2013-11-07 14:14:33 EST

I wonder if this is maybe SELinux related? Can you try "setenforce 0"?

Comment 2 Kevin Fenzi 2013-11-07 17:38:36 EST

Also a 'sh -x /etc/init.d/heartbeat start' output might be good if you can attach it.

Comment 3 Nick Hope 2013-11-07 20:07:44 EST

This appears to be related to the update of the resource-agents package (resource-agents-3.9.2-21.el6_4.8). The HA_BIN variable is set in /usr/lib/ocf/resource.d/heartbeat/.ocf-directories.

In previous versions of the package, this was set to the following:

: ${HA_BIN:=/usr/lib64/heartbeat}

In resource-agents-3.9.2-21.el6_4.8, HA_BIN is set to the following:

: ${HA_BIN:=/usr/libexec/heartbeat}

The heartbeat package from EPEL places the heartbeat binary in the /usr/lib64/heartbeat directory. A workaround of symlinking /usr/lib64/heartbeat/heartbeat to /usr/libexec/heartbeat/heartbeat seems to work for now.

Comment 4 qmic 2013-11-08 07:18:05 EST

This isn`t SELinux related.
Nick Hope, thanks that worked!

Comment 5 Kevin Fenzi 2013-11-08 10:35:35 EST

Would someone be willing to file a new bug or move this over to resource-agents then? 

it would be better if it would get addressed there if possible.

Comment 6 qmic 2013-11-08 13:42:27 EST

I cannot find resource-agents on components list.

Comment 7 Kevin Fenzi 2013-11-08 15:25:22 EST

Are you looking under RHEL?

https://bugzilla.redhat.com/enter_bug.cgi?product=Red%20Hat%20Enterprise%20Linux%206

Comment 9 Fabio Massimo Di Nitto 2013-11-11 04:00:33 EST

David, it looks like this is a consequence of multilib support and PATH handling.

Either heartbeat or resource-agents should ship a compatibility symlink.

In any case this bug doesn´t affect RHEL since we don´t ship or support heartbeat.

Comment 10 Tuomo Soini 2013-11-11 15:10:44 EST

I'd suggest doing the same change for HA_BIN in heartbeat package.

Comment 11 Robert Scheck 2013-11-11 15:12:00 EST

Cross-filed case 00979415 on the Red Hat customer portal as it breaks all of
our existing legacy heartbeat setups as it seems. The case requests that it's 
solved either or by the compatibility symlink and if the compat symlink gets
refused for RHEL that Red Hat supports the EPEL package maintainer.

Comment 12 Fabio Massimo Di Nitto 2013-11-11 15:19:03 EST

(In reply to Robert Scheck from comment #11)
> Cross-filed case 00979415 on the Red Hat customer portal as it breaks all of
> our existing legacy heartbeat setups as it seems. The case requests that
> it's 
> solved either or by the compatibility symlink and if the compat symlink gets
> refused for RHEL that Red Hat supports the EPEL package maintainer.

We have no problem to help the EPEL package maintainer.

The reason why the path was changed was to accommodate multilib requirement for RHEL.

I don't honestly know if using a compatibility symlink will cause multilib issues, but definitely we will find a solution.

As for the customer case, please remember that we do not support heartbeat in RHEL and that even resource-agents for pacemaker (the heartbeat portions) are still TechPreview (not supported) by RHEL.

Comment 13 Robert Scheck 2013-11-11 15:28:32 EST

(In reply to Fabio Massimo Di Nitto from comment #12)
> I don't honestly know if using a compatibility symlink will cause multilib
> issues, but definitely we will find a solution.

That's why I wonder why you use %{_libexecdir} instead of %{_libdir} now, but
that's out of my target.

> As for the customer case, please remember that we do not support heartbeat
> in RHEL and that even resource-agents for pacemaker (the heartbeat portions)
> are still TechPreview (not supported) by RHEL.

Yes, I am aware about that. But it's still not nice to break things thus I try
to provide valuable feedback - and still care about our own customers at work.

You also should keep in mind that AFAIK Linbit (the DRBD developers) still 
supports Heartbeat and their customers won't be amused about this, I guess.
Correct me but a customer case is the only real chance to get some attention
to IMHO not so well thought package changes during RHEL 6.x (sorry!).

Comment 14 Kevin Fenzi 2013-11-11 15:34:07 EST

FYI, from my side (heartbeat maintainer in EPEL), I'm happy to try and make changes or add any interested folks who would like to co-maintain that might have more time than I do to help maintain.

Comment 15 Fabio Massimo Di Nitto 2013-11-11 15:37:57 EST

(In reply to Robert Scheck from comment #13)
> (In reply to Fabio Massimo Di Nitto from comment #12)
> > I don't honestly know if using a compatibility symlink will cause multilib
> > issues, but definitely we will find a solution.
> 
> That's why I wonder why you use %{_libexecdir} instead of %{_libdir} now, but
> that's out of my target.
> 
> > As for the customer case, please remember that we do not support heartbeat
> > in RHEL and that even resource-agents for pacemaker (the heartbeat portions)
> > are still TechPreview (not supported) by RHEL.
> 
> Yes, I am aware about that. But it's still not nice to break things thus I
> try
> to provide valuable feedback - and still care about our own customers at
> work.
> 

Yes we all agree. That's why we will find a fix in one way or another (that being in resource-agents or heartbeat in EPEL).

> You also should keep in mind that AFAIK Linbit (the DRBD developers) still 
> supports Heartbeat and their customers won't be amused about this, I guess.

Linbit ships their own set of packages. I doubt they will be affected by this changes. But then again, we never claimed full support to allow us to change packages as necessary. EPEL and RHEL packaging guidelines are different.

> Correct me but a customer case is the only real chance to get some attention
> to IMHO not so well thought package changes during RHEL 6.x (sorry!).

Not really no.. the bug was getting attention without the customer case. GSS can't do much either way. They don't maintain EPEL nor they provide support for TP components.

We can also agree that this breakage could have been avoided tho.

Comment 16 Fabio Massimo Di Nitto 2013-11-11 15:39:00 EST

(In reply to Kevin Fenzi from comment #14)
> FYI, from my side (heartbeat maintainer in EPEL), I'm happy to try and make
> changes or add any interested folks who would like to co-maintain that might
> have more time than I do to help maintain.

Let's wait Wed for David to come back and discuss quickly the correct fix.

Comment 17 Lars Ellenberg 2013-11-12 03:29:32 EST

My colleague pointed me to this bug.
If you get other bugs regarding heartbeat or resource-agents,
feel free to add me to Cc proactively,
so later I can not pretend I did not know about it ;-)

A new *resource-agents* version, 3.9.6 is overdue,
it was announced to be released last month :-/

As soon as I find the time we'll release that.
(Which should be by the end of this November (honestly!)
 (or someone else takes over)).

Then I can move all binaries and other stuff that does not belong into libdir
(according to what guidelines? can someone shoot me a link please?)
in the *heartbeat* package to libexecdir as well,
also drop the useless legacy init script dependency on the
HA_BIN definition meanwhile split into the resource-agents package,
tag a heartbeat 3.0.6, and have that require resource-agents >= 3.9.6
for good measure.

Meanwhile, symlinks or patching the heartbeat init script is necessary
for recent (newer than ~ July 2013) resource-agents with old heartbeat.

AFAICS, the only heartbeat "dependency" on that variable is in fact the
use of $HA_BIN/heartbeat in the init script, so only patching that
would be an option as well, but the the heartbeat package would still
violate the "multilib guidelines", I guess (did I mention I'd like a pointer
as to which guidelines to apply?), by putting executable binaries into libdir.

Other ideas?

    Lars

Comment 18 Fabio Massimo Di Nitto 2013-11-14 05:43:36 EST

*** Bug 1028957 has been marked as a duplicate of this bug. ***

Comment 19 David Vossel 2013-11-15 11:09:45 EST

(In reply to Lars Ellenberg from comment #17)
> My colleague pointed me to this bug.
> If you get other bugs regarding heartbeat or resource-agents,
> feel free to add me to Cc proactively,
> so later I can not pretend I did not know about it ;-)
> 
> A new *resource-agents* version, 3.9.6 is overdue,
> it was announced to be released last month :-/
> 
> As soon as I find the time we'll release that.
> (Which should be by the end of this November (honestly!)
>  (or someone else takes over)).
> 
> Then I can move all binaries and other stuff that does not belong into libdir
> (according to what guidelines? can someone shoot me a link please?)

Here are a couple of documents I found searching google.

1. https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/s1-filesystem-fhs.html

"/usr/lib, used for object files and libraries that are not designed to be directly utilized by shell scripts or users"

"/usr/libexec, contains small helper programs called by other programs"


2. http://www.centos.org/docs/5/html/Deployment_Guide-en-US/s1-filesystem-fhs.html

"lib/ contains object files and libraries that are not designed to be directly utilized by users or shell scripts"

"libexec/ directory contains small helper programs called by other programs"


> in the *heartbeat* package to libexecdir as well,
> also drop the useless legacy init script dependency on the
> HA_BIN definition meanwhile split into the resource-agents package,
> tag a heartbeat 3.0.6, and have that require resource-agents >= 3.9.6
> for good measure.

Sounds great :)

> 
> Meanwhile, symlinks or patching the heartbeat init script is necessary
> for recent (newer than ~ July 2013) resource-agents with old heartbeat.
> 
> AFAICS, the only heartbeat "dependency" on that variable is in fact the
> use of $HA_BIN/heartbeat in the init script, so only patching that
> would be an option as well,

If the init script is the only reference to HA_BIN, then that seems like the best fix.

>  but the the heartbeat package would still
> violate the "multilib guidelines", I guess (did I mention I'd like a pointer
> as to which guidelines to apply?), by putting executable binaries into
> libdir.

Have heartbeat install the binaries in /usr/libexec/heartbeat as well. But do not depend on the resource-agent package to create or even use that directory (no coupling of the two packages)

> Other ideas?
>
>     Lars

Comment 20 Christoph Galuschka 2013-11-25 15:19:31 EST

would it be worthwhile to also file a bug against EPEL for hearbeat (if not allready done)?

Comment 21 David Vossel 2013-11-25 15:49:47 EST

(In reply to Christoph Galuschka from comment #20)
> would it be worthwhile to also file a bug against EPEL for hearbeat (if not
> allready done)?

Actually, this bug should be moved to heartbeat.

Comment 22 Christoph Galuschka 2013-11-26 05:17:36 EST

David, can you do that, or is a new bug required?
Thanks

Comment 23 Robert Scheck 2013-11-26 06:13:57 EST

Is there any workaround or intermediate solution that could go into the EPEL
heartbeat package? I read about the symlink, so is there any reason that we do
not simply put that one into the heartbeat package in EPEL?

Comment 24 David Vossel 2013-11-26 09:43:47 EST

(In reply to Robert Scheck from comment #23)
> Is there any workaround or intermediate solution that could go into the EPEL
> heartbeat package? I read about the symlink, so is there any reason that we
> do
> not simply put that one into the heartbeat package in EPEL?

I do not maintain that package, but from what I've gathered there's a workaround involving a symlink in the /usr/libexec/heartbeat folder that points to a binary in the /usr/lib/heartbeat folder.

Comment 25 Christoph Galuschka 2013-11-26 09:47:34 EST

David: Thanks

Comment 26 Robert Scheck 2013-11-26 10:24:40 EST

(In reply to David Vossel from comment #24)
> I do not maintain that package, but from what I've gathered there's a
> workaround involving a symlink in the /usr/libexec/heartbeat folder that
> points to a binary in the /usr/lib/heartbeat folder.

I tried to rebuild heartbeat with that symlink for the time being, however
any heartbeat rebuild on RHEL 6.5 fails with "error: 'HA_LIBHBDIR' undeclared 
(first use in this function)" due to cluster-glue-libs-devel-1.0.5-6.el6. It
was first reported via bug #869826.

Comment 27 Lars Ellenberg 2013-11-26 11:02:52 EST

please just fix the heartbeat init script for now.

Something like this
 
sed -i -e 's,\$HA_BIN/heartbeat,$HEARTBEAT,g' heartbeat/init.d/heartbeat.in
sed -i -e $'/^### END INIT INFO/ a\\\n\\\nHEARTBEAT=@libdir@/heartbeat\n\n' heartbeat/init.d/heartbeat.in

[sorry, right now I'm deep in other deep shit ;-)
 or I'd prepare, test and commit upstream this "hotfix" myself]

There is no point using "$HA_BIN" in this init script at all.
Much less so now that variable is in fact provided by an other project
which just happens to be some decendend of something that used to be
part of heartbeat...

Comment 28 Robert Scheck 2013-11-26 11:05:52 EST

As long as heartbeat does not build on RHEL 6 anymore, we are even not able
to ship this workaround, but users have to execute these sed calls themself.

Comment 29 Lars Ellenberg 2013-11-26 11:08:55 EST

uhm, those sed calls where meant to be executed in a heartbeat source checkout ;)
but yes, similar would work to patch the init script in a "live" system.

But you are right, as long as heartbeat does not currently rebuild at all
for your setups, this does not really help much.

Comment 30 Dimitri Maziuk 2013-11-29 14:06:18 EST

(quote Lars Ellenberg from comment #29)
... 
> But you are right, as long as heartbeat does not currently rebuild at all
> for your setups, this does not really help much.

And if rebuilding it is going to take some work, you might consider adding a little more work while you're at it and repackaging it so it doesn't depend on any of the current linux-ha stuff like cluster-glue and resource-agents.

The rationale is that a) heartbeat is unmaintained legacy code while the curent stuff is still a moving target so there's sure to be more incompatible changes coming to break things. And that "things" heartbeat tends to be used for are  mission-critical installations and when those break b) people tend to get really pissed off at Fedora, RedHat, and Linux-HA, which is not something anyone wants (including the pissed-off: it's bad for our blood pressure).

Comment 31 Lars Ellenberg 2013-11-29 15:32:33 EST

(In reply to Dimitri Maziuk from comment #30)
> (quote Lars Ellenberg from comment #29)
> ... 
> > But you are right, as long as heartbeat does not currently rebuild at all
> > for your setups, this does not really help much.
> 
> And if rebuilding it is going to take some work, you might consider adding a
> little more work while you're at it and repackaging it 

Yes, as has been mentioned before,
we likely need to repackage it according to the "guidelines",
put binaries in "libexec", put dynamically loadable stuff in lib{,64}.


> so it doesn't depend
> on any of the current linux-ha stuff like cluster-glue and resource-agents.

Certainly impossible.
That has all been one monolithic package.
When the package split was done,
the messaging and ipc was moved into "glue",
so heartbeat will always have to depend on glue.

And the resource-agents, well, have been moved into "resource-agents",
and what remains in heartbeat is only wrappers to use the "ocf" resource agents
from the old "haresources" ResourceManager script.

Which means this dependency is *also* a hard dependency.

> The rationale is that a) heartbeat is unmaintained legacy code

We very much *do* maintain it.
We do use it in production a lot,
and we do use it in production also with pacemaker on top.

Just that there have not been much code changes in a while
does not mean it is unmaintained.
It's "stable".
It is pretty ugly in its insides at many places, but so are many other projects.
But it does work.

So appart from the occasional bug report about possible misbehaviour
in certain corner cases, there won't be any "developement" happening:
it just has no intention to go anywhere, being "feature complete".

> while the curent stuff is still a moving target so there's sure to be more
> incompatible changes coming to break things.

Uhm, no.
Glue was no moving target at all, it only tried to keep up with all the
breakage caused by pacemaker progess ;-)

And after having rewritten much everything that used to be glue,
using libqb as messaging layer, using its own re-written-from scratch
lrmd, using its own stonithd and so on,
pacemaker no longer depends on glue in any way
(unless you try to use it on top of heartbeat,
as it then needs to use that messaging layer).

> And that "things" heartbeat
> tends to be used for are  mission-critical installations and when those
> break b) people tend to get really pissed off at Fedora, RedHat, and
> Linux-HA, which is not something anyone wants (including the pissed-off:
> it's bad for our blood pressure).

Never change a running system ;-)

Or use a supported stack.

(You also did notice that this "does not build" breakage is because
of some incompatibility with a 3.5 years old glue-devel,
and that heartbeat *does* build fine against
the current glue devel, right?)

Comment 32 Dimitri Maziuk 2013-11-29 16:01:13 EST

(In reply to Lars Ellenberg from comment #31)
...
> We very much *do* maintain it.
> We do use it in production a lot,
> and we do use it in production also with pacemaker on top.
> 
> Just that there have not been much code changes in a while
> does not mean it is unmaintained.
> It's "stable".

and

> Or use a supported stack.

When I did my Software Engineering 101 "maintained" meant "supported". (Admittedly it wasn't this century and I sort of stopped paying attention sometime after design patterns. So maybe it doesn't anymore, what do I know.)

> Glue was no moving target at all, it only tried to keep up with all the
> breakage caused by pacemaker progess ;-)

"Keep up" by "not moving" is way too zen for me.

So cluster-glue is "stable" and does not have to keep up with pacemaker progress anymore, great. Could there be a package heartbeat-resource-agents that similarly doesn't have to keep up while standing still?

> Never change a running system ;-)

Why bother releasing updates if I'm supposed to never install them?

Comment 33 Lars Ellenberg 2013-11-30 04:58:29 EST

(In reply to Dimitri Maziuk from comment #32)
> (In reply to Lars Ellenberg from comment #31)
> ...
> > We very much *do* maintain it.
> > We do use it in production a lot,
> > and we do use it in production also with pacemaker on top.
> > 
> > Just that there have not been much code changes in a while
> > does not mean it is unmaintained.
> > It's "stable".
> 
> and
> 
> > Or use a supported stack.
> 
> When I did my Software Engineering 101 "maintained" meant "supported".
> (Admittedly it wasn't this century and I sort of stopped paying attention
> sometime after design patterns. So maybe it doesn't anymore, what do I know.)

I don't think "maintained" and "supported" are exact synonyms.
But even for your narrow definition:
SuSE supports it.
Linbit supports it.
Others may, too...
 
> > Glue was no moving target at all, it only tried to keep up with all the
> > breakage caused by pacemaker progess ;-)
> 
> "Keep up" by "not moving" is way too zen for me.

Think prey and archer (standing still, but keeping up his aim)
 
> So cluster-glue is "stable" and does not have to keep up with pacemaker
> progress anymore, great. Could there be a package heartbeat-resource-agents
> that similarly doesn't have to keep up while standing still?

Hey I have been pissed of by all those unneccessary and incompatible
package splits and breakages all the time, believe it or not.

But neither the "unmaintained" nor the "moving target" thing
was your central point, I hope, and the "unmaintained" simply struck
a nerve here, I very much dislike heartbeat being spoken of as not maintained,
when it is.

> > Never change a running system ;-)
> 
> Why bother releasing updates if I'm supposed to never install them?

But it is ok to blame upstream heartbeat for a breakage that happened
in the 3.5 years old RHEL-only glue package?

I was simply trying to get through that
 * you are blaming the wrong package
 * you are suggesting the wrong cure...

But alas, there is no point.
it's never the fault of he who broke it,
it is always he who does not longer work.

I'll add an attachment to "make heartbeat compile again" in a minute,
to make you all happy again... who cares for the blame game, after all,
we all want "it" to "just work", right?

*sigh*

Comment 34 Lars Ellenberg 2013-11-30 05:01:15 EST

Created attachment 830886 [details]
make heartbeat compile against rhel cluster-glue-libs-devel 1.0.5 and fix init script

make heartbeat compile against rhel cluster-glue-libs-devel 1.0.5
and fix init script which was broken by recent HA_BIN redefinition
in resource-agents.

May be incomplete, and resulting package is untested.
But should get you going.
Let me know the final fix you settle on,
so I can push similar in heartbeat upstream, too.

Comment 35 Kevin Fenzi 2013-11-30 12:50:26 EST

Thanks very much Lars. I actually dug into this earlier in the week, but didn't get a fully building package, I was going to look again today. ;) 

Anyhow, I took your patch and added another one to fix another compile issue and have a scratch build: 

http://koji.fedoraproject.org/koji/taskinfo?taskID=6241293

Could folks please test this and provide feedback? If it looks good, I can push an update.

Comment 36 Christoph Galuschka 2013-11-30 14:09:39 EST

Kevin: I will try to test those builds next week on monday (hopefully). Thanks for providing them.

Comment 37 Dimitri Maziuk 2013-11-30 16:13:26 EST

(In reply to Kevin Fenzi from comment #35)

> Could folks please test this and provide feedback? If it looks good, I can
> push an update.

/etc/init.d/ patch gets heartbeat started -- I have it running since the day it broke.

There are 11 other binaries and a bunch os .so's in subdirs in /usr/lib64/heartbeat installed by heartbeat rpm. There are 3 binaries in /usr/libexec/heartbeat installed by resource-agents.

Does anyone know if anything else in the location formerly known as $HA_BIN is ever used?

Comment 38 Dimitri Maziuk 2013-11-30 16:36:13 EST

(In reply to Lars Ellenberg from comment #33)

> SuSE supports it.
> Linbit supports it.

As a centos user filing a bug against epel rpm, I'm glad to hear that. Especially from a guy known to start his replies on linux-ha mailing list with "are you a paying suse customer?"

You have a day job, we get it, so do I, so does Kevin. Until linbit and suse start paying him for his effort, the fact that they support their customers is irrlevant here.

> I was simply trying to get through that
>  * you are blaming the wrong package
>  * you are suggesting the wrong cure...

I am saying that *epel* package X is not being developed or updated, nor supported by anything other than goodness of Kevin's heart. It depends on *rhel* package Y that is being developed as part of redhat's effort to bring software Z into their distro. To me that spells high chance of X being broken by an update to Z, which will be unfixable in the current package layout.

You are suggesting that will never happen -- after all the history of linux-ha and all the splits and forks you yourself aren't happy about. I hope you're right, it'd sure make my life easier.

Comment 39 Christoph Galuschka 2013-12-01 07:01:51 EST

(In reply to Kevin Fenzi from comment #35)
> 
> http://koji.fedoraproject.org/koji/taskinfo?taskID=6241293
> 
> Could folks please test this and provide feedback? If it looks good, I can
> push an update.

Kevin: I tested today with two VMs (x64 and i386) and 6.5 and it is looking good. I will do some more testing on real iron and for a longer period tomorrow.

Comment 40 Christoph Galuschka 2013-12-02 05:11:40 EST

Kevin. I now have the new heartbeat also running on real iron machines (6.5 with current resource-agents) where IPs are monitored - so far looking good.

Comment 41 Lars Ellenberg 2013-12-02 05:42:38 EST

(In reply to Dimitri Maziuk from comment #38)
> (In reply to Lars Ellenberg from comment #33)
> 
> > SuSE supports it.
> > Linbit supports it.
> 
> As a centos user filing a bug against epel rpm, I'm glad to hear that.
> Especially from a guy known to start his replies on linux-ha mailing list
> with "are you a paying suse customer?"

You know, thank you, but I'm the Lars. (Ellenberg)
*That* is *the other* Lars ;-) (Marowsky-Brée).

> You have a day job, we get it, so do I, so does Kevin

You realized that I was certainly NOT fighting with Kevin,
but telling you (Dimitri) that even for your incorrect and too narrow definition
of "maintained", you are wrong?

> Until linbit and suse
> start paying him for his effort, the fact that they support their customers
> is irrlevant here.

Absolutely.

> I am saying that *epel* package X is not being developed or updated,

You are complaining that an unsupported stack stopped working.
And that trying to fix that by rebuilding the no longer working package
does not work either, because yet an other package dropped a define
and heartbeat has not compiled on the platform you chose
*for over three years*

And if I say "don't complain that it broke, you used an unsupported stack.
Your options are to either use a supported stack,
or fix who has broken it (not who was broken by it)",
then you complain even louder, and forbid me to speak?
 :-)

I would have skipped this comment altogether
if you had not mistaken me for lmb; me is lge.
So if you insist on arguing with me further about definitions,
wording, and the blame game, take it to private mail ...

Comment 42 Lars Ellenberg 2013-12-02 05:57:07 EST

Guys,

if you rebuild heartbeat anyways,
please use current mercurial tip
not 3 years old 3.0.4.

Ok, "current" as in, was committed 8 month ago.
(Strange. I thought I wrote those patches together with those other 2012 ones.)

There are several highly relevant fixes.
Flaky network (first packet drop, then communication loss) could
 * potentially cause heartbeat core to eat up 100 % cpu, 
 * potentially preventing heartbeat from ever connecting to that node again
And
 * potentially heartbeat would segfault given bad timing of a node dead event
 * potentially heartbeat would not even notice a node as dead
   if it had massive packet loss just before that
 * in certain situations (again: packet loss helps to trigger it)
   the ccm would not converge, so nodes would not agree on membership

If it helps I can tag that as 3.0.6 "soon".
I'll cross-post this comment in the other bug, too.

Comment 43 Kevin Fenzi 2013-12-02 11:19:15 EST

Well, how about we push this old one out with the fix for this issue now... and then when you tag 3.0.6 push it out as soon as it's available?

I'd prefer to get people working as they were before without too many changes in one update...

Comment 44 Fedora Update System 2013-12-02 11:43:30 EST

heartbeat-3.0.4-2.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/heartbeat-3.0.4-2.el6

Comment 45 Dimitri Maziuk 2013-12-02 13:21:45 EST

(In reply to Lars Ellenberg from comment #41)

> You know, thank you, but I'm the Lars. (Ellenberg)
> *That* is *the other* Lars ;-) (Marowsky-Brée).

Sorry, brain fart. Always knew working on weekend's bad for me.

> And if I say "don't complain that it broke, you used an unsupported stack.
> Your options are to either use a supported stack,
> or fix who has broken it (not who was broken by it)",
> then you complain even louder, and forbid me to speak?

No, I'm saying we can't fix who has broken it because it has nothing to do with any of us. It's RHEL pulling in the other stack in order to pull in the other other stack (RDO) -- the situation known as "too many cooks". 

I suppose I can add resource-agents to yum.conf's exclude list...

Comment 46 Dimitri Maziuk 2013-12-02 13:46:30 EST

> (In reply to Lars Ellenberg from comment #41)

PS. I get it, it's heartbeat's bug: wrong path in /etc/init.d script. This time. Next time RHEL fixes something else in resource-agents and it won't be. What matters is heartbeat will be broke again.

Comment 47 Fedora Update System 2013-12-02 20:23:15 EST

Package heartbeat-3.0.4-2.el6:
* should fix your issue,
* was pushed to the Fedora EPEL 6 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=epel-testing heartbeat-3.0.4-2.el6'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-EPEL-2013-12278/heartbeat-3.0.4-2.el6
then log in and leave karma (feedback).

Comment 48 Robert Scheck 2013-12-04 17:37:09 EST

Lars, Kevin, thank you very much for your time and work! The update in EPEL
testing works here fine and as expected. No issues so far. Great work! :)

Comment 49 Christoph Galuschka 2013-12-10 12:16:29 EST

Kevin: Is it possible the change to heartbeat also changed the behaviour of ifconfig (as it does no longer return the HA-IP)? 'ip addr list' works however.

Comment 50 Dimitri Maziuk 2013-12-10 12:50:13 EST

(In reply to Christoph Galuschka from comment #49)
> Kevin: Is it possible the change to heartbeat also changed the behaviour of
> ifconfig (as it does no longer return the HA-IP)? 'ip addr list' works
> however.

My guess is it's whatever resource-agent that ends up handling IPAddr that did it. I've a vague memory that the pacemaker's resource agent's been doing that for quite some time, it just probably never made it into redhat until now.

Which is why I was bitching upthread: wrong $HA_BIN location is not the only thing that changed. RHEL is making more changes to resource-agents and they have no reason to maintain compatibility with EPEL's heartbeat RPM.

Comment 51 Kevin Fenzi 2013-12-10 12:55:23 EST

Can we stop piling on unrelated issues here please?

The ip addr thing I seem to recall was a difference between using: 
/etc/ha.d/resource.d/IPaddr
and
/etc/ha.d/resource.d/IPaddr2
resources, but I don't recall fully. 

If you have another concrete bug, please file a new bug on it. Thanks.

Comment 52 Dimitri Maziuk 2013-12-10 13:21:39 EST

(In reply to Kevin Fenzi from comment #51)

> The ip addr thing I seem to recall was a difference between using: 
> /etc/ha.d/resource.d/IPaddr
> and
> /etc/ha.d/resource.d/IPaddr2

No. Unless you're saying the Lars got unstuck in space-time and rewrote several haresources files here to invoke IPaddr2 instead of IPaddr while I was yum-updating resource-agents.

Comment 53 Fedora Update System 2013-12-17 19:19:11 EST

heartbeat-3.0.4-2.el6 has been pushed to the Fedora EPEL 6 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 54 Smile hosting 2014-02-11 08:16:18 EST

Hi almighties,

just applied this minor update to our few cluster and guess what -> clusters is dead . I explain below :

This new version update( 3.0.4-1.el6 to 3.0.4-2.el6 ) just broke our clusters 's unicast fonctionnality taking origine to this new  patch puches by this bugreport version.

related broken patch : heartbeat-3.0.4-duplicate-ucast.patch

the result is heartbeat cannot start cause ucast (used in /etc/ha.d/ha.cf) cannot work with following error in logs :
info: glib: Starting serial heartbeat on tty /dev/ttyS1 (19200 baud)
info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on br1
info: glib: ucast: bound send socket to device: br1
ERROR: glib: ucast: error setting option SO_REUSEPORT(w): Protocol not available
ERROR: make_io_childpair: cannot open ucast br1
CRIT: Emergency Shutdown: Master Control process died.
CRIT: Killing pid 11194 with SIGTERM
CRIT: Killing pid 11198 with SIGTERM
CRIT: Killing pid 11199 with SIGTERM
CRIT: Emergency Shutdown(MCP dead): Killing ourselves.

When i downgrade to version 3.0.4-1.el6 it's all working back well.
So the patch applied in this bug report create a regression on unicast functionality.

Please rollback or finish/stabilize the patch "heartbeat-3.0.4-duplicate-ucast.patch".

I can test a new version if you want me to , before you push it to stable REPO.

Regards, aurelien Lemaire from Smile Hosting.

Comment 55 Lars Ellenberg 2014-02-12 09:29:41 EST

Actual bug is: SO_REUSEPORT defined by headers, but not supported by kernel.

> --- Comment #17 from Smile hosting <hosting@smile.fr> ---
> just applied this minor update to our few cluster and guess what -> clusters is
> dead . I explain below :
> 
> This new version update( 3.0.4-1.el6 to 3.0.4-2.el6 ) just broke our clusters
> 's unicast fonctionnality taking origine to this new  patch puches by this
> bugreport version.
> 
> related broken patch : heartbeat-3.0.4-duplicate-ucast.patch
> 
> the result is heartbeat cannot start cause ucast (used in /etc/ha.d/ha.cf)
> cannot work with following error in logs :
> info: glib: Starting serial heartbeat on tty /dev/ttyS1 (19200 baud)
> info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on br1
> info: glib: ucast: bound send socket to device: br1
> ERROR: glib: ucast: error setting option SO_REUSEPORT(w): Protocol not
> available

> When i downgrade to version 3.0.4-1.el6 it's all working back well.
> So the patch applied in this bug report create a regression on unicast
> functionality.

No, it does not.

But at the time that -1 binary package was built, SO_REUSEPORT was not
defined...  when the -2 binary package was built, apparently the define
was there.  But your kernel does not support it (yet).

If you try to rebuild the -1 package now, against the same headers
the -2 package was built, it will break with compile time error.
That compile time error was what said patch tries to trivially fix.

Only that this then breaks at runtime when compiled against
too recent headers but run against too old linux kernel.

See upstream mercurial for my attempt at fixing this:
http://hg.linux-ha.org/heartbeat-STABLE_3_0/rev/37f57a36a2dd

I suggest you update to upstream mercurial,
or replace your ucast patch with the above.


Cheers,
        Lars

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

Comment 56 Smile hosting 2014-02-12 12:12:51 EST

Hi Lars,

Thansk for your answser.
I now owe you the context :

I'm in full vanilla centos6 (as this bugreport talk about) up-to-date without any home-cook rebuild of any package. with vanille EL6 REpo for heartbeat packages

uname -a  : Linux HOSTNAME 2.6.32-358.14.1.el6.x86_64 #1 SMP Tue Jul 16 23:51:20 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Vanilla package : 
me-filer5:~# rpm -qa |egrep 'heartbeat|kernel-'
heartbeat-libs-3.0.4-1.el6.x86_64 (working version)
heartbeat-3.0.4-1.el6.x86_64  (working version)
kernel-2.6.32-358.14.1.el6.x86_64


The EPEL EL6 repo currently proposed an update for both heartbeat package to :
heartbeat-libs-3.0.4-2.el6.x86_64 (not working version)
heartbeat-3.0.4-2.el6.x86_64 (not working version)

The patch i refered to is the one i noticed in a diff of the -1 and -2 SRC.rpm  of heartbeat package made by the EPEL heartbeat package maintener.

I confirmed the -2 version of those package does not work anymore on a vanilla EL6 with vanilla EL6 kernel which i supposed is not intended but unfotunate.

Hope it helps understanding the situation.

regards, Aurelien Lemaire

Comment 57 Christoph Galuschka 2014-02-12 12:21:09 EST

As this might solve it - with regards to the kernel age - the one you use is from 6.4 and thus from July last year. 2.6.32-431.5.1 would be the current version.
All I can add is, those packages from EPEL work fine for me.

Comment 58 Robert Scheck 2014-02-12 12:25:29 EST

Please also note that Fedora EPEL 6 officially only supports the latest
version of RHEL/CentOS 6, so currently RHEL 6.5. It might work (or not)
with RHEL 6.4, 6.3, etc. Aside of that I can not see any issues here...

Comment 59 Smile hosting 2014-02-13 04:57:41 EST

Hi,

Now i owe you all my facepalm meaculpa.

My Puppet servant start excluding my kernel from updates since the 2.6.32-358 .Thus i was indeed using an old kernel.

In conclusion after fix and update:  
with vanilla UP-TO-DATE kernel the EPEL heartbeat package work like a charm with following packages version :
rpm -qa |egrep 'heartbeat|kernel-2.6'
heartbeat-libs-3.0.4-2.el6.x86_64
kernel-2.6.32-431.el6.x86_64
heartbeat-3.0.4-2.el6.x86_64
kernel-2.6.32-358.14.1.el6.x86_64

My bad.

Regards, Aurélien Lemaire

Comment 60 Tony 2015-01-28 05:20:48 EST

Hi all,

I have an issue where I have configured heartbeat to run on a 2 node httpd cluster, heartbeat seems to be running when i check logs and I see that node1 comes up on web page, but when i shutdown heartbeat so that node2 would failover, it does not work. This is the log i see on node1... 

tailf /var/log/ha-log
Jan 28 09:48:04 node1 heartbeat: [2420]: info: Configuration validated. Starting heartbeat 3.0.4
Jan 28 09:48:04 node1 heartbeat: [2421]: info: heartbeat: version 3.0.4
Jan 28 09:48:04 node1 heartbeat: [2421]: info: Heartbeat generation: 1422435302
Jan 28 09:48:04 node1 heartbeat: [2421]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
Jan 28 09:48:04 node1 heartbeat: [2421]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
Jan 28 09:48:04 node1 heartbeat: [2421]: info: G_main_add_TriggerHandler: Added signal manual handler
Jan 28 09:48:04 node1 heartbeat: [2421]: info: G_main_add_TriggerHandler: Added signal manual handler
Jan 28 09:48:04 node1 heartbeat: [2421]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Jan 28 09:48:04 node1 heartbeat: [2421]: info: Local status now set to: 'up'
Jan 28 09:48:04 node1 heartbeat: [2421]: info: Link node1:eth0 up.
Jan 28 09:50:05 node1 heartbeat: [2421]: WARN: node node2: is dead
Jan 28 09:50:05 node1 heartbeat: [2421]: info: Comm_now_up(): updating status to active
Jan 28 09:50:05 node1 heartbeat: [2421]: info: Local status now set to: 'active'
Jan 28 09:50:05 node1 heartbeat: [2421]: WARN: No STONITH device configured.
Jan 28 09:50:05 node1 heartbeat: [2421]: WARN: Shared disks are not protected.
Jan 28 09:50:05 node1 heartbeat: [2421]: info: Resources being acquired from node2.
harc(default)[2433]:    2015/01/28_09:50:05 info: Running /etc/ha.d//rc.d/status status
mach_down(default)[2469]:       2015/01/28_09:50:05 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down(default)[2469]:       2015/01/28_09:50:05 info: mach_down takeover complete for node node2.
Jan 28 09:50:05 node1 heartbeat: [2421]: info: mach_down takeover complete.
Jan 28 09:50:05 node1 heartbeat: [2421]: info: Initial resource acquisition complete (mach_down)
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[2501]:  2015/01/28_09:50:05 INFO:  Resource is stopped
Jan 28 09:50:05 node1 heartbeat: [2434]: info: Local Resource acquisition completed.
harc(default)[2588]:    2015/01/28_09:50:06 info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp
ip-request-resp(default)[2588]: 2015/01/28_09:50:06 received ip-request-resp 172.31.29.243 OK yes
ResourceManager(default)[2611]: 2015/01/28_09:50:06 info: Acquiring resource group: node1 172.31.29.243 httpd
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[2639]:  2015/01/28_09:50:06 INFO:  Resource is stopped
ResourceManager(default)[2611]: 2015/01/28_09:50:06 info: Running /etc/ha.d/resource.d/IPaddr 172.31.29.243 start
IPaddr(IPaddr_172.31.29.243)[2737]:     2015/01/28_09:50:06 INFO: Adding inet address 172.31.29.243/20 with broadcast address 172.31.31.255 to device eth0
IPaddr(IPaddr_172.31.29.243)[2737]:     2015/01/28_09:50:06 INFO: Bringing device eth0 up
IPaddr(IPaddr_172.31.29.243)[2737]:     2015/01/28_09:50:06 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-172.31.29.243 eth0 172.31.29.243 auto not_used not_used
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[2723]:  2015/01/28_09:50:06 INFO:  Success
Jan 28 09:50:16 node1 heartbeat: [2421]: info: Local Resource acquisition completed. (none)
Jan 28 09:50:16 node1 heartbeat: [2421]: info: local resource transition completed.





node2 i see this

tailf /var/log/ha-log
Jan 28 09:27:22 node2 heartbeat: [1646]: info: Configuration validated. Starting heartbeat 3.0.4
Jan 28 09:27:22 node2 heartbeat: [1647]: info: heartbeat: version 3.0.4
Jan 28 09:27:22 node2 heartbeat: [1647]: info: Heartbeat generation: 1422435301
Jan 28 09:27:22 node2 heartbeat: [1647]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
Jan 28 09:27:22 node2 heartbeat: [1647]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
Jan 28 09:27:22 node2 heartbeat: [1647]: info: G_main_add_TriggerHandler: Added signal manual handler
Jan 28 09:27:22 node2 heartbeat: [1647]: info: G_main_add_TriggerHandler: Added signal manual handler
Jan 28 09:27:22 node2 heartbeat: [1647]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Jan 28 09:27:22 node2 heartbeat: [1647]: info: Local status now set to: 'up'
Jan 28 09:27:22 node2 heartbeat: [1647]: info: Link node2:eth0 up.
Jan 28 09:29:23 node2 heartbeat: [1647]: WARN: node node1: is dead
Jan 28 09:29:23 node2 heartbeat: [1647]: info: Comm_now_up(): updating status to active
Jan 28 09:29:23 node2 heartbeat: [1647]: info: Local status now set to: 'active'
Jan 28 09:29:23 node2 heartbeat: [1647]: WARN: No STONITH device configured.
Jan 28 09:29:23 node2 heartbeat: [1647]: WARN: Shared disks are not protected.
Jan 28 09:29:23 node2 heartbeat: [1647]: info: Resources being acquired from node1.
Jan 28 09:29:23 node2 heartbeat: [1656]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys node2] to acquire.
harc(default)[1655]:    2015/01/28_09:29:23 info: Running /etc/ha.d//rc.d/status status
mach_down(default)[1685]:       2015/01/28_09:29:23 info: Taking over resource group 172.31.29.243
ResourceManager(default)[1712]: 2015/01/28_09:29:23 info: Acquiring resource group: node1 172.31.29.243 httpd
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[1740]:  2015/01/28_09:29:23 INFO:  Resource is stopped
ResourceManager(default)[1712]: 2015/01/28_09:29:23 info: Running /etc/ha.d/resource.d/IPaddr 172.31.29.243 start
IPaddr(IPaddr_172.31.29.243)[1838]:     2015/01/28_09:29:23 INFO: Adding inet address 172.31.29.243/20 with broadcast address 172.31.31.255 to device eth0
IPaddr(IPaddr_172.31.29.243)[1838]:     2015/01/28_09:29:23 INFO: Bringing device eth0 up
IPaddr(IPaddr_172.31.29.243)[1838]:     2015/01/28_09:29:23 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-172.31.29.243 eth0 172.31.29.243 auto not_used not_used
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[1824]:  2015/01/28_09:29:23 INFO:  Success
mach_down(default)[1685]:       2015/01/28_09:29:23 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down(default)[1685]:       2015/01/28_09:29:23 info: mach_down takeover complete for node node1.
Jan 28 09:29:23 node2 heartbeat: [1647]: info: mach_down takeover complete.
Jan 28 09:29:23 node2 heartbeat: [1647]: info: Initial resource acquisition complete (mach_down)
Jan 28 09:29:33 node2 heartbeat: [1647]: info: Local Resource acquisition completed. (none)
Jan 28 09:29:33 node2 heartbeat: [1647]: info: local resource transition completed.
^Z
[1]+  Stopped                 tailf /var/log/ha-log
[root@ip-172-31-29-242 ~]# tailf /var/log/ha-log
IPaddr(IPaddr_172.31.29.243)[1838]:     2015/01/28_09:29:23 INFO: Adding inet address 172.31.29.243/20 with broadcast address 172.31.31.255 to device eth0
IPaddr(IPaddr_172.31.29.243)[1838]:     2015/01/28_09:29:23 INFO: Bringing device eth0 up
IPaddr(IPaddr_172.31.29.243)[1838]:     2015/01/28_09:29:23 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-172.31.29.243 eth0 172.31.29.243 auto not_used not_used
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[1824]:  2015/01/28_09:29:23 INFO:  Success
mach_down(default)[1685]:       2015/01/28_09:29:23 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down(default)[1685]:       2015/01/28_09:29:23 info: mach_down takeover complete for node node1.
Jan 28 09:29:23 node2 heartbeat: [1647]: info: mach_down takeover complete.
Jan 28 09:29:23 node2 heartbeat: [1647]: info: Initial resource acquisition complete (mach_down)
Jan 28 09:29:33 node2 heartbeat: [1647]: info: Local Resource acquisition completed. (none)
Jan 28 09:29:33 node2 heartbeat: [1647]: info: local resource transition completed.

Comment 61 Tony 2015-01-28 05:21:30 EST

             Hi all, 
            
 I have an issue where I have configured heartbeat to run on a 2 node httpd cluster, heartbeat seems to be running when i check logs and I see that node1 comes up on web page, but when i shutdown heartbeat so that node2 would failover, it does not work. This is the log i see on node1...  
            
 tailf /var/log/ha-log 
            
 Jan 28 09:48:04 node1 heartbeat: [2420]: info: Configuration validated. Starting heartbeat 3.0.4 
            
 Jan 28 09:48:04 node1 heartbeat: [2421]: info: heartbeat: version 3.0.4 
            
 Jan 28 09:48:04 node1 heartbeat: [2421]: info: Heartbeat generation: 1422435302 
            
 Jan 28 09:48:04 node1 heartbeat: [2421]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0 
            
 Jan 28 09:48:04 node1 heartbeat: [2421]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1 
            
 Jan 28 09:48:04 node1 heartbeat: [2421]: info: G_main_add_TriggerHandler: Added signal manual handler 
            
 Jan 28 09:48:04 node1 heartbeat: [2421]: info: G_main_add_TriggerHandler: Added signal manual handler 
            
 Jan 28 09:48:04 node1 heartbeat: [2421]: info: G_main_add_SignalHandler: Added signal handler for signal 17 
            
 Jan 28 09:48:04 node1 heartbeat: [2421]: info: Local status now set to: 'up' 
            
 Jan 28 09:48:04 node1 heartbeat: [2421]: info: Link node1:eth0 up. 
            
 Jan 28 09:50:05 node1 heartbeat: [2421]: WARN: node node2: is dead 
            
 Jan 28 09:50:05 node1 heartbeat: [2421]: info: Comm_now_up(): updating status to active 
            
 Jan 28 09:50:05 node1 heartbeat: [2421]: info: Local status now set to: 'active' 
            
 Jan 28 09:50:05 node1 heartbeat: [2421]: WARN: No STONITH device configured. 
            
 Jan 28 09:50:05 node1 heartbeat: [2421]: WARN: Shared disks are not protected. 
            
 Jan 28 09:50:05 node1 heartbeat: [2421]: info: Resources being acquired from node2. 
            
 harc(default)[2433]:    2015/01/28_09:50:05 info: Running /etc/ha.d//rc.d/status status 
            
 mach_down(default)[2469]:       2015/01/28_09:50:05 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired 
            
 mach_down(default)[2469]:       2015/01/28_09:50:05 info: mach_down takeover complete for node node2. 
            
 Jan 28 09:50:05 node1 heartbeat: [2421]: info: mach_down takeover complete. 
            
 Jan 28 09:50:05 node1 heartbeat: [2421]: info: Initial resource acquisition complete (mach_down) 
            
 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[2501]:  2015/01/28_09:50:05 INFO:  Resource is stopped 
            
 Jan 28 09:50:05 node1 heartbeat: [2434]: info: Local Resource acquisition completed. 
            
 harc(default)[2588]:    2015/01/28_09:50:06 info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp 
            
 ip-request-resp(default)[2588]: 2015/01/28_09:50:06 received ip-request-resp 172.31.29.243 OK yes 
            
 ResourceManager(default)[2611]: 2015/01/28_09:50:06 info: Acquiring resource group: node1 172.31.29.243 httpd 
            
 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[2639]:  2015/01/28_09:50:06 INFO:  Resource is stopped 
            
 ResourceManager(default)[2611]: 2015/01/28_09:50:06 info: Running /etc/ha.d/resource.d/IPaddr 172.31.29.243 start 
            
 IPaddr(IPaddr_172.31.29.243)[2737]:     2015/01/28_09:50:06 INFO: Adding inet address 172.31.29.243/20 with broadcast address 172.31.31.255 to device eth0 
            
 IPaddr(IPaddr_172.31.29.243)[2737]:     2015/01/28_09:50:06 INFO: Bringing device eth0 up 
            
 IPaddr(IPaddr_172.31.29.243)[2737]:     2015/01/28_09:50:06 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-172.31.29.243 eth0 172.31.29.243 auto not_used not_used 
            
 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[2723]:  2015/01/28_09:50:06 INFO:  Success 
            
 Jan 28 09:50:16 node1 heartbeat: [2421]: info: Local Resource acquisition completed. (none) 
            
 Jan 28 09:50:16 node1 heartbeat: [2421]: info: local resource transition completed. 
            
 node2 i see this 
            
 tailf /var/log/ha-log 
            
 Jan 28 09:27:22 node2 heartbeat: [1646]: info: Configuration validated. Starting heartbeat 3.0.4 
            
 Jan 28 09:27:22 node2 heartbeat: [1647]: info: heartbeat: version 3.0.4 
            
 Jan 28 09:27:22 node2 heartbeat: [1647]: info: Heartbeat generation: 1422435301 
            
 Jan 28 09:27:22 node2 heartbeat: [1647]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0 
            
 Jan 28 09:27:22 node2 heartbeat: [1647]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1 
            
 Jan 28 09:27:22 node2 heartbeat: [1647]: info: G_main_add_TriggerHandler: Added signal manual handler 
            
 Jan 28 09:27:22 node2 heartbeat: [1647]: info: G_main_add_TriggerHandler: Added signal manual handler 
            
 Jan 28 09:27:22 node2 heartbeat: [1647]: info: G_main_add_SignalHandler: Added signal handler for signal 17 
            
 Jan 28 09:27:22 node2 heartbeat: [1647]: info: Local status now set to: 'up' 
            
 Jan 28 09:27:22 node2 heartbeat: [1647]: info: Link node2:eth0 up. 
            
 Jan 28 09:29:23 node2 heartbeat: [1647]: WARN: node node1: is dead 
            
 Jan 28 09:29:23 node2 heartbeat: [1647]: info: Comm_now_up(): updating status to active 
            
 Jan 28 09:29:23 node2 heartbeat: [1647]: info: Local status now set to: 'active' 
            
 Jan 28 09:29:23 node2 heartbeat: [1647]: WARN: No STONITH device configured. 
            
 Jan 28 09:29:23 node2 heartbeat: [1647]: WARN: Shared disks are not protected. 
            
 Jan 28 09:29:23 node2 heartbeat: [1647]: info: Resources being acquired from node1. 
            
 Jan 28 09:29:23 node2 heartbeat: [1656]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys node2] to acquire. 
            
 harc(default)[1655]:    2015/01/28_09:29:23 info: Running /etc/ha.d//rc.d/status status 
            
 mach_down(default)[1685]:       2015/01/28_09:29:23 info: Taking over resource group 172.31.29.243 
            
 ResourceManager(default)[1712]: 2015/01/28_09:29:23 info: Acquiring resource group: node1 172.31.29.243 httpd 
            
 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[1740]:  2015/01/28_09:29:23 INFO:  Resource is stopped 
            
 ResourceManager(default)[1712]: 2015/01/28_09:29:23 info: Running /etc/ha.d/resource.d/IPaddr 172.31.29.243 start 
            
 IPaddr(IPaddr_172.31.29.243)[1838]:     2015/01/28_09:29:23 INFO: Adding inet address 172.31.29.243/20 with broadcast address 172.31.31.255 to device eth0 
            
 IPaddr(IPaddr_172.31.29.243)[1838]:     2015/01/28_09:29:23 INFO: Bringing device eth0 up 
            
 IPaddr(IPaddr_172.31.29.243)[1838]:     2015/01/28_09:29:23 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-172.31.29.243 eth0 172.31.29.243 auto not_used not_used 
            
 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[1824]:  2015/01/28_09:29:23 INFO:  Success 
            
 mach_down(default)[1685]:       2015/01/28_09:29:23 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired 
            
 mach_down(default)[1685]:       2015/01/28_09:29:23 info: mach_down takeover complete for node node1. 
            
 Jan 28 09:29:23 node2 heartbeat: [1647]: info: mach_down takeover complete. 
            
 Jan 28 09:29:23 node2 heartbeat: [1647]: info: Initial resource acquisition complete (mach_down) 
            
 Jan 28 09:29:33 node2 heartbeat: [1647]: info: Local Resource acquisition completed. (none) 
            
 Jan 28 09:29:33 node2 heartbeat: [1647]: info: local resource transition completed. 
            
 ^Z 
            
 [1]+  Stopped                 tailf /var/log/ha-log 
            
 [root@ip-172-31-29-242 ~]# tailf /var/log/ha-log 
            
 IPaddr(IPaddr_172.31.29.243)[1838]:     2015/01/28_09:29:23 INFO: Adding inet address 172.31.29.243/20 with broadcast address 172.31.31.255 to device eth0 
            
 IPaddr(IPaddr_172.31.29.243)[1838]:     2015/01/28_09:29:23 INFO: Bringing device eth0 up 
            
 IPaddr(IPaddr_172.31.29.243)[1838]:     2015/01/28_09:29:23 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-172.31.29.243 eth0 172.31.29.243 auto not_used not_used 
            
 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[1824]:  2015/01/28_09:29:23 INFO:  Success 
            
              mach_down(default)[1685]:       2015/01/28_09:29:23 info: /usr/share/heartbeat 
            
              /mach_down: nice_failback: foreign resources acquired 
            
 mach_down(default)[1685]:       2015/01/28_09:29:23 info: mach_down takeover complete for node node1. 
            
 Jan 28 09:29:23 node2 heartbeat: [1647]: info: mach_down takeover complete. 
            
 Jan 28 09:29:23 node2 heartbeat: [1647]: info: Initial resource acquisition complete (mach_down) 
            
 Jan 28 09:29:33 node2 heartbeat: [1647]: info: Local Resource acquisition completed. (none) 
            
 Jan 28 09:29:33 node2 heartbeat: [1647]: info: local resource transition completed.