Getting Started with Linux-HA (Heartbeat)

Getting Started

The first thing you'll need is two computers. You need not have identical hardware in both machines (or amount of memory, etc.), but if you did, it would make your life that much easier when a component fails.

Now you have to decide on some of your implementation. Your "cluster" is established via a "heartbeat" between the two computers (nodes) generated by the software package of the same name. However, this heartbeat needs one or more media paths (serial via a null modem cable, Ethernet via a crossover cable, etc.) between the nodes.

At this point, you're actually ready to begin hardware-wise. Of course, since you're looking into HA, you'll mostly likely want to avoid having only one point of failure. In this case, that would be your null modem cable/serial port or network interface card (NIC)/crossover cable. So, you need to decide whether you wish to add a second serial/null modem connection or a second network interface card (NIC)/crossover connnection to each node. See Appendix A for instructions on how to build a Cat-5 crossover cable. My heartbeat path setup uses one serial port and one extra NIC because I only had one null modem cable, had an extra of NIC on hand and thought it was good to have two medium types for the heartbeats.

Once your hardware is in order, you must install your OS and configure your networking (I used Red Hat). Assuming you have 2 NICs, one should be configured for your "normal" network and the other as a private network between your clustered nodes (via the crossover cable). For an example, we will assume that our cluster will have the following addresses:

Node 1 (linuxha1)192.168.85.1 (normal 192x net)
10.0.0.1 (private 10x net for heartbeat)
Node 2 (linuxha2)192.168.85.2 (192x)
10.0.0.2 (10x)
Note: None of these addresses should be your "cluster address" - the address handled by Heartbeat and failed over between nodes!

Most *nix distributions this easy during installation, however, if you are having any problems, refer to either the Ethernet HOWTO, or the documentation for your distribution. To check your configuration, type:

ifconfig 

This will show your network interfaces and their configuration. You can obtain your network routing information from "netstat -nr".

If it looks good, make sure you can ping between both nodes on all interfaces.

Next, if you're using one, you'll need to test your serial connection. On one node, which will be the receiver, type:

cat </dev/ttyS0 

On the other node, type:

echo hello >/dev/ttyS0

You should see the text on the receiver node. If it works, change their roles and try again. If it doesn't, it may be as simple as having the wrong device file. Volker's HA Hardware Guide and the Serial HOWTO are two good resources for troubleshooting your serial connection.

Installing Heartbeat

You can now install the Heartbeat package. If you're reading this, you probably already have it, but in any case it's available at:

There are binary RPMs at the website, or you can build Heartbeat from source. Grab the tarball (or install the source RPM). Untar it into your favorite source directory. From the top of the source tree, type "./ConfigureMe configure", followed by "make" and "make install". If you have problems installing the RPMs found at the website and want a way to make your own, there may be help in the FAQ.

Configuring Heartbeat

Configuring ha.cf

There are three files you will need to configure before starting up Heartbeat. The first is ha.cf. This will be placed in the /etc/ha.d directory that is created after installation. It tells Heartbeat what types of media paths to use and how to configure them. The ha.cf in the source directory contains all the various options you can use, I'll go through it line by line...

serial /dev/ttyS0 
Use a serial heartbeat - if you don't use a serial heartbeat, you must use another medium, such as a bcast (Ethernet) heartbeat. Replace /dev/ttyS0 with the appropriate device file for your required serial heartbeat.
watchdog /dev/watchdog 
Optional. The watchdog function provides a way to have a system that is still minimally functioning, but not providing a heartbeat, reboot itself after a minute of being sick. This could help to avoid a scenario where the machine recovers its heartbeat after being pronounced dead. If that happened and a disk mount failed over, you could have two nodes mounting a disk simultaneously. If you wish to use this feature, then in addition to this line, you will need to load the " softdog" kernel module and create the actual device file. To do this, first type " insmod softdog" to load the module. Then, type " grep misc /proc/devices" and note the number it reports (should be 10). Next, type " cat /proc/misc | grep watchdog" and note that number (should be 130). Now you can create the device file with that info typing, " mknod /dev/watchdog c 10 130".
bcast eth1 
Specifies to use a broadcast heartbeat over the eth1 interface (replace with eth0, eth2, or whatever you use).
keepalive 2 
Sets the time between heartbeats to 2 seconds.
warntime 10 
Time in seconds before issuing a "late heartbeat" warning in the logs.
deadtime 30 
Node is pronounced dead after 30 seconds.
initdead 120 
With some configurations, the network takes some time to start working after a reboot. This is a separate "deadtime" to handle that case. It should be at least twice the normal deadtime.
baud 19200 
Speed at which to run the serial line (bps).
udpport 694 
Use port number 694 for bcast or ucast communication. This is the default, and the official IANA registered port number.
auto_failback on 
Required. For those familiar with Tru64 Unix, Heartbeat acts as if in "favored member" mode. The master listed in the haresources file holds all the resources until a failover, at which time the slave takes over. When auto_failback is set to on once the master comes back online, it will take everything back from the slave. When set to off this option will prevent the master node from re-acquiring cluster resources after a failover. This option is similar to to the obsolete nice_failback option. If you want to upgrade from a cluster which had nice_failback set off, to this or later versions, special considerations apply in order to want to avoid requiring a flash cut. Please see the FAQ for details on how to deal with this situation.
node linuxha1.linux-ha.org 
Mandatory. Hostname of machine in cluster as described by uname -n.
node linuxha2.linux-ha.org 
Mandatory. Hostname of machine in cluster as described by uname -n.
respawn userid cmd 
Optional: Lists a command to be spawned and monitored. E.g.: To spawn ccm daemons the following line has to be added:
respawn hacluster /usr/lib/heartbeat/ccm 
Informs Heartbeat to spawn the command with the credentials of that of userid ( hacluster, in this example) and monitors the health of the process, respawning it if dead. For ipfail, the line would be:
respawn hacluster /usr/lib/heartbeat/ipfail 
NOTE: If the process dies with exit code 100, the process is not respawned.
ping ping1.linux-ha.org ping2.linux-ha.org .... 
Optional: Specify ping nodes. These nodes are not considered as cluster nodes. They are used to check network connectivity for modules like ipfail.
ping_group name ping1.linux-ha.org ping2.linux-ha.org .... 
Optional: Specify a group of ping nodes. These are similar to ping nodes, but if any node in a group is available then the group is considered available. The group name can be any string and is used to uniquely identify the group. Each group must appear on a separate line. Like ping nodes the group is not considered to be a cluster node. They appear to be the same as ping nodes and are used to check network connectivity for modules like ipfail.

Configuring haresources

Once you've got your ha.cf set up, you need to configure haresources. This file specifies the services for the cluster and who the default owner is.

Note: This file must be the same on both nodes or BadThingsWillHappen.

For our example, we'll assume the high availability services are Apache and Samba. The IP for the cluster is mandatory, and don't configure the cluster IP outside of the haresources file!. The haresources will need one line:

linuxha1.linux-ha.org 192.168.85.3 httpd smb 

So, this line dictates that on startup, have linuxha1 serve the IP 192.168.85.3 and start Apache and Samba as well. On shutdown, Heartbeat will first stop smb, then Apache, then give up the IP. This assumes that the command "uname -n" spits out "linuxha1.linux-ha.org" - yours may well produce "linuxha1" and if it does, use that instead!

Note: httpd and smb are the name of startup scripts for Apache and Samba, respectively. Heartbeat will look for startup scripts of the same name in the following paths:

/etc/ha.d/resource.d 
/etc/init.d

These scripts must start services via "scriptname start" and stop them via "scriptname stop". So you can use any services as long as they conform to the above standard.

Should you need to pass arguments to a custom script, the format would be:

scriptname::argument 

So, if we added a service "maid" which needed the argument "vacuum", our haresources line would modify to the following:

linuxha1 192.168.85.3 httpd smb maid::vacuum 

This brings us to some added flexibility with the service IP address. We are actually using a shorthand notation above. The actual line could have read (we've canned the maid):

linuxha1 IPaddr::192.168.85.3 httpd smb 

Where IPaddr is the name of our service script, taking the argument 192.168.85.3. Sure enough, if you look in the directory /etc/ha.d/resource.d, you will find a script called IPaddr. This script will also allow you to manipulate the netmask, broadcast address and base interface of this IP service. To specify a subnet with 32 addresses, you could define the service as (leaving off the IPaddr because we can!):

linuxha1 192.168.85.3/27 httpd smb 

This sets the IP service address to 192.168.85.3, the netmask to 255.255.255.224 and the broadcast address would default to 192.168.85.31 (which is the highest address on the subnet). The last parameter you can set is the broadcast address. To override the default and set it to 192.168.85.16, your entry would read:

linuxha1 192.168.85.3/27/192.168.85.16 httpd smb 

You may be wondering whether any of the above is necessary for you. It depends. If you've properly established a net route (independent of Heartbeat) for the service's IP address, with the correct netmask and broadcast address, then no, it's not necessary for you. However, this case won't fit everybody and that's why the option's there! In addition, you may have more than one possible interface that could be used for the service IP. Read on to see how Heartbeat treats this...

Once you straighten out your haresources file, copy ha.cf and haresources to /etc/ha.d and you're ready to start!

Configuring ipfail

The ipfail plugin attempts to provide detection of network failures, and then intelligently react, directing the cluster to failover resources as necessary. In order to accomplish this goal, it uses ping nodes or ping groups which work as "dumb" third parties in the cluster. Provided both HA nodes can communicate with each other, ipfail can reliably detect when one of their network links has become unusable, and compensate.

To configure ipfail, the following steps must be performed.

  1. Select good ping node candidates.

    • It is essential that good strategic ping nodes be selected. The better your choices, the stronger your HA cluster becomes. Choosing solid network devices like switches and routers is a good idea. Do not choose either of the members of the HA cluster. Nor should you select someone's workstation. It is also important to select ping nodes that reflect the connectivity of your HA nodes. If you wish to monitor the connectivity of two interfaces, it is wise to select a ping node for each interface, that is reachable exclusively from said interface. Consult ipfail-diagram.pdf for a graphical representation of this idea.
  2. Set auto_failback to on or off.

    • ipfail will only operate if Heartbeat has been configured to something other than legacy. In ha.cf, set the auto_failback option to on or off like so:
      • auto_failback on 
        
      or
      • auto_failback off 
        
  3. Configure your ha.cf to start ipfail.

    • Add a line like the following to ha.cf (assuming your compile PREFIX is /usr):
      • respawn hacluster /usr/lib/heartbeat/ipfail 
        
  4. Add the ping nodes to ha.cf.

    • The ping nodes can be added to the cluster by using a line like the following:
      • ping pnode1 pnode2 pnodeN 
        
      Simply replace pnode1, pnode2, ... pnodeN with the IP addresses of your ping nodes.

Ensure that the above configuration directives are added to the ha.cf on both members of the cluster, and that they are identical.

  • NOTE: You will want to check on the availability of the ping nodes prior to using them. If you cannot ping them from both of the HA nodes, they are useless.

Selecting an Interface

One important aspect of configuring the haresources file for a machine which has multiple Ethernet interfaces is to know how Heartbeat selects which interface will wind up supporting the service addresses that are configured in haresources. After all, no interface was specified in the haresources file.

Heartbeat decides which interface will be used by looking at the routing table. It tries to select the lowest cost route to the IP address to be taken over. In the case of a tie, it chooses the first route found. For most configurations this means the default route will be least preferred.

If you don't specify a netmask for the IP address in the haresources file, the netmask associated with the selected route will be used. Similarly, if an interface is not specified, then the virtual IP address will be added to the interface associated with the selected route. If the broadcast address is omitted then the highest address in the subnet is used.

Configuring Authkeys

The third file to configure determines your authentication keys. There are three types of authentication methods available: crc, md5, and sha1. "Well, which should I use?", you ask. Since this document is called "Getting Started", we'll keep it simple......

If your Heartbeat runs over a secure network, such as the crossover cable in our example, you'll want to use crc. This is the cheapest method from a resources perspective. If the network is insecure, but you're either not very paranoid or concerned about minimizing CPU resources, use md5. Finally, if you want the best authentication without regard for CPU resources, use sha1. It's the hardest to crack.

The format of the file is as follows:

auth <number> 
<number> <authmethod> [<authkey>]

So, for sha1, a sample /etc/ha.d/authkeys could be:

auth 1 
1 sha1 key-for-sha1-any-text-you-want

For md5, you could use the same as the above, but replace "sha1" with "md5".

Finally, for crc, a sample might be:

auth 2 
2 crc

Whatever index you put after the keyword auth must be found below in the keys listed in the file. If you put "auth 4", then there must be a "4 signaturetype" line in the list below.

Make sure its permissions are safe, like 600. And "any text you want" is not quite right. There's a limit to the number of characters you can use. That's it!

Starting and testing Heartbeat

From Red Hat, or other distributions which use /etc/init.d startup files, simply type /etc/init.d/heartbeat start on both nodes. I would recommend starting on the system master (in our example linuxha1) first.

If you want Heartbeat to run on startup, what to do will differ on your distribution. You may need to place links to the startup script in the appropriate init level directories, but the RPM versions will do this for you. I have Heartbeat start at its default sequential priority (75, which means it starts after services 74 and lower and before services with priority 76-99), end at its default sequential priority (05), and only care about the 0(halt), 6(reboot), 3(text-only), 5(X) run levels.

So, if I had to do it by hand, I'd need to type in the following (as root, of course):

cd /etc/rc.d/rc0.d ; ln -s ../init.d/heartbeat K05heartbeat''' 
cd /etc/rc.d/rc3.d ; ln -s ../init.d/heartbeat S75heartbeat'''
cd /etc/rc.d/rc5.d ; ln -s ../init.d/heartbeat S75heartbeat'''
cd /etc/rc.d/rc6.d ; ln -s ../init.d/heartbeat K05heartbeat'''

The last time I ran Slackware, there was no /etc/rc.d/init.d directory (may have changed by now) and to do the same thing, I would have placed in /etc/rc.d/rc.local:

/etc/ha.d/heartbeat start 

***This assumes you copy the file ha.rc to /etc/ha.d/heartbeat. If you can't find /etc/rc.d/init.d with your distribution and you're unsure of how processes start, you can use the rc.local method. But you're on your own for shutdown, I just don't remember...

Note: If you use the watchdog function, you'll need to load its module at bootup as well. You can put the following command at the bottom of the /etc/rc.d/rc.sysinit file:

/sbin/insmod softdog 

For the rc.local method, just put the same line right above where you start Heartbeat.

Once you've started Heartbeat, take a peek at your log file (default is /var/log/ha-log) before testing it. If all is peachy, the service owner's log (linuxha1 in our example) should look something like this:

heartbeat: 2003/02/10_13:52:22 info: Neither logfile nor logfacility found. 
heartbeat: 2003/02/10_13:52:22 info: Logging defaulting to /var/log/ha-log
heartbeat: 2003/02/10_13:52:22 info: **************************
heartbeat: 2003/02/10_13:52:22 info: Configuration validated. Starting heartbeat 0.4.9f
heartbeat: 2003/02/10_13:52:22 info: nice_failback is in effect.
heartbeat: 2003/02/10_13:52:22 info: heartbeat: version 0.4.9f
heartbeat: 2003/02/10_13:52:22 info: Heartbeat generation: 17
heartbeat: 2003/02/10_13:52:22 info: Starting serial heartbeat on tty /dev/ttyS0 (19200 baud)
heartbeat: 2003/02/10_13:52:22 info: UDP Broadcast heartbeat started on port 694 (694) interface eth1
heartbeat: 2003/02/10_13:52:23 info: pid 28140 locked in memory.
heartbeat: 2003/02/10_13:52:23 info: pid 28137 locked in memory.
heartbeat: 2003/02/10_13:52:23 info: pid 28139 locked in memory.
heartbeat: 2003/02/10_13:52:23 notice: Using watchdog device: /dev/watchdog
heartbeat: 2003/02/10_13:52:23 info: pid 28141 locked in memory.
heartbeat: 2003/02/10_13:52:23 info: Local status now set to: 'up'
heartbeat: 2003/02/10_13:52:23 info: pid 28138 locked in memory.
heartbeat: 2003/02/10_13:52:23 info: pid 28134 locked in memory.
heartbeat: 2003/02/10_13:52:25 info: Link linuxha1.linux-ha.org:eth1 up.
heartbeat: 2003/02/10_13:53:23 WARN: node linuxha2.linux-ha.org: is dead
heartbeat: 2003/02/10_13:53:23 info: Dead node linuxha2.linux-ha.org held no resources.
heartbeat: 2003/02/10_13:53:23 info: Resources being acquired from linuxha2.linux-ha.org.
heartbeat: 2003/02/10_13:53:23 info: Local status now set to: 'active'
heartbeat: 2003/02/10_13:53:23 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2003/02/10_13:53:23 info: /usr/lib/heartbeat/mach_down: nice_failback: acquiring foreign resources
heartbeat: 2003/02/10_13:53:23 info: mach_down takeover complete.
heartbeat: 2003/02/10_13:53:23 info: mach_down takeover complete for node linuxha2.linux-ha.org.
heartbeat: 2003/02/10_13:53:23 info: Acquiring resource group: linuxha1.linux-ha.org 192.168.85.3 datadisk::drbd0 datadisk::drbd1 mirror
heartbeat: 2003/02/10_13:53:23 info: Running /etc/ha.d/resource.d/IPaddr 192.168.85.3 start
heartbeat: 2003/02/10_13:53:23 info: /sbin/ifconfig eth0:0 192.168.85.3 netmask 255.255.255.0  broadcast 192.168.85.255
heartbeat: 2003/02/10_13:53:23 info: Sending Gratuitous Arp for 192.168.85.3 on eth0:0 [eth0]
heartbeat: 2003/02/10_13:53:23 /usr/lib/heartbeat/send_arp eth0 192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:24 info: Running /etc/ha.d/resource.d/datadisk drbd0 start
heartbeat: 2003/02/10_13:53:24 info: Running /etc/ha.d/resource.d/datadisk drbd1 start
heartbeat: 2003/02/10_13:53:25 info: Running /etc/ha.d/resource.d/mirror  start
heartbeat: 2003/02/10_13:53:25 /usr/lib/heartbeat/send_arp eth0 192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:26 info: Resource acquisition completed.
heartbeat: 2003/02/10_13:53:28 /usr/lib/heartbeat/send_arp eth0 192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:30 /usr/lib/heartbeat/send_arp eth0 192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:32 /usr/lib/heartbeat/send_arp eth0 192.168.85.3 00304823BD48 192.168.85.3 ffffffffffff
heartbeat: 2003/02/10_13:53:33 info: Local Resource acquisition completed. (none)
heartbeat: 2003/02/10_13:53:33 info: local resource transition completed.
heartbeat: 2003/02/10_13:56:30 info: Link linuxha2.linux-ha.org:eth1 up.
heartbeat: 2003/02/10_13:56:30 info: Status update for node linuxha2.linux-ha.org: status up
heartbeat: 2003/02/10_13:56:30 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2003/02/10_13:56:30 info: Status update for node linuxha2.linux-ha.org: status active
heartbeat: 2003/02/10_13:56:30 info: remote resource transition completed.
heartbeat: 2003/02/10_13:56:30 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2003/02/10_13:56:31 info: Link linuxha2.linux-ha.org:/dev/ttyS0 up.

NOTE: Your log may differ depending on when you started Heartbeat on linuxha2!!! I started Heartbeat on the linuxha2 @13:56:30...

OK, now try to ping your cluster's IP (192.168.85.3 in the example). If this works, ssh to it and verify you're on linuxha1. Next, make sure your services are tied to the .3 address. Bring up Netscape and type in 192.168.85.3 for the URL. For Samba, try to map the drive "//192.168.85.3/test" assuming you set up a share called "test". See Samba docs to get that going. As an aside, however, you'll want to use the "netbios name" parameter to have your Samba share listed under the cluster name and not the hostname of your cluster member!

NOTE: If you can't bring up the service IP address and you get ha-log entries similar to this:

SIOCSIFADDR: No such device
SIOCSIFFLAGS: No such device
SIOCSIFNETMASK: No such device
SIOCSIFBRDADDR: No such device
SIOCSIFFLAGS: No such device
SIOCADDRT: No such device

  • It may mean that you need to enable IP aliasing in your kernel build. Check /usr/src/linux/.config for " CONFIG_IP_ALIAS=y" if you don't have it, you'll have the line " CONFIG_IP_ALIAS is not set". Rebuild your kernel with IP aliasing enabled.

If this all works, you've got availability. Now let's see if we have High Availability :-)

Take down linuxha1. Kill power, kill Heartbeat, whatever you have the stomach for, but don't just yank both the serial and eth1 heartbeat cables. If you do that, you'll have services running on both nodes and when you re-connect the heartbeat, a bit of chaos.... Now ping the cluster IP. Approximately 5-10 seconds later it should start responding again. Telnet again and verify you're on linuxha2. If it happens but takes more like 30 seconds, something is wrong.

If you get this far, it's probably working, but you should probably check all your heartbeats, too. First, check your serial heartbeat. Unplug the crossover cable from your eth1 NIC that you're using for your bcast heartbeat. Wait about 10 seconds. Now, look at /var/log/ha-log on linuxha2 and make sure there's no line like this:

1999/08/16_12:40:58 node linuxha1.linux-ha.org: is dead 
If you get that, your serial heartbeat isn't working and your second node is taking over. To avoid any problems, shut down Heartbeat on the first node, then test your null modem cable. Run the above serial tests again.

If your log is clean, great. Re-connect the crossover cable. Once that's done, disconnect the serial cable, wait 10 seconds and check the linuxha2 log again. If it's clean, congrats! If not, you can check /var/log/ha-log and /var/log/ha-debug for more clues.

Appendix A - Ethernet Crossover Cable Construction

Your cable diagram should be as follows:

Connector AConnector B
Pin #Pin #
13
26
31
42
57
68
74
85
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Linux-HA heartbeat源码包是一种用于实现高可用性(High Availability)集群的开源软件。作为一个底层系统工具,它在集群中的节点之间提供了一种心跳机制,用于检测节点的状态和通信。 heartbeat源码包的主要功能是监控集群中的节点,并在发生故障时自动切换到备用节点,以保证系统的连续可用性。它通过不断发送心跳信号来检测节点的状态,一旦检测到节点宕机或出现问题,就会触发自动故障切换。此外,heartbeat还提供了灵活的配置选项,用户可以根据自己的需求进行配置,如定义故障检测算法、设置故障切换策略等。 heartbeat源码包采用C语言编写,具有良好的可移植性和跨平台性,可以在各种Linux发行版和其他Unix操作系统上运行。它基于分布式的架构,支持多种网络通信协议(如UDP、TCP等),能够在不同网络层面上进行心跳通信,确保可靠性和性能。 对于开发者而言,心态源码包提供了丰富的API接口和文档,方便二次开发和定制化。开发者可以通过编写自定义的资源代理来扩展heartbeat的功能,比如添加新的监控指标或自定义故障检测算法等。 总结而言,Linux-HA heartbeat源码包是一个功能强大的高可用性集群软件,提供了可靠的心跳机制和故障切换功能。它通过自动监测节点状态和通信来确保系统的连续可用性,并提供了灵活的配置和扩展选项,可以满足不同用户的需求。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值