ClusterXL and State Synchronization

The health of ClusterXL can be examined using a number of different commands:

cphaprob –a if

cphaprob state

cphaprob list

cpstat ha –f all | more

fw ctl pstat

Use the ‘cphaprob –a if’ command on the cluster members to check which interfaces have been configu

red for state synchronization and verify the sync mode is consistent on the cluster members:

Example output:

[Expert@Zulu]# cphaprob –a if

eth1c0 non sync(non secured)

eth2c0 non sync(non secured)

eth3c0 non sync(non secured)

eth4c0 sync(secured), multicast

Virtual cluster interfaces: 3

eth1c0 192.168.1.1

eth2c0 192.168.2.1

eth3c0 10.1.1.1

[Expert@Zulu]#

[Expert@Shaka]# cphaprob –a if

eth1c0 non sync(non secured)

eth2c0 non sync(non secured)

eth3c0 non sync(non secured)

eth4c0 sync(secured), broadcast

Virtual cluster interfaces: 3

eth1c0 192.168.1.1

eth2c0 192.168.2.1

eth3c0 10.1.1.1

[Expert@Shaka]#

In the above example, interface eth4c0 hasbeen configured on both cluster members for state sync but

the sync mode is inconsistent, one is using multicast and the other broadcast mode. Ensure the c

luster members use the same mode. (The default mode is multicast.)

The following document explains how to change between broadcast and multicast mode:

sk20576: How to set ClusterXL Control Protocol (CCP) in broadcast mode in ClusterXL

Use the ‘cphaprob state’ command to check if state sync is up and running. The local and remote

state synchronization IP addresses should be displayed and their state should be shown as ‘Active’ on

the HA Master and ‘Standby’ on the HA Backup. In a load-sharing

cluster the state should be shown as

‘Active’ on both the local and remote firewalls:

Example output - HA:

[Expert@Zulu]# cphaprob state

Cluster Mode: New High Availability (Active Up)

Number Unique Address Assigned Load State

1 (local)

1.1.1.1

100%

Active

2 1.1.1.2 0% Standby

[Expert@Zulu]#

In a HA cluster configuration (above), one member should be Active and the other Standby.

Example output – Load-Sharing:

[Expert@Dingaan]# cphaprob state

Cluster Mode: New High Availability (Active Up)

Number Unique Address Assigned Load State

1 (local)

1.1.1.3

50%

Active

2 1.1.1.4 50% Active

[Expert@Dingaan]#

In a load-sharing cluster configuration (above), both members should be shown as Active.

Example output – HA or Load-Sharing:

[Expert@Zulu]# cphaprob state

Cluster Mode: New High Availability (Active Up)

Number Unique Address Assigned Load State

1 (local) 1.1.1.1 100% Active

[Expert@Zulu]#

Remote cluster partner is missing!

If the remote partner is not shown it will be usually be due to one of the following:

  There is no network connectivity between the members of the cluster on the state sync

network

  The partner does not have state synchronization enabled

  One partner is using broadcast mode and the other is using multicast mode

  One of the monitored processes has an issue, such as no policy loaded

  The partner firewall is down.

Example output - HA or Load-Sharing:

[Expert@Zulu]# cphaprob state

Cluster Mode: New High Availability (Active Up)

Number Unique Address Assigned Load State

1 (local)

1.1.1.1

100%

Active

2 1.1.1.2 0% Ready

[Expert@Zulu]#

Partner is in the ‘Ready’ state. If one of the partners is in the ‘Ready’ state it indicates that

there is an issue with state synchronization.

The ‘Ready’ state is normally caused by another member of the cluster running

a higher version of code or HFA, for example, as would happen during

an upgrade. This state is also seen when CoreXL has been configured to use a

different number of cores on the individual cluster members. For further information see:

sk42096: Cluster member with CoreXL is in 'Ready' state

The ‘Ready’ state can also occur if a cluster member receives state synchronization traffic from a

different cluster that is using the same mac magic number and the other cluster is running a

higher version of code. For further information see:

sk36913: Connecting several clusters on the same network

Example output - HA or Load-Sharing:

[Expert@Zulu]# cphaprob state

Cluster Mode: New High Availability (Active Up)

Number Unique Address Assigned Load State

1 (local)

1.1.1.1

100%

Active

2 1.1.1.2 0% Down

[Expert@Zulu]#

A remote cluster member is in the ‘Down’ state indicates

that there is either a problem on the remote member or the state synchronization network between the c

luster members is broken.

To investigate why a member shows itself to be locally ‘Down’ use the ‘cpstat ha –f all | more’ comma

nd on the firewall that shows ‘Down’. This command displays the Problem Notification Table and

the state of health of the monitored processes:

Example output (truncated):

[Expert@Zulu]# cpstat ha –f all | more

Problem Notification table

-------------------------------------------------

|Name |Status |Priority|Verified|Descr|

-------------------------------------------------

|Synchronization|OK | 0| 3383| |

|Filter |OK | 0| 3383| |

|cphad |OK | 0| 0| |

|fwd |OK | 0| 0| |

-------------------------------------------------

All monitored processes have the ‘OK’ status.

Example output (truncated):

[Expert@Shaka]# cpstat ha –f all | more

Problem Notification table

-------------------------------------------------

|Name |Status |Priority|Verified|Descr|

-------------------------------------------------

|Synchronization|problem| 0| 3383| |

|Filter |problem| 0| 3383| |

|cphad |OK | 0| 0| |

|fwd |OK | 0| 0| |

-------------------------------------------------

State synchronization is in a problem state because the policy is unloaded on this cluster member.

Installing the policy will fix this issue.

Alternatively, the ‘cphaprob list’ command displays

the same information plus some additional details:

Example output:

[Expert@Zulu]# cphaprob list

Registered Devices:

Device Name: Synchronization

Registration number: 0

Timeout: none

Current state: OK

Time since last report: 12139.6 sec

Device Name: Filter

Registration number: 1

Timeout: none

Current state: OK

Time since last report: 12124.5 sec

Device Name: cphad

Registration number: 2

Timeout: 5 sec

Current state: OK

Time since last report: 0.6 sec

Device Name: fwd

Registration number: 3

Timeout: 5 sec

Current state: OK

Time since last report: 0.6 sec

All monitored processes are shown as ‘OK’.

Assuming that state synchronization on the cluster is healthy, use the following command to check if t

he state tables are synchronized:

fw tab –t connections –s

Simultaneously execute the command on both cluster members; compare the values

of #VALS. The values

on both firewalls should be similar if the state synchronization mechanism is working unless

a lot of delayed notification is in use.

Example output:

[Expert@Zulu]# fw tab –t connections -s

HOST NAME ID #VALS #PEAK #SLINKS

localhost connections 8158 3222 38026 9820

[Expert@Zulu]#

[Expert@Shaka]# fw tab –t connections -s

HOST NAME ID #VALS #PEAK #SLINKS

localhost connections 8158 3187 38026 9808

[Expert@Shaka]#

The #PEAK may be different depending on the uptime and when the last peak number of connection

s occurred.

The #VALS on a HA pair should always be similar.

Examine the output of the sync section of ‘fw ctl pstat’.

Example output:

Sync: Version: new

Status: Able to Send/Receive sync packets

Sync packets sent:

total : 13880231, retransmitted : 5, retrans reqs : 524, acks : 70

Sync packets received:

total : 692409645, were queued : 720, dropped by net : 517

retrans reqs : 5, received 43019 acks retrans reqs for illegal seq : 0

dropped updates as a result of sync overload: 0

Callback statistics: handled 42940 cb, average delay : 1, max delay : 4

If the dropped by net counter has incremented then some sync packets have been lost and the

problem needs to be investigated to find the cause.

For further information please refer to:

sk34476: Explanation of Sync section in the output of fw ctl pstat command