Troubleshooting VMware High Availability (HA)

Details
You are experiencing:
  • VMware High Availability (HA) failover errors:

    HA agent on <server> in cluster <cluster> in <datacenter> has an error


    Insufficient resources to satisfy HA failover level on cluster

  • HA agent configuration errors on ESX hosts:

    Failed to connect to host

    OR
Failed to install the VirtualCenter agent

OR

cmd addnode failed for primary node: Internal AAM Error - agent could not start


OR

cmd addnode failed for primary node:/opt/vmware/aam/bin/ft_startup failed
  • Configuration of hosts IP address is inconsistent on host <hostname> address resolved to <IP> and <IP>
  • Port errors:

    Ports not freed after stop_ftbb

This article provides you with steps to:
  • Troubleshoot an ESX Server that cannot be added to a VMware HA cluster
  • Troubleshoot HA configuration errors that are reported on the cluster
  • Address insufficient resources messages on your HA cluster
  • Correct the following error:

    gethostbyname error:2
Solution
This article guides you through the process of troubleshooting a VMware HA cluster. The article identifies common configuration problems as well as confirming the availability of required resources on your ESX Server.
 
Validate that each troubleshooting step below is true for your environment. Each step provides instructions or a link to a document, in order to eliminate possible causes and take corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step.
 
Note: If you perform a corrective action in any of the following steps, attempt to Reconfigure for VMware HA again.  
  1. Check the release notes for current releases to see if the problem has been resolved in a bug fix.  Release notes are kept with the downloads.

    For vSphere 4
    http://www.vmware.com/download/vsphere/

    For VI 3
    http://www.vmware.com/download/vi/

  2. Verify that there are enough Licenses to configure VMware HA. For more information, see Verifying that the feature is licensed (1003692).

  3. Verify that name resolution is correctly configured on the ESX Server. For more information, see Identifying issues with and setting up name resolution on ESX Server (1003735) .

  4. Verify that name resolution is correctly configured on the vCenter Server. For more information, see Configuring name resolution for VMware VirtualCenter (1003713) .

  5. Verify that the time is correct on all ESX Servers with the date command. For more information on setting up time synchronization with ESX Server, see Installing and Configuring NTP on VMware ESX Server (1339).

  6. Verify that network connectivity exists from the VirtualCenter Server to the ESX Server. For more information, see Testing network connectivity with the Ping command (1003486).

  7. Verify that network connectivity exists from the ESX Server to the isolation response address. For more information, see Testing network connectivity with the Ping command (1003486).

  8. Verify that all of the required network ports are open. For more information, see Testing port connectivity with the Telnet command (1003487).

  9. Determine if there is a cluster resource issue. For more information, see  Advanced Configuration options for VMware High Availability (1006421).

  10. Verify that the correct version of the VirtualCenter agent service is installed. For more information, see Verifying and reinstalling the correct version of VMware VirtualCenter Server agent (1003714) .

  11. Verify the VirtualCenter Server Service has been restarted. To restart the VirtualCenter Server Service, see Stopping, starting, or restarting the vCenter Server service (1003895).

  12. Verify that VMware HA is only attempting to configure on one Service Console. For more information, see VMware High Availability configuration issues when an iSCSI Service Console is on the same network (1003789).

  13. Verify that the VMware HA cluster is not corrupted. To do this you need to create another cluster as a test. For more information, see Recreating VMware High Availability Cluster (1003715) .
Notes:
Keywords
vmware high availability, vmware ha, ha