Enabling HDFS HA Using Cloudera Manager
Minimum Required Role: Cluster Administrator (also provided by Full Administrator)
You can use Cloudera Manager to configure your CDH 4 or CDH 5 cluster for HDFS HA and automatic failover. In Cloudera Manager 5, HA is implemented using Quorum-based storage. Quorum-based storage relies upon a set of JournalNodes, each of which maintains a local edits directory that logs the modifications to the namespace metadata. Enabling HA enables automatic failover as part of the same command.
Important:- Enabling or disabling HA causes the previous monitoring history to become unavailable.
- Some parameters will be automatically set as follows once you have enabled JobTracker HA. If you want to change the value from the default for these parameters, use an advanced configuration snippet.
- mapred.jobtracker.restart.recover: true
- mapred.job.tracker.persist.jobstatus.active: true
- mapred.ha.automatic-failover.enabled: true
- mapred.ha.fencing.methods: shell(/bin/true)
Enabling High Availability and Automatic Failover
The Enable High Availability workflow leads you through adding a second (standby) NameNode and configuring JournalNodes. During the workflow, Cloudera Manager creates a federated namespace.
- Perform all the configuration and setup tasks described under Configuring Hardware for HDFS HA.
- Ensure that you have a ZooKeeper service.
- Go to the HDFS service.
- Select
- Specify a name for the nameservice or accept the default name nameservice1 and click Continue.
- In the NameNode Hosts field, click Select a host. The host selection dialog box displays.
- Check the checkbox next to the hosts where you want the standby NameNode to be set up and clickOK. The standby NameNode cannot be on the same host as the active NameNode, and the host that is chosen should have the same hardware configuration (RAM, disk space, number of cores, and so on) as the active NameNode.
- In the JournalNode Hosts field, click Select hosts. The host selection dialog box displays.
- Check the checkboxes next to an odd number of hosts (a minimum of three) to act as JournalNodes and click OK. JournalNodes should be hosted on hosts with similar hardware specification as the NameNodes. Cloudera recommends that you put a JournalNode each on the same hosts as the active and standby NameNodes, and the third JournalNode on similar hardware, such as the JobTracker.
- Click Continue.
- In the JournalNode Edits Directory property, enter a directory location for the JournalNode edits directory into the fields for each JournalNode host.
- You may enter only one directory for each JournalNode. The paths do not need to be the same on every JournalNode.
- The directories you specify should be empty, and must have the appropriate permissions.
- Extra Options: Decide whether Cloudera Manager should clear existing data in ZooKeeper, standby NameNode, and JournalNodes. If the directories are not empty (for example, you are re-enabling a previous HA configuration), Cloudera Manager will not automatically delete the contents—you can select to delete the contents by keeping the default checkbox selection. The recommended default is to clear the directories. If you choose not to do so, the data should be in sync across the edits directories of the JournalNodes and should have the same version data as the NameNodes.
- Click Continue.
. A screen showing the hosts that are eligible to run a standby NameNode and the JournalNodes displays.
- If you want to use other services in a cluster with HA configured, follow the procedures in Configuring Other CDH Components to Use HDFS HA.
- If you are running CDH 4.0 or 4.1, the standby NameNode may fail at the bootstrapStandby command with the error Unable to read transaction ids 1-7 from the configured shared edits storage. Use rsync or a similar tool to copy the contents of the dfs.name.dir directory from the active NameNode to the standby NameNode and start the standby NameNode.
- Stop the HDFS service.
- Configure the service RPC port:
- Go to the HDFS service.
- Click the Configuration tab.
- Select .
- Select .
- Locate the NameNode Service RPC Port property or search for it by typing its name in the Search box.
- Change the port value as needed.
If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.
- On a ZooKeeper server host, run zookeeper-client.
- Execute the following to remove the configured nameservice. This example assumes the name of the nameservice is nameservice1. You can identify the nameservice from the Federation and High Availability section on the HDFS Instances tab:
rmr /hadoop-ha/nameservice1
- Execute the following to remove the configured nameservice. This example assumes the name of the nameservice is nameservice1. You can identify the nameservice from the Federation and High Availability section on the HDFS Instances tab:
- Click the Instances tab.
- Select .
- Start the HDFS service.
Fencing Methods
In order to ensure that only one NameNode is active at a time, a fencing method is required for the shared edits directory. During a failover, the fencing method is responsible for ensuring that the previous active NameNode no longer has access to the shared edits directory, so that the new active NameNode can safely proceed writing to it.
By default, Cloudera Manager configures HDFS to use a shell fencing method (shell(./cloudera_manager_agent_fencer.py)) that takes advantage of the Cloudera Manager Agent. However, you can configure HDFS to use the sshfence method, or you can add your own shell fencing scripts, instead of or in addition to the one Cloudera Manager provides.
The fencing parameters are found in the
category under the configuration properties for your HDFS service.For details of the fencing methods supplied with CDH 5, and how fencing is configured, see Fencing Configuration.