The way of enabling HDFS HA by using Cloudera-manager

最新推荐文章于 2022-11-01 11:04:20 发布

IT爱好者菜鸟努力中

最新推荐文章于 2022-11-01 11:04:20 发布

阅读量634

点赞数

分类专栏：分布式大数据

分布式大数据专栏收录该内容

3 篇文章 0 订阅

订阅专栏

Enabling HDFS HA Using Cloudera Manager

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

You can use Cloudera Manager to configure your CDH 4 or CDH 5 cluster for HDFS HA and automatic failover. In Cloudera Manager 5, HA is implemented using Quorum-based storage. Quorum-based storage relies upon a set of JournalNodes, each of which maintains a local edits directory that logs the modifications to the namespace metadata. Enabling HA enables automatic failover as part of the same command.

Important:

Enabling or disabling HA causes the previous monitoring history to become unavailable.
Some parameters will be automatically set as follows once you have enabled JobTracker HA. If you want to change the value from the default for these parameters, use an advanced configuration snippet.
- mapred.jobtracker.restart.recover: true
- mapred.job.tracker.persist.jobstatus.active: true
- mapred.ha.automatic-failover.enabled: true
- mapred.ha.fencing.methods: shell(/bin/true)

Enabling High Availability and Automatic Failover

The Enable High Availability workflow leads you through adding a second (standby) NameNode and configuring JournalNodes. During the workflow, Cloudera Manager creates a federated namespace.

Perform all the configuration and setup tasks described under Configuring Hardware for HDFS HA.
Ensure that you have a ZooKeeper service.
Go to the HDFS service.
Select Actions > Enable High Availability. A screen showing the hosts that are eligible to run a standby NameNode and the JournalNodes displays.
1. Specify a name for the nameservice or accept the default name nameservice1 and click Continue.
2. In the NameNode Hosts field, click Select a host. The host selection dialog box displays.
3. Check the checkbox next to the hosts where you want the standby NameNode to be set up and clickOK. The standby NameNode cannot be on the same host as the active NameNode, and the host that is chosen should have the same hardware configuration (RAM, disk space, number of cores, and so on) as the active NameNode.
4. In the JournalNode Hosts field, click Select hosts. The host selection dialog box displays.
5. Check the checkboxes next to an odd number of hosts (a minimum of three) to act as JournalNodes and click OK. JournalNodes should be hosted on hosts with similar hardware specification as the NameNodes. Cloudera recommends that you put a JournalNode each on the same hosts as the active and standby NameNodes, and the third JournalNode on similar hardware, such as the JobTracker.
6. Click Continue.
7. In the JournalNode Edits Directory property, enter a directory location for the JournalNode edits directory into the fields for each JournalNode host.
  - You may enter only one directory for each JournalNode. The paths do not need to be the same on every JournalNode.
  - The directories you specify should be empty, and must have the appropriate permissions.
8. Extra Options: Decide whether Cloudera Manager should clear existing data in ZooKeeper, standby NameNode, and JournalNodes. If the directories are not empty (for example, you are re-enabling a previous HA configuration), Cloudera Manager will not automatically delete the contents—you can select to delete the contents by keeping the default checkbox selection. The recommended default is to clear the directories. If you choose not to do so, the data should be in sync across the edits directories of the JournalNodes and should have the same version data as the NameNodes.
9. Click Continue.
Cloudera Manager executes a set of commands that will stop the dependent services, delete, create, and configure roles and directories as appropriate, create a nameservice and failover controller, and restart the dependent services and deploy the new client configuration.
If you want to use other services in a cluster with HA configured, follow the procedures in Configuring Other CDH Components to Use HDFS HA.
If you are running CDH 4.0 or 4.1, the standby NameNode may fail at the bootstrapStandby command with the error Unable to read transaction ids 1-7 from the configured shared edits storage. Use rsync or a similar tool to copy the contents of the dfs.name.dir directory from the active NameNode to the standby NameNode and start the standby NameNode.

Important: If you change the NameNode Service RPC Port ( dfs.namenode.servicerpc-address) while automatic failover is enabled, this will cause a mismatch between the NameNode address saved in the ZooKeeper /hadoop-ha znode and the NameNode address that the Failover Controller is configured with. This will prevent the Failover Controllers from restarting. If you need to change the NameNode Service RPC Port after Auto Failover has been enabled, you must do the following to re-initialize the znode:

Stop the HDFS service.
Configure the service RPC port:
1. Go to the HDFS service.
2. Click the Configuration tab.
3. Select Scope > NameNode.
4. Select Category > Ports and Addresses.
5. Locate the NameNode Service RPC Port property or search for it by typing its name in the Search box.
6. Change the port value as needed.
  If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.
On a ZooKeeper server host, run zookeeper-client.
1. Execute the following to remove the configured nameservice. This example assumes the name of the nameservice is nameservice1. You can identify the nameservice from the Federation and High Availability section on the HDFS Instances tab:
```
rmr /hadoop-ha/nameservice1
```
Click the Instances tab.
Select Actions > Initialize High Availability State in ZooKeeper.
Start the HDFS service.

Fencing Methods

In order to ensure that only one NameNode is active at a time, a fencing method is required for the shared edits directory. During a failover, the fencing method is responsible for ensuring that the previous active NameNode no longer has access to the shared edits directory, so that the new active NameNode can safely proceed writing to it.

By default, Cloudera Manager configures HDFS to use a shell fencing method (shell(./cloudera_manager_agent_fencer.py)) that takes advantage of the Cloudera Manager Agent. However, you can configure HDFS to use the sshfence method, or you can add your own shell fencing scripts, instead of or in addition to the one Cloudera Manager provides.

The fencing parameters are found in the Service-Wide > High Availability category under the configuration properties for your HDFS service.

For details of the fencing methods supplied with CDH 5, and how fencing is configured, see Fencing Configuration.

IT爱好者菜鸟努力中

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
The way of enabling HDFS HA by using Cloudera-manager

Enabling HDFS HA Using Cloudera ManagerMinimum Required Role: Cluster Administrator (also provided by Full Administrator)You can use Cloudera Manager to configure your CDH 4 or CDH 5 cluster f
复制链接

扫一扫