NameNode优化归纳【RPC&FBR&监控】

最新推荐文章于 2024-01-11 14:35:56 发布

Geoffrey Turing

最新推荐文章于 2024-01-11 14:35:56 发布

阅读量5.9k

点赞数 91

分类专栏： HDFS 文章标签： hadoop 大数据 hdfs

本文链接：https://blog.csdn.net/qq_37865420/article/details/112917688

版权

HDFS 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

1、background

We have seen many incidents of overloaded HDFS namenode due to 1) misconfigurations or 2) “bad” MR jobs or Hive queries that create large amount of RPC requests in a short period of time. There are quite a few features that have been introduced in HDP 2.3/2.4 to protect HDFS namenode. This article summarize the deployment steps of these features with an incomplete list of known issues and possible solutions for them.

2、optimizing

Enable Async Audit Logging开启异步日志（本文已有配置说明）
Dedicated Service RPC Port拆分serviceRPC端口（本文已有配置说明）
Dedicated Lifeline RPC Port for HA拆分lifeLineRPC端口（本文已有配置说明）
Enable FairCallQueue on Client RPC Port开启RPC公平调度队列（本文已有配置说明）
Enable RPC Client Backoff on Client RPC port开启backoff退避
Enable RPC Caller Context to track the “bad” jobs``
Enable Response time based backoff with DecayedRpcScheduler``
Check JMX for namenode client RPC call queue length and average queue time``
Check JMX for namenode DecayRpcScheduler when FCQ is enabled
NNtop (HDFS-6982)``
Tuning configuration when deleting a large Dir slowly （本文已有配置说明）
Injection of patch to improve FBR when NN started

3、Enable Async Audit Logging

Enable async audit logging by setting
dfs.namenode.audit.log.async to true in hdfs-site.xml. This can minimize the impact of audit log I/Os on namenode performance.

<property>  
  <name>dfs.namenode.audit.log.async</name>  
  <value>true</value>
</property>

4、Dedicated Service RPC Port

Configuring a separate service RPC port can improve the responsiveness of the NameNode by allowing DataNode and client requests to be processed via separate RPC queues. Datanode and all other services should be connected to the new service RPC address and clients connect to the well known addresses specified by dfs.namenode.rpc-address.

Adding a service RPC port to an HA cluster with automatic failover via ZKFCs (with/wo Kerberos) requires some additional steps as follows:

1、Add the following settings to hdfs-site.xml.

<property>
  <name>dfs.namenode.servicerpc-address.mycluster.nn1</name>
  <value>nn1.example.com:8040</value>
</property>
<property>
  <name>dfs.namenode.servicerpc-address.mycluster.nn2</name>
  <value>nn2.example.com:8040</value>
</property>

2. If the cluster is not Kerberos enabled, skip this step.

If the cluster is kerberos enabled, create two new hdfs_jass.conf files for nn1 and nn2 and copy them to /etc/hadoop/conf/hdfs_jaas.conf, respectively

nn1:

Client { 
com.sun.security.auth.module.Krb5LoginModule required 
useKeyTab=true 
storeKey=true 
useTicketCache=false 
keyTab="/etc/security/keytabs/nn.service.keytab" principal="nn/c6401.ambari.apache.org@EXAMPLE.COM";
};

nn2:

Client { 
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
storeKey=true
useTicketCache=false
keyTab="/etc/security/keytabs/nn.service.keytab" principal="nn/c6402.ambari.apache.org@EXAMPLE.COM";
};

Add the following to hadoop-env.sh

export HADOOP_NAMENODE_OPTS="
-Dzookeeper.sasl.client=true 
-Dzookeeper.sasl.client.username=zookeeper 
-Djava.security.auth.login.config=/etc/hadoop/conf/hdfs_jaas.conf 
-Dzookeeper.sasl.clientconfig=Client ${HADOOP_NAMENODE_OPTS}"

3. Restart NameNodes

4. Restart DataNodes

Restart DataNodes to connect to the new NameNode service RPC port instead of the NameNode client RPC port .

5. Stop the ZKFC

Stop the ZKFC processes on both NameNodes

6. -formatZK

Run the following command to reset the ZKFC state in ZooKeeper

hdfs zkfc -formatZK

Known issues:

1 Without step 6 you will see the following exception after ZKFC restart.

java.lang.RuntimeException:Mismatched address 
stored in ZK forNameNode

2 Without step 2 in a Kerberos enabled HA cluster, you will see the following exception when running step 6.

16/03/23 03:30:53 INFO ha.ActiveStandbyElector: Recursively deleting /hadoop-ha/hdp64ha from ZK...16/03/23 03:30:53 ERROR ha.ZKFailoverController: Unable to clear zk parent znodejava.io.IOException: Couldn't clear parent znode /hadoop-ha/hdp64haat org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:380)at org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:267)at org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:212)at org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:61)at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:172)at org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:168)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:360)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:442)at org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:168)at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:183)

Caused by: org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = Directory not empty for /hadoop-ha/hdp64haat org.apache.zookeeper.KeeperException.create(KeeperException.java:125)at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)at org.apache.zookeeper.ZKUtil.deleteRecursive(ZKUtil.java:54)at org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:375)at org.apache.hadoop.ha.ActiveStandbyElector$1.run(ActiveStandbyElector.java:372)at org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1041)at org.apache.hadoop.ha.ActiveStandbyElector.clearParentZNode(ActiveStandbyElector.java:372)
... 11 more

5、Dedicated Lifeline RPC Port for HA

HDFS-9311 allows using a separate RPC address to isolate health checks and liveness from client RPC port which could be exhausted due to “bad” jobs. Here is an example to configure this feature in a HA cluster.

<property>  
<name>dfs.namenode.lifeline.rpc-address.mycluster.nn1</name>
<value>nn1.example.com:8050</value> 
</property>

<property>
  <name>dfs.namenode.lifeline.rpc-address.mycluster.nn2</name>
  <value>nn1.example.com:8050</value>
</property>

也就是说
【RPC拆分patch参数配置】

dfs.namenode.servicerpc-address.gaofeng.nn1=gaofeng-nn-01:8022
dfs.namenode.servicerpc-address.gaofeng.nn2=gaofeng-nn-02:8022
dfs.namenode.lifeline.rpc-address.gaofeng.nn1=gaofeng-nn-01:8023
dfs.namenode.lifeline.rpc-address.gaofeng.nn2=gaofeng-nn-02:8023
dfs.namenode.service.handler.count=50
dfs.namenode.lifeline.handler.count=50

按照上面的配置完成后重启受影响的组件，之后进行-formatZK即可

但是拆分队列在hadoop3有bug原因：
sendLifeline NPE异常
NameNode在处理DataNode发送的生命线消息时出现NPE，这将导致NN计算的maxLoad异常。
由于在choose DataNode中DataNode被标识为busy并且无法分配可用的节点，程序循环的执行会导致高CPU并降低集群的处理性能。
解决办法：打入HDFS-15556

Duplicated issue为HDFS-14042

6、Enable FairCallQueue on Client RPC Port

《聊聊RPC的拥塞控制》
《RPC Congestion Control with FairCallQueue》
《FairCallQueue.html官方文档》
《FairCallQueue滴滴技术文摘》
《Quality of Service in Hadoop性能测试图》
《华为FairCallQueue配置说明》
《唯品会 HDFS 性能挑战和优化实践》

7、Enable RPC Client Backoff on Client RPC port

开启backoff退避
TODO…

8、Enable RPC Caller Context to track the “bad” jobs

TODO…

9、Enable Response time based backoff with DecayedRpcScheduler

TODO…

10、HDFS jmx&health check

jmx

参考：Hadoop+JMX+Monitoring+and+Alerting官网指标list
Check JMX for namenode client RPC call queue length and average queue time``
Check JMX for namenode DecayRpcScheduler when FCQ is enabled
NNtop (HDFS-6982)``

The HDFS monitoring commands I often use in production are summarized below

【NN audit cmd count】

查看hdfs-audit审计日志中cmd一份中的个数
cat /var/log/hadoop/ocdp/hdfs-audit.log/awk '{print $2}'|awk -F ':' '{print $1":"$2}'|sort|uniq -c

后台查看健康检查

hdfs dfsadmin -report |head

三副本情况下查看块数

hdfs dfsadmin -report |grep 'Num of Blocks'|awk -F ':' '{print $2}'|awk '{sum +=$1};END {print sum/3}'这是一个大约值

curl --silent http://192.168.1.1:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem | grep -i "blocktotal"这是50070显示的值

查看PendingDeletionBlocks

(Ambari方式)

curl -u admin:admin -X GET http://192.168.1.1:8080/api/v1/clusters/testqjcluster/hosts/host-192-168-1-1/host_components/NAMENODE?fields=metrics/rpc
PendingDeletionBlocks

(Hadoop方式)

curl --silent http://192.168.1.1:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem | grep -i "PendingDeletionBlocks"

查看RPC指标

(Ambari方式)

curl -u admin:admin -X GET http://192.168.1.1:8080/api/v1/clusters/cluster1/hosts/host-192-168-1-1/host_components/NAMENODE?fields=metrics/rpc

(Hadoop方式)
- 依据上文的端口配置，监控如下
- （client=客户端跟NN的交互）
  curl --silent http://192.168.1.1:50070/jmx?qry=Hadoop:service=NameNode,name=RpcActivityForPort8020
- （service=NN跟JN的交互）
  curl --silent http://192.168.1.1:50070/jmx?qry=Hadoop:service=NameNode,name=RpcActivityForPort8040
- （lifeLine=DN跟NN的交互）
  curl --silent http://192.168.1.1:50070/jmx?qry=Hadoop:service=NameNode,name=RpcActivityForPort8050

11、delete优化

Reference 唯品会-林意群

HDFS-13831patch

打入HDFS-13831patch，将dfs.namenode.block.deletion.increment（default 1000）降低为100

FoldedTreeSet碎片阈值

dfs.namenode.storageinfo.defragment.ratio=0.75->0.9
ipc.8020.callqueue.impl=org.apache.hadoop.ipc.FairCallQueue
按照commitor的建议，调大FoldedTreeSet（Hadoop3存储blockInfo的数据结构）的碎片阈值参数

ipc.server.read.threadpool.size

Reader线程数，默认1->100

dfs.namenode.service.handler.count

Handler线程数，默认10->361

ipc.server.handler.queue.size

每个 Handler 处理的最大 Call 队列长度，默认100->1000。

12、Injection of patch to improve FBR when NN started

线上1.4k节点规模集群的HDFS原本NN启动后2亿多块上报不到一小时就完成，在Hadoop2.7升级到3.1后，发现上报的时间需要4个小时左右，严重影响线上环境
在打入如下patch后，可以解决这个问题
HDFS-14366
HDFS-14859
HDFS-14632
HDFS-14171

一键三连(〃‘▽’〃)

更多关于大数据（Hadoop、HBASE、Hive、Flink、Doris、Pulsar、Kafka、ClickHouse）学习干货资料
识别下方二维码，回复“资料全集”，即可获得下载地址。
在这里插入图片描述

Geoffrey Turing

关注

91
点赞
踩
12

收藏

觉得还不错? 一键收藏
打赏
2
评论
NameNode优化归纳【RPC&FBR&监控】

1、backgroundWe have seen many incidents of overloaded HDFS namenode due to 1) misconfigurations or 2) “bad” MR jobs or Hive queries that create large amount of RPC requests in a short period of time. There are quite a few features that have been introduce
复制链接

扫一扫