线上Hadoop环境节点报DataNode Pause Duration,页面错误信息如下:
Pause Duration
Average time spent paused was 44.4 second(s) (74.00%) per minute over the previous 5 minute(s). Critical threshold: 60.00%.
对应该节点的jvm heap使用情况如下:

翻看cloudera官网,找到该选项 DataNode Pause Duration 解释如下:
DataNode Pause Duration
This DataNode health test checks that the DataNode threads are not experiencing long scheduling pauses. The test uses a pause monitoring thread in the DataNode that tracks scheduling delay by noting if it is run on its requested schedule. If the thread is not run on its requested schedule, the delay is noted and considered pause time. The health test checks that no more than some percentage of recent time is spent paused. A failure of this health test may indicate that the DataNode is not getting enough CPU resources, or that it is spending too much time doing garbage collection. Inspect the DataNode logs for any pause monitor output and check garbage collection metrics exposed by the DataNode. This test can be configured using the Pause Duration Thresholds and Pause Duration Monitoring Period DataNode monitoring settings.
Short Name: Pause Duration
Property Name Description Template Name Default Value Unit Pause Duration Monitoring Period The period to review when computing the moving average of extra time the pause monitor spent paused. datanode_pause_duration_window 5 MINUTES Pause Duration Thresholds The health test thresholds for the weighted average extra time the pause monitor spent paused. Specified as a percentage of elapsed wall clock time. datanode_pause_duration_thresholds critical:60.0, warning:30.0 no unit
https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ht_datanode.html

1504

被折叠的 条评论
为什么被折叠?



