记一次集群修复问题:
有天上班突然发现ambari提示hiveserver2拒绝连接,按照习惯,先重启hiveserver2的服务。
结果发现重启后还是报错,遂进入服务器直接敲“hive”命令,得到下面输出:
The number of live datanodes 3 has reached the minimum number 0.
Safe mode will be turned off automatically once the thresholds have been reached.
Caused by: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Log not rolled.
Name node is in safe mode.
The reported blocks 632758 needs additional 5114 blocks to reach the threshold 0.9990
of total blocks 638510.
The number of live datanodes 3 has reached the minimum number 0.
Safe mode will be turned off automatically once the thresholds have been reached.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode
(FSNamesystem.java:1209)
... 12 more
留意到关键词“safe mode”,突然想起昨天公司园区停电了一个小时,于是综合思虑,找出了原因:
由于断电,导致集群发生了dataNode datablock丢失,丢失的数量超过了阈值,于是系统自动进入安全模式,hiveserver2也就拒绝外部连接。
解决办法:
运行命令hadoop dfsadmin -safemode leave
来退出安全模式
重启hdfs以自动修复因断电而丢失的block
按上面处理之后,ambari就没有报错了。