cronjob不执行java_hadoop群集出现crontab job不执行的情况

今天hadoop群集出现crontab job不执行的情况,手动运行job,报错如下:

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RetriableException): org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot

delete /user/hdfs/.staging/job_1441592436807_1892.Name node is in safe mode.

The reported blocks 4710619 needs additional 51773 blocks to reach the threshold 1.0000 of total blocks 4762391.

The number of live datanodes 34 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1211)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:3354)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:3314)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3298)

at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:733)

at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete

(ClientNamenodeProtocolServerSideTranslatorPB.java:547)

at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod

(ClientNamenodeProtocolProtos.java)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

Caused by: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /user/hdfs/.staging/job_1441592436807_1892. Name node is in safe

mode.

The reported blocks 4710619 needs additional 51773 blocks to reach the threshold 1.0000 of total blocks 4762391.

The number of live datanodes 34 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1207)

... 14 more

at org.apache.hadoop.ipc.Client.call(Client.java:1410)

at org.apache.hadoop.ipc.Client.call(Client.java:1363)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)

报错中提示Namenode safe mode,

我查看namenode节点,hadoop dfsadmin -safemode get

但是状态显示的是off,很奇怪,

是不是这个namenode节点进程死掉了?

我尝试将另外的namenode节点调整为active状态,

hdfs haadmin -transitionToActive --forcemanual nn2

nn2节点变成了active状态,之后查看nn1

hdfs haadmin -getServiceState nn1尽然还是active状态,

手动将它调整为standby试试,hdfs haadmin -transitionToStandby --forcemanual nn1

有时候会报错:forcefence and forceactive flags not supported with auto-failover enabled.

意思是自动切换,不能手动。可以关闭这个Namenode节点服务,重新启动。

折腾一下,跑个MR,终于成功了,记录下,帮助遇到这个问题的朋友。

至于什么原因造成的,大概是近期一直在进行大量的MR并同时进行-put上传操作造成的。

阅读(1488) | 评论(0) | 转发(0) |

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值