hadoop故障记录-hadoop启动后datanode起不来/起来了过一段时间自动消失

前几天在虚拟机里面做测试的hadoop集群出了问题,在NameNode启动集群的时候集群“正常”启动,NameNode和DateNode所有进程都起来了。但是,问题来了,登录50070HDFS管理页面的时候出现下图情况

并且DataNode节点上的DateNode进程崩溃。 百度谷歌大体都在说NameNode和DataNode的namespaceID不一致~修改配置文件等等~

各种尝试未能解决问题,分析了一下slave1的log:

NameNode节点上的log:

2013-07-16 14:23:53,531 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = slaves1/192.168.20.136
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 1.0.4
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012
************************************************************/
2013-07-16 14:23:56,389 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2013-07-16 14:23:56,464 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2013-07-16 14:23:56,465 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2013-07-16 14:23:56,466 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2013-07-16 14:23:57,422 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2013-07-16 14:23:57,437 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2013-07-16 14:23:58,490 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2013-07-16 14:24:00,782 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 0 time(s).
2013-07-16 14:24:01,785 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 1 time(s).
2013-07-16 14:24:02,786 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 2 time(s).
2013-07-16 14:24:03,788 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 3 time(s).
2013-07-16 14:24:04,803 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 4 time(s).
2013-07-16 14:24:05,805 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 5 time(s).
2013-07-16 14:24:06,808 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 6 time(s).
2013-07-16 14:24:07,847 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 7 time(s).
2013-07-16 14:24:08,849 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 8 time(s).
2013-07-16 14:24:09,850 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 9 time(s).
2013-07-16 14:24:09,852 INFO org.apache.hadoop.ipc.RPC: Server at master/192.168.20.135:9000 not available yet, Zzzzz...
2013-07-16 14:24:11,855 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 0 time(s).
2013-07-16 14:24:12,856 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 1 time(s).
2013-07-16 14:24:30,484 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Incompatible build versions: namenode BV = ; datanode BV = 1393290
2013-07-16 14:24:30,775 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible build versions: namenode BV = ; datanode BV = 1393290
    at org.apache.hadoop.hdfs.server.datanode.DataNode.handshake(DataNode.java:566)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:362)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682)

2013-07-16 14:24:30,894 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at slaves1/192.168.20.136
************************************************************/

看这个样子是DataNode连不到NameNode了- -. ping还是ping的通的 说明集群内部有问题了。

各种抓耳挠腮的时候突然想到 之前在NameNode上面按照书上说的编译了一下hadoop的eclipse插件

其中第二步的时候在hadoop安装目录下面执行了ant compile。

在终端用hadoop version命令检查了一下hadoop版本

Hadoop 1.0.4-SNAPSHOT
Subversion  -r
Compiled by jelon on Mon Jul 15 19:44:03 CST 2013
From source with checksum a34c7c3a1218f2023cb9ced9cd6033c0

NameNode节点的hadoop版本变成了 Hadoop 1.0.4-SNAPSHOT DataNode节点的hadoop版本是Hadoop 1.0.4 暂时不懂多的这个SNAPSHOT什么意思,将NameNode的hadoop整个安装目录全部拷贝到其他的数据节点 然后删除配置文件core-site.xml里面hadoop.tmp.dir设置的tmp目录下面所有的东西  重新格式化hadoop 然后重启hadoop 登录50070管理页面

启动成功 在DataNode上jps命令检查 DataNode进程仍然存在


  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值