hadoop second namenode异常 Inconsistent checkpoint fields
没有访问量情况下,namenode进程:cpu 100% ;内存使用超多;没有错误日志;
secondarynamenode报错:
java.io.IOException: Inconsistent checkpoint fields.LV = -57 namespaceID = 371613059 cTime = 0 ; clusterId = CID-b8a5f273-515a-434c-87c0-4446d4794c85 ; blockpoolId = BP-1082677108-127.0.0.1-1433842542163.Expecting respectively: -57; 1687946377; 0; CID-603ff285-de5a-41a0-85e8-f033ea1916fc; BP-2591078-127.0.0.1-1433770362761. at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:134) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:531) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:395) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:361) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:411) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:357) at java.lang.Thread.run(Thread.java:662)
造成如上异常的原因很多,其中一个原因为:second namenode的数据目录中的edit log与当前的数据版本不一致导致
解决方法:
手动删除second nodenode目录下的文件,然后重启hadoop:
查询发现second namenode下的edit log竟然是很久以前的:
/opt/hadoop-2.5.1/dfs/tmp/dfs/namesecondary/current
[root@hbase current]# lltotal 116-rw-r--r-- 1 root root 42 Jun 8 2015 edits_0000000000000000001-0000000000000000002-rw-r--r-- 1 root root 8991 Jun 8 2015 edits_0000000000000000003-0000000000000000089-rw-r--r-- 1 root root 4370 Jun 8 2015 edits_0000000000000000090-0000000000000000123-rw-r--r-- 1 root root 3817 Jun 9 2015 edits_0000000000000000124-0000000000000000152-rw-r--r-- 1 root root 2466 Jun 9 2015 edits_0000000000000000153-0000000000000000172-rw-r--r-- 1 root root 2466 Jun 9 2015 edits_0000000000000000173-0000000000000000192-rw-r--r-- 1 root root 2466 Jun 9 2015 edits_0000000000000000193-0000000000000000212-rw-r--r-- 1 root root 2466 Jun 9 2015 edits_0000000000000000213-0000000000000000232-rw-r--r-- 1 root root 2466 Jun 9 2015 edits_0000000000000000233-0000000000000000252-rw-r--r-- 1 root root 2466 Jun 9 2015 edits_0000000000000000253-0000000000000000272-rw-r--r-- 1 root root 2466 Jun 9 2015 edits_0000000000000000273-0000000000000000292-rw-r--r-- 1 root root 2466 Jun 9 2015 edits_0000000000000000293-0000000000000000312-rw-r--r-- 1 root root 2466 Jun 9 2015 edits_0000000000000000313-0000000000000000332-rw-r--r-- 1 root root 2466 Jun 9 2015 edits_0000000000000000333-0000000000000000352-rw-r--r-- 1 root root 2466 Jun 9 2015 edits_0000000000000000353-0000000000000000372-rw-r--r-- 1 root root 2466 Jun 9 2015 edits_0000000000000000373-0000000000000000392-rw-r--r-- 1 root root 2466 Jun 9 2015 edits_0000000000000000393-0000000000000000412-rw-r--r-- 1 root root 6732 Jun 9 2015 edits_0000000000000000413-0000000000000000468-rw-r--r-- 1 root root 4819 Jun 9 2015 edits_0000000000000000469-0000000000000000504-rw-r--r-- 1 root root 2839 Jun 9 2015 fsimage_0000000000000000468-rw-r--r-- 1 root root 62 Jun 9 2015 fsimage_0000000000000000468.md5-rw-r--r-- 1 root root 2547 Jun 9 2015 fsimage_0000000000000000504-rw-r--r-- 1 root root 62 Jun 9 2015 fsimage_0000000000000000504.md5-rw-r--r-- 1 root root 199 Jun 9 2015 VERSION
上面的问题解决方法是在配置了hadoop.tmp.dir的情况下,如果没有配置,则无法找到edit log文件,需要进行配置,在hdfs-site.xml或core-site.xml中进行配置;
hadoop.tmp.dir配置参数指定 hdfs的默认临时路径,这个最好配置,如果在新增节点或者其他情况下莫名其妙的DataNode启动不了,就删除此文件中的tmp目录即可。不过如果删除了NameNode机器的此目录,那么就需要重新执行NameNode格式化的命令。