2012-12-17 10:58:59,925 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid
volume failure config value: 3
at org.apache.hadoop.hdfs.server.datanode.FSDataset.<init>(FSDataset.java:1025)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:414)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:305)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1606)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1546)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1564)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1690)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1707)
新建了几台机器的集群,启动datanode时候报了这个错。
主要原因是因为dfs.datanode.failed.volumes.tolerated 参数配置了3,
这个参数的含义:The number of volumes that are allowed to fail before a datanode stops offering service. By default any volume failure will cause a datanode to shutdown.
datanode允许磁盘损坏的个数 ,datanode在启动时候会使用dfs.data.dir下配置的文件夹(用于存储block的),若是有一些不可以用且个数>上面配置的那个 值,这启动失败,代码见:org.apache.hadoop.hdfs.server.datanode.FSDataset
public FSDataset(DataStorage storage, Configuration conf) throws IOException {
this.maxBlocksPerDir = conf.getInt("dfs.datanode.numblocks", 64);
// The number of volumes required for operation is the total number
// of volumes minus the number of failed volumes we can tolerate.
final int volFailuresTolerated =
conf.getInt("dfs.datanode.failed.volumes.tolerated", 0);
String[] dataDirs = conf.getTrimmedStrings(DataNode.DATA_DIR_KEY);
int volsConfigured = (dataDirs == null) ? 0 : dataDirs.length;
int volsFailed = volsConfigured - storage.getNumStorageDirs();
validVolsRequired = volsConfigured - volFailuresTolerated;
if (volFailuresTolerated < 0 || volFailuresTolerated >= volsConfigured) {
throw new DiskErrorException("Invalid volume failure "
+ " config value: " + volFailuresTolerated);
}
由于dfs.data.dir只配了一个目录,所以将 dfs.datanode.failed.volumes.tolerated设置为0后,问题解决。