1、问题描述
本人一个namenode节点下面挂在了3个data节点,在namenode 执行:./start-all.sh,其中一个datanode上面,DataNode进程无法启动,但TaskTracker可以启动,其他两个正常。
2、报错日志
2013-12-30 13:25:57,781 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2013-12-30 13:25:57,793 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2013-12-30 13:25:57,794 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2013-12-30 13:25:57,794 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2013-12-30 13:25:57,893 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2013-12-30 13:25:58,296 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Registered FSDatasetStatusMBean
2013-12-30 13:25:58,307 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened data transfer server at 50010
2013-12-30 13:25:58,310 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s
2013-12-30 13:25:58,370 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2013-12-30 13:25:58,452 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2013-12-30 13:25:58,465 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dfs.webhdfs.enabled = false
2013-12-30 13:25:58,465 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50075
2013-12-30 13:25:58,465 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50075 webServer.getConnectors()[0].getLocalPort() returned 50075
2013-12-30 13:25:58,466 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50075
2013-12-30 13:25:58,466 INFO org.mortbay.log: jetty-6.1.26
2013-12-30 13:25:58,794 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50075
2013-12-30 13:25:58,800 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered.
2013-12-30 13:25:58,801 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source DataNode registered.
2013-12-30 13:25:59,069 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
2013-12-30 13:25:59,073 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort50020 registered.
2013-12-30 13:25:59,073 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcActivityForPort50020 registered.
2013-12-30 13:25:59,075 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration = DatanodeRegistration(slave1.hadoop:50010, storageID=, infoPort=50075, ipcPort=50020)
2013-12-30 13:25:59,189 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: org.apache.hadoop.ipc.RemoteException: java.io.IOException: verifyNodeRegistration: unknown datanode slave1.hadoop:50010
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyNodeRegistration(FSNamesystem.java:4743)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:2538)
at org.apache.hadoop.hdfs.server.namenode.NameNode.register(NameNode.java:1006)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)
at org.apache.hadoop.ipc.Client.call(Client.java:1107)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at com.sun.proxy.$Proxy5.register(Unknown Source)
at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:740)
at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1549)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1609)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1734)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1751)
2013-12-30 13:25:59,191 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at slave1.hadoop/192.168.18.206
************************************************************/
3、问题解决
经排查发现,之前做节点解除测试,将该节点加入了exclude文件,配置如下:
hdfs-site.xml加入了如下几行配置:
<property>
<name>dfs.hosts.exclude</name>
<value>/usr/hadoop/conf/excludes</value>
<description>Names a file that contains a list of hosts that are not permitted to connect to the namenode.
The full pathname of the file must be specified. If the value is empty, no hosts are excluded.
</description>
</property>
excludes文件下面加入了如下配置:
slave1.hadoop
直接将excludes文件清空即可,重启OK!