一、问题
启动分布式环境,执行命令
hadoop@master$ ${HADOOP_HOME}~ sbin/start.dfs.sh
现象:主节点的namenode、secondarynamenode启动成功,但是slave节点datenode启动不成功。
查看日志hadoop-hadoop-datanode-master.log 显示如下
***2018-11-27 05:37:59,970 INFO org.apache.hadoop.http.HttpServer2: HttpServer.start() threw a non Bind IOException
java.net.BindException: Port in use: localhost:0
at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:919)
at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:856)
at org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer.<init>(DatanodeHttpServer.java:104)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startInfoServer(DataNode.java:760)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1112)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:429)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2374)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2261)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2308)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2485)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2509)
Caused by: java.net.BindException: Cannot assign requested address
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:914)
... 10 more*
2018-11-27 05:37:59,981 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2018-11-27 05:37:59,983 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at master/192.168.83.138
**********************************************************/
2018-11-27 05:44:09,765 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
二、日志分析
很自然,从这个日志中提取两个关键的地方
(1)java.net.BindException: Port in use: localhost:0
(2)Caused by: java.net.BindException: Cannot assign requested address
其中第一个是现象,是说端口被占用,到底哪个端口被占用,只给出了主机localhost,并且给出了localhost的端口0,那么是什么导致的的?
第二个说明的失败导致的原因,说网络绑定异常,不能分配请求的地址,这个其实也没什么用。
三、解决办法
(1)在core-site.xml中添加了namenode 和 Datanode的目录地址,如下:
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/software/hadoop-2.7.3/data/name</value>
</property>
<property>
<name>dfs.dataname.data.dir</name>
<value>file:/home/hadoop/software/hadoop-2.7.3/data/data</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/software/hadoop-2.7.3/data/tmp</value>
</property>
</configuration>
(2)重点,导致问题的根本原因,在/etc/hosts文件中
之前不知道看谁的博客,说在hosts文件中删除多余的IP与主机名的对应关系(之前貌似也是ipc通信不成功,然后删除了127.0.0.1和0.0.0.1)之后就成为了如下的样子:
192.168.83.138 master
192.168.83.139 slave1
192.168.83.140 slave2
#127.0.0.1 localhost localhost.localdomain localhost4.localdomain4
#::1 localhost localhost.localdomain localhost6.localdomain6
但是,实际上应该是这样子的
[hadoop@master etc]$ more hosts
192.168.83.138 master
192.168.83.139 slave1
192.168.83.140 slave2
127.0.0.1 localhost localhost.localdomain localhost4.localdomain4
::1 localhost localhost.localdomain localhost6.localdomain6
[hadoop@master etc]$
这样再次格式化namenode节点:bin/hdfs namenode -format
重启HDFS:sbin/start-dsf.sh
然后成功:
[hadoop@master etc]$ jps
10244 NodeManager
10132 ResourceManager
9704 NameNode
9978 SecondaryNameNode
11162 Jps
10830 DataNode
[hadoop@master etc]$