问题
在集群主机点启动HDFS集群和YARN集群之后,查看各个节点启动情况:主节点全部启动成功,从节点启动失败
[root@master-node ~]# jps
6432 Jps
5427 NameNode
5811 ResourceManager
5531 DataNode
5917 NodeManager
[root@slave-node1 ~]# jps
2244 Jps
[root@slave-node2 ~]# jps
1818 Jps
原因
查看从节点log日志,报了“域名解析错误”。查看域名配置文件/etc/hosts,发现只有主节点配置了,从节点都没有配置
[root@slave-node1 logs]# head -100 hadoop-root-datanode-slave-node1.log
[root@slave-node1 ~]# vim /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
将主节点/etc/hosts文件远程复制到各个集群从节点上。
[root@master-node ~]# vim /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.159.10 master-node
192.168.159.11 slave-node1
192.168.159.12 slave-node2
[root@master-node ~]# scp /etc/hosts root@slave-node1:/etc/hosts
[root@master-node ~]# scp /etc/hosts root@slave-node2:/etc/hosts
解决方案
先关闭集群,再次启动集群,检查启动情况,都生效了
master-node节点
[root@master-node ~]# cd /opt/module/hadoop-2.7.4/sbin/
[root@master-node sbin]# start-dfs.sh
[root@master-node sbin]# start-yarn.sh
[root@master-node sbin]# jps
5427 NameNode
5811 ResourceManager
5531 DataNode
5917 NodeManager
6685 Jps
slave-node1节点
[root@slave-node1 ~]# jps
1922 DataNode
1994 SecondaryNameNode
2286 Jps
slave-node2节点
[root@slave-node2 ~]# jps
1586 DataNode
1818 Jps
1679 NodeManager