测试环境:Fedora20,主机Master.Hadoop,ip:192.168.1.105
虚拟机:Slave1.Hadoop:192.168.1.3
Slave2.Hadoop:192.168.1.4
Slave3.Hadoop:192.168.5
Hadoop版本:hadoop0.20.2
Java-version:jdk1.7.0_51
键入如下命令时提示:
[hadoop@Master ~]$hadoop dfsadmin -report
report: org.apache.hadoop.security.AccessControlException: Superuser privilege is required
[hadoop@Master ~]$stop-all.sh
no jobtracker to stop192.168.1.4: no tasktracker to stop
192.168.1.3: no tasktracker to stop
192.168.1.5: no tasktracker to stop
no namenode to stop
192.168.1.3: no datanode to stop
192.168.1.4: no datanode to stop
192.168.1.5: no datanode to stop
192.168.1.105: stopping secondarynamenode
发现只有secondarynamenode被停止了。
检查发现自己的hdfs-site.xml配置文件配错了。
正确的如下:
<configuration>
<property>
<name>dfs.replication</name>
<!--本人有3台Slave-->
<value>3</value>
</property>
</configuration>
然后把执行如下命令:
[hadoop@Master ~]$ rm -rf /usr/hadoop/tmp
[hadoop@Master ~]$ mkdir /usr/hadoop/tmp
[hadoop@Master ~]$ rm -rf /tmp/hadoop*
对数据节点进行一样的操作。
原因:每次namenode format会重新创建一个namenodeId,而tmp/dfs/data下包含了上次format下的id,namenode format清空了namenode下的数据,但是没有清空datanode下的数据,导致启动时失败,所要做的就是每次fotmat前,清空tmp一下的所有目录。
接着:
[hadoop@Master ~]$ hadoop namenode -format
[hadoop@Master ~]$ start-all.sh
starting namenode, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-namenode-Master.Hadoop.out
192.168.1.4: starting datanode, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-datanode-Slave2.Hadoop.out
192.168.1.3: starting datanode, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-datanode-Slave1.Hadoop.out
192.168.1.5: starting datanode, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-datanode-Slave3.Hadoop.out
192.168.1.105: starting secondarynamenode, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-Master.Hadoop.out
starting jobtracker, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-jobtracker-Master.Hadoop.out
192.168.1.5: starting tasktracker, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-tasktracker-Slave3.Hadoop.out
192.168.1.3: starting tasktracker, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-tasktracker-Slave1.Hadoop.out
192.168.1.4: starting tasktracker, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-tasktracker-Slave2.Hadoop.out
发现又有令外的问题了,结果如下:
[hadoop@Master current]$jps
4452 Jps
4315 JobTracker
4019 NameNode
4191 SecondaryNameNode
[hadoop@Master current]$ stop-all.sh
stopping jobtracker
192.168.1.4: no tasktracker to stop
192.168.1.3: no tasktracker to stop
192.168.1.5: no tasktracker to stop
stopping namenode
192.168.1.4: no datanode to stop
192.168.1.3: no datanode to stop
192.168.1.5: no datanode to stop
192.168.1.105: stopping secondarynamenode
找不到原因,于是查看日志,找到如下异常信息:
2014-02-05 13:46:52,237 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 9000, call addBlock(/usr/hadoop/tmp/mapred/system/jobtracker.info, DFSClient_-729425152) from 192.168.1.105:39751: error: java.io.IOException: File /usr/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
java.io.IOException: File /usr/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
2014-02-05 13:46:52,246 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=hadoop,hadoop ip=/192.168.1.105 cmd=delete src=/usr/hadoop/tmp/mapred/system/jobtracker.info dst=null perm=null
网上收集了一资料,说是把“masters”和“slaves”中都配置上主机的IP,我配置的是“192.168.1.105”,这样问题就解决了,重新格式化后,4个节点顺利的启动起来了。
如下:
[hadoop@Master ~]$ more /usr/hadoop/conf/masters
192.168.1.105
[hadoop@Master ~]$ more /usr/hadoop/conf/slaves
192.168.1.105//注意:这个不能缺
192.168.1.3
192.168.1.4
192.168.1.5
然后:
格式化:
[hadoop@Master ~]$ hadoop namenode -format
启动:
[hadoop@Master ~]$start-all.sh
starting namenode, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-namenode-Master.Hadoop.out
192.168.1.105: starting datanode, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-datanode-Master.Hadoop.out
192.168.1.3: starting datanode, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-datanode-Slave1.Hadoop.out
192.168.1.5: starting datanode, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-datanode-Slave3.Hadoop.out
192.168.1.4: starting datanode, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-datanode-Slave2.Hadoop.out
192.168.1.105: starting secondarynamenode, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-Master.Hadoop.out
starting jobtracker, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-jobtracker-Master.Hadoop.out
192.168.1.105: starting tasktracker, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-tasktracker-Master.Hadoop.out
192.168.1.4: starting tasktracker, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-tasktracker-Slave2.Hadoop.out
192.168.1.5: starting tasktracker, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-tasktracker-Slave3.Hadoop.out
192.168.1.3: starting tasktracker, logging to /usr/hadoop/bin/../logs/hadoop-hadoop-tasktracker-Slave1.Hadoop.out
[hadoop@Master ~]$ jps
5227 DataNode
5558 JobTracker
5727 TaskTracker
5880 Jps
5108 NameNode
5440 SecondaryNameNode
[hadoop@Master ~]$ hadoop dfsadmin -report
Configured Capacity: 25668202496 (23.91 GB)
Present Capacity: 20160196623 (18.78 GB)
DFS Remaining: 20160172032 (18.78 GB)
DFS Used: 24591 (24.01 KB)
DFS Used%: 0%
Under replicated blocks: 1
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)
Name: 192.168.1.105:50010
Decommission Status : Normal
Configured Capacity: 25668202496 (23.91 GB)
DFS Used: 24591 (24.01 KB)
Non DFS Used: 5508005873 (5.13 GB)
DFS Remaining: 20160172032(18.78 GB)
DFS Used%: 0%
DFS Remaining%: 78.54%
Last contact: Wed Feb 05 13:54:11 CST 2014
发现还是不行:
如下:
[hadoop@Master ~]$stop-all.sh
stopping jobtracker192.168.1.4: no tasktracker to stop
192.168.1.3: no tasktracker to stop
192.168.1.5: no tasktracker to stop
stopping namenode
192.168.1.3: no datanode to stop
192.168.1.4: no datanode to stop
192.168.1.5: no datanode to stop
192.168.1.105: stopping secondarynamenode
证明上面在“slaves”中都配置上主机的IP是不正确的。再改回来。于是在Slave所在主机上通过jps查看进程得知,datanode与tasktracker会过一段时自动杀掉。
解决办法:
关掉SELINUX:
再关掉防火墙:
注意:
1.Fedora与其他版本的linux系统有不一样的地方。
2.然后把执行如下命令:
[hadoop@Master ~]$ rm -rf /usr/hadoop/tmp
[hadoop@Master ~]$ mkdir /usr/hadoop/tmp
[hadoop@Master ~]$ rm -rf /tmp/hadoop*
[hadoop@Slave ~]$ rm -rf /usr/hadoop/tmp
[hadoop@Slave ~]$ mkdir /usr/hadoop/tmp
[hadoop@Slave ~]$ rm -rf /tmp/hadoop*
3.重启