Hadoop集群配置方法见:Hadoop集群配置(最全面总结)
1.在 HADOOP_HOME/conf/ hadoop-env.sh 里面,要把export HADOOP_PID_DIR=/var/hadoop/pids这一行的注释去掉,或者自己指定一个路径,比如export HADOOP_PID_DIR=${HADOOP_HOME}/pids。因为默认是放在/tmp目录下,而这个目录会时不时被清空,所以jps查看时候就找不到namenode的进程了。
2.修改core-site.xml中的hadoop.tmp.dir属性,以及hdfs-site.xml和mapred-site.xml中的对应属性,一定不要放在/tmp目录下
<property><name>hadoop.tmp.dir</name><value>/home/admin/hadoop/tmp</value></property>
3. hadoop的创建用户最好不要是root,否则后面可能遇到权限的问题。注意文件的owner一致。
4. /etc/hosts文件中的主机名,要和core-site.xml和mapred-site.xml中的对应。一般来说要使用主机名,但是尝试后还是不行,反而使用localhost行,不明白为什么。
210.32.xxx.xx lenovo_3 # Added by NetworkManager
#127.0.0.1 localhost.localdomain localhost
127.0.0.1 localhost
#::1 lenovo_3 localhost6.localdomain6 localhost6
下面是从别处找来的一些错误和解决方法的例子:
1.在root账户(非hadoop账户)下操作hadoop
启动namenode失败,log中提示错误:
INFO org.apache.hadoop.hdfs.server.common.Storage: Cannot lock storage /opt/data/hadoop/hdfs/name. The directory is already locked.
2013-08-06 09:54:45,052 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
因为在root账户下启动了下hadoop,虽然Ctrl+C终止了,但是master上的namenode在root下启动了,这时候切换到hadoop账户下再启动hadoop会出现目录被root账户下的namenode进程锁定,导致FSNameSystem启动失败。
解决办法:切换至root,用jps查看启动的进程,kill掉就OK。
root账户下启动hadoop(前提是hadoop权限账户不是root账户),终止后切换到hadoop账户,会出现Permission Denied
2013-08-06 09:51:50,882 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.FileNotFoundException: /opt/data/hadoop/hdfs/name/in_use.lock (Permission denied)
是因为root会把文件权限更改,使用chown -R hadoop:hadoop /opt/data/hadoop/hdfs/*更改用户权限即可
hadoop无法正常启动(1)
执行 $ bin/hadoop start-all.sh之后,无法启动.
异常一
Exception in thread "main" java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): file:/// has no authority.
localhost: at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:214)
localhost: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:135)
localhost: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.<init>(SecondaryNameNode.java:119)
localhost: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:481)
解决方法:此时是没有配置conf/mapred-site.xml的缘故. 在0.21.0版本上是配置mapred-site.xml,在之前的版本是配置core-site.xml,0.20.2版本中配置mapred-site.xml无效,只能配置core-site.xml文件
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:9001</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
hadoop无法正常启动(2)
异常二、
starting namenode, logging to /home/xixitie/hadoop/bin/../logs/hadoop-root-namenode-aist.out
localhost: starting datanode, logging to /home/xixitie/hadoop/bin/../logs/hadoop-root-datanode-aist.out
localhost: starting secondarynamenode, logging to /home/xixitie/hadoop/bin/../logs/hadoop-root-secondarynamenode-aist.out
localhost: Exception in thread "main" java.lang.NullPointerException
localhost: at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:134)
localhost: at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:156)
localhost: at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:160)
localhost: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:131)
localhost: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.<init>(SecondaryNameNode.java:115)
localhost: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:469)
starting jobtracker, logging to /home/xixitie/hadoop/bin/../logs/hadoop-root-jobtracker-aist.out
localhost: starting tasktracker, logging to /home/xixitie/hadoop/bin/../logs/hadoop-root-tasktracker-aist.out
解决方法:此时是没有配置conf/mapred-site.xml的缘故. 在0.21.0版本上是配置mapred-site.xml,在之前的版本是配置core-site.xml , 0.20.2版本中配置mapred-site.xml无效,只能配置core-site.xml文件
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:9001</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
hadoop无法正常启动(3)
异常三、
starting namenode, logging to /home/xixitie/hadoop/bin/../logs/hadoop-root-namenode-aist.out
localhost: starting datanode, logging to /home/xixitie/hadoop/bin/../logs/hadoop-root-datanode-aist.out
localhost: Error: JAVA_HOME is not set.
localhost: starting secondarynamenode, logging to /home/xixitie/hadoop/bin/../logs/hadoop-root-secondarynamenode-aist.out
localhost: Error: JAVA_HOME is not set.
starting jobtracker, logging to /home/xixitie/hadoop/bin/../logs/hadoop-root-jobtracker-aist.out
localhost: starting tasktracker, logging to /home/xixitie/hadoop/bin/../logs/hadoop-root-tasktracker-aist.out
localhost: Error: JAVA_HOME is not set.
解决方法:
请在$hadoop/conf/hadoop-env.sh文件中配置JDK的环境变量
JAVA_HOME=/home/xixitie/jdk
CLASSPATH=$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export JAVA_HOME CLASSPATH
hadoop无法正常启动(4)
异常四:mapred-site.xml配置中使用hdfs://localhost:9001,而不使用localhost:9001的配置
异常信息如下:
11/04/20 23:33:25 INFO security.Groups: Group mapping impl=org.apache.hadoop.sec urity.ShellBasedUnixGroupsMapping; cacheTimeout=300000
11/04/20 23:33:25 WARN fs.FileSystem: "localhost:9000" is a deprecated filesystem name. Use "hdfs://localhost:9000/" instead.
11/04/20 23:33:25 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
11/04/20 23:33:25 WARN fs.FileSystem: "localhost:9000" is a deprecated filesystem name. Use "hdfs://localhost:9000/" instead.
11/04/20 23:33:25 WARN fs.FileSystem: "localhost:9000" is a deprecated filesystem name. Use "hdfs://localhost:9000/" instead.
解决方法:
mapred-site.xml配置中使用hdfs://localhost:9000,而不使用localhost:9000的配置
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:9001</value>
</property>
hadoop无法正常启动(5)
异常五、no namenode to stop 问题的解决:
异常信息如下:11/04/20 21:48:50 INFO ipc.Client: Retrying connect to server: localhost/127.0.0 .1:9000. Already tried 0 time(s).
11/04/20 21:48:51 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
11/04/20 21:48:52 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).
11/04/20 21:48:53 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).
11/04/20 21:48:54 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).
11/04/20 21:48:55 INFO ipc.Client: Retrying connect to server: localhost/127.0.0 .1:9000. Already tried 5 time(s).
11/04/20 21:48:56 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).
11/04/20 21:48:57 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).
11/04/20 21:48:58 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).
解决方法:
这个问题是由namenode没有启动起来引起的,为什么no namenode to stop,可能之前的一些数据对namenode有影响,
你需要执行:
$ bin/hadoop namenode -format //如果还是不成功,试一下sudo
然后
$bin/hadoop start-all.sh
hadoop无法正常启动(6)
异常五、no datanode to stop 问题的解决:
有时数据结构出现问题会产生无法启动datanode的问题。
然后用 hadoop namenode -format 重新格式化后仍然无效,/tmp中的文件并没有清楚。
其实还需要清除/tmp/hadoop*里的文件。
执行步骤:
一、先删除hadoop:///tmp
hadoop fs -rmr /tmp
二、停止 hadoop
stop-all.sh
三、删除/tmp/hadoop*
rm -rf /tmp/hadoop*
四、格式化hadoop
hadoop namenode -format
五、启动hadoop
start-all.sh
之后即可解决这个datanode没法启动的问题
异常六、name node in safe mode
org.apache.hadoop.dfs.SafeModeException: Cannot delete /user/hadoop/input. Name node is in safe mode
在分布式文件系统启动的时候,开始的时候会有安全模式,当分布式文件系统处于安全模式的情况下,文件系统中的内容不允许修改也不允许删除,直到安全模式结束。安全模式主要是为了系统启动的时候检查各个DataNode上数据块的有效性,同时根据策略必要的复制或者删除部分数据块。运行期通过命令也可以进入安全模式。在实践过程中,系统启动的时候去修改和删除文件也会有安全模式不允许修改的出错提示,只需要等待一会儿即可。
解决方法:bin/hadoop dfsadmin -safemode leave
也就是关闭Hadoop的安全模式,就可以了。不行试一下sudo。