1.datanode 连接namenote出现超时
datanode.log的报错:
2014-02-2416:10:36,194 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool<registering> (storage id unknown) service to node2/10.103.243.23:9010starting to offer service
2014-02-2416:10:36,230 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2014-02-2416:10:36,234 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020:starting
2014-02-2416:10:37,373 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 0 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-02-2416:10:38,374 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 1 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-02-2416:10:39,375 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 2 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-02-2416:10:40,376 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 3 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-02-2416:10:41,377 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 4 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-02-2416:10:42,379 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 5 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-02-2416:10:43,380 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 6 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-02-2416:10:44,381 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 7 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-02-2416:10:45,383 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 8 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-02-2416:10:46,384 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:9010. Already tried 9 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-02-2416:10:46,390 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problemconnecting to server: node2/10.103.243.23:9010
Nodemanager.log报错:
2014-02-2416:10:55,925 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /node startedat 8042
2014-02-2416:10:56,678 INFO org.apache.hadoop.yarn.webapp.WebApps: Registered webappguice modules
2014-02-2416:10:56,734 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting toResourceManager at node2/10.103.243.23:8031
2014-02-2416:10:57,788 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 0 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-02-2416:10:58,789 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 1 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-02-2416:10:59,790 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,sleepTime=1 SECONDS)
2014-02-2416:11:00,791 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 3 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-02-2416:11:01,792 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 4 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-02-2416:11:02,793 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 5 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-02-2416:11:03,794 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 6 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-02-2416:11:04,796 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 7 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-02-2416:11:05,797 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 8 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-02-2416:11:06,798 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 9 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-02-2416:11:37,810 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:node2/10.103.243.23:8031. Already tried 0 time(s); retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
搜索网上,相类似的报错也是很多,但是没有解决,ssh都能免密钥登录,配置文件都检查过没有问题,很是奇怪。最后问题结果就出在了hosts文件上,需要主节点的hosts文件中的主机名对应的ip改成实际的ip地址,不能为127.0.0.1。于是一鼓作气,把节点自己主机名对应的ip全改成了真实ip,集群成功启动。
2.datanode启动出现问题:
日志报错:
2006-01-01 23:19:23,737 FATALorg.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed forblock pool Block pool BP-1873323460-127.0.1.1-1393249411040 (storage idDS-738355263-127.0.0.1-50010-1393191985983) service to node2/10.103.243.23:9010
java.io.IOException: IncompatibleclusterIDs in /home/tseg637/hadoop-2.2.0/dfsdata/data: namenode clusterID =CID-ea0e9701-33a3-4f77-999f-a0da13b502f1; datanode clusterID =CID-9b4729d5-bb23-4609-9a85-8e6047efd956
atorg.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:391)
atorg.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:191)
atorg.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)
atorg.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:837)
atorg.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:808)
atorg.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)
atorg.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)
atorg.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
atjava.lang.Thread.run(Thread.java:662)
网上搜索得到文件的原因:每次hadoop namenode format会重新创建一个namenodeId,而/home/tseg637/hadoop-2.2.0/dfsdata/data下包含了上次format下的id,namenode format清空了name下的数据,但是没有清空data下的数据,导致启动时失败,所要做的就是每次fotmat前,清空/home/tseg637/hadoop-2.2.0/dfsdata/data的所有内容,这样datanode启动就会成功。
3.运行hadoop2.2.0自带的wordcount程序报错
运行自带的示例hadoop-mapreduce-examples-2.2.0.jar报错:
14/02/24 15:27:36 INFO mapreduce.Job: Jobjob_1393225741554_0003 failed with sta te FAILED due to: Application application_1393225741554_0003 failed 2times due to Error launchingappattempt_1393225741554_0003_000002. Got exception: org.apac he.hadoop.yarn.exceptions.YarnException:Unauthorized request to start container .
This token is expired. current time is1393313243534 found 1393227455640
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct orAccessorImpl.java:39)
atsun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC onstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
atorg.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl. instantiateException(SerializedExceptionPBImpl.java:152)
atorg.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl. deSerialize(SerializedExceptionPBImpl.java:106)
atorg.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.l aunch(AMLauncher.java:122)
atorg.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.r un(AMLauncher.java:249)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec utor.java:895)
atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:918)
at java.lang.Thread.run(Thread.java:662)
. Failing the application.
这个问题,后来发现是不同的node之前的系统时间不同,一个节点的日期错了…… 让时间基本相同利用date –s 设置时间或者运行ntpdate time-a.nist.gov 进行同步时间,时间同步后就可以了。