1、背景描述
为了保证接入hadoop集群datanode和tasktracker的可信,增加集群安全,增加如下配置
a、在hdfs-site.xml中增加datanode许可列表
<!-- security -->
<property>
<name>dfs.hosts</name>
<value>/data0/hadoop/hosts/include</value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/data0/hadoop/hosts/exclude</value>
</property>
b、在mapred-site.xml中增加task节点许可列表
<!-- security -->
<property>
<name>mapred.hosts</name>
<value>/data0/hadoop/hosts/include</value>
</property>
<property>
<name>mapred.hosts.exclude</name>
<value>/data0/hadoop/hosts/exclude</value>
</property>
c、在/data0/hadoop/hosts/include文件增加许可机器ip
10.1.55.60
10.1.55.61
10.1.55.62
2、问题
重启namenode正常,某天重启jobtracker,监控页面显示node数为0,节点上tasktracker进程全部退出,重启也失败,log中显示如下信息
2012-10-11 09:38:59,335 INFO org.apache.hadoop.mapred.TaskTracker:
Tasktracker disallowed by JobTracker.
2012-10-11 09:38:59,373 INFO org.apache.hadoop.util.AsyncDiskService: Shutting down all AsyncDiskService threads...
2012-10-11 09:38:59,376 INFO org.apache.hadoop.util.AsyncDiskService: All AsyncDiskService threads are terminated.
2012-10-11 09:38:59,376 INFO org.apache.hadoop.util.MRAsyncDiskService: Deleting toBeDeleted directory.
2012-10-11 09:38:59,377 INFO org.apache.hadoop.mapred.TaskTracker: Shutting down: Map-events fetcher for all reduce tasks ......
2012-10-11 09:38:59,377 INFO org.apache.hadoop.ipc.Server: Stopping server on 55338
2012-10-11 09:38:59,377 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 55338: exiting
2012-10-11 09:38:59,377 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 55338: exiting
2012-10-11 09:38:59,377 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 55338: exiting
2012-10-11 09:38:59,377 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 55338: exiting
2012-10-11 09:38:59,377 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 55338
2012-10-11 09:38:59,378 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2012-10-11 09:38:59,378 INFO org.apache.hadoop.mapred.TaskTracker: Shutting down StatusHttpServer
2012-10-11 09:38:59,378 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:50060
2012-10-11 09:38:59,482 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG:
3、分析问题
从log
Tasktracker disallowed by JobTracker.可以看出是Jobtracker不允许Tasktracker连接
include中已经配置了,为什么不允许连接,datanode和namenode的连接正常。
4、解决问题
google关键字"
Tasktracker disallowed by JobTracker"
原因参考:http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201003.mbox/%3CC7D682B6.3E08%25awittenauer@linkedin.com%3E
Jobtracker在验证tasktracker时用的是域名而非ip
修改/data0/hadoop/hosts/include为:
10.1.55.60
10.1.55.61
10.1.55.62
10.1.55.63
hadoop001
hadoop002
hadoop003
重启jobtracker,启动tasktracker问题解决。