hadoop的"mapred.ReduceTask: java.net.ConnectException: Connection timed out"问题解决

  集群某节点91有故障发生,出现

2013-11-08 08:32:13,908 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201311061017_18902_r_000000_0 copy failed: attempt_201311061017_18902_m_000003_0 from node-192
2013-11-08 08:32:13,921 WARN org.apache.hadoop.mapred.ReduceTask: java.net.ConnectException: Connection timed out
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
	at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
	at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
	at java.net.SocksSocketImpl.connect(Unknown Source)
	at java.net.Socket.connect(Unknown Source)
	at sun.net.NetworkClient.doConnect(Unknown Source)
	at sun.net.www.http.HttpClient.openServer(Unknown Source)
	at sun.net.www.http.HttpClient.openServer(Unknown Source)
	at sun.net.www.http.HttpClient.<init>(Unknown Source)
	at sun.net.www.http.HttpClient.New(Unknown Source)
	at sun.net.www.http.HttpClient.New(Unknown Source)
	at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(Unknown Source)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source)
	at sun.net.www.protocol.http.HttpURLConnection.connect(Unknown Source)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1631)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.setupSecureConnection(ReduceTask.java:1588)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1488)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1399)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1331)

分析hadoop代码:

    localFs = FileSystem.getLocal(fConf);
        if (fConf.get("slave.host.name") != null) {
          this.localHostname = fConf.get("slave.host.name");
        }
        if (localHostname == null) {
          this.localHostname =
          DNS.getDefaultHost
          (fConf.get("mapred.tasktracker.dns.interface","default"),
           fConf.get("mapred.tasktracker.dns.nameserver","default"));
        }


在该节点ping 下这个hostname:

ping node-191
PING node-128-191.localhost (220.250.64.228) 56(84) bytes of data.
64 bytes from 220.250.64.228: icmp_seq=1 ttl=247 time=14.8 ms
64 bytes from 220.250.64.228: icmp_seq=2 ttl=247 time=14.3 ms
64 bytes from 220.250.64.228: icmp_seq=3 ttl=247 time=14.4 ms

发现压根不是191的ip。

到该节点的hosts里查看,也没有配置191的hostname。

问题得解。

将191的hostname添加到集群所有节点的hosts上。重启tasktracker搞定。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值