hadoop数据节点通信异常【启动hadoop集群遇到错误org.apache.hadoop.ipc.Client: Retrying connect to server】

最新推荐文章于 2023-02-22 08:00:00 发布

buster2014

最新推荐文章于 2023-02-22 08:00:00 发布

阅读量4k

点赞数 1

分类专栏： hadoop yun

hadoop 同时被 2 个专栏收录

59 篇文章 0 订阅

订阅专栏

yun

42 篇文章 0 订阅

订阅专栏

文章来源：http://anyoneking.com/archives/594

hadoop数据节点通信异常

在前几天，我们的hadoop集群很不稳定。经常会有1个数据节点挂掉。使用jps查看，tasktracker和datanode均正常，没有crash掉。查看日志：
org.apache.hadoop.ipc.Client: Retrying connect to server: xxxxx/192.168.0.xxxx:9001. Already tried 9 time(s).
org.apache.hadoop.ipc.Client: Retrying connect to server: xxxxx/192.168.0.xxxx:9001. Already tried 8 time(s).
org.apache.hadoop.ipc.Client: Retrying connect to server: xxxxx/192.168.0.xxxx:9001. Already tried 7 time(s).
org.apache.hadoop.ipc.Client: Retrying connect to server: xxxxx/192.168.0.xxxx:9001. Already tried 6 time(s).
org.apache.hadoop.ipc.Client: Retrying connect to server: xxxxx/192.168.0.xxxx:9001. Already tried 5 time(s).
就是与namenode无法正常通信。
从集群的角度看，最近没有做过任何修改。
先尝试通过hadoop-daemon.sh stop datanode ,hadoop-daemon.sh stop tasktracker停止数据节点。
然后尝试通过hadoop-daemon.sh start datanode ,hadoop-daemon.sh start tasktracker启动数据节点。
均正常，无错误信息。
但是在运行一段时间，或者跑一两个MR程序后，该数据几点所在服务器负载开始暴增。
然后无法与namenode正常通信。
一个一个排查吧。
节点信息配置，HDFS信息，均无异常。在排查服务器配置的时候，看到了一个很奇怪的东西。
在etc/hosts中被增加了一个配置，如下：
127.0.1.1 xxxxxx
127.0.1.1是debian中的本地回环。这个造成了hadoop解析出现问题。而且此项配置不知道是谁增加上的。
在屏蔽该项后，问题依然存在，只能重启该服务器。重启后一切正常。

由此可看，保持hadoop集群的环境清洁是很重要的。而且这对于我以后针对hadoop集群异常检查增加了不少经验。hadoop集群的配置一般不会有很大的变动，hadoop对服务器环境的依赖较大，从服务器环境是否变化来排查问题是一个不错的解决方式，mark一下。

本文固定链接: http://anyoneking.com/archives/603 | 懒散狂徒的博客
标签: error, hadoop, Retrying connect to server

buster2014

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
hadoop数据节点通信异常【启动hadoop集群遇到错误org.apache.hadoop.ipc.Client: Retrying connect to server】

文章来源：http://anyoneking.com/archives/594hadoop数据节点通信异常2013-04-21 16:37 | By: 懒散狂徒 | Hadoop | 13,898 views | 抢沙发 | 在前几天，我们的hadoop集群很不稳定。经常会有1个数据节点挂掉。使用jps查看，tasktracker和datanode均正
复制链接

扫一扫

专栏目录