reduce端缓存数据过多出现FGC,导致reduce生成的数据无法写到hdfs

某搜索推荐job在执行过程中,由于最后一个reduce任务的JVM老年代耗尽,频繁执行FGC,造成服务暂停,进而导致dfsclient超时无法写入HDFS。异常表现为SocketTimeoutException,分析可能原因包括:DN线程耗尽、网络带宽问题、dfsclient故障(本次因reduce OOM)。排除带宽问题后,确认是reduce内存溢出引起的dfsclient挂起。
摘要由CSDN通过智能技术生成

     搜索推荐有一个job,1000多个map,200个reduce,运行到最后只剩一个reduce(10.39.6.130上)的时候,出现以下异常,导致job失败:

2014-12-04 15:49:04,297 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 12 segments left of total size: 11503294914 bytes
2014-12-04 15:49:04,314 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2014-12-04 15:49:04,394 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.lzo_deflate]
2014-12-04 16:02:26,889 WARN [ResponseProcessor for block BP-1386326728-10.39.2.131-1382089338395:blk_1394149732_320469086] org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-1386326728-10.39.2.131-1382089338395:blk_1394149732_320469086
java.io.IOException: Bad response ERROR for block BP-1386326728-10.39.2.131-1382089338395:blk_1394149732_320469086 from datanode 10.39.5.193:50010
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:819)
2014-12-04 16:02:26,889 WARN [ResponseProcessor for block BP-1386326728-10.39.2.131-1382089338395:blk_1394153869_320473223] org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-1386326728-10.39.2.131-1382089338395:blk_1394153869_320473223
java.io.IOException: Bad response ERROR for block BP-1386326728-10.39.2.131-1382089338395:blk_1394153869_320473223 from datanode 10.39.1.90:50010
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:819)
2014-12-04 16:02:26,891 WARN [DataStreamer for file /dw_ext/recmd/mds6/mds_filter_relation_10/20141203/_temporary/1/_temporary/attempt_1415948652989_195149_r_000158_3/user-r-00158 block BP-1386326728-10.39.2.131-1382089338395:blk_1394149732_320469086] org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-1386326728-10.39.2.131-1382089338395:blk_1394149732_320469086 in pipeline 10.39.6.130:50010, 10.39.5.185:50010, 10.39.5.193:50010: bad datanode 10.39.5.193:50010
2014-12-04 16:02:26,891 WARN [DataStreamer for file /dw_ext/recmd/mds6/mds_filter_relation_10/20141203/_temporary/1/_temporary/attempt_1415948652989_195149_r_000158_3/exposure-r-00158 block BP-1386326728-10.39.2.131-1382089338395:blk_1394153869_320473223] org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-1386326728-10.39.2.131-1382089338395:blk_1394153869_320473223 in pipeline 10.39.6.130:50010, 10.39.1.89:50010, 10.39.1.90:50010: bad datanode 10.39.1.90:50010
java.io.EOFException: Premature EOF: no length prefix available
	at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1987)
	at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:796)

2014-12-04 16:05:23,743 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Arrays.java:2734)
	at java.util.Vector.ensureCapacityHelper(Vector.java:226)
	at java.util.Vector.add(Vector.java:728)
	at rec.CommonUtil.pack_Treeset(CommonUtil.java:395)
	at rec.ConvertExposure10$MyReducer.collect_exposure(ConvertExposure10.java:259)
	at rec.ConvertExposure10$MyReducer.reduce(ConvertExposure10.java:329)
	at rec.ConvertExposure10$MyReducer.reduce(ConvertExposure10.java:234)
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1550)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
从异常上看,首先是reduce在往hdfs写数据时,发现建pipeline时,没有收到pipeline上最后一个节点的回应:
2014-12-04 16:02:26,889 WARN [ResponseProcessor for block BP-1386326728-10.39.2.131-1382089338395:blk_1394149732_320469086] org.apache.hadoop.hdfs.DF
  • 2
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值