[root@localhost local_input]# hadoop jar MyWordCount.jar MyWordCount input output4
11/12/07 16:03:29 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
11/12/07 16:03:30 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
11/12/07 16:03:30 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/12/07 16:03:30 INFO input.FileInputFormat: Total input paths to process : 7
11/12/07 16:03:30 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
11/12/07 16:03:30 INFO mapreduce.JobSubmitter: number of splits:7
11/12/07 16:03:30 INFO mapreduce.JobSubmitter: adding the following namenodes' delegation tokens:null
11/12/07 16:03:30 INFO mapreduce.Job: Running job: job_201112071307_0003
11/12/07 16:03:31 INFO mapreduce.Job: map 0% reduce 0%
11/12/07 16:03:45 INFO mapreduce.Job: map 28% reduce 0%
11/12/07 16:03:46 INFO mapreduce.Job: map 57% reduce 0%
11/12/07 16:03:52 INFO mapreduce.Job: map 71% reduce 0%
11/12/07 16:03:54 INFO mapreduce.Job: map 71% reduce 9%
11/12/07 16:03:57 INFO mapreduce.Job: map 100% reduce 9%
11/12/07 16:04:35 INFO mapreduce.Job: Task Id : attempt_201112071307_0003_m_000003_0, Status : FAILED
Too many fetch-failures
11/12/07 16:04:35 WARN mapreduce.Job: Error reading task outputConnection refused
11/12/07 16:04:35 WARN mapreduce.Job: Error reading task outputConnection refused
11/12/07 16:04:39 INFO mapreduce.Job: map 85% reduce 9%
11/12/07 16:04:45 INFO mapreduce.Job: map 100% reduce 9%
11/12/07 16:05:00 INFO mapreduce.Job: map 100% reduce 23%
11/12/07 16:05:14 INFO mapreduce.Job: Task Id : attempt_201112071307_0003_m_000002_0, Status : FAILED
Too many fetch-failures
11/12/07 16:05:14 WARN mapreduce.Job: Error reading task outputConnection refused
11/12/07 16:05:14 WARN mapreduce.Job: Error reading task outputConnection refused
11/12/07 16:05:18 INFO mapreduce.Job: map 85% reduce 23%
11/12/07 16:05:24 INFO mapreduce.Job: map 100% reduce 23%
问题原因:
Reduce task启动后第一个阶段是shuffle,即向map端fetch数据。每次fetch都可能因为connect超时,read超时,checksum错误等原因而失败。Reduce task为每个map设置了一个计数器,用以记录fetch该map输出时失败的次数。当失败次数达到一定阈值时,会通知JobTracker fetch该map输出操作失败次数太多了,并打印如下log:
Failedto fetch map-output from attempt_201105261254_102769_m_001802_0 evenafter MAX_FETCH_RETRIES_PER_MAP retries... reporting to the JobTracker
其中阈值计算方式为:
max(MIN_FETCH_RETRIES_PER_MAP,
getClosestPowerOf2((this.maxBackoff * 1000 / BACKOFF_INIT) + 1));
默认情况下MIN_FETCH_RETRIES_PER_MAP=2maxBackoff=300 BACKOFF_INIT=4000,因此默认阈值为6,可通过修改mapred.reduce.copy.backoff参数来调整。
当达到阈值后,Reduce task通过umbilical协议告诉TaskTracker,TaskTracker在下一次heartbeat时,通知JobTracker。当JobTracker发现超过50%的Reduce汇报fetch某个map的输出多次失败后,JobTracker会failed掉该map并重新调度,打印如下log:
"Too many fetch-failures for output of task: attempt_201105261254_102769_m_001802_0 ... killing it"
解决办法:
暂无