mapreduce出现大量task被KILLED_UNCLEAN的3个原因

最新推荐文章于 2021-07-26 13:08:40 发布

代立冬

最新推荐文章于 2021-07-26 13:08:40 发布

阅读量3.6k

点赞数

分类专栏： ● Hadoop --------【Hadoop线上异常】文章标签： KILLED_UNCLEAN

本文链接：https://blog.csdn.net/oDaiLiDong/article/details/47447633

版权

在Hadoop集群中，某些作业运行时间过长，因为数千个任务由于不明原因被频繁杀死。例如，一个作业运行了12小时20分钟，产生了约13,000个任务，但只有4,118个映射任务成功完成，8,708个被杀死，而只有1个任务失败。侦探们发现被杀任务属于资源密集型的adhoc Hive查询，任务通常在启动后的6-16分钟内被年轻地杀死。" 128951903,16783902,Python XML解析详解：ElementTree与Minidom模块,"['Python', 'XML处理', '开发语言']

摘要由CSDN通过智能技术生成

Request received to kill task 'attempt_201411191723_2827635_r_000009_0' by user ------- Task has been KILLED_UNCLEAN by the user 原因如下： 1.An impatient user (armed with "mapred job -kill-task" command) 2.JobTracker (to kill a speculative duplicate, or when a whole job fails) 3.Fair Scheduler (but diplomatically, it calls it “preemption”)

一篇老外的文章说的更详细：

This is one of the most bloodcurling (and my favorites) stories, that we have recently seen in our 190-square-meter Hadoopland. In a nutshell, some jobs were surprisingly running extremely long, because thousands of their tasks were constantly being killed for some unknown reasons by someone (or something).

For example, a photo, taken by our detectives, shows a job running for 12hrs:20min that spawned around 13,000 tasks until that moment. However (only) 4,118 of map tasks had finished successfully, while 8,708 were killed (!) and … surprisingly only 1 task failed (?) – obviously spreading panic in the Hadoopland.