常见计算任务失败原因
1) 子进程返回错误1
java.lang.RuntimeException: PipeMapRed?.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed?.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed?.java:544)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper?.java:153)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner?.java:57)
at org.apache.hadoop.mapred.MapTask.run(MapTask?.java:332)
at org.apache.hadoop.mapred.Child.main(Child.java:174)
原因定位:map或reduce返回1,mapreduce框架会将应用程序的返回值收集,subprocess failed with code 1表示程序返回的就是1。详细出错原因需rd自己结合Task Logs和程序源码定位
2)子进程返回错误137
java.lang.RuntimeException: PipeMapRed?.waitOutputThreads(): subprocess failed with code 137
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed?.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed?.java:568)
at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer?.java:130)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask?.java:430)
at org.apache.hadoop.mapred.Child.main(Child.java:174)
原因定位:map或reduce程序超出平台内存限制被limit杀掉,一般的平台都会有一个默认内存限制,例如配置内存限制为800MB(137-128=9, 对应信号为SIGKILL)发生这种情况后一般登录到这台计算节点上看dmesg都能看到类似: killer: killing process….提示 ps:在新的haoop版本中,如果单个程序运行占用cpu的时间超过12小时,也会被kill返回137.
确定是内存超限制而被杀掉的话可以通过参数stream.memory.limit来指定一个更大的内存限制。
3)子进程返回错误139
出core了
4)子进程返回错误141
java.lang.RuntimeException: PipeMapRed?.waitOutputThreads(): subprocess failed with code 141
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed?.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed?.java:568)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper?.java:158)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner?.java:57)
at org.apache.hadoop.mapred.MapTask.run(MapTask?.java:332)
at org.apache.hadoop.mapred.Child.main(Child.java:174)
原因定位:map或reduce异常退出,平台继续向管道推送数据,因管道异常出错(141-128=13, 信号13代表着SIGPIPE错误,即管道错误)根本原因还是程序异常退出导致。详细出错原因需rd自己结合Task Logs和程序源码定位。
5)子进程返回错误255
PipeMapRed?.waitOutputThreads(): subprocess failed with code 255
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed?.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed?.java:568)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper?.java:158)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner?.java:57)
at org.apache.hadoop.mapred.MapTask.run(MapTask?.java:332))
at org.apache.hadoop.mapred.Child.main(Child.java:174)
原因定位:map或reduce异常退出返回值-1,详细出错原因需rd自己结合Task Logs和程序源码定位