python123.io作业_python – Broken Pipe Error导致AWS上的流式弹性MapReduce作业失败

当我这样做时,一切都在当地正常工作:

cat input | python mapper.py | sort | python reducer.py

但是,当我在AWS Elastic Mapreduce上运行流式MapReduce作业时,作业无法成功完成. mapper.py在一定程度上运行(我知道这是因为沿途写入stderr).映射器被“Broken Pipe”错误中断,我可以在失败后从任务尝试的syslog中检索到该错误:

java.io.IOException: Broken pipe

at java.io.FileOutputStream.writeBytes(Native Method)

at java.io.FileOutputStream.write(FileOutputStream.java:282)

at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)

at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)

at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)

at java.io.DataOutputStream.write(DataOutputStream.java:90)

at org.apache.hadoop.streaming.io.TextInputWriter.writeUTF8(TextInputWriter.java:72)

at org.apache.hadoop.streaming.io.TextInputWriter.writeValue(TextInputWriter.java:51)

at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:109)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)

at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:441)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)

at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)

at org.apache.hadoop.mapred.Child.main(Child.java:249)

2012-03-26 07:19:05,400 WARN org.apache.hadoop.streaming.PipeMapRed (main): java.io.IOException: Broken pipe

at java.io.FileOutputStream.writeBytes(Native Method)

at java.io.FileOutputStream.write(FileOutputStream.java:282)

at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)

at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)

at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)

at java.io.DataOutputStream.flush(DataOutputStream.java:106)

at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:579)

at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:124)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)

at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:441)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)

at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)

at org.apache.hadoop.mapred.Child.main(Child.java:249)

2012-03-26 07:19:05,400 INFO org.apache.hadoop.streaming.PipeMapRed (main): mapRedFinished

2012-03-26 07:19:05,400 WARN org.apache.hadoop.streaming.PipeMapRed (main): java.io.IOException: Bad file descriptor

at java.io.FileOutputStream.writeBytes(Native Method)

at java.io.FileOutputStream.write(FileOutputStream.java:282)

at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)

at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)

at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)

at java.io.DataOutputStream.flush(DataOutputStream.java:106)

at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:579)

at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:135)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)

at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:441)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)

at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)

at org.apache.hadoop.mapred.Child.main(Child.java:249)

2012-03-26 07:19:05,400 INFO org.apache.hadoop.streaming.PipeMapRed (main): mapRedFinished

2012-03-26 07:19:05,405 INFO org.apache.hadoop.streaming.PipeMapRed (Thread-13): MRErrorThread done

2012-03-26 07:19:05,408 INFO org.apache.hadoop.mapred.TaskLogsTruncater (main): Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1

2012-03-26 07:19:05,519 INFO org.apache.hadoop.io.nativeio.NativeIO (main): Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.

2012-03-26 07:19:05,520 INFO org.apache.hadoop.io.nativeio.NativeIO (main): Got UserName hadoop for UID 106 from the native implementation

2012-03-26 07:19:05,522 WARN org.apache.hadoop.mapred.Child (main): Error running child

java.io.IOException: log:null

R/W/S=7018/3/0 in:NA [rec/s] out:NA [rec/s]

minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null

HOST=null

USER=hadoop

HADOOP_USER=null

last Hadoop input: |null|

last tool output: |text/html 1|

Date: Mon Mar 26 07:19:05 UTC 2012

java.io.IOException: Broken pipe

at java.io.FileOutputStream.writeBytes(Native Method)

at java.io.FileOutputStream.write(FileOutputStream.java:282)

at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)

at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)

at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)

at java.io.DataOutputStream.write(DataOutputStream.java:90)

at org.apache.hadoop.streaming.io.TextInputWriter.writeUTF8(TextInputWriter.java:72)

at org.apache.hadoop.streaming.io.TextInputWriter.writeValue(TextInputWriter.java:51)

at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:109)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)

at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:441)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)

at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)

at org.apache.hadoop.mapred.Child.main(Child.java:249)

at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:125)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)

at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:441)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)

at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)

at org.apache.hadoop.mapred.Child.main(Child.java:249)

2012-03-26 07:19:05,525 INFO org.apache.hadoop.mapred.Task (main): Runnning cleanup for the task

2012-03-26 07:19:05,526 INFO org.apache.hadoop.mapred.DirectFileOutputCommitter (main): Nothing to clean up on abort since there are no temporary files written

这是mapper.py.请注意,我写信给stderr为自己提供调试信息:

#!/usr/bin/env python

import sys

from warc import ARCFile

def main():

warc_file = ARCFile(fileobj=sys.stdin)

for web_page in warc_file:

print >> sys.stderr, '%s\t%s' % (web_page.header.content_type, 1) #For debugging

print '%s\t%s' % (web_page.header.content_type, 1)

print >> sys.stderr, 'done' #For debugging

if __name__ == "__main__":

main()

以下是我在stderr中为mapper.py运行时的任务尝试获得的内容:

text/html 1

text/html 1

text/html 1

基本上,循环运行3次然后突然停止而没有python抛出任何错误. (注意:它应该输出数千行).即使是未被捕获的例外也应该出现在stderr中.

因为MapReduce在我的本地计算机上运行完全正常,我的猜测是Hadoop如何处理我从mapper.py打印的输出这是一个问题.但我对问题可能是什么一无所知.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值