Python MRJob Hadoop中报错解决思路

1)在Hadoop中跑一个Python MRJob脚本报以下错误

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2019-01-14 13:00:53,010 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: PipeMapR
ed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2)跑Python MRJob脚本报错一般都跟python的安装环境和库有关。

3)使用下面的命令导出log看下是哪里错误。

yarn  logs  -applicationId   application_1545890266346_0066  > yarn.log

4)log信息如下

Container: container_1545890266346_0066_01_000007 on CDH2_55798
=================================================================
LogType:stderr
Log Upload Time:Tue Jan 15 10:03:24 +0800 2019
LogLength:242
Log Contents:
+ __mrjob_PWD=/HDFS/yarn/local/usercache/hdfs/appcache/application_1545890266346_0066/container_1545890266346_0066_01_000007
+ exec
+ python -c 'import fcntl; fcntl.flock(9, fcntl.LOCK_EX)'
setup-wrapper.sh: line 6: python: command not found

指示没有找到python命令

5)解决方法如下:

首先要在Hadoop集群 中安装mrjob库,pip install mrjob

1.py脚本中配置

    #!/usr/lib/python
    # encoding:utf-8

 2.把脚本加执行权限

   chmod +x mrjob.py

  3.执行脚本看下

      ./mrtop.py -r hadoop hdfs:///tmp/wordcount/data1

 4.执行成功

job output is in hdfs:///user/root/tmp/mrjob/mrtop.root.20190115.025644.808237/output
Streaming final output from hdfs:///user/root/tmp/mrjob/mrtop.root.20190115.025644.808237/output...
"xiaojun"       2
"python"        2
Removing HDFS temp directory hdfs:///user/root/tmp/mrjob/mrtop.root.20190115.025644.808237...
Removing temp directory /tmp/mrtop.root.20190115.025644.808237... 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值