pig的 Replicated Join 失败

原创 2015年07月07日 17:11:23

在使用pig的特殊join是报了类似下边的错误‘发现这是pig的bug地址为  https://issues.apache.org/jira/browse/PIG-3725  


错误信息

 Join_6, MergeJoin_5, Join_8, Join_7, MergeJoin_2, MergeJoin_3, MergeJoin_8, MergeJoin_1, MultiQuery_14, MergeJoin_4, MergeJoin_9, MergeJoin_6, MergeJoin_7.

In these tests, Pig need to read a local file distributed by distribute cache. However, Pig try to read hdfs instead. Here is the stack:

org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load function.
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:263)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.setUpHashMap(POFRJoin.java:398)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.getNextTuple(POFRJoin.java:231)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load function.
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNextTuple(POLoad.java:127)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:281)
... 14 more
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://hor14n23.gq1.ygridcore.net:8020/user/hrt_qa/pigrepl_scope-75_831941592_1390506968802_1
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:285)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:340)
at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:190)
at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:146)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.setUp(POLoad.java:95)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNextTuple(POLoad.java:123)
... 15 more

解决办法 

Index: src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java
===================================================================
--- src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java	(revision 1561195)
+++ src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java	(working copy)
@@ -28,6 +28,7 @@
 
 import org.apache.hadoop.conf.Configuration;
 import org.apache.pig.ExecType;
+import org.apache.pig.backend.hadoop.executionengine.HExecutionEngine;
 import org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce;
 import org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil;
 
@@ -94,7 +95,8 @@
             }
         }
         Properties props = ConfigurationUtil.toProperties(localConf);
-        props.setProperty(MapRedUtil.FILE_SYSTEM_NAME, "file:///");
+        props.setProperty(HExecutionEngine.FILE_SYSTEM_LOCATION, "file:///");
+        props.setProperty(HExecutionEngine.ALTERNATIVE_FILE_SYSTEM_LOCATION, "file:///");
         return props;
     }
 }
Index: src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java
===================================================================
--- src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java	(revision 1561195)
+++ src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java	(working copy)
@@ -68,8 +68,8 @@
 public class HExecutionEngine {
     
     public static final String JOB_TRACKER_LOCATION = "mapred.job.tracker";
-    private static final String FILE_SYSTEM_LOCATION = "fs.default.name";
-    private static final String ALTERNATIVE_FILE_SYSTEM_LOCATION = "fs.defaultFS";
+    public static final String FILE_SYSTEM_LOCATION = "fs.default.name";
+    public static final String ALTERNATIVE_FILE_SYSTEM_LOCATION = "fs.defaultFS";
     
     private static final String HADOOP_SITE = "hadoop-site.xml";
     private static final String CORE_SITE = "core-site.xml";




版权声明:本文为博主原创文章,未经博主允许不得转载。

Hadoop MapReduce进阶 使用分布式缓存进行replicated join

概念: reduce-side join技术是灵活的,但是有时候它仍然会变得效率极低。由于join直到reduce()阶段才会开始,我们将会在网络中传递shuffle所有数据,而在大多数情况下,我们...

Hadoop MapReduce进阶 使用分布式缓存进行replicated join

概念: reduce-side join技术是灵活的,但是有时候它仍然会变得效率极低。由于join直到reduce()阶段才会开始,我们将会在网络中传递shuffle所有数据,而在大多数情况下,我们...

Hadoop MapReduce进阶 使用分布式缓存进行replicated join

概念: reduce-side join技术是灵活的,但是有时候它仍然会变得效率极低。由于join直到reduce()阶段才会开始,我们将会在网络中传递shuffle所有数据,而在大多数情况下,...
  • nysyxxg
  • nysyxxg
  • 2014年06月28日 22:45
  • 354

Hadoop MapReduce进阶 使用分布式缓存进行replicated join

概念: reduce-side join技术是灵活的,但是有时候它仍然会变得效率极低。由于join直到reduce()阶段才会开始,我们将会在网络中传递shuffle所有数据,而在大多数情况下,我们...

Pig 学习之 Join 、Group、sort、Union

joinA:(2,Tie) (4,Coat) (3,Hat) (1,Scarf) B:(Joe,2) (Hank,4) (Ali,0) (Eve,3) (Hank,2) inner joinA = L...

翻译 CRUSH: Controlled, Scalable,Decentralized Placement of Replicated Data

CRUSH: Controlled, Scalable,Decentralized Placement of Replicated DataCEPH 的 CRUSH 算法原理概要新兴的的大规模分布式存...

hadoop异常“could only be replicated to 0 nodes, instead of 1” 解决

异常分析  1、“could only be replicated to 0 nodes, instead of 1”异常 (1)异常描述 上面配置都正确无误,并且,已经完成了如下运...

hadoop "File /user/<user>/input/conf/slaves could only be replicated to 0 nodes, instead of 1"问题及解决办

本文地址:http://blog.csdn.net/kongxx/article/details/6892675 安装hadoop的官方文档安装后,在伪分布式模式下运行 bin/hadoo...
  • kongxx
  • kongxx
  • 2011年10月21日 08:20
  • 6101

Hadoop报错“could only be replicated to 0 nodes, instead of 1” .

Hadoop报错“could only be replicated to 0 nodes, instead of 1” root@scutshuxue-desktop:/home/root/hado...

zookeeper replicated模式

standalone适合开发,因为是单机,因此不能作为产品环境使用。
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:pig的 Replicated Join 失败
举报原因:
原因补充:

(最多只允许输入30个字)