pig Explain详解

有如下代码:

<pre name="code" class="java"> b = load '/in_off/tree/20140101/*' as (date,uid);
 c = sample  b 0.01;
 d = limit c 10 ;

分别explain下。

 

<pre name="code" class="php">explain b;

2014-06-10 10:09:50,697 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
b: (Name: LOStore Schema: date#3:bytearray,uid#4:bytearray)
|
|---b: (Name: LOLoad Schema: date#3:bytearray,uid#4:bytearray)RequiredFields:[0, 1]

#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
b: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-1
|
|---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-0

2014-06-10 10:09:50,859 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-06-10 10:09:50,930 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-06-10 10:09:50,930 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
#--------------------------------------------------
# Map Reduce Plan                                  
#--------------------------------------------------
MapReduce node scope-2
Map Plan
b: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-1
|
|---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-0--------
Global sort: false
----------------

首先
#-----------------------------------------------
# New Logical Plan:
逻辑执行如何生成b?
b: (Name: LOStore Schema: date#3:bytearray,uid#4:bytearray)
|
|---b: (Name: LOLoad Schema: date#3:bytearray,uid#4:bytearray)RequiredFields:[0, 1]

然后

#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
b: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-1
|
|---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-0

最后MR如何?

2014-06-10 10:09:50,859 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-06-10 10:09:50,930 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-06-10 10:09:50,930 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
#--------------------------------------------------
# Map Reduce Plan                                  
#--------------------------------------------------
MapReduce node scope-2
Map Plan
b: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-1
|
|---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-0--------
Global sort: false
----------------
看到每没有,load只有map,没有reduce.

再来继续。对b做抽样,去0.001%出来。

<pre name="code" class="java">grunt> c  = sample b 0.001;
2014-06-10 10:10:42,092 [main] WARN  org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_DOUBLE 1 time(s).
grunt> explain c;          
2014-06-10 10:10:46,421 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
看logic plan里,由底到高,看到最下面是b,然后经过操作,最后生成c.

#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
c: (Name: LOStore Schema: date#36:bytearray,uid#37:bytearray)
|
|---c: (Name: LOFilter Schema: date#36:bytearray,uid#37:bytearray)
    |   |
    |   (Name: LessThan Type: boolean Uid: 43)
    |   |
    |   |---(Name: UserFunc(org.apache.pig.builtin.RANDOM) Type: double Uid: 41)
    |   |
    |   |---(Name: Constant Type: double Uid: 42)
    |
    |---b: (Name: LOLoad Schema: date#36:bytearray,uid#37:bytearray)RequiredFields:[0, 1]

Physical plan里注意到RANDOM.
#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
c: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-8
|
|---c: Filter[bag] - scope-4
    |   |
    |   Less Than[boolean] - scope-7
    |   |
    |   |---POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-5
    |   |
    |   |---Constant(0.0010) - scope-6
    |
    |---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-3

2014-06-10 10:10:46,477 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-06-10 10:10:46,481 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-06-10 10:10:46,481 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1

还是没有用到reduce.
#--------------------------------------------------
# Map Reduce Plan                                  
#--------------------------------------------------
MapReduce node scope-9
Map Plan
c: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-8
|
|---c: Filter[bag] - scope-4
    |   |
    |   Less Than[boolean] - scope-7
    |   |
    |   |---POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-5
    |   |
    |   |---Constant(0.0010) - scope-6
    |
    |---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-3--------
Global sort: false
----------------

让我们用下reduce.order 一个。
 
 
<span style="font-family: Arial, Helvetica, sans-serif;">
</span>
<span style="font-family: Arial, Helvetica, sans-serif;"></span><pre name="code" class="java">runt> d = order c by uid;
2014-06-10 10:13:12,689 [main] WARN  org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_DOUBLE 1 time(s).
grunt> explain d;
2014-06-10 10:13:17,037 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
d: (Name: LOStore Schema: date#64:bytearray,uid#65:bytearray)
|
|---d: (Name: LOSort Schema: date#64:bytearray,uid#65:bytearray)
    |   |
    |   uid:(Name: Project Type: bytearray Uid: 65 Input: 0 Column: 1)
    |
    |---c: (Name: LOFilter Schema: date#64:bytearray,uid#65:bytearray)
        |   |
        |   (Name: LessThan Type: boolean Uid: 71)
        |   |
        |   |---(Name: UserFunc(org.apache.pig.builtin.RANDOM) Type: double Uid: 69)
        |   |
        |   |---(Name: Constant Type: double Uid: 70)
        |
        |---b: (Name: LOLoad Schema: date#64:bytearray,uid#65:bytearray)RequiredFields:[0, 1]

#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
d: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-17
|
|---d: POSort[bag]() - scope-16
    |   |
    |   Project[bytearray][1] - scope-15
    |
    |---c: Filter[bag] - scope-11
        |   |
        |   Less Than[boolean] - scope-14
        |   |
        |   |---POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-12
        |   |
        |   |---Constant(0.0010) - scope-13
        |
        |---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-10

2014-06-10 10:13:17,052 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-06-10 10:13:17,089 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
2014-06-10 10:13:17,089 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
#--------------------------------------------------
# Map Reduce Plan                                  
#--------------------------------------------------
MapReduce node scope-18
Map Plan
Store(hdfs://namenode:9000/tmp/temp-1458176514/tmp-990933258:org.apache.pig.impl.io.InterStorage) - scope-19
|
|---c: Filter[bag] - scope-11
    |   |
    |   Less Than[boolean] - scope-14
    |   |
    |   |---POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-12
    |   |
    |   |---Constant(0.0010) - scope-13
    |
    |---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-10--------
Global sort: false
----------------

MapReduce node scope-21
Map Plan
d: Local Rearrange[tuple]{tuple}(false) - scope-25
|   |
|   Constant(all) - scope-24
|
|---New For Each(false)[tuple] - scope-23
    |   |
    |   Project[bytearray][1] - scope-22
    |
    |---Load(hdfs://namenode:9000/tmp/temp-1458176514/tmp-990933258:org.apache.pig.impl.builtin.RandomSampleLoader('org.apache.pig.impl.io.InterStorage','100')) - scope-20--------
Reduce Plan
Store(hdfs://namenode:9000/tmp/temp-1458176514/tmp-67539995:org.apache.pig.impl.io.InterStorage) - scope-34
|
|---New For Each(false)[tuple] - scope-33
    |   |
    |   POUserFunc(org.apache.pig.impl.builtin.FindQuantiles)[tuple] - scope-32
    |   |
    |   |---Project[tuple][*] - scope-31
    |
    |---New For Each(false,false)[tuple] - scope-30
        |   |
        |   Constant(-1) - scope-29
        |   |
        |   Project[bag][1] - scope-27
        |
        |---Package[tuple]{chararray} - scope-26--------
Global sort: false
Secondary sort: true
----------------

MapReduce node scope-36
Map Plan
d: Local Rearrange[tuple]{bytearray}(false) - scope-37
|   |
|   Project[bytearray][1] - scope-15
|
|---Load(hdfs://namenode:9000/tmp/temp-1458176514/tmp-990933258:org.apache.pig.impl.io.InterStorage) - scope-35--------
Reduce Plan
d: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-17
|
|---New For Each(true)[tuple] - scope-40
    |   |
    |   Project[bag][1] - scope-39
    |
    |---PackageLite[tuple]{bytearray} - scope-38--------
Global sort: true

看到没,reduce用到了。
 
 





 


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值