有如下代码:
<pre name="code" class="java"> b = load '/in_off/tree/20140101/*' as (date,uid);
c = sample b 0.01;
d = limit c 10 ;
分别explain下。
<pre name="code" class="php">explain b;
2014-06-10 10:09:50,697 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
b: (Name: LOStore Schema: date#3:bytearray,uid#4:bytearray)
|
|---b: (Name: LOLoad Schema: date#3:bytearray,uid#4:bytearray)RequiredFields:[0, 1]
#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
b: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-1
|
|---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-0
2014-06-10 10:09:50,859 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-06-10 10:09:50,930 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-06-10 10:09:50,930 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-2
Map Plan
b: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-1
|
|---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-0--------
Global sort: false
----------------
首先
#-----------------------------------------------
# New Logical Plan:
逻辑执行如何生成b?
b: (Name: LOStore Schema: date#3:bytearray,uid#4:bytearray)
|
|---b: (Name: LOLoad Schema: date#3:bytearray,uid#4:bytearray)RequiredFields:[0, 1]
然后
#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
b: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-1
|
|---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-0
最后MR如何?
2014-06-10 10:09:50,859 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-06-10 10:09:50,930 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-06-10 10:09:50,930 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-2
Map Plan
b: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-1
|
|---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-0--------
Global sort: false
----------------
看到每没有,load只有map,没有reduce.
再来继续。对b做抽样,去0.001%出来。
<pre name="code" class="java">grunt> c = sample b 0.001;
2014-06-10 10:10:42,092 [main] WARN org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_DOUBLE 1 time(s).
grunt> explain c;
2014-06-10 10:10:46,421 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
看logic plan里,由底到高,看到最下面是b,然后经过操作,最后生成c.
#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
c: (Name: LOStore Schema: date#36:bytearray,uid#37:bytearray)
|
|---c: (Name: LOFilter Schema: date#36:bytearray,uid#37:bytearray)
| |
| (Name: LessThan Type: boolean Uid: 43)
| |
| |---(Name: UserFunc(org.apache.pig.builtin.RANDOM) Type: double Uid: 41)
| |
| |---(Name: Constant Type: double Uid: 42)
|
|---b: (Name: LOLoad Schema: date#36:bytearray,uid#37:bytearray)RequiredFields:[0, 1]
Physical plan里注意到RANDOM.
#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
c: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-8
|
|---c: Filter[bag] - scope-4
| |
| Less Than[boolean] - scope-7
| |
| |---POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-5
| |
| |---Constant(0.0010) - scope-6
|
|---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-3
2014-06-10 10:10:46,477 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-06-10 10:10:46,481 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-06-10 10:10:46,481 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
还是没有用到reduce.
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-9
Map Plan
c: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-8
|
|---c: Filter[bag] - scope-4
| |
| Less Than[boolean] - scope-7
| |
| |---POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-5
| |
| |---Constant(0.0010) - scope-6
|
|---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-3--------
Global sort: false
----------------
让我们用下reduce.order 一个。
<span style="font-family: Arial, Helvetica, sans-serif;">
</span>
<span style="font-family: Arial, Helvetica, sans-serif;"></span><pre name="code" class="java">runt> d = order c by uid;
2014-06-10 10:13:12,689 [main] WARN org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_DOUBLE 1 time(s).
grunt> explain d;
2014-06-10 10:13:17,037 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
d: (Name: LOStore Schema: date#64:bytearray,uid#65:bytearray)
|
|---d: (Name: LOSort Schema: date#64:bytearray,uid#65:bytearray)
| |
| uid:(Name: Project Type: bytearray Uid: 65 Input: 0 Column: 1)
|
|---c: (Name: LOFilter Schema: date#64:bytearray,uid#65:bytearray)
| |
| (Name: LessThan Type: boolean Uid: 71)
| |
| |---(Name: UserFunc(org.apache.pig.builtin.RANDOM) Type: double Uid: 69)
| |
| |---(Name: Constant Type: double Uid: 70)
|
|---b: (Name: LOLoad Schema: date#64:bytearray,uid#65:bytearray)RequiredFields:[0, 1]
#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
d: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-17
|
|---d: POSort[bag]() - scope-16
| |
| Project[bytearray][1] - scope-15
|
|---c: Filter[bag] - scope-11
| |
| Less Than[boolean] - scope-14
| |
| |---POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-12
| |
| |---Constant(0.0010) - scope-13
|
|---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-10
2014-06-10 10:13:17,052 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-06-10 10:13:17,089 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
2014-06-10 10:13:17,089 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-18
Map Plan
Store(hdfs://namenode:9000/tmp/temp-1458176514/tmp-990933258:org.apache.pig.impl.io.InterStorage) - scope-19
|
|---c: Filter[bag] - scope-11
| |
| Less Than[boolean] - scope-14
| |
| |---POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-12
| |
| |---Constant(0.0010) - scope-13
|
|---b: Load(/in_off/tree/20140101/*:org.apache.pig.builtin.PigStorage) - scope-10--------
Global sort: false
----------------
MapReduce node scope-21
Map Plan
d: Local Rearrange[tuple]{tuple}(false) - scope-25
| |
| Constant(all) - scope-24
|
|---New For Each(false)[tuple] - scope-23
| |
| Project[bytearray][1] - scope-22
|
|---Load(hdfs://namenode:9000/tmp/temp-1458176514/tmp-990933258:org.apache.pig.impl.builtin.RandomSampleLoader('org.apache.pig.impl.io.InterStorage','100')) - scope-20--------
Reduce Plan
Store(hdfs://namenode:9000/tmp/temp-1458176514/tmp-67539995:org.apache.pig.impl.io.InterStorage) - scope-34
|
|---New For Each(false)[tuple] - scope-33
| |
| POUserFunc(org.apache.pig.impl.builtin.FindQuantiles)[tuple] - scope-32
| |
| |---Project[tuple][*] - scope-31
|
|---New For Each(false,false)[tuple] - scope-30
| |
| Constant(-1) - scope-29
| |
| Project[bag][1] - scope-27
|
|---Package[tuple]{chararray} - scope-26--------
Global sort: false
Secondary sort: true
----------------
MapReduce node scope-36
Map Plan
d: Local Rearrange[tuple]{bytearray}(false) - scope-37
| |
| Project[bytearray][1] - scope-15
|
|---Load(hdfs://namenode:9000/tmp/temp-1458176514/tmp-990933258:org.apache.pig.impl.io.InterStorage) - scope-35--------
Reduce Plan
d: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-17
|
|---New For Each(true)[tuple] - scope-40
| |
| Project[bag][1] - scope-39
|
|---PackageLite[tuple]{bytearray} - scope-38--------
Global sort: true
看到没,reduce用到了。