1 聚合操作比较
1.1 presto groupby
explain select sum(totalprice),orderpriority from orders group by orderpriority;
- Output[_col0, orderpriority] => [sum:double, orderpriority:varchar(15)]
_col0 := sum
- RemoteExchange[GATHER] => sum:double, orderpriority:varchar(15)
- Project => [sum:double, orderpriority:varchar(15)]
- Aggregate(FINAL)[orderpriority] => [orderpriority:varchar(15), $hashvalue:bigint, sum:double] //最终聚合
sum := "sum"("sum_9")
- RemoteExchange[REPARTITION] => orderpriority:varchar(15), sum_9:double, $hashvalue:bigint //从部分聚合节点远程拉取结果
- Aggregate(PARTIAL)[orderpriority] => [orderpriority:varchar(15), $hashvalue_11:bigint, sum_10:double]
sum_10 := "sum"("totalprice") //部分聚合
- Project => [$hashvalue_11:bigint, orderpriority:varchar(15), totalprice:double]
$hashvalue_11 := "combine_hash"(BIGINT '0', COALESCE("$operator$hash_code"("orderpriority"), 0))
- TableScan[tpch:tpch:orders:sf1.0, originalConstraint = true] => [totalprice:double, orderpriority:varchar(15)] //表扫描操作
totalprice := tpch:totalprice
orderpriority := tpch:orderpriority
1.2 hive groupby
explain select sum(o_totalprice),o_orderpriority from orders where o_orderkey>100 group by o_orderpriority;
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan //表扫描
alias: orders
Statistics: Num rows: 6000000 Data size: 596779236 Basic stats: COMPLETE Column stats: NONE
Filter Operator //表过滤
predicate: (o_orderkey > 100) (type: boolean)
Statistics: Num rows: 2000000 Data size: 198926412 Basic stats: COMPLETE Column stats: NONE
Select Operator //表投影
expressions: o_orderpriority (type: string), o_totalprice (type: double)
outputColumnNames: o_orderpriority, o_totalprice
Statistics: Num rows: 2000000 Data size: 198926412 Basic stats: COMPLETE Column stats: NONE
Group By Operator //对扫描、过滤和投影之后 的记过进行局部groupby,hash的方式
aggregations: sum(o_totalprice) //聚合函数
keys: o_orderpriority (type: string)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 2000000 Data size: 198926412 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator //输出局部groupby的结