Presto源码分析(和hive执行计划的比较)

本文对比分析了Presto和Hive在聚合操作、排序操作和JOIN操作上的执行计划差异。Presto的GROUP BY、排序和JOIN操作展示了其优化策略,包括部分聚合和全局聚合的阶段划分。Hive的排序操作则提出了疑问,可能在reduce阶段进行全局排序。在JOIN操作中,Presto使用了分发和部分聚合,而Hive的部分细节未给出。
摘要由CSDN通过智能技术生成

1 聚合操作比较

1.1 presto groupby

explain select sum(totalprice),orderpriority from orders group by orderpriority;

 - Output[_col0, orderpriority] => [sum:double, orderpriority:varchar(15)]
         _col0 := sum
     - RemoteExchange[GATHER] => sum:double, orderpriority:varchar(15)
         - Project => [sum:double, orderpriority:varchar(15)]
             - Aggregate(FINAL)[orderpriority] => [orderpriority:varchar(15), $hashvalue:bigint, sum:double] //最终聚合
                     sum := "sum"("sum_9")
                 - RemoteExchange[REPARTITION] => orderpriority:varchar(15), sum_9:double, $hashvalue:bigint //从部分聚合节点远程拉取结果
                     - Aggregate(PARTIAL)[orderpriority] => [orderpriority:varchar(15), $hashvalue_11:bigint, sum_10:double] 
                             sum_10 := "sum"("totalprice") //部分聚合
                         - Project => [$hashvalue_11:bigint, orderpriority:varchar(15), totalprice:double]
                                 $hashvalue_11 := "combine_hash"(BIGINT '0', COALESCE("$operator$hash_code"("orderpriority"), 0))
                             - TableScan[tpch:tpch:orders:sf1.0, originalConstraint = true] => [totalprice:double, orderpriority:varchar(15)] //表扫描操作
                                     totalprice := tpch:totalprice
                                     orderpriority := tpch:orderpriority

1.2 hive groupby

explain select sum(o_totalprice),o_orderpriority from orders where o_orderkey>100 group by o_orderpriority;


STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan //表扫描
            alias: orders
            Statistics: Num rows: 6000000 Data size: 596779236 Basic stats: COMPLETE Column stats: NONE
            Filter Operator //表过滤
              predicate: (o_orderkey > 100) (type: boolean)
              Statistics: Num rows: 2000000 Data size: 198926412 Basic stats: COMPLETE Column stats: NONE
              Select Operator //表投影
                expressions: o_orderpriority (type: string), o_totalprice (type: double)
                outputColumnNames: o_orderpriority, o_totalprice
                Statistics: Num rows: 2000000 Data size: 198926412 Basic stats: COMPLETE Column stats: NONE
                Group By Operator //对扫描、过滤和投影之后 的记过进行局部groupby,hash的方式
                  aggregations: sum(o_totalprice) //聚合函数
                  keys: o_orderpriority (type: string)
                  mode: hash
                  outputColumnNames: _col0, _col1
                  Statistics: Num rows: 2000000 Data size: 198926412 Basic stats: COMPLETE Column stats: NONE
                  Reduce Output Operator //输出局部groupby的结
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值