Spark源码-2.3 Aggregate物理实现-3种聚合物理算子

概述

Optimizer 中的预处理

当存在多列distinct计算时,Optimizer执行RewriteDistinctAggregates规则时,该规则会将多列distinct展开(通过插入Expand算子),非distinct聚合列和每个distinct聚合列会被分为不同的组(假设为N组),每个组为一行数据并带有group id,这样一行数据会被扩展为N行。之后,用两层Aggregate算子计算Expand之后的数据,第一层按前面的分组聚合,第二层再将结果聚合。引用RewriteDistinctAggregates的注释中的例子说明:

val data = Seq(
    ("a", "ca1", "cb1", 10),
    ("a", "ca1", "cb2", 5),
    ("b", "ca1", "cb1", 13))
    .toDF("key", "cat1", "cat2", "value")
data.createOrReplaceTempView("data")

val agg = data.groupBy($"key")
        .agg(
            countDistinct($"cat1").as("cat1_cnt"),
            countDistinct($"cat2").as("cat2_cnt"),
        sum($"value").as("total"))

原始逻辑计划:

Aggregate(
    key = ['key]
    functions = [
        COUNT(DISTINCT 'cat1), 
        COUNT(DISTINCT 'cat2), 
        sum('value)]
    output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
    LocalTableScan [...]

改造后逻辑计划

Aggregate(
    key = ['key]
    functions = [
        count(if (('gid = 1)) 'cat1 else null), 
        count(if (('gid = 2)) 'cat2 else null), 
        first(if (('gid = 0)) 'total else null) ignore nulls]
    output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
    Aggregate(
        key = ['key, 'cat1, 'cat2, 'gid]
        functions = [
            sum('value)]
        output = ['key, 'cat1, 'cat2, 'gid, 'total])
        Expand(
            projections = [
                ('key, null, null, 0,cast('value as bigint)),
                ('key, 'cat1, null, 1, null),
                ('key, null, 'cat2, 2, null)]
        output = ['key, 'cat1, 'cat2, 'gid, 'value])
        LocalTableScan [...]

只有一个distinct聚合列时不做处理。

变换为物理计划

SparkPlannerAggregation规则将逻辑计划转换为物理计划。其核心代码:

object Aggregation extends Strategy {
    def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
      case PhysicalAggregation(
          groupingExpressions, aggregateExpressions, resultExpressions, child) =>

        val (functionsWithDistinct, functionsWithoutDistinct) = ...
        
        val aggregateOperator =
          if (functionsWithDistinct.isEmpty) {
            aggregate.AggUtils.planAggregateWithoutDistinct(...)
          } else {
            aggregate.AggUtils.planAggregateWithOneDistinct(...)
          }
        aggregateOperator
      case _ => Nil
    }
  }

PhysicalAggregation将逻辑Aggregate解析成分组表达式、聚合表达式等,将逻辑计划解析为物理Aggregate交由AggUtils完成。

AggUtils

AggUtils将聚合分为两个类型:无distinct聚合计算和有一个distinct聚合计算(RewriteDistinctAggregates已经将多distinct转换为无distinct聚合)。

无distinct

无distinct逻辑Aggregate会被转换为2个物理Aggregate:

  • 1.Partial聚合
  • 2.Final聚合
def planAggregateWithoutDistinct(
      groupingExpressions: Seq[NamedExpression],
      aggregateExpressions: Seq[AggregateExpression],
      resultExpressions: Seq[NamedExpression],
      child: SparkPlan): Seq[SparkPlan] = {
    // Check if we can use HashAggregate.

    // 1. Create an Aggregate Operator for partial aggregations.
    val groupingAttributes = groupingExpressions.map(_.toAttribute)
    val partialAggregateExpressions = aggregateExpressions.map(_.copy(mode = Partial))
    val partialAggregateAttributes = ...
    val partialResultExpressions = ...
    val partialAggregate = createAggregate(
        ...
        child = child)

    // 2. Create an Aggregate Operator for final aggregations.
    val finalAggregateExpressions = aggregateExpressions.map(_.copy(mode = Final))
    // The attributes of the final aggregation buffer, which is presented as input to the result
    // projection:
    val finalAggregateAttributes = ...
    val finalAggregate = createAggregate(
        ...
        child = partialAggregate)

    finalAggregate :: Nil
  }

仅一个distinct

仅有一个distinct逻辑Aggregate会被转换为4个物理Aggregate:

  • 1.将distinct列加入group by列进行Partial聚合
  • 2.将distinct列加入group by列进行PartialMerge
  • 3.仅原group by列进行PartialMerge
  • 4.Final聚合
def planAggregateWithOneDistinct(
      groupingExpressions: Seq[NamedExpression],
      functionsWithDistinct: Seq[AggregateExpression],
      functionsWithoutDistinct: Seq[AggregateExpression],
      resultExpressions: Seq[NamedExpression],
      child: SparkPlan): Seq[SparkPlan] = {

    // functionsWithDistinct is guaranteed to be non-empty. Even though it may contain more than one
    // DISTINCT aggregate function, all of those functions will have the same column expressions.
    // For example, it would be valid for functionsWithDistinct to be
    // [COUNT(DISTINCT foo), MAX(DISTINCT foo)], but [COUNT(DISTINCT bar), COUNT(DISTINCT foo)] is
    // disallowed because those two distinct aggregates have different column expressions.
    ...

    // 1. Create an Aggregate Operator for partial aggregations.
    val partialAggregate: SparkPlan = {
      ...
      // We will group by the original grouping expression, plus an additional expression for the
      // DISTINCT column. For example, for AVG(DISTINCT value) GROUP BY key, the grouping
      // expressions will be [key, value].
      createAggregate(
        groupingExpressions = groupingExpressions ++ namedDistinctExpressions, // 将distinct列加入分组列
        ...,
        child = child)
    }

    // 2. Create an Aggregate Operator for partial merge aggregations.
    val partialMergeAggregate: SparkPlan = {
      ...
      createAggregate(
        ...,
        groupingExpressions = groupingAttributes ++ distinctAttributes, // 将distinct列加入分组列
        ...,
        child = partialAggregate)
    }

    // 3. Create an Aggregate operator for partial aggregation (for distinct)
    ...
    val partialDistinctAggregate: SparkPlan = {
      ...
      createAggregate(
        groupingExpressions = groupingAttributes,
        ...,
        child = partialMergeAggregate)
    }

    // 4. Create an Aggregate Operator for the final aggregation.
    val finalAndCompleteAggregate: SparkPlan = {
      ...
      createAggregate(
        ...,
        groupingExpressions = groupingAttributes,
        ...,
        child = partialDistinctAggregate)
    }

    finalAndCompleteAggregate :: Nil
  }

结合RewriteDistinctAggregates规则可见,spark中distinct聚合的计算是被转换为group by进行优化的

Aggregate物理计划

物理计划对象的选择与创建是由AggUtils.createAggregate方法完成的。聚合实现的物理计划有三种:

  • HashAggregateExec
  • ObjectHashAggregateExec
  • SortAggregateExec
HashAggregateExec

使用TungstenAggregationIterator聚合并输出数据,其内部使用hash表实现UnsafeFixedWidthAggregationMap(下面简称hashmap)进行基于hash的聚合。

  • 若数据量较小,能被全部存在hashmap中,则直接将hashmap中数据输出;
  • 若数据量较大,当hashmap申请不到内存(装满),会将数据排序并spill到磁盘;清空的数据hashmap被用来聚合新数据。循环上述步骤处理输入数据直到没有输入,spill的排序数据执行排序聚合并输出。可见,当数据量较大内存不足以容纳时,hash聚合实际退化为排序聚合
ObjectHashAggregateExec

ObjectHashAggregateExec与HashAggregateExec类似,都是基于hash聚合,且内存不能容纳全量数据时会退化为sort聚合,但二者仍有不同:

/*
 * A hash-based aggregate operator that supports [[TypedImperativeAggregate]] functions that may
 * use arbitrary JVM objects as aggregation states.
 *
 * Similar to [[HashAggregateExec]], this operator also falls back to sort-based aggregation when
 * the size of the internal hash map exceeds the threshold. The differences are:
 *
 *  - It uses safe rows as aggregation buffer since it must support JVM objects as aggregation
 *    states.
 *
 *  - It tracks entry count of the hash map instead of byte size to decide when we should fall back.
 *    This is because it's hard to estimate the accurate size of arbitrary JVM objects in a
 *    lightweight way.
 *
 *  - Whenever fallen back to sort-based aggregation, this operator feeds all of the rest input rows
 *    into external sorters instead of building more hash map(s) as what [[HashAggregateExec]] does.
 *    This is because having too many JVM object aggregation states floating there can be dangerous
 *    for GC.
 *
 *  - CodeGen is not supported yet.
 */

HashAggregateExec内部使用UnsafeFixedWidthAggregationMap进行hash聚合,其支持的聚合buffer(亦其聚合过程中状态值)类型为等宽、可变的,如以下类型:

NullType,
BooleanType,
ByteType,
ShortType,
IntegerType,
LongType,
FloatType,
DoubleType,
DateType,
TimestampType,
DecimalType

ObjectHashAggregateExec支持聚合buffer为任意java对象的计算。内部使用ObjectAggregationMap进行hash聚合,实际上这个类持有类型为java.util.LinkedHashMap[UnsafeRow, InternalRow]的对象,存储每个key对应的聚合buffer(此处可能为任意java对象),亦即聚合值的中间状态。

SortAggregateExec

SortAggregateExec实现了方法:

override def outputOrdering: Seq[SortOrder] = {
    groupingExpressions.map(SortOrder(_, Ascending))
}

规则EnsureRequirements会在其与子算子之间插入SortExec,这样SortAggregateExec计算时每个分区内数据都是已排序的。

SortAggregateExec直接使用SortBasedAggregationIterator进行sort聚合,按顺序迭代child算子(即被Aggregate的表)结果数据,由于分区内数据已排序,同一聚合key的数据相邻,从而顺序迭代即可进行聚合。

选择策略

回到AggUtils.createAggregate方法,下面说明Aggregate逻辑计划转换为物理计划的选择策略:

if(所有聚合函数的聚合buffer结果为等宽、可变类型):
    HashAggregateExec
else if(spark.sql.execution.useObjectHashAggregateExec=true && 所有聚合函数为TypedImperativeAggregate类型):
    ObjectHashAggregateExec
else:
    SortAggregateExec

TypedImperativeAggregate:

/**
 * Aggregation function which allows arbitrary user-defined java object to be used as internal
 * aggregation buffer.
 */
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值