Spark源码-2.3 Aggregate物理实现-3种聚合物理算子

最新推荐文章于 2022-10-20 15:53:43 发布

一不小心注册成真名了

最新推荐文章于 2022-10-20 15:53:43 发布

阅读量215

点赞数

分类专栏：大数据文章标签： spark 大数据 sql

本文链接：https://blog.csdn.net/weixin_42265234/article/details/115920329

版权

大数据专栏收录该内容

11 篇文章 0 订阅

订阅专栏

概述

Optimizer 中的预处理

当存在多列distinct计算时，Optimizer执行RewriteDistinctAggregates规则时，该规则会将多列distinct展开（通过插入Expand算子），非distinct聚合列和每个distinct聚合列会被分为不同的组（假设为N组），每个组为一行数据并带有group id，这样一行数据会被扩展为N行。之后，用两层Aggregate算子计算Expand之后的数据，第一层按前面的分组聚合，第二层再将结果聚合。引用RewriteDistinctAggregates的注释中的例子说明：

val data = Seq(
    ("a", "ca1", "cb1", 10),
    ("a", "ca1", "cb2", 5),
    ("b", "ca1", "cb1", 13))
    .toDF("key", "cat1", "cat2", "value")
data.createOrReplaceTempView("data")

val agg = data.groupBy($"key")
        .agg(
            countDistinct($"cat1").as("cat1_cnt"),
            countDistinct($"cat2").as("cat2_cnt"),
        sum($"value").as("total"))

原始逻辑计划：

Aggregate(
    key = ['key]
    functions = [
        COUNT(DISTINCT 'cat1), 
        COUNT(DISTINCT 'cat2), 
        sum('value)]
    output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
    LocalTableScan [...]

改造后逻辑计划

Aggregate(
    key = ['key]
    functions = [
        count(if (('gid = 1)) 'cat1 else null), 
        count(if (('gid = 2)) 'cat2 else null), 
        first(if (('gid = 0)) 'total else null) ignore nulls]
    output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
    Aggregate(
        key = ['key, 'cat1, 'cat2, 'gid]
        functions = [
            sum('value)]
        output = ['key, 'cat1, 'cat2, 'gid, 'total])
        Expand(
            projections = [
                ('key, null, null, 0,cast('value as bigint)),
                ('key, 'cat1, null, 1, null),
                ('key, null, 'cat2, 2, null)]
        output = ['key, 'cat1, 'cat2, 'gid, 'value])
        LocalTableScan [...]

只有一个distinct聚合列时不做处理。

变换为物理计划

SparkPlanner的Aggregation规则将逻辑计划转换为物理计划。其核心代码：

object Aggregation extends Strategy {
    def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
      case PhysicalAggregation(
          groupingExpressions, aggregateExpressions, resultExpressions, child) =>

        val (functionsWithDistinct, functionsWithoutDistinct) = ...
        
        val aggregateOperator =
          if (functionsWithDistinct.isEmpty) {
            aggregate.AggUtils.planAggregateWithoutDistinct(...)
          } else {
            aggregate.AggUtils.planAggregateWithOneDistinct(...)
          }
        aggregateOperator
      case _ => Nil
    }
  }

PhysicalAggregation将逻辑Aggregate解析成分组表达式、聚合表达式等，将逻辑计划解析为物理Aggregate交由AggUtils完成。

AggUtils

AggUtils将聚合分为两个类型：无distinct聚合计算和有一个distinct聚合计算（RewriteDistinctAggregates已经将多distinct转换为无distinct聚合）。

无distinct

无distinct逻辑Aggregate会被转换为2个物理Aggregate：

1.Partial聚合
2.Final聚合

def planAggregateWithoutDistinct(
      groupingExpressions: Seq[NamedExpression],
      aggregateExpressions: Seq[AggregateExpression],
      resultExpressions: Seq[NamedExpression],
      child: SparkPlan): Seq[SparkPlan] = {
    // Check if we can use HashAggregate.

    // 1. Create an Aggregate Operator for partial aggregations.
    val groupingAttributes = groupingExpressions.map(_.toAttribute)
    val partialAggregateExpressions = aggregateExpressions.map(_.copy(mode = Partial))
    val partialAggregateAttributes = ...
    val partialResultExpressions = ...
    val partialAggregate = createAggregate(
        ...
        child = child)

    // 2. Create an Aggregate Operator for final aggregations.
    val finalAggregateExpressions = aggregateExpressions.map(_.copy(mode = Final))
    // The attributes of the final aggregation buffer, which is presented as input to the result
    // projection:
    val finalAggregateAttributes = ...
    val finalAggregate = createAggregate(
        ...
        child = partialAggregate)

    finalAggregate :: Nil
  }

仅一个distinct

仅有一个distinct逻辑Aggregate会被转换为4个物理Aggregate：

1.将distinct列加入group by列进行Partial聚合
2.将distinct列加入group by列进行PartialMerge
3.仅原group by列进行PartialMerge
4.Final聚合

def planAggregateWithOneDistinct(
      groupingExpressions: Seq[NamedExpression],
      functionsWithDistinct: Seq[AggregateExpression],
      functionsWithoutDistinct: Seq[AggregateExpression],
      resultExpressions: Seq[NamedExpression],
      child: SparkPlan): Seq[SparkPlan] = {

    // functionsWithDistinct is guaranteed to be non-empty. Even though it may contain more than one
    // DISTINCT aggregate function, all of those functions will have the same column expressions.
    // For example, it would be valid for functionsWithDistinct to be
    // [COUNT(DISTINCT foo), MAX(DISTINCT foo)], but [COUNT(DISTINCT bar), COUNT(DISTINCT foo)] is
    // disallowed because those two distinct aggregates have different column expressions.
    ...

    // 1. Create an Aggregate Operator for partial aggregations.
    val partialAggregate: SparkPlan = {
      ...
      // We will group by the original grouping expression, plus an additional expression for the
      // DISTINCT column. For example, for AVG(DISTINCT value) GROUP BY key, the grouping
      // expressions will be [key, value].
      createAggregate(
        groupingExpressions = groupingExpressions ++ namedDistinctExpressions, // 将distinct列加入分组列
        ...,
        child = child)
    }

    // 2. Create an Aggregate Operator for partial merge aggregations.
    val partialMergeAggregate: SparkPlan = {
      ...
      createAggregate(
        ...,
        groupingExpressions = groupingAttributes ++ distinctAttributes, // 将distinct列加入分组列
        ...,
        child = partialAggregate)
    }

    // 3. Create an Aggregate operator for partial aggregation (for distinct)
    ...
    val partialDistinctAggregate: SparkPlan = {
      ...
      createAggregate(
        groupingExpressions = groupingAttributes,
        ...,
        child = partialMergeAggregate)
    }

    // 4. Create an Aggregate Operator for the final aggregation.
    val finalAndCompleteAggregate: SparkPlan = {
      ...
      createAggregate(
        ...,
        groupingExpressions = groupingAttributes,
        ...,
        child = partialDistinctAggregate)
    }

    finalAndCompleteAggregate :: Nil
  }

结合RewriteDistinctAggregates规则可见，spark中distinct聚合的计算是被转换为group by进行优化的。

Aggregate物理计划

物理计划对象的选择与创建是由AggUtils.createAggregate方法完成的。聚合实现的物理计划有三种：

HashAggregateExec
ObjectHashAggregateExec
SortAggregateExec

HashAggregateExec

使用TungstenAggregationIterator聚合并输出数据，其内部使用hash表实现UnsafeFixedWidthAggregationMap(下面简称hashmap)进行基于hash的聚合。

若数据量较小，能被全部存在hashmap中，则直接将hashmap中数据输出；
若数据量较大，当hashmap申请不到内存(装满)，会将数据排序并spill到磁盘；清空的数据hashmap被用来聚合新数据。循环上述步骤处理输入数据直到没有输入，spill的排序数据执行排序聚合并输出。可见，当数据量较大内存不足以容纳时，hash聚合实际退化为排序聚合。

ObjectHashAggregateExec

ObjectHashAggregateExec与HashAggregateExec类似，都是基于hash聚合，且内存不能容纳全量数据时会退化为sort聚合，但二者仍有不同：

/*
 * A hash-based aggregate operator that supports [[TypedImperativeAggregate]] functions that may
 * use arbitrary JVM objects as aggregation states.
 *
 * Similar to [[HashAggregateExec]], this operator also falls back to sort-based aggregation when
 * the size of the internal hash map exceeds the threshold. The differences are:
 *
 *  - It uses safe rows as aggregation buffer since it must support JVM objects as aggregation
 *    states.
 *
 *  - It tracks entry count of the hash map instead of byte size to decide when we should fall back.
 *    This is because it's hard to estimate the accurate size of arbitrary JVM objects in a
 *    lightweight way.
 *
 *  - Whenever fallen back to sort-based aggregation, this operator feeds all of the rest input rows
 *    into external sorters instead of building more hash map(s) as what [[HashAggregateExec]] does.
 *    This is because having too many JVM object aggregation states floating there can be dangerous
 *    for GC.
 *
 *  - CodeGen is not supported yet.
 */

HashAggregateExec内部使用UnsafeFixedWidthAggregationMap进行hash聚合，其支持的聚合buffer(亦其聚合过程中状态值)类型为等宽、可变的，如以下类型：

NullType,
BooleanType,
ByteType,
ShortType,
IntegerType,
LongType,
FloatType,
DoubleType,
DateType,
TimestampType,
DecimalType

而ObjectHashAggregateExec支持聚合buffer为任意java对象的计算。内部使用ObjectAggregationMap进行hash聚合，实际上这个类持有类型为java.util.LinkedHashMap[UnsafeRow, InternalRow]的对象，存储每个key对应的聚合buffer(此处可能为任意java对象)，亦即聚合值的中间状态。

SortAggregateExec

SortAggregateExec实现了方法：

override def outputOrdering: Seq[SortOrder] = {
    groupingExpressions.map(SortOrder(_, Ascending))
}

规则EnsureRequirements会在其与子算子之间插入SortExec，这样SortAggregateExec计算时每个分区内数据都是已排序的。

SortAggregateExec直接使用SortBasedAggregationIterator进行sort聚合，按顺序迭代child算子(即被Aggregate的表)结果数据，由于分区内数据已排序，同一聚合key的数据相邻，从而顺序迭代即可进行聚合。

选择策略

回到AggUtils.createAggregate方法，下面说明Aggregate逻辑计划转换为物理计划的选择策略：

if(所有聚合函数的聚合buffer结果为等宽、可变类型):
    HashAggregateExec
else if(spark.sql.execution.useObjectHashAggregateExec=true && 所有聚合函数为TypedImperativeAggregate类型):
    ObjectHashAggregateExec
else:
    SortAggregateExec

TypedImperativeAggregate:

/**
 * Aggregation function which allows arbitrary user-defined java object to be used as internal
 * aggregation buffer.
 */

一不小心注册成真名了

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Spark源码-2.3 Aggregate物理实现-3种聚合物理算子

概述Optimizer 中的预处理当存在多列distinct计算时，Optimizer执行RewriteDistinctAggregates规则时，该规则会将多列distinct展开（通过插入Expand算子），非distinct聚合列和每个distinct聚合列会被分为不同的组（假设为N组），每个组为一行数据并带有group id，这样一行数据会被扩展为N行。之后，用两层Aggregate算子计算Expand之后的数据，第一层按前面的分组聚合，第二层再将结果聚合。引用RewriteDistinctAg
复制链接

扫一扫