说明
如果一个Query中有重复出现的Exchange或者Subquery, Spark 可以对着一部分进行 Reuse。
在 disable AE 时,是通过两个固定的Rule 来改写Plan 来实现 reuse.
在 enable AE 时,AE 通过 Stage 的划分和调度,通过特殊的 ReusedStage 来实现 reuse.
ReuseExchange 和 ReuseSubquery Rule
在 disable AE 时,通过 ReuseExchange 和 ReuseSubquery Rule 来实现exchange 的reuse。
sparkPlan 在转换成 executedPlan 之前会先执行 ReuseExchange Rule 和 ReuseSubquery Rule。
在 ReuseExchange Rule 中对之前出现过的 Exchange, 替换成 ReusedExchangeExec。
在 ReusedExchangeExec 中所有的 execute 方法对应的 child 的 execute方法。
在 ReuseSubquery Rule 中对之前出现过的 ExecSubqueryExpression, 替换成 ReusedSubqueryExec.
在 ReusedExchangeExec 和 ReusedSubqueryExec 中所有的 execute 方法对应的 child 的 execute方法。
// ReuseExchange
def reuse: PartialFunction[Exchange, SparkPlan] = {
case exchange: Exchange =>
val sameSchema = exchanges.getOrElseUpdate(exchange.schema, ArrayBuffer[Exchange]())
val samePlan = sameSchema.find { e =>
exchange.sameResult(e)
}
if (samePlan.isDefined) {
// Keep the output of this exchange, the following plans require that to resolve
// attributes.
// samePlan.get 就是之前出现过的 exchange,
ReusedExchangeExec(exchange.output, samePlan.get)
} else {
sameSchema += exchange
exchange
}
}
// ReuseSubquery
val subqueries = mutable.HashMap[StructType, ArrayBuffer[BaseSubqueryExec]]()
plan transformAllExpressions {
case sub: ExecSubqueryExpression =>
val sameSchema =
subqueries.getOrElseUpdate(sub.plan.schema, ArrayBuffer[BaseSubqueryExec]())
val sameResult = sameSchema.find(_.sameResult(sub.plan))
if (sameResult.isDefined) {
sub.withNewPlan(ReusedSubqueryExec(sameResult.get))
} else {
sameSchema += sub.plan
sub
}
}
AE 中 reuse 的实现
AE 对Exchange的 reuse 是在调度的时候实现的。
以下面Query为例:
SELECT user.id, name, count(salary) as c1
FROM user join sal
ON user.id = sal.id
GROUP BY user.id, name
UNION ALL
SELECT user.id, name, count(salary) as c1
FROM user join sal
ON user.id = sal.id
GROUP BY user.id, name
原始 Plan 如下
== Optimized Logical Plan ==
Union
:- Aggregate [id#277L, name#278], [id#277L, name#278, count(salary#282) AS c1#279L]
: +- Project [id#277L, name#278, salary#282]
: +- Join Inner, (id#277L = id#281L)
: :- Filter isnotnull(id#277L)
: : +- Relation default.user[id#277L,name#278] parquet
: +- Filter isnotnull(id#281L)
: +- Relation default.sal[id#281L,salary#282] parquet
+- Aggregate [id#277L, name#278], [id#277L, name#278, count(salary#282) AS c1#287L]
+- Project [id#277L, name#278, salary#282]
+- Join Inner, (id#277L = id#281L)
:- Filter isnotnull(id#277L)
: +- Relation default.user[id#277L,name#278] parquet
+- Filter isnotnull(id#281L)
+- Relation default.sal[id#281L,salary#282] parquet
== Physical Plan ==
Union
:- HashAggregate(keys=[id#277L, name#278], functions=[count(salary#282)], output=[id#277L, name#278, c1#279L])
: +- Exchange hashpartitioning(id#277L, name#278, 5), ENSURE_REQUIREMENTS, [id=#186]
: +- HashAggregate(keys=[id#277L, name#278], functions=[partial_count(salary#282)], output=[id#277L, name#278, count#292L])
: +- Project [id#277L, name#278, salary#282]
: +- BroadcastHashJoin [id#277L], [id#281L], Inner, BuildLeft, false
: :- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true]),false), [id=#181]
: : +- Project [id#277L, name#278]
: : +- Filter isnotnull(id#277L)
: : +- FileScan parquet default.user[id#277L,name#278] Batched: true, DataFilters: [isnotnull(id#277L)], Format: Parquet...
: +- Project [id#281L, salary#282]
: +- Filter isnotnull(id#281L)
: +- FileScan parquet default.sal[id#281L,salary#282] Batched: true, DataFilters: [isnotnull(id#281L)], Format: Parquet...
+- HashAggregate(keys=[id#277L, name#278], functions=[count(salary#282)], output=[id#277L, name#278, c1#287L])
+- Exchange hashpartitioning(id#277L, name#278, 5), ENSURE_REQUIREMENTS, [id=#193]
+- HashAggregate(keys=[id#277L, name#278], functions=[partial_count(salary#282)], output=[id#277L, name#278, count#294L])
+- Project [id#277L, name#278, salary#282]
+- BroadcastHashJoin [id#277L], [id#281L], Inner, BuildLeft, false
:- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true]),false), [id=#188]
: +- Project [id#277L, name#278]
: +- Filter isnotnull(id#277L)
: +- FileScan parquet default.user[id#277L,name#278] Batched: true, DataFilters: [isnotnull(id#277L)], Format: Parquet...
+- Project [id#281L, salary#282]
+- Filter isnotnull(id#281L)
+- FileScan parquet default.sal[id#281L,salary#282] Batched: true, DataFilters: [isnotnull(id#281L)], Format: Parquet...
AE 第一次划分 Stage 并提交执行
Union
:- HashAggregate(keys=[id#277L, name#278], functions=[count(salary#282)], output=[id#277L, name#278, c1#279L])
: +- Exchange hashpartitioning(id#277L, name#278, 5), ENSURE_REQUIREMENTS, [id=#229]
: +- HashAggregate(keys=[id#277L, name#278], functions=[partial_count(salary#282)], output=[id#277L, name#278, count#292L])
: +- Project [id#277L, name#278, salary#282]
: +- BroadcastHashJoin [id#277L], [id#281L], Inner, BuildLeft, false
: :- BroadcastQueryStage 0
: : +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true]),false), [id=#224]
: : +- *(1) Project [id#277L, name#278]
: : +- *(1) Filter isnotnull(id#277L)
: : +- *(1) ColumnarToRow
: : +- FileScan parquet default.user[id#277L,name#278] Batched: true, DataFilters: [isnotnull(id#277L)], Format: Parquet...
: +- Project [id#281L, salary#282]
: +- Filter isnotnull(id#281L)
: +- FileScan parquet default.sal[id#281L,salary#282] Batched: true, DataFilters: [isnotnull(id#281L)], Format: Parquet...
+- HashAggregate(keys=[id#277L, name#278], functions=[count(salary#282)], output=[id#277L, name#278, c1#287L])
+- Exchange hashpartitioning(id#277L, name#278, 5), ENSURE_REQUIREMENTS, [id=#255]
+- HashAggregate(keys=[id#277L, name#278], functions=[partial_count(salary#282)], output=[id#277L, name#278, count#294L])
+- Project [id#277L, name#278, salary#282]
+- BroadcastHashJoin [id#277L], [id#281L], Inner, BuildLeft, false
:- BroadcastQueryStage 1
: +- ReusedExchange [id#277L, name#278], BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true]),false), [id=#224]
+- Project [id#281L, salary#282]
+- Filter isnotnull(id#281L)
+- FileScan parquet default.sal[id#281L,salary#282] Batched: true, DataFilters: [isnotnull(id#281L)], Format: Parquet....
此时 AE 提交执行 BroadcastQueryStage 0
和 BroadcastQueryStage 1
.
BroadcastQueryStage 的执行就是执行 BroadcastExchangeExec 中 val relationFuture: Future[broadcast.Broadcast[Any]]
, 将 SparkPlan 转换成执行的 Relation, 并 Broadcast 出去。
AE 第二次划分 Stage 并提交执行
Union
:- HashAggregate(keys=[id#277L, name#278], functions=[count(salary#282)], output=[id#277L, name#278, c1#279L])
: +- ShuffleQueryStage 2
: +- Exchange hashpartitioning(id#277L, name#278, 5), ENSURE_REQUIREMENTS, [id=#325]
: +- *(2) HashAggregate(keys=[id#277L, name#278], functions=[partial_count(salary#282)], output=[id#277L, name#278, count#292L])
: +- *(2) Project [id#277L, name#278, salary#282]
: +- *(2) BroadcastHashJoin [id#277L], [id#281L], Inner, BuildLeft, false
: :- BroadcastQueryStage 0
: : +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true]),false), [id=#224]
: : +- *(1) Project [id#277L, name#278]
: : +- *(1) Filter isnotnull(id#277L)
: : +- *(1) ColumnarToRow
: : +- FileScan parquet default.user[id#277L,name#278] Batched: true, DataFilters: [isnotnull(id#277L)], Format: Parquet...
: +- *(2) Project [id#281L, salary#282]
: +- *(2) Filter isnotnull(id#281L)
: +- *(2) ColumnarToRow
: +- FileScan parquet default.sal[id#281L,salary#282] Batched: true, DataFilters: [isnotnull(id#281L)], Format: Parquet...
+- HashAggregate(keys=[id#277L, name#278], functions=[count(salary#282)], output=[id#277L, name#278, c1#287L])
+- ShuffleQueryStage 3
+- ReusedExchange [id#277L, name#278, count#294L], Exchange hashpartitioning(id#277L, name#278, 5), ENSURE_REQUIREMENTS, [id=#325]
此时 AE 提交执行 ShuffleQueryStage 2
和 ShuffleQueryStage 3
.
对于ShuffleQueryStage
通过 val mapOutputStatisticsFuture: Future[MapOutputStatistics]
返回 Future 对象,等待执行完成。
上面的 QueryStage 都对应同一 Future对象,保证了 QueryStage 不会重复提交执行。