通过执行计划理解上图
spark-sql (default)> explain extended
> select
> a.key*(4+5),
> b.value
> from
> aa a join aa b
> on a.key=b.key and a.key>10;
plan
== Parsed Logical Plan ==
'Project [unresolvedalias(('a.key * (4 + 5)), None), 'b.value]
+- 'Join Inner, (('a.key = 'b.key) && ('a.key > 10))
:- 'SubqueryAlias a
: +- 'UnresolvedRelation `aa`
+- 'SubqueryAlias b
+- 'UnresolvedRelation `aa`
== Analyzed Logical Plan ==
(key * (4 + 5)): int, value: string
Project [(key#37 * (4 + 5)) AS (key * (4 + 5))#41, value#40]
+- Join Inner, ((key#37 = key#39) && (key#37 > 10))
:- SubqueryAlias a
: +- SubqueryAlias aa
: +- CatalogRelation `default`.`aa`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#37, value#38]
+- SubqueryAlias b
+- SubqueryAlias aa
+- CatalogRelation `default`.`aa`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#39, value#40]
== Optimized Logical Plan ==
Project [(key#37 * 9) AS (key * (4 + 5))#41, value#40]
+- Join Inner, (key#37 = key#39)
:- Project [key#37]
: +- Filter (isnotnull(key#37) && (key#37 > 10))
: +- CatalogRelation `default`.`aa`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#37, value#38]
+- Filter ((key#39 > 10) && isnotnull(key#39))
+- CatalogRelation `default`.`aa`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#39, value#40]
== Physical Plan ==
*Project [(key#37 * 9) AS (key * (4 + 5))#41, value#40]
+- *SortMergeJoin [key#37], [key#39], Inner
:- *Sort [key#37 ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(key#37, 200)
: +- *Filter (isnotnull(key#37) && (key#37 > 10))
: +- HiveTableScan [key#37], CatalogRelation `default`.`aa`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#37, value#38]
+- *Sort [key#39 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(key#39, 200)
+- *Filter ((key#39 > 10) && isnotnull(key#39))
+- HiveTableScan [key#39, value#40], CatalogRelation `default`.`aa`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#39, value#40]
Time taken: 1.218 seconds, Fetched 1 row(s)