Spark SQL的架构
实例分析
spark-sql> explain extended select * from emp e inner join dept d on e.deptno = d.deptno where e.deptno > 10;
20/02/04 20:16:31 INFO CodeGenerator: Code generated in 22.286318 ms
== Parsed Logical Plan ==
'Project [*]
+- 'Filter ('e.deptno > 10)
+- 'Join Inner, ('e.deptno = 'd.deptno)
:- 'SubqueryAlias `e`
: +- 'UnresolvedRelation `emp`
+- 'SubqueryAlias `d`
+- 'UnresolvedRelation `dept`
== Analyzed Logical Plan ==
empno: int, ename: string, position: string, managerid: int, hiredate: string, salary: double, allowance: double, deptno: int, deptno: int, ename: string, dname: string, city: int
Project [empno
+- Filter (deptno
+- Join Inner, (deptno
:- SubqueryAlias `e`
: +- SubqueryAlias `h_demo`.`emp`
: +- HiveTableRelation `h_demo`.`emp`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [empno
+- SubqueryAlias `d`
+- SubqueryAlias `h_demo`.`dept`
+- HiveTableRelation `h_demo`.`dept`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [deptno
== Optimized Logical Plan ==
Join Inner, (deptno
:- Filter (isnotnull(deptno
: +- HiveTableRelation `h_demo`.`emp`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [empno
+- Filter ((deptno
+- HiveTableRelation `h_demo`.`dept`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [deptno
== Physical Plan ==
*(2) BroadcastHashJoin [deptno
:- *(2) Filter (isnotnull(deptno
: +- Scan hive h_demo.emp [empno
+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)))
+- *(1) Filter ((deptno
+- Scan hive h_demo.dept [deptno
Time taken: 0.352 seconds, Fetched 1 row(s)
20/02/04 20:16:31 INFO SparkSQLCLIDriver: Time taken: 0.352 seconds, Fetched 1 row(s)