Spark SQL的架构
实例分析
spark-sql> explain extended select * from emp e inner join dept d on e.deptno = d.deptno where e.deptno > 10;
20/02/04 20:16:31 INFO CodeGenerator: Code generated in 22.286318 ms
== Parsed Logical Plan ==
'Project [*]
+- ' Filter ( 'e.deptno > 10)
+- ' Join Inner, ( 'e.deptno = ' d.deptno)
:- 'SubqueryAlias ` e`
: +- ' UnresolvedRelation ` emp`
+- 'SubqueryAlias ` d`
+- ' UnresolvedRelation ` dept`
== Analyzed Logical Plan ==
empno: int, ename: string, position: string, managerid: int, hiredate: string, salary: double, allowance: double, deptno: int, deptno: int, ename: string, dname: string, city: int
Project [ empno
+- Filter ( deptno
+- Join Inner, ( deptno
:- SubqueryAlias ` e`
: +- SubqueryAlias ` h_demo` . ` emp`
: +- HiveTableRelation ` h_demo` . ` emp` , org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [ empno
+- SubqueryAlias ` d`
+- SubqueryAlias ` h_demo` . ` dept`
+- HiveTableRelation ` h_demo` . ` dept` , org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [ deptno
== Optimized Logical Plan ==
Join Inner, ( deptno
:- Filter ( isnotnull( deptno
: +- HiveTableRelation ` h_demo` . ` emp` , org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [ empno
+- Filter (( deptno
+- HiveTableRelation ` h_demo` . ` dept` , org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [ deptno
== Physical Plan ==
*( 2) BroadcastHashJoin [ deptno
:- *( 2) Filter ( isnotnull( deptno
: +- Scan hive h_demo.emp [ empno
+- BroadcastExchange HashedRelationBroadcastMode( List( cast( input[ 0, int, false] as bigint)) )
+- *( 1) Filter (( deptno
+- Scan hive h_demo.dept [ deptno
Time taken: 0.352 seconds, Fetched 1 row( s)
20/02/04 20:16:31 INFO SparkSQLCLIDriver: Time taken: 0.352 seconds, Fetched 1 row( s)