Spark优化之执行计划

1.sparksql 执行计划

spark.sql("select a.uid,count(1) from a join u on a.uid=u.uid where a.actorlevel>10 group by a.uid order by a.uid").explain()

(1)explain() 查看物理计划

== Physical Plan ==
*(4) Sort [uid#84 ASC NULLS FIRST], true, 0
+- Exchange rangepartitioning(uid#84 ASC NULLS FIRST, 200)
   +- *(3) HashAggregate(keys=[uid#84], functions=[count(1)])
      +- Exchange hashpartitioning(uid#84, 200)
         +- *(2) HashAggregate(keys=[uid#84], functions=[partial_count(1)])
            +- *(2) Project [uid#84]
               +- *(2) BroadcastHashJoin [cast(uid#84 as bigint)], [uid#23L], Inner, BuildRight
                  :- *(2) Project [uid#84]
                  :  +- *(2) Filter ((isnotnull(actorlevel#50) && (cast(actorlevel#50 as int) > 10)) && isnotnull(uid#84))
                  :     +- *(2) FileScan json [actorlevel#50,uid#84] Batched: false, Format: JSON, Location: InMemoryFileIndex[file:/D:/workspace/sparktest/datas/actorinfo], PartitionFilters: [], PushedFilters: [IsNotNull(actorlevel), IsNotNull(uid)], ReadSchema: struct<actorlevel:string,uid:string>
                  +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true]))
                     +- *(1) Project [uid#23L]
                        +- *(1) Filter isnotnull(uid#23L)
                           +- *(1) FileScan json [uid#23L] Batched: false, Format: JSON, Location: InMemoryFileIndex[file:/D:/workspace/sparktest/datas/yhtx_1022], PartitionFilters: [], PushedFilters: [IsNotNull(uid)], ReadSchema: struct<uid:bigint>

(2)explain(true)查看整个SQL的执行计划,主要分为4个阶段

 --1:解析过程

== Parsed Logical Plan ==
'Sort ['a.uid ASC NULLS FIRST], true
+- 'Aggregate ['a.uid], ['a.uid, unresolvedalias('count(1), None)]
   +- 'Filter ('a.actorlevel > 10)
      +- 'Join Inner, ('a.uid = 'u.uid)
         :- 'UnresolvedRelation `a`
         +- 'UnresolvedRelation `u`

  --2:逻辑阶段

== Analyzed Logical Plan ==
uid: string, count(1): bigint
Project [uid#84, count(1)#181L]
+- Sort [uid#84 ASC NULLS FIRST], true
   +- Aggregate [uid#84], [uid#84, count(1) AS count(1)#181L]
      +- Filter (cast(actorlevel#50 as int) > 10)
         +- Join Inner, (cast(uid#84 as bigint) = uid#23L)
            :- SubqueryAlias a
            :  +- Relation[actorid#48,actorinfoflag#49,actorlevel#50,actvdate#51,adchannel#52,adsubchannel#53,areaid#54,channel#55,city#56,clienLogDate#57,deviceid#58,edition#59,gameid#60,guid#61,imei#62,language#63,lastgamedate#64,logdate#65,logindates#66,logtype#67L,lvlupdate#68,lvluptimes#69L,money#70,netsp#71,... 13 more fields] json
            +- SubqueryAlias u
               +- Relation[actorid#6L,actorname#7,approach#8,areaid#9L,channel#10L,deviceid#11,edition#12L,gameid#13L,imei#14,item#15L,itemcount#16L,itemname#17,logdate#18,logtype#19L,opersys#20L,subchannel#21,type#22L,uid#23L] json

 --3:优化阶段

== Optimized Logical Plan ==
Sort [uid#84 ASC NULLS FIRST], true
+- Aggregate [uid#84], [uid#84, count(1) AS count(1)#181L]
   +- Project [uid#84]
      +- Join Inner, (cast(uid#84 as bigint) = uid#23L)
         :- Project [uid#84]
         :  +- Filter ((isnotnull(actorlevel#50) && (cast(actorlevel#50 as int) > 10)) && isnotnull(uid#84))
         :     +- Relation[actorid#48,actorinfoflag#49,actorlevel#50,actvdate#51,adchannel#52,adsubchannel#53,areaid#54,channel#55,city#56,clienLogDate#57,deviceid#58,edition#59,gameid#60,guid#61,imei#62,language#63,lastgamedate#64,logdate#65,logindates#66,logtype#67L,lvlupdate#68,lvluptimes#69L,money#70,netsp#71,... 13 more fields] json
         +- Project [uid#23L]
            +- Filter isnotnull(uid#23L)
               +- Relation[actorid#6L,actorname#7,approach#8,areaid#9L,channel#10L,deviceid#11,edition#12L,gameid#13L,imei#14,item#15L,itemcount#16L,itemname#17,logdate#18,logtype#19L,opersys#20L,subchannel#21,type#22L,uid#23L] json

--4:物理执行计划

== Physical Plan ==
*(4) Sort [uid#84 ASC NULLS FIRST], true, 0
+- Exchange rangepartitioning(uid#84 ASC NULLS FIRST, 200)
   +- *(3) HashAggregate(keys=[uid#84], functions=[count(1)], output=[uid#84, count(1)#181L])
      +- Exchange hashpartitioning(uid#84, 200)
         +- *(2) HashAggregate(keys=[uid#84], functions=[partial_count(1)], output=[uid#84, count#184L])
            +- *(2) Project [uid#84]
               +- *(2) BroadcastHashJoin [cast(uid#84 as bigint)], [uid#23L], Inner, BuildRight
                  :- *(2) Project [uid#84]
                  :  +- *(2) Filter ((isnotnull(actorlevel#50) && (cast(actorlevel#50 as int) > 10)) && isnotnull(uid#84))
                  :     +- *(2) FileScan json [actorlevel#50,uid#84] Batched: false, Format: JSON, Location: InMemoryFileIndex[file:/D:/workspace/sparktest/datas/actorinfo], PartitionFilters: [], PushedFilters: [IsNotNull(actorlevel), IsNotNull(uid)], ReadSchema: struct<actorlevel:string,uid:string>
                  +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true]))
                     +- *(1) Project [uid#23L]
                        +- *(1) Filter isnotnull(uid#23L)
                           +- *(1) FileScan json [uid#23L] Batched: false, Format: JSON, Location: InMemoryFileIndex[file:/D:/workspace/sparktest/datas/yhtx_1022], PartitionFilters: [], PushedFilters: [IsNotNull(uid)], ReadSchema: struct<uid:bigint>
 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值