关于sparkSQL子查询的相关问题

sparkSQL支持的子查询:

spark.sql("select u1 from (select u1,u2,u3 from tableTemp)").show()

spark.sql("select u1 from tableTemp1 where u1 > (select max(pre) from tableTemp2)").show()

sparkSQL不支持的子查询:

  1. exists和in子查询(新版本已支持)
比如:
Select * From feeData Where FeeID Not in (Select ID From FeeInfo)
FeeInfo 表最好有索引
优化语句:Select * From feeData as d Left Join FeeInfo f on d.FeeID =f.ID Where
f.Id is null

原因参考Although Apache Spark SQL currently does not support IN or EXISTS subqueries, you can efficiently implement the semantics by rewriting queries to use LEFT SEMI JOIN.

in子查询不支持,in后直接跟值做查询条件还是支持的:

spark.sql("select * from tableTemp where pre_prob in (0.9,1,2)").show()

但还应注意in与括号之间的距离,因为sparkSQL内置了in(expr2,expr3,…)函数,返回的是布尔值:

> SELECT 1 in(1, 2, 3);
 true
> SELECT 1 in(2, 3, 4);
 false
> SELECT named_struct('a', 1, 'b', 2) in(named_struct('a', 1, 'b', 1), named_struct('a', 1, 'b', 3));
 false
> SELECT named_struct('a', 1, 'b', 2) in(named_struct('a', 1, 'b', 2), named_struct('a', 1, 'b', 3));
 true

补充
在新版本的hive以及spark中均支持了exists和in作为子查询,参考:
spark官网描述http://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-where.html
hive官网描述https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries


  1. 子查询作为字段
比如:
val result = spark.sql("select
	ssdggs,
	ssdggsid,
	ssdsmc,
	ssds,
	count(*) sl,
	round(count(*) / (select count(*) from dyycjl dy where dy.xgsbm=sb.ssgdgsid) * 100 , 2) zb
	from sbxx sb group by sb.ssdggs,sb.ssdggsid,sb.ssdsmc,sb.ssds")
改为:
val tempTable1 = spark.sql("select
	ssdggs,
	ssdggsid,
	ssdsmc,
	ssds,
	count(*) sl
	from sbxx sb group by sb.ssdggs,sb.ssdggsid,sb.ssdsmc,sb.ssds")
tempTable1.createTempView("tempTable1")

val tempTable2 = spark.sql("select xgsbm,count(*) num from dyycjl group by xgsbm")
tempTable2.createTempView("tempTable2")

spark.sql("select t1.*,round(t1.sl / num *100 , 2) zb from tempTable1 t1 left join tempTable2 t2 on t1.xgsbm=t2.ssgdgsid")
评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值