Hive面试题:hive如何实现in和not in

https://blog.csdn.net/qq_42246689/article/details/84702253

https://blog.csdn.net/zhangge360/article/details/84865567

 

In的实现:

Hive中的in的实现方式很多,简单说几种:

  1. 用left semi join实现
  2. 用left outer join+is not null实现
  3. Inner join实现

Not in的实现:

Left outer join+is null

举例说明:

有两个表如下:

skim表

userIDitemIDtime
0013422015-05-08
0023822015-05-09
0024582015-05-09
0043252015-05-09

 

 

 

 

 

userIDitemIDtime
0013422015-05-07
0023822015-05-08
0034582015-05-09
0043252015-05-09

 

 

 

 

 

 

IN实现:

如果要查询在skim表中并且也在buy表中的信息,需要用in查询,hive sql如下:

select skim.userId , skim.itemId from skim left outer join buy

on skim.userId = buy .userId and skim.itemId = buy .itemId where buy .userId is not null;

select skim.userId , skim.itemId from skim left semi join buy

on skim.userId = buy .userId and skim.itemId = buy .itemId;

select skim.userId , skim.itemId from skim join buy

on skim.userId = buy .userId and skim.itemId = buy .itemId;

结果如下:

userIDitemID
001342
002382
003458

 

 

 

 

 

NOT IN实现:

如果要查询在skim表中并且不在buy表中的信息,需要用not in查询,hive sql如下:

select skim.userId, skim.itemId from skim left outer join buy

on skim.userId=buy .userId and skim.itemId=buy .itemId where buy .userId is null;

结果如下:

userIDitemID
004468

 

 

 

 

 

Hive 不支持 where 子句中的子查询, SQL 常用的 exist in 子句需要改写。这一改写相对简单。考虑以下 SQL 查询语句:

 
  1. SELECT a.key, a.value

  2. FROM a

  3. WHERE a.key in

  4. (SELECT b.key

  5. FROM B);

可以改写为

 
  1. SELECT a.key, a.value

  2. FROM a LEFT OUTER JOIN b ON (a.key = b.key)

  3. WHERE b.key <> NULL;

一个更高效的实现是利用 left semi join 改写为:

 
  1. SELECT a.key, a.val

  2. FROM a LEFT SEMI JOIN b on (a.key = b.key);

left semi join 是 0.5.0 以上版本的特性。hive 的 left semi join 讲解https://blog.csdn.net/happyrocking/article/details/79885071

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

not exists 例子

 
  1.  
  2. select a, b

  3. from table1 t1

  4. where not exists (select 1

  5. from table2 t2

  6. where t1.a = t2.a

  7. and t1.b = t2.b)

  8.  

可以改为

 
  1.  
  2.  
  3. select t1.a, t2.b

  4. from table1 t1

  5. left join table2 t2

  6. on (t1.a = t2.a and t1.b = t2.b)

  7. where t2.a is null

  8.  
  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值