关于Hive 在子查询中是否支持 IN 和 NOT IN 子句的测试

亲测有效,在hive-2.1.1版本,存储为hadoop-2.7.3下,hive支持 in或not in 中包含查询子句的语法,此外,在有些版本或hive thrift包不支持的话,也可能通过left join实现

1、创建表

假设有一个用户浏览商品表scan_info,和一个用户购买商品表buy_info。如下

hive> create table scan_info (userId string,itemId string,time string) row format delimited fields terminated by '\t';

OK

Time taken: 0.784 seconds

hive> create table buy_info (userId string,itemId string,time string) row format delimited fields terminated by '\t';

OK

Time taken: 0.133 seconds

2、准备数据并导入数据

hive> load data local inpath '/home/wo/yy_work/scan_info.txt' overwrite into table scan_info;

Loading data to table test_hive.scan_info

OK

Time taken: 2.568 seconds

hive> load data local inpath '/home/wo/yy_work/buy_info.txt' overwrite into table buy_info;

Loading data to table test_hive.buy_info

OK

Time taken: 0.605 second

3、验证查看数据

hive> select * from scan_info;

OK

001 342 2018-12-14

002 382 2018-12-15

003 458 2018-12-15

004 468 2018-12-15

hive> select * from buy_info;

OK

001 342 2018-12-13

002 382 2018-12-14

003 458 2018-12-15

005 325 2018-12-15

4、左外连接查询


select * from scan_info left outer join buy_info on scan_info.userId = buy_info.userId and scan_info.itemId = buy_info.itemId 

Total MapReduce CPU Time Spent: 2 seconds 690 msec

OK

001 342 2018-12-14 001 342 2018-12-13

002 382 2018-12-15 002 382 2018-12-14

003 458 2018-12-15 003 458 2018-12-15

004 468 2018-12-15 NULL NULL NULL

Time taken: 23.802 seconds, Fetched: 4 row(s)

5、如果要查询在scan_info表中并且也在buy_info表中的信息,需要用in查询,hive sql如下:

select scan_info.userId , scan_info.itemId from scan_info left outer join buy_info

on scan_info.userId = buy_info.userId and scan_info.itemId = buy_info.itemId

where buy_info.userId is not null;

001 342

002 382

003 458

Time taken: 22.313 seconds, Fetched: 3 row(s)

6、如果要查询在scan_info表中并且不也在buy_info表中的信息,需要用not in查询,hive sql如下:

select scan_info.userId, scan_info.itemId from scan_info left outer join buy_info

on scan_info.userId=buy_info.userId and scan_info.itemId=buy_info.itemId

where buy_info.userId is null;

OK

004 468

Time taken: 20.133 seconds, Fetched: 1 row(s)

7、not in 子查询

select scan_info.userId, scan_info.itemId from scan_info where scan_info.userId not in (

select distinct buy_info.userId from buy_info )

OK

004 468

Time taken: 104.018 seconds, Fetched: 1 row(s)

8、in子查询

select scan_info.userId, scan_info.itemId from scan_info where scan_info.userId in (

select distinct buy_info.userId from buy_info )

OK

001 342

002 382

003 458

Time taken: 44.224 seconds, Fetched: 3 row(s)

参考:https://blog.csdn.net/yzhhughyzh/article/details/45604291

          https://blog.csdn.net/chenyuexing0822/article/details/78525383         

 

  • 0
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值