hive四种常见的join

1.左连接 left join
spark-sql> with test1 as (
         > select 1 as user_id,'xiaoming' as name
         > union all 
         > select 2 as user_id,'xiaolan' as name
         > union all 
         > select 3 as user_id,'xiaoxin' as name
         > ),
         > 
         > test2 as (
         > select 1 as user_id,19 as age
         > union all 
         > select 2 as user_id,20 as age
         > union all 
         > select 4 as user_id,21 as age
         > )
         > 
         > select 
         >   t1.user_id,
         >   t1.name,
         >   t2.user_id,
         >   t2.age
         > from 
         > (select 
         >   user_id,
         >   name
         > from test1) t1
         > 
         > left join
         > (select 
         >   user_id,
         >   age
         > from test2) t2
         > on t1.user_id=t2.user_id;
1	xiaoming	1	19
2	xiaolan		2	20
3	xiaoxin		NULL	NULL
Time taken: 0.936 seconds, Fetched 3 row(s)

以左表为主,左表数据全部返回,右表数据只返回和左表通过关联条件能关联上的数据

2.右连接 right join
spark-sql> with test1 as (
         > select 1 as user_id,'xiaoming' as name
         > union all 
         > select 2 as user_id,'xiaolan' as name
         > union all 
         > select 3 as user_id,'xiaoxin' as name
         > ),
         > 
         > test2 as (
         > select 1 as user_id,19 as age
         > union all 
         > select 2 as user_id,20 as age
         > union all 
         > select 4 as user_id,21 as age
         > )
         > 
         > select 
         >   t1.user_id,
         >   t1.name,
         >   t2.user_id,
         >   t2.age
         > from 
         > (select 
         >   user_id,
         >   name
         > from test1) t1
         > 
         > right join
         > (select 
         >   user_id,
         >   age
         > from test2) t2
         > on t1.user_id=t2.user_id;
1       xiaoming        1       19                                              
2		xiaolan			2		20
NULL	NULL			4		21
Time taken: 0.936 seconds, Fetched 3 row(s)

以右表为主,右表数据全部返回,左表只返回和右表通过关联条件能关联上的数据

3.内连接 inner join
spark-sql> with test1 as (
         > select 1 as user_id,'xiaoming' as name
         > union all 
         > select 2 as user_id,'xiaolan' as name
         > union all 
         > select 3 as user_id,'xiaoxin' as name
         > ),
         > 
         > test2 as (
         > select 1 as user_id,19 as age
         > union all 
         > select 2 as user_id,20 as age
         > union all 
         > select 4 as user_id,21 as age
         > )
         > 
         > select 
         >   t1.user_id,
         >   t1.name,
         >   t2.user_id,
         >   t2.age
         > from 
         > (select 
         >   user_id,
         >   name
         > from test1) t1
         > 
         > inner join
         > (select 
         >   user_id,
         >   age
         > from test2) t2
         > on t1.user_id=t2.user_id;
1       xiaoming        1       19                                              
2		xiaolan			2		20
Time taken: 1.108 seconds, Fetched 2 row(s)

通过关联条件只取两个表数据的交集,关联不上的全部剔除了

4.全连接(满连接) full join
spark-sql> with test1 as (
         > select 1 as user_id,'xiaoming' as name
         > union all 
         > select 2 as user_id,'xiaolan' as name
         > union all 
         > select 3 as user_id,'xiaoxin' as name
         > ),
         > 
         > test2 as (
         > select 1 as user_id,19 as age
         > union all 
         > select 2 as user_id,20 as age
         > union all 
         > select 4 as user_id,21 as age
         > )
         > 
         > select 
         >   t1.user_id,
         >   t1.name,
         >   t2.user_id,
         >   t2.age
         > from 
         > (select 
         >   user_id,
         >   name
         > from test1) t1
         > 
         > full join
         > (select 
         >   user_id,
         >   age
         > from test2) t2
         > on t1.user_id=t2.user_id;
1       xiaoming        1       19                                              
3		xiaoxin			NULL	NULL
NULL	NULL			4		21
2		xiaolan			2		20

通过关联条件取两个表数据的并集,关联上的数据返回,关联不上的数据,另一部分返回NULL

总结:
如果只做数据拼接,一般用left join;如果只取几个表共同的部分,用inner join;如果所有的关联数据都要保留用full join

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值