hive过滤条件on和过滤条件where的区别

过滤条件on:指的是join后面on的具体内容,例如: on t1.age>=20
过滤条件where:指的是where条件后面的内容,例如:where t1.age>=20

我们来看具体结果,到底有什么不同?

案例1:过滤条件on
spark-sql> with test1 as (
         > select 1 as user_id,'xiaoming' as name
         > union all 
         > select 2 as user_id,'xiaolan' as name
         > union all 
         > select 3 as user_id,'xiaoxin' as name),
         > 
         > test2 as (
         > select 1 as user_id,19 as age
         > union all 
         > select 2 as user_id,20 as age
         > union all 
         > select 3 as user_id,21 as age)
         > 
         > select 
         >   t1.user_id,
         >   t1.name,
         >   t2.user_id,
         >   t2.age
         > from 
         > (select 
         >   user_id,
         >   name
         > from test1) t1
         > 
         > left join
         > (select 
         >   user_id,
         >   age
         > from test2) t2
         > on t1.user_id=t2.user_id and t1.user_id<=2;
1       xiaoming        1       19                                              
2		xiaolan			2		20
3		xiaoxin			NULL	NULL
Time taken: 6.106 seconds, Fetched 3 row(s)

注:从结果能看出来,on t1.user_id<=2的条件只是让user_id<=2的数据不参与join操作,这样右表数据就关联不上了,但是左边中的数据并不会被过滤掉

案例2:过滤条件where
spark-sql> with test1 as (
         > select 1 as user_id,'xiaoming' as name
         > union all 
         > select 2 as user_id,'xiaolan' as name
         > union all 
         > select 3 as user_id,'xiaoxin' as name),
         > 
         > test2 as (
         > select 1 as user_id,19 as age
         > union all 
         > select 2 as user_id,20 as age
         > union all 
         > select 3 as user_id,21 as age)
         > 
         > select 
         >   t1.user_id,
         >   t1.name,
         >   t2.user_id,
         >   t2.age
         > from 
         > (select 
         >   user_id,
         >   name
         > from test1) t1
         > 
         > left join
         > (select 
         >   user_id,
         >   age
         > from test2) t2
         > on t1.user_id=t2.user_id 
         > where t1.user_id<=2;
1	xiaoming	1	19
2	xiaolan		2	20
Time taken: 0.279 seconds, Fetched 2 row(s)

注:从结果能看出来,where t1.user_id<=2的条件让user_id<=2的数据直接过滤掉了,但是有一个点没有考虑优化的是左表中user_id<=2会参与join操作后才会被过滤掉

总结:
1.过滤条件on起到的作用是让user_id<=2的数据不参与join操作,并不能将我们不需要的数据过滤掉
2.过滤条件where起到了过滤的作用,把我们不需要的数据直接过滤掉
3.也可以这么理解,过滤条件on起到了局部过滤作用,过滤条件where起到了全局过滤作用

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值