上一篇hive 常见join 从原理上讲解了他们的区别,这篇文章通过例子来区别。
0 Data Prepare
create table test.student as
select 1 as id, 'jack' as name
union all
select 2 as id, 'jack' as name
union all
select 3 as id, 'jerry' as name
union all
select 4 as id, 'tom' as name;
create table test.score as
select 'jack' as name, 'English' as class, 80 as score
union all
select 'jerry' as name, 'English' as class, 90 as score
union all
select 'jerry' as name, 'Chinese' as class, 85 as score;
1 map join
select /*+mapjoin(sc)*/ stu.*
from test.student stu
left join test.score sc
on stu.name = sc.name;
map join 是实现join的一种方式,并不改变结果。所有右侧表里有两项相同key jerry的话,在关联表里会出现两项jerry.
2 left semi join
select stu.*
from test.student stu
left semi join test.score sc
on stu.name = sc.name;
left semi join 实现的是in的逻辑,右侧表里相同的key jerry会进行去重。所以结果里只有一项jerry结果。