本篇博文主要是针对Hive中join的部分,自己跑2张demo表,直观查看一下各种join的查询结果,便于理解。
目录
2. left join 和 left outer join
2.2. left outer join ... where XX is null
3. right join 和 right outer join
3.2 right outer join ... where XX is null
4.2 full outer join ... where A.XX is null or B.XX is null
首先建2张demo表,
表tmp1的字段A,B;
hive> select * from tmp1;
a b
1 7
2 8
3 9
4 7
表tmp2的字段C,D;
hive> select * from tmp2;
c d
1 10
4 12
5 11
1. join 和 inner join
hive> select * from tmp1 join tmp2 on tmp1.A=tmp2.C;
a b c d
1 7 1 10
4 7 4 12
hive> select * from tmp1 inner join tmp2 on tmp1.A=tmp2.C;
a b c d
1 7 1 10
4 7 4 12
结论:join和inner join完全等价,都是求两张表的交集。
2. left join 和 left outer join
hive> select * from tmp1 left join tmp2 on tmp1.A=tmp2.C;
a b c d
1 7 1 10
2 8 NULL NULL
3 9 NULL NULL
4 7 4 12
hive> select * from tmp1 left outer join tmp2 on tmp1.A=tmp2.C;
a b c d
1 7 1 10
2 8 NULL NULL
3 9 NULL NULL
4 7 4 12
结论:left join和left outer join完全等价。
2.2. left outer join ... where XX is null
hive> select * from tmp1 left outer join tmp2 on tmp1.A=tmp2.C where tmp2.C is null;
a b c d
2 8 NULL NULL
3 9 NULL NULL
结论:left outer join...where XX is null,实现功能为去重,在tmp1表中,去除了on的字段相等的,tmp2表中的记录。
求两张表的差集。
3. right join 和 right outer join
hive> select * from tmp1 right join tmp2 on tmp1.A=tmp2.C;
a b c d
1 7 1 10
4 7 4 12
NULL NULL 5 11
hive> select * from tmp1 right outer join tmp2 on tmp1.A=tmp2.C;
a b c d
1 7 1 10
4 7 4 12
NULL NULL 5 11
结论:right join和right outer join完全等价。
3.2 right outer join ... where XX is null
hive> select * from tmp1 right outer join tmp2 on tmp1.A=tmp2.C where tmp1.A is null;
a b c d
NULL NULL 5 11
结论:与left outer join where XX is null类似,求两张表的差集。
4. full outer join
hive> select * from tmp1 full outer join tmp2 on tmp1.A=tmp2.C;
a b c d
1 7 1 10
2 8 NULL NULL
3 9 NULL NULL
4 7 4 12
NULL NULL 5 11
结论:full outer join是求两张表的并集
4.2 full outer join ... where A.XX is null or B.XX is null
hive> select * from tmp1 full outer join tmp2 on tmp1.A=tmp2.C where tmp1.A is null or tmp2.C is null;
a b c d
2 8 NULL NULL
3 9 NULL NULL
NULL NULL 5 11
结论:取并集中,A或者B的字段为null的记录
总结:
(1)inner join和join等价,求两表交集,关联字段匹配且同时存在的记录;
(2)A left join B和A left outer join B等价,返回记录数与表A一致,关联字段不匹配的B表部分用null补全;
(2.2)A left outer join B where B.XX is null,求A表中剔除了B表中关联字段匹配的部分,求两表差集
(3)A right join B和A right outer join B等价,返回记录数与表B一致,关联字段不匹配的A表部分用null补全;
(3.2)A right outer join B where A.XX is null,求B表中剔除了A表中关联字段匹配的部分,求两表差集
(4)full outer join,求两表并集,关联字段不匹配的部门用null补全;
(5)full outer join ... where A.XX is null or B.XX is null,求两表并集中,关联字段任一表为null的部分。
发
1514

被折叠的 条评论
为什么被折叠?



