Hive中常见的join方式

Hive中除了支持和传统数据库中一样的内关联、左关联、右关联、全关联,还支持LEFT SEMI JOIN和CROSS JOIN,但这两种JOIN类型也可以用前面的代替。

如何实现join?

两张表的数据:

在这里插入图片描述

1)内关联(JOIN) 只返回能关联上的结果。

SELECT a.id, a.name,b.age FROM student a INNER JOIN studentssb b ON (a.id = b.id);

INNER可以省略

0: jdbc:hive2://hadoop1012:10000> SELECT a.id, a.name,b.age FROM student a  INNER  JOIN studentssb b  ON (a.id = b.id);
+-------+------------+--------+
| a.id  |   a.name   | b.age  |
+-------+------------+--------+
| 1017  | student17  | 14     |
| 1018  | student18  | 18     |
| 1019  | student19  | 24     |
| 1001  | student1   | 12     |
| 1002  | student2   | 14     |
| 1003  | student3   | 18     |
| 1004  | student4   | 14     |
| 1005  | student5   | 18     |
| 1006  | student6   | 24     |
| 1007  | student7   | 15     |
| 1008  | student8   | 17     |
| 1009  | student9   | 18     |
| 1010  | student10  | 14     |
| 1011  | student11  | 12     |
| 1013  | student13  | 14     |
| 1014  | student14  | 15     |
| 1015  | student15  | 17     |
| 1016  | student16  | 18     |
+-------+------------+--------+

2) 左外关联(LEFT [OUTER] JOIN) 以LEFT [OUTER] JOIN关键字前面的表作为主表,和其他表进行关联,返回记录和主表的记录数一致,关联不上的字段置为NULL。是否指定OUTER关键字,貌似对查询结果无影响。

select a.id,a.name,b.age from student a left join studentssb b ON (a.id=b.id);

0: jdbc:hive2://hadoop1012:10000> select a.id,a.name,b.age from student a left join studentssb b ON (a.id=b.id);
+-------+------------+--------+
| a.id  |   a.name   | b.age  |
+-------+------------+--------+
| 1001  | student1   | 12     |
| 1002  | student2   | 14     |
| 1003  | student3   | 18     |
| 1004  | student4   | NULL   |
| 1005  | student5   | NULL   |
| 1006  | student6   | NULL   |
| 1007  | student7   | NULL   |
| 1008  | student8   | NULL   |
| 1009  | student9   | NULL   |
| 1010  | student10  | NULL   |
| 1011  | student11  | NULL   |
| 1012  | student12  | NULL   |
| 1013  | student13  | NULL   |
| 1014  | student14  | NULL   |
| 1015  | student15  | NULL   |
| 1016  | student16  | NULL   |
+-------+------------+--------+

3)右外关联(RIGHT [OUTER] JOIN) 和左外关联相反,以RIGTH [OUTER] JOIN关键词后面的表作为主表,和前面的表做关联,返回记录数和主表一致,关联不上的字段为NULL。是否指定OUTER关键字,貌似对查询结果无影响。

SELECT a.id, a.name, b.age FROM student a RIGHT OUTER JOIN studentssb b ON (a.id = b.id);

0: jdbc:hive2://hadoop1012:10000> SELECT a.id, a.name, b.age  FROM student a  RIGHT OUTER JOIN studentssb b  ON (a.id = b.id);
+-------+------------+--------+
| a.id  |   a.name   | b.age  |
+-------+------------+--------+
| 1001  | student1   | 12     |
| 1002  | student2   | 14     |
| 1003  | student3   | 18     |
| 1004  | student4   | 14     |
| 1005  | student5   | 18     |
| 1006  | student6   | 24     |
| 1007  | student7   | 15     |
| 1008  | student8   | 17     |
| 1009  | student9   | 18     |
| 1010  | student10  | 14     |
| 1011  | student11  | 12     |
| 1013  | student13  | 14     |
| 1014  | student14  | 15     |
| 1015  | student15  | 17     |
| 1016  | student16  | 18     |
| 1017  | student17  | 14     |
| 1018  | student18  | 18     |
| 1019  | student19  | 24     |
| NULL  | NULL       | 15     |
| NULL  | NULL       | 17     |
+-------+------------+--------+

4) 全外关联(FULL [OUTER] JOIN) 以两个表的记录为基准,返回两个表的记录去重之和,关联不上的字段为NULL。是否指定OUTER关键字,貌似对查询结果无影响。

注意:FULL JOIN时候,Hive不会使用MapJoin来优化。

SELECT a.id, a.name, b.age FROM student a FULL OUTER JOIN studentssb b ON (a.id = b.id);

0: jdbc:hive2://hadoop1012:10000> SELECT a.id, a.name, b.age  FROM student a  FULL OUTER JOIN studentssb b  ON (a.id = b.id);
+-------+------------+--------+
| a.id  |   a.name   | b.age  |
+-------+------------+--------+
| 1001  | student1   | 12     |
| 1002  | student2   | 14     |
| 1003  | student3   | 18     |
| 1004  | student4   | 14     |
| 1005  | student5   | 18     |
| 1006  | student6   | 24     |
| 1007  | student7   | 15     |
| 1008  | student8   | 17     |
| 1009  | student9   | 18     |
| 1010  | student10  | 14     |
| 1011  | student11  | 12     |
| 1012  | student12  | NULL   |
| 1013  | student13  | 14     |
| 1014  | student14  | 15     |
| 1015  | student15  | 17     |
| 1016  | student16  | 18     |
| 1017  | student17  | 14     |
| 1018  | student18  | 18     |
| 1019  | student19  | 24     |
| NULL  | NULL       | 15     |
| NULL  | NULL       | 17     |
| 1022  | student22  | NULL   |
+-------+------------+--------+

5) LEFT SEMI JOIN 以LEFT SEMI JOIN关键字前面的表为主表,返回主表的KEY也在副表中的记录。

SELECT a.id, a.name FROM student a LEFT SEMI JOIN studentssb b ON (a.id = b.id);

0: jdbc:hive2://hadoop1012:10000> SELECT a.id, a.name  FROM student a  LEFT SEMI JOIN studentssb b  ON (a.id = b.id);
+-------+------------+
| a.id  |   a.name   |
+-------+------------+
| 1017  | student17  |
| 1018  | student18  |
| 1019  | student19  |
| 1001  | student1   |
| 1002  | student2   |
| 1003  | student3   |
| 1004  | student4   |
| 1005  | student5   |
| 1006  | student6   |
| 1007  | student7   |
| 1008  | student8   |
| 1009  | student9   |
| 1010  | student10  |
| 1011  | student11  |
| 1013  | student13  |
| 1014  | student14  |
| 1015  | student15  |
| 1016  | student16  |
+-------+------------+

6) 笛卡尔积关联(CROSS JOIN) 返回两个表的笛卡尔积结果,不需要指定关联键。

SELECT a.id, a.name, b.age FROM student a CROSS JOIN studentssb b;

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值