实验准备数据
首先要准备两个文档,用来下面做join操作
>>a.txt
1,a
2,b
3,c
4,d
7,y
8,u
>>b.txt
2,bb
3,cc
7,yy
9,pp
建表:
create table a(id int,name string)
row format delimited fields terminated by ','
create table b(id int,name string)
row format delimited fields terminated by ',';
导入数据:
load data local inpath '/home/hadoop/a.txt' into table a;
load data local inpath '/home/hadoop/b.txt' into table b;
a表和b表:
id | name |
1 | a |
2 | b |
3 | c |
4 | d |
7 | y |
8 | u |
id | name |
2 | bb |
3 | cc |
7 | yy |
9 | pp |
实验:
注:Hive中的join操作的关键字必须在on中指定,不能在where中指定,不然会先做笛卡尔积再过滤,详解请往下面看
内关联(inner join):
join关键字默认为内连接,返回两张表中都有的信息。例如下面的例子,用a表和b表的id关联的,所以join输出的结果就是两个表中id相同的内容
select * from a inner join b on a.id=b.id;
+-------+---------+-------+---------+--+
| a.id | a.name | b.id | b.name |
+-------+---------+-------+---------+--+
| 2 | b | 2 | bb |
| 3 | c | 3 | cc |
| 7 | y | 7 | yy |
+-------+---------+-------+---------+--+
左关联(left join):
以前面的表作为主表和其他表进行关联,返回的记录数和主表的记录数相同,关联不上的字段用NULL;
left [outer] join对结果无影响;
select * from a left join b on a.id=b.id;
+-------+---------+-------+---------+--+
| a.id | a.name | b.id | b.name |
+-------+---------+-------+---------+--+
| 1 | a | NULL | NULL |
| 2 | b | 2 | bb |
| 3 | c | 3 | cc |
| 4 | d | NULL | NULL |
| 7 | y | 7 | yy |
| 8 | u | NULL | NULL |
+-------+---------+-------+---------+--+
右关联(right join):
right join与left相反,以后面的表为主表,和前面的表做关联,返回的记录数和主表一致,关联不上的字段为NULL;
select * from a right join b on a.id=b.id;
全关联(full join):
返回两个表记录的并集,关联不上的字段为NULL;
select * from a full outer join b on a.id=b.id;
+-------+---------+-------+---------+--+
| a.id | a.name | b.id | b.name |
+-------+---------+-------+---------+--+
| 1 | a | NULL | NULL |
| 2 | b | 2 | bb |
| 3 | c | 3 | cc |
| 4 | d | NULL | NULL |
| 7 | y | 7 | yy |
| 8 | u | NULL | NULL |
| NULL | NULL | 9 | pp |
+-------+---------+-------+---------+--+
left semi join:
left semi join以关键字前面的表为主表,两个表对on的条件字段做交集,返回前面表的记录。相当于内关联只返回主表
select * from a left semi join b on a.id = b.id;
+-------+---------+--+
| a.id | a.name |
+-------+---------+--+
| 2 | b |
| 3 | c |
| 7 | y |
+-------+---------+--+