Hive——join的使用

最新推荐文章于 2024-07-29 11:41:47 发布

sunghosts

最新推荐文章于 2024-07-29 11:41:47 发布

阅读量670

点赞数

分类专栏：大数据文章标签： hive

原文链接：https://www.cnblogs.com/jnba/p/10673747.html

版权

大数据专栏收录该内容

4 篇文章 0 订阅

订阅专栏

转：https://www.cnblogs.com/jnba/p/10673747.html

hive中常用的join有：inner join、left join 、right join 、full join、left semi join、cross join、mulitiple

在hive中建立两张表，用于测试：

hive> select * from rdb_a;
OK
1       lucy
2       jack
3       tony
 
hive> select * from rdb_b;
OK
1       12
2       22
4       32

一、基本join使用

1、内关联（[inner] join）：只返回关联上的结果

select a.id,a.name,b.age from rdb_a a inner join rdb_b b on a.id=b.id;
 
Total MapReduce CPU Time Spent: 2 seconds 560 msec
OK
1       lucy    12
2       jack    22
Time taken: 47.419 seconds, Fetched: 2 row(s)

2、左关联（left [outer] join）：以左表为主

select a.id,a.name,b.age from rdb_a a left join rdb_b b on a.id=b.id;
 
Total MapReduce CPU Time Spent: 1 seconds 240 msec
OK
1       lucy    12
2       jack    22
3       tony    NULL
Time taken: 33.42 seconds, Fetched: 3 row(s)

3、右关联（right [outer] join）：以右表为主

select a.id,a.name,b.age from rdb_a a right join rdb_b b on a.id=b.id;
 
Total MapReduce CPU Time Spent: 2 seconds 130 msec
OK
1       lucy    12
2       jack    22
NULL    NULL    32
Time taken: 32.7 seconds, Fetched: 3 row(s)

4、全关联（full [outer] join）：以两个表的记录为基准，返回两个表的记录去重之和，关联不上的字段为NULL。

select a.id,a.name,b.age from rdb_a a full join rdb_b b on a.id=b.id;
 
Total MapReduce CPU Time Spent: 5 seconds 540 msec
OK
1       lucy    12
2       jack    22
3       tony    NULL
NULL    NULL    32
Time taken: 42.938 seconds, Fetched: 4 row(s)

5、left semi join：以LEFT SEMI JOIN关键字前面的表为主表，返回主表的KEY也在副表中的记录。

select a.id,a.name from rdb_a a left semi join rdb_b b on a.id=b.id;
 
Total MapReduce CPU Time Spent: 3 seconds 300 msec
OK
1       lucy
2       jack
Time taken: 31.105 seconds, Fetched: 2 row(s)
 
其实就相当于：select a.id,a.name from rdb_a a where a.id in(select b.id from  rdb_b b );

6、笛卡尔积关联（cross join）：返回两个表的笛卡尔积结果，不需要指定关联键

select a.id,a.name,b.age from rdb_a a cross join rdb_b b;
 
Total MapReduce CPU Time Spent: 1 seconds 260 msec
OK
1       lucy    12
1       lucy    22
1       lucy    32
2       jack    12
2       jack    22
2       jack    32
3       tony    12
3       tony    22
3       tony    32
Time taken: 24.727 seconds, Fetched: 9 row(s)