Hive mysql 内连接,Hive的隐式连接总是内连接吗?

The join documentation for Hive encourages the use of implicit joins, i.e.

SELECT *

FROM table1 t1, table2 t2, table3 t3

WHERE t1.id = t2.id AND t2.id = t3.id AND t1.zipcode = '02535';

Is this equivalent to

SELECT t1.*, t2.*, t3.*

FROM table1 t1

INNER JOIN table2 t2 ON

t1.id = t2.id

INNER JOIN table3 t3 ON

t2.id = t3.id

WHERE t1.zipcode = '02535'

, or will the above return additional records?

解决方案

Not always. Your queries are equivalent. But without WHERE t1.id = t2.id AND t2.id = t3.id it will be CROSS JOIN.

Update:

This is interesting question and I decided to add some demo. Let's create two tables:

A(c1 int, c2 string) and B(c1 int, c2 string).

Load data:

insert into table A

select 1, 'row one' union all

select 2, 'row two';

insert into table B

select 1, 'row one' union all

select 3, 'row three';

Check data:

hive> select * from A;

OK

1 row one

2 row two

Time taken: 1.29 seconds, Fetched: 2 row(s)

hive> select * from B;

OK

1 row one

3 row three

Time taken: 0.091 seconds, Fetched: 2 row(s)

Check cross join (implicit join without where transformed to cross):

hive> select a.c1, a.c2, b.c1, b.c2 from a,b;

Warning: Map Join MAPJOIN[14][bigTable=a] in task 'Stage-3:MAPRED' is a cross product

Warning: Map Join MAPJOIN[22][bigTable=b] in task 'Stage-4:MAPRED' is a cross product

Warning: Shuffle Join JOIN[4][tables = [a, b]] in Stage 'Stage-1:MAPRED' is a cross product

OK

1 row one 1 row one

2 row two 1 row one

1 row one 3 row three

2 row two 3 row three

Time taken: 54.804 seconds, Fetched: 4 row(s)

Check inner join (implicit join with where works as INNER):

hive> select a.c1, a.c2, b.c1, b.c2 from a,b where a.c1=b.c1;

OK

1 row one 1 row one

Time taken: 38.413 seconds, Fetched: 1 row(s)

Try to perform left join by adding OR b.c1 is null to the where:

hive> select a.c1, a.c2, b.c1, b.c2 from a,b where (a.c1=b.c1) OR (b.c1 is null);

OK

1 row one 1 row one

Time taken: 57.317 seconds, Fetched: 1 row(s)

As you can see we got inner join again. or b.c1 is null is ignored

Now left join without where and ON clause (transformed to CROSS):

select a.c1, a.c2, b.c1, b.c2 from a left join b;

OK

1 row one 1 row one

1 row one 3 row three

2 row two 1 row one

2 row two 3 row three

Time taken: 37.104 seconds, Fetched: 4 row(s)

As you can see we got cross again.

Try left join with where clause and without ON (works as INNER):

select a.c1, a.c2, b.c1, b.c2 from a left join b where a.c1=b.c1;

OK

1 row one 1 row one

Time taken: 40.617 seconds, Fetched: 1 row(s)

We got INNER join

Try left join with where clause and without ON+ try to allow nulls:

select a.c1, a.c2, b.c1, b.c2 from a left join b where a.c1=b.c1 or b.c1 is null;

OK

1 row one 1 row one

Time taken: 53.873 seconds, Fetched: 1 row(s)

Again got INNER. or b.c1 is null is ignored.

Left join with on clause:

hive> select a.c1, a.c2, b.c1, b.c2 from a left join b on a.c1=b.c1;

OK

1 row one 1 row one

2 row two NULL NULL

Time taken: 48.626 seconds, Fetched: 2 row(s)

Yes, it is true left join.

Left join with on + where (got INNER):

hive> select a.c1, a.c2, b.c1, b.c2 from a left join b on a.c1=b.c1 where a.c1=b.c1;

OK

1 row one 1 row one

Time taken: 49.54 seconds, Fetched: 1 row(s)

We got INNER because WHERE does not allow NULLS.

Left join with where + allow nulls:

hive> select a.c1, a.c2, b.c1, b.c2 from a left join b on a.c1=b.c1 where a.c1=b.c1 or b.c1 is null;

OK

1 row one 1 row one

2 row two NULL NULL

Time taken: 55.951 seconds, Fetched: 2 row(s)

Yes, it is left join.

Conclusion:

Implicit join works as INNNER (with where) or CROSS if without WHERE

clause.

Left join can work as CROSS if without ON and without WHERE, can also work as INNER if WHERE clause does not allows nulls

for right

table.

Better use ANSI syntax because it is self-explaining and it is easy to understand what do you expect it to work like. Implicit joins or left joins working as INNER or CROSS are difficult to understand and very prone to error.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值