hive full join多表多关联键联合查询

2 篇文章 0 订阅

业务需求中往往需要将多表全量数据合并,数据行合并时用union、union all,列合并时则用full join。示例如下:

create table tmp.tmp_fulljoin_tablea (key1 string,key2 string,tcola string);
create table tmp.tmp_fulljoin_tableb (key1 string,key2 string,tcolb string);
create table tmp.tmp_fulljoin_tablec (key1 string,key2 string,tcolc string);

insert into table tmp.tmp_fulljoin_tablea
select '111','aaa','111aaa'
union all 
select '222','bbb','222bbb'
union all 
select '333',null,'333ccc';

insert into table tmp.tmp_fulljoin_tableb
select '111','aaa','111aaa'
union all 
select '444','ddd','444ddd'
union all 
select null,'ccc','333ccc';

insert into table tmp.tmp_fulljoin_tablec
select '111','aaa','111aaa'
union all 
select '444','ddd','444ddd'
union all 
select '333','ccc','333ccc';

两表full join 

a表与b表通过key1进行全关联全字段

select *
from tmp.tmp_fulljoin_tablea a   
full join tmp.tmp_fulljoin_tableb b 
on a.key1 = b.key1;

实际应用

select COALESCE(a.key1,b.key1) key1,
    COALESCE(a.key2,b.key2) key2,
    tcola,tcolb
from tmp.tmp_fulljoin_tablea a   
full join tmp.tmp_fulljoin_tableb b 
on a.key1 = b.key1;

 

多表full join 

a、b、c表通过key1进行全关联全字段

select *
from tmp.tmp_fulljoin_tablea a   
full join tmp.tmp_fulljoin_tableb b 
on a.key1 = b.key1
full join tmp.tmp_fulljoin_tablec c
on COALESCE(a.key1,b.key1) = c.key1;

 注意不要将c表关联条件写成on a.key1 = c.key1;否则达不到多表关联效果

 实际应用

select COALESCE(a.key1,b.key1,c.key1) key1,
    COALESCE(a.key2,b.key2,c.key2) key2,
    tcola,tcolb,tcolc
from tmp.tmp_fulljoin_tablea a   
full join tmp.tmp_fulljoin_tableb b 
on a.key1 = b.key1
full join tmp.tmp_fulljoin_tablec c
on COALESCE(a.key1,b.key1) = c.key1;

多表多关联键full join 

select COALESCE(a.key1,b.key1,c.key1) key1,
    COALESCE(a.key2,b.key2,c.key2) key2,
    tcola,tcolb,tcolc
from tmp.tmp_fulljoin_tablea a   
full join tmp.tmp_fulljoin_tableb b 
on a.key1 = b.key1 and a.key2 = b.key2
full join tmp.tmp_fulljoin_tablec c
on COALESCE(a.key1,b.key1) = c.key1 and COALESCE(a.key2,b.key2) = c.key2

 结合执行计划可以看出先是a表和b表关联,然后得出的结果再与c表关联,使用coalesce也不会出现c表的333eee和a表的333、b表的ccc错表关联。

 

多表full join优化

当数据量过大时可以尝试将所有表的关联键落表,然后left join。根据实际环境执行表现选择最佳方法。

create table as tmp_key as  
select key1,key2 from tmp.tmp_fulljoin_tablea
union 
select key1,key2 from tmp.tmp_fulljoin_tableb
union 
select key1,key2 from tmp.tmp_fulljoin_tablec;


select k.*,
    tcola,tcolb,tcolc
from tmp_key k
left join tmp.tmp_fulljoin_tablea a 
on k.key1=a.key1 and k.key2=a.key2
left join tmp.tmp_fulljoin_tableb b
on k.key1=b.key1 and k.key2=b.key2
left join tmp.tmp_fulljoin_tablec c
on k.key1=c.key1 and k.key2=c.key2
--注意key值有null的情况

  • 4
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值