0- 描述
描述:表f是事实表,表d是匹配表,在hive中如何将匹配表中的值关联到事实表中?
表d相当于拉链过的变化维,但日期范围可能是不全的。
表f:
date_id p_id
2017 C
2018 B
2019 A
2013 C
表d:
d_start d_end p_id p_value
2016 2018 A 1
2016 2018 B 2
2008 2009 C 4
2010 2015 C 3
1- 问题一
描述:范围匹配
输出结果如下所示:
date_id p_id p_value
2017 C null
2018 B 2
2019 A null
2013 C 3
参考答案:
此处给出两种解法,其一:
select
f.date_id,
f.p_id,
A.p_value
from f
left join
(
select
date_id,
p_id,
p_value
from
(
select
f.date_id,
f.p_id,
d.p_value
from f
left join d on f.p_id = d.p_id
where f.date_id >= d.d_start and f.date_id <= d.d_end
)A
)A
ON f.date_id = A.date_id;
其二:
select
date_id,
p_id,
flag as p_value
from (
select
f.date_id,
f.p_id,
d.d_start,
d.d_end,
d.p_value,
if(f.date_id between d.d_start and d.d_end,d.p_value,null) flag,
max(d.d_end) over(partition by date_id) max_end
from f
left join d
on f.p_id = d.p_id
) tmp
where d_end = max_end;