首先先说一下,我用的hive库是海致BDP,是一款成熟的hive产品,所以和正常的hive数据库还是又一定区别,不过同组的老哥说语法和hive几乎一样。
分为两步
第一步:把逗号分隔的多个id从一行分隔成多行
利用lateral view explode( )函数,把一行拆成多列
select fi.id,fi.film_star,s_film_star
from film_info fi
lateral view explode(split(film_star,',')) t as s_film_star
第二步:把多行id与字典表进行关联,拿到对应值,并把多行对应值捏成一行
1.hive库不支持子查询,所以把拆成多行的数据用temp表暂存
2.然后用拆分出来的表对字典表关联,拿到需要用的值
3.利用concat_ws( )、collect_set( )函数,通过group by分组,捏成一行
TEMP actor_l =
select fi.id,fi.film_star,s_film_star
from film_info fi
lateral view explode(split(film_star,',')) t as s_film_star
select
max(actor_l.id) as film_id,
max(actor_l.film_star) as film_star,
concat_ws(',', collect_set(actor.actor_name)) as actor_name
from actor_l -- 上面的temp表
left join actor on actor.id = actor_l.s_film_star -- 需要获取信息的表
group by actor_l.id
然后拿着film_id去关联就可以了
大功告成!