背景
例:A、B、C 设备属于同一个用户,
C、D 属于同一个用户,
D、E 属于同一个用户。

则可以将 A、B、C、D、E 当作同一个用户。
1、数据初始化
create table test_id_mapping (
id_list array<String> comment "设备ID"
);
insert overwrite table test_id_mapping
select array("A","B","C") union all
select array("C","D") union all
select array("D","E")
PS. 下面sql 会用到两个UDF函数
combine_unique :UDAF,对多个array数据 取并集
combine:UDF,合并多个array数据
2、初步聚合
create table test_id_mapping_1 as
select
item,
combine_unique(id_list) as union_id_list,
sum(value) as value
from (
select
item, id_list, 1 as value
from test_id_mapping
lateral view explode(id_list) item as item
) t1
group by item;
hive> select * from test_id_mapping_1;
A ["A","B","C"] 1
B ["A","B","C"] 1
C ["A","B","C","D"] 2
D ["C","D","E"] 2
E ["D","E"] 1

本文介绍了ID-Mapping的Hive SQL初始化过程,包括数据初始化、初步聚合、迭代聚合的初始化和迭代过程,以及最终的聚合结果输出。在数据初始化阶段,使用了UDF函数`combine_unique`和`combine`进行数据处理。
最低0.47元/天 解锁文章
6万+

被折叠的 条评论
为什么被折叠?



