目录
客群圈选 - 属性明细表模型
客群属性明细表,建模时,比较常用的有2种模型:
宽表模型
- 主要适用于前导流程为离线ETL,模型相对固定
- 可以基于宽表模型构建物化视图,加速圈选查询效率
- 另外宽表模型,对于基于UID查用户明细,应对高并发场景也比较友好
- 宽表模型示意如下:
纵表模型(高表)
- 适用于前导为实时ETL,模型变动较快,如当日有新增属性标签,
- 纵表模型避免了做schema change相对灵活
- 缺点:难以命中物化视图,建表时也需要考虑数据倾斜相关难点
- 纵表模型示意如下:
bitmap_union + 物化视图
例子:
在明细表中找出有过click购物车和view收藏页行为的user列表
DDL
create table t1(
event_day date,
event_time datetime,
uid int,
action string,
page string,
product_code string,
from_days int)
DUPLICATE KEY(`event_day`,`event_time`, `uid`)
PARTITION BY RANGE(event_day)
(
PARTITION p20210401 VALUES LESS THAN ('2021-04-01'),
PARTITION p20210501 VALUES LESS THAN ('2021-05-01')
)
DISTRIBUTED BY HASH(`uid`) BUCKETS 3
;
数据样例
mysql> select * from t1;
+-------------+----------------------------+--------------+--------+-----------+---------------+--------------+
| event_day | event_time | uid | action | page | product_code | from_days |
+-------------+----------------------------+--------------+--------+-----------+---------------+--------------+
| 2021-04-03 | 2021-04-03 10:01:30 | 274649163 | click | 购物车 | MDS | 1 |
| 2021-04-03 | 2021-04-03 10:04:30 | 274649163 | view | 收藏页 | MDS | 4 |
| 2021-04-03 | 2021-04-03 10:03:30 | 274649164 | click | 购物车 | MDS | 2 |
| 2021-04-03 | 2021-04-03 10:06:30 | 274649165 | click | 购物车 | MMS | 8 |
| 2021-04-03 | 2021-04-03 10:08:30 | 274649164 | view | 收藏页 | MDS | 3 |
| 2021-04-03 | 2021-04-03 10:09:30 | 274649165 | view | 购物车 | MDS | 10 |
+------------+---------------------------+---------------+-------+------------+--------------+---------------+
6 rows in set (0.01 sec)
物化视图
CREATE MATERIALIZED VIEW user_profile_view AS
SELECT event_day ,action, page , bitmap_union(to_bitmap(uid)) b_uid
FROM t1
GROUP BY event_day , action, page;
客群圈选
WITH tbl_c AS (
SELECT bitmap_union(to_bitmap(uid)) b_uid
FROM t1
WHERE event_day = '2021-04-03'
AND action='click' and page= '购物车'
GROUP BY action, page
) , tbl_v AS(
SELECT bitmap_union(to_bitmap(uid)) b_uid
FROM t1
WHERE event_day = '2021-04-03'
AND action='view' and page='收藏页'
GROUP BY action, page
) ,tbl_u AS(
SELECT b_uid from tbl_c
UNION ALL
SELECT b_uid from tbl_v
) SELECT
bitmap_count(bitmap_intersect(b_uid)) uid_ct,
bitmap_to_string(bitmap_intersect(b_uid)) uid_list
FROM tbl_u
NOTE
v1.18 提供bitmap to array、unnest方便行列转换