标签
PostgreSQL , 佣金分配 , 树状 , 藤状 , 递归查询 , 传销
背景
早在十年前,PostgreSQL 8点几的版本就支持了递归查询,递归查询的应用非常的广泛,如下:
《PostgreSQL 递归妙用案例 - 分组数据去重与打散》
《PostgreSQL Oracle 兼容性之 - INDEX SKIP SCAN (递归查询变态优化) 非驱动列索引扫描优化》
《PostgrSQL 递归SQL的几个应用 - 极客与正常人的思维》
《快速入门PostgreSQL应用开发与管理 - 4 高级SQL用法》
《PostgreSQL 递归查询CASE - 树型路径分组输出》
《用PostgreSQL找回618秒逝去的青春 - 递归收敛优化》
《distinct xx和count(distinct xx)的变态递归优化方法 - 索引收敛(skip scan)扫描》
《时序数据合并场景加速分析和实现 - 复合索引,窗口分组查询加速,变态递归加速》
《PostgreSQL 使用递归SQL 找出数据库对象之间的依赖关系》
《PostgreSQL Oracle 兼容性之 - WITH 递归 ( connect by )》
《递归优化CASE - group by & distinct tuning case : use WITH RECURSIVE and min() function》
《PostgreSQL 树状数据存储与查询(非递归) - Use ltree extension deal tree-like data type》
《PostgreSQL Oracle 兼容性之 - connect by》
本文要介绍一个分佣场景:
虽然都说要远离“传销”,但是“传销”的分佣模式是被很多场景所使用的,例如通过发展下线得到更多的佣金。
A赚的钱,必须要分给他的直线上游,同时可能还要分给他的顶级上游。
如果将层级关系使用关系数据库存储,那么实际上分佣最关键的是通过关系找到某个用户的上游,以及最上游。可以通过递归查询得到。
以1亿用户为例,最大50个层级的关系网。
设计1
1、表结构设计
create table tbl (
uid int8 primary key, -- 用户ID
pid int8 -- 直接上游ID,如果一个用户是ROOT用户,则PID为 null
);
create index idx_tbl_1 on tbl (pid);
2、创建一个函数,按规则返回它的上游
create or replace function gen_pid(int8) returns int8 as $$
-- 生成它的上游ID,200万以内的ID为根ID。其他都取比它小200万对应的那个ID,形成一颗50级的树。
select case when $1<=2000000 then null else $1-2000000 end;
$$ language sql strict;
3、写入1亿数据,形成深度为50的树。
insert into tbl select id, gen_pid(id) from generate_series(1,100000000) t(id) on conflict do nothing;
递归查询
使用递归查询语法:
当一个用户获得一笔收入时,需要将他的收入,分配一部分佣金给他的直接上级,以及总的上级。输入UID,查找根、直接上级
with recursive tmp as (select * from tbl where uid=94499137
union all
select tbl.* from tbl join tmp on (tmp.pid=tbl.uid))
select uid,pid from tmp where pid is null or uid=94499137;
uid | pid
----------+----------
94499137 | 92499137 -- 直接上级
499137 | -- 根的PID=NULL
(2 rows)
不加限制,则返回的是以输入UID为末端,层层往上,输出整颗树的数据。
postgres=# with recursive tmp as (select * from tbl where uid=94499137
union all
select tbl.* from tbl join tmp on (tmp.pid=tbl.uid))
select uid,pid from tmp;
uid | pid
----------+----------
94499137 | 92499137
92499137 | 90499137
90499137 | 88499137
88499137 | 86499137
86499137 | 84499137
84499137 | 82499137
82499137 | 80499137
80499137 | 78499137
78499137 | 76499137
76499137 | 74499137
74499137 | 72499137
72499137 | 70499137
70499137 | 68499137
68499137 | 66499137
66499137 | 64499137
64499137 | 62499137
62499137 | 60499137
60499137 | 58499137
58499137 | 56499137
56499137 | 54499137
54499137 | 52499137
52499137 | 50499137
50499137 | 48499137
48499137 | 46499137
46499137 | 44499137
44499137 | 42499137
42499137 | 40499137
40499137 | 38499137
38499137 | 36499137
36499137 | 34499137
34499137 | 32499137
32499137 | 30499137
30499137 | 28499137
28499137 | 26499137
26499137 | 24499137
24499137 | 22499137
22499137 | 20499137
20499137 | 18499137
18499137 | 16499137
16499137 | 14499137
14499137 | 12499137
12499137 | 10499137
10499137 | 8499137
8499137 | 6499137
6499137 | 4499137
4499137 | 2499137
2499137 | 499137
499137 |
(48 rows)
性能压测
随机输入用户ID,返回它的直接上级,以及他的总上级ID。
vi test.sql
\set uid random(1,100000000)
with recursive tmp as (select * from tbl where uid=:uid
union all
select tbl.* from tbl join tmp on (tmp.pid=tbl.uid))
select uid,pid from tmp where pid is null or uid=:uid;
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 56 -j 56 -T 120
性能如下
transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 56
number of threads: 56
duration: 120 s
number of transactions actually processed: 8933930
latency average = 0.752 ms
latency stddev = 0.406 ms
tps = 74448.004668 (including connections establishing)
tps = 74458.612227 (excluding connections establishing)
statement latencies in milliseconds:
0.002 \set uid random(1,100000000)
0.751 with recursive tmp as (select * from tbl where uid=:uid
7.4万QPS
设计2
增加一个字段,表示这个节点是否享有分佣金的权利。(1表示有,0表示没有)。
同时,只有第一个享有分佣金权利的上级节点,以及根节点,享有分佣金的机会。 那么结构如何设计、SQL如何写呢?
1、结构
create table tbl(
uid int8 primary key, -- 用户ID
pid int8, -- 上级用户ID
status int2 -- 是否享有分佣权利(0表示没有,1表示有)
);
2、生成测试数据
insert into tbl select id, gen_pid(id), floor(random()*1.99) from generate_series(1,100000000) t(id) on conflict do nothing;
3、查询SQL
with recursive tmp as (
select uid,pid,status,'0' as p_status from tbl where uid=94499137
union all
select tbl.uid,tbl.pid,tbl.status,tbl.status||tmp.p_status as p_status from tbl join tmp on (tmp.pid=tbl.uid)
)
select uid,pid,status,p_status from tmp
where pid is null -- 根节点 如果根也要判断是否有分金权限则 (pid is null and status=1)
or
(uid <> 94499137 and status=1 and substring(p_status,'\d(.*)') !~ '1');
uid | pid | status | p_status
----------+----------+--------+--------------------------------------------------
88499137 | 86499137 | 1 | 1000
499137 | | 0 | 010000001001000110010101010001101000001001011000
(2 rows)
4、全部路径查出来的例子
postgres=# with recursive tmp as (
select uid,pid,status,'0' as p_status from tbl where uid=94499137
union all
select tbl.uid,tbl.pid,tbl.status,tbl.status||tmp.p_status as p_status from tbl join tmp on (tmp.pid=tbl.uid)
)
select uid,pid,status,p_status from tmp;
uid | pid | status | p_status
----------+----------+--------+--------------------------------------------------
94499137 | 92499137 | 0 | 0
92499137 | 90499137 | 0 | 00
90499137 | 88499137 | 0 | 000
88499137 | 86499137 | 1 | 1000 -- 符合条件,输出第一个拥有分金权利的上级。
86499137 | 84499137 | 1 | 11000
84499137 | 82499137 | 0 | 011000
82499137 | 80499137 | 1 | 1011000
80499137 | 78499137 | 0 | 01011000
78499137 | 76499137 | 0 | 001011000
76499137 | 74499137 | 1 | 1001011000
74499137 | 72499137 | 0 | 01001011000
72499137 | 70499137 | 0 | 001001011000
70499137 | 68499137 | 0 | 0001001011000
68499137 | 66499137 | 0 | 00001001011000
66499137 | 64499137 | 0 | 000001001011000
64499137 | 62499137 | 1 | 1000001001011000
62499137 | 60499137 | 0 | 01000001001011000
60499137 | 58499137 | 1 | 101000001001011000
58499137 | 56499137 | 1 | 1101000001001011000
56499137 | 54499137 | 0 | 01101000001001011000
54499137 | 52499137 | 0 | 001101000001001011000
52499137 | 50499137 | 0 | 0001101000001001011000
50499137 | 48499137 | 1 | 10001101000001001011000
48499137 | 46499137 | 0 | 010001101000001001011000
46499137 | 44499137 | 1 | 1010001101000001001011000
44499137 | 42499137 | 0 | 01010001101000001001011000
42499137 | 40499137 | 1 | 101010001101000001001011000
40499137 | 38499137 | 0 | 0101010001101000001001011000
38499137 | 36499137 | 1 | 10101010001101000001001011000
36499137 | 34499137 | 0 | 010101010001101000001001011000
34499137 | 32499137 | 0 | 0010101010001101000001001011000
32499137 | 30499137 | 1 | 10010101010001101000001001011000
30499137 | 28499137 | 1 | 110010101010001101000001001011000
28499137 | 26499137 | 0 | 0110010101010001101000001001011000
26499137 | 24499137 | 0 | 00110010101010001101000001001011000
24499137 | 22499137 | 0 | 000110010101010001101000001001011000
22499137 | 20499137 | 1 | 1000110010101010001101000001001011000
20499137 | 18499137 | 0 | 01000110010101010001101000001001011000
18499137 | 16499137 | 0 | 001000110010101010001101000001001011000
16499137 | 14499137 | 1 | 1001000110010101010001101000001001011000
14499137 | 12499137 | 0 | 01001000110010101010001101000001001011000
12499137 | 10499137 | 0 | 001001000110010101010001101000001001011000
10499137 | 8499137 | 0 | 0001001000110010101010001101000001001011000
8499137 | 6499137 | 0 | 00001001000110010101010001101000001001011000
6499137 | 4499137 | 0 | 000001001000110010101010001101000001001011000
4499137 | 2499137 | 0 | 0000001001000110010101010001101000001001011000
2499137 | 499137 | 1 | 10000001001000110010101010001101000001001011000
499137 | | 0 | 010000001001000110010101010001101000001001011000
(48 rows)
5、压测
随机输入用户ID,返回它的拥有分金权利的第一个上级,以及他的总上级ID。
vi test.sql
\set uid random(1,100000000)
with recursive tmp as (
select uid,pid,status,'0' as p_status from tbl where uid=:uid
union all
select tbl.uid,tbl.pid,tbl.status,tbl.status||tmp.p_status as p_status from tbl join tmp on (tmp.pid=tbl.uid)
)
select uid,pid,status,p_status from tmp
where pid is null
or
(uid <> :uid and status=1 and substring(p_status,'\d(.*)') !~ '1');
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 56 -j 56 -T 120
性能如下
transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 56
number of threads: 56
duration: 120 s
number of transactions actually processed: 9917080
latency average = 0.678 ms
latency stddev = 0.372 ms
tps = 82640.572124 (including connections establishing)
tps = 82652.202185 (excluding connections establishing)
statement latencies in milliseconds:
0.002 \set uid random(1,100000000)
0.676 with recursive tmp as (
QPS: 82640
测试场景3
一个分支节点可能有多个子节点,或者说一个老板有多个直接下属,但是数据结构上没有闭环。
按照设计2,再新增一些子节点,挂到某些节点下面,满足这种场景的测试需求。
1、加子节点数据
insert into tbl select id, random()*1000000+1, floor(random()*1.99) from generate_series(100000001,110000000) t(id) on conflict do nothing;
insert into tbl select id, 98000000+random()*1000000+1, floor(random()*1.99) from generate_series(110000001,120000000) t(id) on conflict do nothing;
insert into tbl select id, 96000000+random()*1000000+1, floor(random()*1.99) from generate_series(120000001,140000000) t(id) on conflict do nothing;
查看一个老板有多个下属的情况
postgres=# select * from tbl where pid=100;
uid | pid | status
-----------+-----+--------
102744852 | 100 | 0
2000100 | 100 | 0
102528366 | 100 | 0
101494941 | 100 | 1
103618372 | 100 | 1
102937592 | 100 | 0
108272885 | 100 | 0
106060898 | 100 | 0
105693863 | 100 | 1
108090257 | 100 | 1
109855280 | 100 | 0
109603022 | 100 | 1
(12 rows)
2、压测
随机输入用户ID,返回它的拥有分金权利的第一个上级,以及他的总上级ID。
vi test.sql
\set uid random(1,140000000)
with recursive tmp as (
select uid,pid,status,'0' as p_status from tbl where uid=:uid
union all
select tbl.uid,tbl.pid,tbl.status,tbl.status||tmp.p_status as p_status from tbl join tmp on (tmp.pid=tbl.uid)
)
select uid,pid,status,p_status from tmp
where pid is null
or
(uid <> :uid and status=1 and substring(p_status,'\d(.*)') !~ '1');
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 56 -j 56 -T 120
性能如下
transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 56
number of threads: 56
duration: 120 s
number of transactions actually processed: 13363407
latency average = 0.503 ms
latency stddev = 0.383 ms
tps = 111342.309347 (including connections establishing)
tps = 111359.640535 (excluding connections establishing)
statement latencies in milliseconds:
0.002 \set uid random(1,140000000)
0.502 with recursive tmp as (
QPS: 111342
设计3
使用物化视图存储每个节点的所有上级节点。
略
小结
本文介绍的分佣场景,与“传销”模式类似。虽然都说要远离“传销”,但是“传销”的分佣模式是被很多场景所使用的,例如通过发展下线得到更多的佣金。
A赚的钱,必须要分给他的直线上游,同时可能还要分给他的顶级上游。
如果将层级关系使用关系数据库存储,那么实际上分佣最关键的是通过关系找到某个用户的上游,以及最上游。可以通过递归查询得到。
以1亿用户为例,最大50个层级的关系网,这样一个量级下面,查询速度可以达到7.4万QPS,瓶颈响应时间0.75毫秒。
相比传统的,直接将层级全部存储到一行中,递归有几个好处:
1、维护方便,更换关系时,只需要修改几条相关的记录。
2、查询性能可以满足。
而直接将所有层级都放到每一条记录中,最严重的问题就是更改关系时,需要动用的记录数可能非常非常多。动一个人的记录,就需要修改所有涉及到这个人的路径的所有记录,更新量是非常巨大的。