概述
对于分区表的大数据统计分析,由于数据量巨大,往往需要采用并行。但是数据库并行的效率相比分进程分表统计还是有比较大的差距。本文通过巧用dblink,实现分进程分分区统计数据。
例子
kingbase=# \d t751
分区表 "public.t751"
栏位 | 类型 | 校对规则 | 可空的 | 预设
------+-----------------------------+----------+----------+------
id | bigint | | not null |
code | text | | |
c1 | timestamp without time zone | | |
c2 | text | | |
c3 | numeric | | |
c4 | integer | | |
c5 | integer | | |
c6 | integer | | |
v2 | numeric | | |
v3 | timestamp without time zone | | |
分区键值: HASH (c4, c5, c6)
分区的数量:1000(可以使用 \d+ 来列出它们)
表空间:"nvmtbs01"
kingbase=# select pg_size_pretty(sum(pg_relation_size(relid))) from pg_partition_tree('t751');
pg_size_pretty
----------------
20 GB
并行方式访问
kingbase=# explain analyze select c4, c5, c6, sum(v2) v2 from t751 group by c4, c5, c6;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=5656234.57..14706860.46 rows=8000000 width=44) (actual time=15764.781..19110.336 rows=11781 loops=1)
Group Key: t751_872.c4, t751_872.c5, t751_872.c6
-> Gather Merge (cost=5656234.57..13806860.46 rows=64000000 width=44) (actual time=15764.142..19103.239 rows=11902 loops=1)
Workers Planned: 8
Workers Launched: 7
-> Partial GroupAggregate (cost=5655234.43..5911484.32 rows=8000000 width=44) (actual time=15478.852..18479.480 rows=1488 loops=8)
Group Key: t751_872.c4, t751_872.c5, t751_872.c6
-> Sort (cost=5655234.43..5686484.41 rows=12499991 width=22) (actual time=15478.239..17066.239 rows=12500000 loops=8)
Sort Key: t751_872.c4, t751_872.c5, t751_872.c6
Sort Method: quicksort Memory: 1408511kB
Worker 0: Sort Method: quicksort Memory: 1371267kB
Worker 1: Sort Method: quicksort Memory: 1364497kB
Worker 2: Sort Method: quicksort Memory: 1374769kB
Worker 3: Sort Method: quicksort Memory: 1355381kB
Worker 4: Sort Method: quicksort Memory: 1352989kB
Worker 5: Sort Method: quicksort Memory: 1343974kB
Worker 6: Sort Method: quicksort Memory: 1386845kB
-> Parallel Append (cost=0.00..3156369.51 rows=12499991 width=22) (actual time=0.034..5045.596 rows=12500000 loops=8)
-> Parallel Seq Scan on t751_872 (cost=0.00..5849.56 rows=81756 width=22) (actual time=0.025..60.070 rows=196215 loops=1)
-> Parallel Seq Scan on t751_470 (cost=0.00..5753.10 rows=80410 width=22) (actual time=0.068..73.100 rows=192984 loops=1)
...
-> Parallel Seq Scan on t751_416 (cost=0.00..556.80 rows=10380 width=22) (actual time=0.009..5.736 rows=17646 loops=1)
-> Parallel Seq Scan on t751_885 (cost=0.00..315.77 rows=5877 width=22) (actual time=0.010..2.460 rows=9991 loops=1)
Planning Time: 26.450 ms
Execution Time: 19180.545 ms
注意:这里实际采用的是groupagg , 如果采用hashagg,执行效率会高很多
使用dblink方式
创建函数
create or replace function f_t_part(para int default 0)
returns table
(
c4 int,
c5 int,
c6 int,
v2 numeric
)
language plpgsql
immutable
parallel safe
as
$$
declare
vtabnam text;
atabnam text[];
begin
perform dblink_disconnect(conn)
from unnest(dblink_get_connections()) conn;
perform dblink_connect('conn_' || sn, 'dbname=' || current_database())
from generate_series(1, para) sn;
if para <= 0 then
for vtabnam in select relid from pg_partition_tree('t751')
loop
for c4,c5,c6,v2 in execute format('select c4, c5, c6, sum(v2) v2 from only %s group by c4, c5, c6 ;',vtabnam)
loop
return next ;
end loop;
end loop;
else
for atabnam in select string_to_array(string_agg(relid, ','), ',') relid
from pg_partition_tree('t751') with ordinality
, lateral ( select ordinality / para sn)
where isleaf
group by sn
loop
perform dblink_is_busy(conn) = 0 from unnest(dblink_get_connections()) conn;
perform conn,tabnam, dblink_send_query(conn,format('select c4, c5, c6, sum(v2) v2 from only %s group by c4, c5, c6 ;',tabnam)) sq
from (select unnest(dblink_get_connections()) conn, unnest(atabnam) tabnam) v
where tabnam is not null;
for c4,c5,c6,v2 in
select tab.* from unnest(dblink_get_connections()) conn , dblink_get_result(conn) as tab(c4 int, c5 int, c6 int, v2 numeric)
loop
return next ;
end loop;
end loop;
end if;
perform dblink_disconnect(conn) from unnest(dblink_get_connections()) conn;
return;
end;
$$;
执行效率
ingbase=# explain analyze select * from ft751_01();
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Function Scan on ft751_01 (cost=0.25..10.25 rows=1000 width=44) (actual time=32738.962..32739.288 rows=11781 loops=1)
Planning Time: 0.055 ms
Execution Time: 32741.655 ms
(3 行记录)
kingbase=# explain analyze ^Jselect * from ft751_01(16);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Function Scan on ft751_01 (cost=0.25..10.25 rows=1000 width=44) (actual time=5397.644..5398.189 rows=11781 loops=1)
Planning Time: 0.044 ms
Execution Time: 5398.887 ms
结论
实际验证,二者性能上差距并不大。用dblink 方式主要是可以更灵活的使用并行数量。这里的目的主要是向大家提供灵活使用并行的一种方法。