前言
一、PG分区表-传统分区表
pg10版本之前pg分区表一般通过继承加触发器方式实现,这种分区方式不能算是内置分区表,而且步骤非常繁琐。
pg10版本一个重量级的新特性是支持内置分区表。pg10支持范围分区和列表分区。
1.分区表的意义
分区表的优势主要体现在降低大表管理成本和某些场景的性能提升。
分区表主要有以下优势:
(1)当查询或更新一个分区上的大部分数据时,对分区进行索引扫描代价很大,
然而,在分区上使用顺序扫描能提升性能。
(2)当需要删除一个分区数据时,通过drop table删除一个分区,远比delete删除数据高效,
特别适用于日志数据场景。
(3)由于一个表只能存储在一个表空间上,使用分区表后,可以将分区放到不同表的表空间上。
可以将分区放到廉价的存储设备上,也可以将系统常访问的分区存储在高速存储上。
2.传统分区表
传统分区表是通过继承和触发器方式实现的。
需要定义父表、定义子表、定义子表约束、创建子表索引、创建分区插入、删除、修改函数和触发器等。
继承是传统分区表的重要组成部分。
2.1.继承表
首先定义一张父表,之后创建子表并继承父表。
创建一张日志模型表tbl_log父表:
create table tbl_log
(
id int4,
create_date date,
log_type text
);
创建一张子表tbl_log_sql用于存储sql日志:
create table tbl_log_sql(sql text) inherits (tbl_log);
查看tbl_log_sql表结构
\d tbl_log_sql
postgres=# \d tbl_log_sql
Table "public.tbl_log_sql"
Column | Type | Collation | Nullable | Default
-------------+---------+-----------+----------+---------
id | integer | | |
create_date | date | | |
log_type | text | | |
sql | text | | |
Inherits: tbl_log
父表和子表都可以插入数据
父表:
insert into tbl_log values (1,'2017-08-26',null);
子表:
insert into tbl_log_sql values (1,'2017-08-27',null,'select 2');
查询父表:
不加only,显示与父表与子表字段相同的数据
postgres=# select * from tbl_log;
id | create_date | log_type
----+-------------+----------
1 | 2017-08-26 |
1 | 2017-08-27 |
(2 rows)
加only只显示父表数据
postgres=# select * from only tbl_log;
id | create_date | log_type
----+-------------+----------
1 | 2017-08-26 |
(1 row)
确定数据来源于哪些表,可通过以下SQL查看表的OID:
select tableoid,* from tbl_log;
postgres=# select tableoid,* from tbl_log;
tableoid | id | create_date | log_type
----------+----+-------------+----------
16410 | 1 | 2017-08-26 |
16415 | 1 | 2017-08-27 |
(2 rows)
tableoid是表的隐藏字段,表示表的OID,可通过pg_class系统表关联找到表名。
select p.relname,c.*
from tbl_log c,pg_class p
where c.tableoid = p.oid;
postgres=# select p.relname,c.*
postgres-# from tbl_log c,pg_class p
postgres-# where c.tableoid = p.oid;
relname | id | create_date | log_type
-------------+----+-------------+----------
tbl_log | 1 | 2017-08-26 |
tbl_log_sql | 1 | 2017-08-27 |
(2 rows)
如果父表名称前没有加only,则会对父表和所有子表进行DML操作。
删除父表数据:
delete from tbl_log;
select * from tbl_log;
postgres=# delete from tbl_log;
DELETE 2
postgres=# select * from tbl_log;
id | create_date | log_type
----+-------------+----------
(0 rows)
2.2.创建分区表
(1)创建父表,如果父表上定义了约束,子表会继承;父表不应该写入数据。
(2)通过inherits方式创建继承表,也称之为子表或分区。
(3)给所有子表创建约束。
(4)给所有子表创建索引,由于继承操作不会继承父表上的索引,因此索引需要手工创建。
(5)在父表上定义insert、delete、update触发器,将sql分发到对应分区,
这步可选,因为应用可以根据分区规则定位到对应分区进行DML操作。
(6)启用constraint_exclusion参数,如果这个参数设置成off,则父表上sql性能会降低。
(1)创建父表
create table log_ins
(
id serial,
user_id int4,
create_time timestamp(0) without time zone
);
(2)创建13张子表
create table log_ins_history(check(create_time < '2021-01-01')) inherits(log_ins);
create table log_ins_202101(check(create_time >= '2021-01-01' and create_time < '2021-02-01')) inherits(log_ins);
create table log_ins_202102(check(create_time >= '2021-02-01' and create_time < '2021-03-01')) inherits(log_ins);
create table log_ins_202103(check(create_time >= '2021-03-01' and create_time < '2021-04-01')) inherits(log_ins);
create table log_ins_202104(check(create_time >= '2021-04-01' and create_time < '2021-05-01')) inherits(log_ins);
create table log_ins_202105(check(create_time >= '2021-05-01' and create_time < '2021-06-01')) inherits(log_ins);
create table log_ins_202106(check(create_time >= '2021-06-01' and create_time < '2021-07-01')) inherits(log_ins);
create table log_ins_202107(check(create_time >= '2021-07-01' and create_time < '2021-08-01')) inherits(log_ins);
create table log_ins_202108(check(create_time >= '2021-08-01' and create_time < '2021-09-01')) inherits(log_ins);
create table log_ins_202109(check(create_time >= '2021-09-01' and create_time < '2021-10-01')) inherits(log_ins);
create table log_ins_202110(check(create_time >= '2021-10-01' and create_time < '2021-11-01')) inherits(log_ins);
create table log_ins_202111(check(create_time >= '2021-11-01' and create_time < '2021-12-01')) inherits(log_ins);
create table log_ins_202112(check(create_time >= '2021-12-01' and create_time < '2022-01-01')) inherits(log_ins);
(3)给子表创建索引
create index idx_his_ctime on log_ins_history using btree (create_time);
create index idx_log_ins_202101_ctime on log_ins_202101 using btree (create_time);
create index idx_log_ins_202102_ctime on log_ins_202102 using btree (create_time);
create index idx_log_ins_202103_ctime on log_ins_202103 using btree (create_time);
create index idx_log_ins_202104_ctime on log_ins_202104 using btree (create_time);
create index idx_log_ins_202105_ctime on log_ins_202105 using btree (create_time);
create index idx_log_ins_202106_ctime on log_ins_202106 using btree (create_time);
create index idx_log_ins_202107_ctime on log_ins_202107 using btree (create_time);
create index idx_log_ins_202108_ctime on log_ins_202108 using btree (create_time);
create index idx_log_ins_202109_ctime on log_ins_202109 using btree (create_time);
create index idx_log_ins_202110_ctime on log_ins_202110 using btree (create_time);
create index idx_log_ins_202111_ctime on log_ins_202111 using btree (create_time);
create index idx_log_ins_202112_ctime on log_ins_202112 using btree (create_time);
(4)创建触发器函数,设置数据插入父表时的路由规则。
create or replace function log_ins_insert_trigger()
returns trigger
language plpgsql
as $function$
begin
if (NEW.create_time < '2021-01-01') then
insert into log_ins_history values (NEW.*);
elsif (NEW.create_time >= '2021-01-01' and NEW.create_time < '2021-02-01') then
insert into log_ins_202101 values (NEW.*);
elsif (NEW.create_time >= '2021-02-01' and NEW.create_time < '2021-03-01') then
insert into log_ins_202102 values (NEW.*);
elsif (NEW.create_time >= '2021-03-01' and NEW.create_time < '2021-04-01') then
insert into log_ins_202103 values (NEW.*);
elsif (NEW.create_time >= '2021-04-01' and NEW.create_time < '2021-05-01') then
insert into log_ins_202104 values (NEW.*);
elsif (NEW.create_time >= '2021-05-01' and NEW.create_time < '2021-06-01') then
insert into log_ins_202105 values (NEW.*);
elsif (NEW.create_time >= '2021-06-01' and NEW.create_time < '2021-07-01') then
insert into log_ins_202106 values (NEW.*);
elsif (NEW.create_time >= '2021-07-01' and NEW.create_time < '2021-08-01') then
insert into log_ins_202107 values (NEW.*);
elsif (NEW.create_time >= '2021-08-01' and NEW.create_time < '2021-09-01') then
insert into log_ins_202108 values (NEW.*);
elsif (NEW.create_time >= '2021-09-01' and NEW.create_time < '2021-10-01') then
insert into log_ins_202109 values (NEW.*);
elsif (NEW.create_time >= '2021-10-01' and NEW.create_time < '2021-11-01') then
insert into log_ins_202110 values (NEW.*);
elsif (NEW.create_time >= '2021-11-01' and NEW.create_time < '2021-12-01') then
insert into log_ins_202111 values (NEW.*);
elsif (NEW.create_time >= '2021-12-01' and NEW.create_time < '2022-01-01') then
insert into log_ins_202112 values (NEW.*);
else
raise exception 'create_time out of range. Fix the log_ins_insert_trigger() function!';
end if;
return null;
end;
$function$;
new.*:是指要插入的数据行。
(5)在父表上定义插入触发器:
create trigger log_ins_insert_trigger before insert on log_ins for each row
execute procedure log_ins_insert_trigger();
2.3.使用分区表
往父表log_ins插入数据,并验证数据是否插入对应分区。
insert into log_ins (user_id,create_time)
select round(100000000*random()),generate_series('2020-12-01'::date,'2021-12-01'::date,'1 minute');
round(100000000*random()):随机生成8位整数。
generate_series:生成时间数据。
查询数据如下:
select * from log_ins limit 2;
postgres=# select * from log_ins limit 2;
id | user_id | create_time
----+----------+---------------------
1 | 37324797 | 2020-12-01 00:00:00
2 | 9736688 | 2020-12-01 00:01:00
(2 rows)
查看父表数据,发现父表里没有数据:
postgres=# select count(*) from only log_ins;
count
-------
0
(1 row)
postgres=# select count(*) from log_ins;
count
--------
525601
(1 row)
查看子表数据:
select min(create_time),max(create_time) from log_ins_202101;
postgres=# select min(create_time),max(create_time) from log_ins_202101;
min | max
---------------------+---------------------
2021-01-01 00:00:00 | 2021-01-31 23:59:00
(1 row)
查看子表大小:
postgres=# \dt+ log_ins*
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+-----------------+-------+----------+-------------+---------------+------------+-------------
public | log_ins | table | postgres | permanent | heap | 0 bytes |
public | log_ins_202101 | table | postgres | permanent | heap | 1968 kB |
public | log_ins_202102 | table | postgres | permanent | heap | 1776 kB |
public | log_ins_202103 | table | postgres | permanent | heap | 1968 kB |
public | log_ins_202104 | table | postgres | permanent | heap | 1904 kB |
public | log_ins_202105 | table | postgres | permanent | heap | 1968 kB |
public | log_ins_202106 | table | postgres | permanent | heap | 1904 kB |
public | log_ins_202107 | table | postgres | permanent | heap | 1968 kB |
public | log_ins_202108 | table | postgres | permanent | heap | 1968 kB |
public | log_ins_202109 | table | postgres | permanent | heap | 1904 kB |
public | log_ins_202110 | table | postgres | permanent | heap | 1968 kB |
public | log_ins_202111 | table | postgres | permanent | heap | 1904 kB |
public | log_ins_202112 | table | postgres | permanent | heap | 8192 bytes |
public | log_ins_history | table | postgres | permanent | heap | 1968 kB |
(14 rows)
结论:数据都已经插入到子表里。
2.4.查询父表还是子表
查询父表:
explain analyze select * from log_ins
where create_time > '2021-01-01' and create_time < '2021-01-02';
postgres=# explain analyze select * from log_ins
postgres-# where create_time > '2021-01-01' and create_time < '2021-01-02';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Append (cost=0.00..61.95 rows=1387 width=16) (actual time=0.029..0.406 rows=1439 loops=1)
-> Seq Scan on log_ins log_ins_1 (cost=0.00..0.00 rows=1 width=16) (actual time=0.005..0.005 rows=0 loops=1)
Filter: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone)
AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
-> Index Scan using idx_log_ins_202101_ctime on log_ins_202101 log_ins_2 (cost=0.29..55.01 rows=1386 width=16)
(actual time=0.024..0.283 rows=1439 loops=1)
Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone)
AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
Planning Time: 0.519 ms
Execution Time: 0.492 ms
(7 rows)
结论:在log_ins_202101上进行了索引扫描,执行时间为0.492 ms
查询子表:
explain analyze select * from log_ins_202101
where create_time > '2021-01-01' and create_time < '2021-01-02';
postgres=# explain analyze select * from log_ins_202101
postgres-# where create_time > '2021-01-01' and create_time < '2021-01-02';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_log_ins_202101_ctime on log_ins_202101 (cost=0.29..55.01 rows=1386 width=16)
(actual time=0.021..0.275 rows=1439 loops=1)
Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone)
AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
Planning Time: 0.136 ms
Execution Time: 0.358 ms
(4 rows)
结论:直接查询子表只需要0.358 ms。
结论:对于传统分区表分区方式,不建议应用访问父表,而是直接访问子表。
2.5.constraint_exclusion参数
constraint_exclusion:参数用来控制优化器是否根据表上的约束来优化查询。
on:所有表都通过约束优化查询。
off:所有表都不通过约束优化查询。
partition:只对继承表和union all子查询通过检索约束来优化查询。
就是说:设置成on或partition,查询父表时优化器会根据子表上的约束判断检索哪些子表。
而不需要扫描所有子表,从而提升查询性能。
查询默认设置参数:
postgres=# show constraint_exclusion;
constraint_exclusion
----------------------
partition
(1 row)
测试:
set constraint_exclusion=off;
postgres=# set constraint_exclusion=off;
SET
postgres=# show constraint_exclusion;
constraint_exclusion
----------------------
off
(1 row)
查询父表:
explain analyze select * from log_ins
where create_time > '2021-01-01' and create_time < '2021-01-02';
postgres=# explain analyze select * from log_ins
postgres-# where create_time > '2021-01-01' and create_time < '2021-01-02';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Append (cost=0.00..168.26 rows=1407 width=16) (actual time=0.048..0.587 rows=1439 loops=1)
-> Seq Scan on log_ins log_ins_1 (cost=0.00..0.00 rows=1 width=16) (actual time=0.003..0.003 rows=0 loops=1)
Filter: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone)
AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
-> Index Scan using idx_his_ctime on log_ins_history log_ins_2 (cost=0.29..8.31 rows=1 width=16)
(actual time=0.017..0.017 rows=0 loops=1)
Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone)
AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
-> Index Scan using idx_log_ins_202101_ctime on log_ins_202101 log_ins_3 (cost=0.29..55.01 rows=1386 width=16)
(actual time=0.027..0.295 rows=1439 loops=1)
Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone)
AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
-> Index Scan using idx_log_ins_202102_ctime on log_ins_202102 log_ins_4 (cost=0.29..8.31 rows=1 width=16)
(actual time=0.017..0.017 rows=0 loops=1)
Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone)
AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
-> Index Scan using idx_log_ins_202103_ctime on log_ins_202103 log_ins_5 (cost=0.29..8.31 rows=1 width=16)
(actual time=0.010..0.010 rows=0 loops=1)
Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone)
AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
-> Index Scan using idx_log_ins_202104_ctime on log_ins_202104 log_ins_6 (cost=0.29..8.31 rows=1 width=16)
(actual time=0.015..0.015 rows=0 loops=1)
Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone)
AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
-> Index Scan using idx_log_ins_202105_ctime on log_ins_202105 log_ins_7 (cost=0.29..8.31 rows=1 width=16)
(actual time=0.008..0.009 rows=0 loops=1)
Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone)
AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
-> Index Scan using idx_log_ins_202106_ctime on log_ins_202106 log_ins_8 (cost=0.29..8.31 rows=1 width=16)
(actual time=0.016..0.016 rows=0 loops=1)
Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone)
AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
-> Index Scan using idx_log_ins_202107_ctime on log_ins_202107 log_ins_9 (cost=0.29..8.31 rows=1 width=16)
(actual time=0.008..0.008 rows=0 loops=1)
Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone)
AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
-> Index Scan using idx_log_ins_202108_ctime on log_ins_202108 log_ins_10 (cost=0.29..8.31 rows=1 width=16)
(actual time=0.017..0.017 rows=0 loops=1)
Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone)
AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
-> Index Scan using idx_log_ins_202109_ctime on log_ins_202109 log_ins_11 (cost=0.29..8.31 rows=1 width=16)
(actual time=0.016..0.016 rows=0 loops=1)
Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone)
AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
-> Index Scan using idx_log_ins_202110_ctime on log_ins_202110 log_ins_12 (cost=0.29..8.31 rows=1 width=16)
(actual time=0.012..0.012 rows=0 loops=1)
Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone)
AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
-> Index Scan using idx_log_ins_202111_ctime on log_ins_202111 log_ins_13 (cost=0.29..8.31 rows=1 width=16)
(actual time=0.012..0.012 rows=0 loops=1)
Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone)
AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
-> Bitmap Heap Scan on log_ins_202112 log_ins_14 (cost=4.24..14.81 rows=9 width=16)
(actual time=0.006..0.006 rows=0 loops=1)
Recheck Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone)
AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
-> Bitmap Index Scan on idx_log_ins_202112_ctime (cost=0.00..4.24 rows=9 width=0)
(actual time=0.004..0.004 rows=0 loops=1)
Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone)
AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
Planning Time: 1.000 ms
Execution Time: 0.778 ms
(33 rows)
结论:查询父表时扫描了所有分区,执行时间上升到了0.685
结论:建议constraint_exclusion参数设置成partition,不建议设置成on,
因为优化器通过检查约束来优化查询的方式本身就带来一定开销,
如果所有表都启用这个特性,将加重优化器的负担。
2.6.添加分区
方法一、降低对生产系统的影响:
(1)创建分区
create table log_ins_202201 (like log_ins including all);
(2)添加约束
alter table log_ins_202201 add constraint log_ins_202201_create_time_check
check (create_time >= '2022-01-01' and create_time < '2022-02-01');
(3)刷新触发器函数log_ins_insert_trigger()
create or replace function log_ins_insert_trigger()
returns trigger
language plpgsql
as $function$
begin
if (NEW.create_time < '2021-01-01') then
insert into log_ins_history values (NEW.*);
elsif (NEW.create_time >= '2021-01-01' and NEW.create_time < '2021-02-01') then
insert into log_ins_202101 values (NEW.*);
elsif (NEW.create_time >= '2021-02-01' and NEW.create_time < '2021-03-01') then
insert into log_ins_202102 values (NEW.*);
elsif (NEW.create_time >= '2021-03-01' and NEW.create_time < '2021-04-01') then
insert into log_ins_202103 values (NEW.*);
elsif (NEW.create_time >= '2021-04-01' and NEW.create_time < '2021-05-01') then
insert into log_ins_202104 values (NEW.*);
elsif (NEW.create_time >= '2021-05-01' and NEW.create_time < '2021-06-01') then
insert into log_ins_202105 values (NEW.*);
elsif (NEW.create_time >= '2021-06-01' and NEW.create_time < '2021-07-01') then
insert into log_ins_202106 values (NEW.*);
elsif (NEW.create_time >= '2021-07-01' and NEW.create_time < '2021-08-01') then
insert into log_ins_202107 values (NEW.*);
elsif (NEW.create_time >= '2021-08-01' and NEW.create_time < '2021-09-01') then
insert into log_ins_202108 values (NEW.*);
elsif (NEW.create_time >= '2021-09-01' and NEW.create_time < '2021-10-01') then
insert into log_ins_202109 values (NEW.*);
elsif (NEW.create_time >= '2021-10-01' and NEW.create_time < '2021-11-01') then
insert into log_ins_202110 values (NEW.*);
elsif (NEW.create_time >= '2021-11-01' and NEW.create_time < '2021-12-01') then
insert into log_ins_202111 values (NEW.*);
elsif (NEW.create_time >= '2021-12-01' and NEW.create_time < '2022-01-01') then
insert into log_ins_202112 values (NEW.*);
elsif (NEW.create_time >= '2022-01-01' and NEW.create_time < '2022-02-01') then
insert into log_ins_202201 values (NEW.*);
else
raise exception 'create_time out of range. Fix the log_ins_insert_trigger() function!';
end if;
return null;
end;
$function$;
(4)创建相关索引
create index idx_log_ins_202201_ctime on log_ins_202201 using btree (create_time);
(5)所有步骤完成后,将新区log_ins_202201继承到父表log_ins
alter table log_ins_202201 inherits log_ins;
方法二、正常步骤:
(1)创建子表
create table log_ins_202202
(check(create_time >= '2022-02-01' and create_time >= '2022-03-01')) inherits (log_ins);
(2)创建相关索引
create index idx_log_ins_202202_ctime on log_ins_202202 using btree (create_time);
(3)刷新触发器函数log_ins_insert_trigger()
create or replace function log_ins_insert_trigger()
returns trigger
language plpgsql
as $function$
begin
if (NEW.create_time < '2021-01-01') then
insert into log_ins_history values (NEW.*);
elsif (NEW.create_time >= '2021-01-01' and NEW.create_time < '2021-02-01') then
insert into log_ins_202101 values (NEW.*);
elsif (NEW.create_time >= '2021-02-01' and NEW.create_time < '2021-03-01') then
insert into log_ins_202102 values (NEW.*);
elsif (NEW.create_time >= '2021-03-01' and NEW.create_time < '2021-04-01') then
insert into log_ins_202103 values (NEW.*);
elsif (NEW.create_time >= '2021-04-01' and NEW.create_time < '2021-05-01') then
insert into log_ins_202104 values (NEW.*);
elsif (NEW.create_time >= '2021-05-01' and NEW.create_time < '2021-06-01') then
insert into log_ins_202105 values (NEW.*);
elsif (NEW.create_time >= '2021-06-01' and NEW.create_time < '2021-07-01') then
insert into log_ins_202106 values (NEW.*);
elsif (NEW.create_time >= '2021-07-01' and NEW.create_time < '2021-08-01') then
insert into log_ins_202107 values (NEW.*);
elsif (NEW.create_time >= '2021-08-01' and NEW.create_time < '2021-09-01') then
insert into log_ins_202108 values (NEW.*);
elsif (NEW.create_time >= '2021-09-01' and NEW.create_time < '2021-10-01') then
insert into log_ins_202109 values (NEW.*);
elsif (NEW.create_time >= '2021-10-01' and NEW.create_time < '2021-11-01') then
insert into log_ins_202110 values (NEW.*);
elsif (NEW.create_time >= '2021-11-01' and NEW.create_time < '2021-12-01') then
insert into log_ins_202111 values (NEW.*);
elsif (NEW.create_time >= '2021-12-01' and NEW.create_time < '2022-01-01') then
insert into log_ins_202112 values (NEW.*);
elsif (NEW.create_time >= '2022-01-01' and NEW.create_time < '2022-02-01') then
insert into log_ins_202201 values (NEW.*);
elsif (NEW.create_time >= '2022-02-01' and NEW.create_time < '2022-03-01') then
insert into log_ins_202202 values (NEW.*);
else
raise exception 'create_time out of range. Fix the log_ins_insert_trigger() function!';
end if;
return null;
end;
$function$;
postgres=# \d+ log_ins
Table "public.log_ins"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
-------------+--------------------------------+-----------+----------+-------------------------------------+---------+-------------+--------------+-------------
id | integer | | not null | nextval('log_ins_id_seq'::regclass) | plain | | |
user_id | integer | | | | plain | | |
create_time | timestamp(0) without time zone | | | | plain | | |
Triggers:
log_ins_insert_trigger BEFORE INSERT ON log_ins FOR EACH ROW EXECUTE FUNCTION log_ins_insert_trigger()
Child tables: log_ins_202101,
log_ins_202102,
log_ins_202103,
log_ins_202104,
log_ins_202105,
log_ins_202106,
log_ins_202107,
log_ins_202108,
log_ins_202109,
log_ins_202110,
log_ins_202111,
log_ins_202112,
log_ins_202201,
log_ins_202202,
log_ins_history
Access method: heap
2.7.删除分区
传统分区表删除分区通常有两种方法:
第一种方法,直接删除分区:
drop table log_ins_202202;
postgres=# \d+ log_ins;
Table "public.log_ins"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
-------------+--------------------------------+-----------+----------+-------------------------------------+---------+-------------+--------------+-------------
id | integer | | not null | nextval('log_ins_id_seq'::regclass) | plain | | |
user_id | integer | | | | plain | | |
create_time | timestamp(0) without time zone | | | | plain | | |
Triggers:
log_ins_insert_trigger BEFORE INSERT ON log_ins FOR EACH ROW EXECUTE FUNCTION log_ins_insert_trigger()
Child tables: log_ins_202101,
log_ins_202102,
log_ins_202103,
log_ins_202104,
log_ins_202105,
log_ins_202106,
log_ins_202107,
log_ins_202108,
log_ins_202109,
log_ins_202110,
log_ins_202111,
log_ins_202112,
log_ins_202201,
log_ins_history
Access method: heap
第二种方法,先将分区的继承关系去掉,再删除表:
alter table log_ins_202201 no inherit log_ins;
postgres=# \d+ log_ins;
.........
.........
Triggers:
log_ins_insert_trigger BEFORE INSERT ON log_ins FOR EACH ROW EXECUTE FUNCTION log_ins_insert_trigger()
Child tables: log_ins_202101,
log_ins_202102,
log_ins_202103,
log_ins_202104,
log_ins_202105,
log_ins_202106,
log_ins_202107,
log_ins_202108,
log_ins_202109,
log_ins_202110,
log_ins_202111,
log_ins_202112,
log_ins_history
Access method: heap
select * from log_ins_202201 limit 3;
postgres=# select * from log_ins_202201 limit 3;
id | user_id | create_time
----+---------+-------------
(0 rows)
drop table log_ins_202201;
2.8.分区表的相关查询
如何查看分区表的定义、分区表分区信息呢?
第一种方法:
\d log_ins
postgres=# \d log_ins
Table "public.log_ins"
Column | Type | Collation | Nullable | Default
-------------+--------------------------------+-----------+----------+-------------------------------------
id | integer | | not null | nextval('log_ins_id_seq'::regclass)
user_id | integer | | |
create_time | timestamp(0) without time zone | | |
Triggers:
log_ins_insert_trigger BEFORE INSERT ON log_ins FOR EACH ROW EXECUTE FUNCTION log_ins_insert_trigger()
Number of child tables: 13 (Use \d+ to list them.)
以上显示了有13个分区,并且创建了触发器。
如果想列出分区名称可通过\d+ log_ins
Triggers:
log_ins_insert_trigger BEFORE INSERT ON log_ins FOR EACH ROW EXECUTE FUNCTION log_ins_insert_trigger()
Child tables: log_ins_202101,
log_ins_202102,
log_ins_202103,
log_ins_202104,
log_ins_202105,
log_ins_202106,
log_ins_202107,
log_ins_202108,
log_ins_202109,
log_ins_202110,
log_ins_202111,
log_ins_202112,
log_ins_history
Access method: heap
第二种方法,通过SQL命令:
select
nmsp_parent.nspname as parent_schema,
parent.relname as parent,
nmsp_child.nspname as child_schema,
child.relname as child_schema
from
pg_inherits join pg_class parent
on pg_inherits.inhparent = parent.oid join pg_class child
on pg_inherits.inhrelid = child.oid join pg_namespace nmsp_parent
on nmsp_parent.oid = parent.relnamespace join pg_namespace nmsp_child
on nmsp_child.oid = child.relnamespace
where
parent.relname = 'log_ins';
postgres=# select
postgres-# nmsp_parent.nspname as parent_schema,
postgres-# parent.relname as parent,
postgres-# nmsp_child.nspname as child_schema,
postgres-# child.relname as child_schema
postgres-# from
postgres-# pg_inherits join pg_class parent
postgres-# on pg_inherits.inhparent = parent.oid join pg_class child
postgres-# on pg_inherits.inhrelid = child.oid join pg_namespace nmsp_parent
postgres-# on nmsp_parent.oid = parent.relnamespace join pg_namespace nmsp_child
postgres-# on nmsp_child.oid = child.relnamespace
postgres-# where
postgres-# parent.relname = 'log_ins';
parent_schema | parent | child_schema | child_schema
---------------+---------+--------------+-----------------
public | log_ins | public | log_ins_history
public | log_ins | public | log_ins_202101
public | log_ins | public | log_ins_202102
public | log_ins | public | log_ins_202103
public | log_ins | public | log_ins_202104
public | log_ins | public | log_ins_202105
public | log_ins | public | log_ins_202106
public | log_ins | public | log_ins_202107
public | log_ins | public | log_ins_202108
public | log_ins | public | log_ins_202109
public | log_ins | public | log_ins_202110
public | log_ins | public | log_ins_202111
public | log_ins | public | log_ins_202112
(13 rows)
pg_inherits系统表记录了子表和父表之间的继承关系,查询列出指定分区表的分区。
查看一个库中有哪些分区表,并显示这些分区表的分区数量:
select
nspname,
relname,
count(*) as partition_num
from
pg_class c,
pg_namespace n,
pg_inherits i
where
c.oid = i.inhparent
and c.relnamespace = n.oid
and c.relhassubclass
and c.relkind in ('r','p')
group by 1,2
order by partition_num desc;
postgres=# select
postgres-# nspname,
postgres-# relname,
postgres-# count(*) as partition_num
postgres-# from
postgres-# pg_class c,
postgres-# pg_namespace n,
postgres-# pg_inherits i
postgres-# where
postgres-# c.oid = i.inhparent
postgres-# and c.relnamespace = n.oid
postgres-# and c.relhassubclass
postgres-# and c.relkind in ('r','p')
postgres-# group by 1,2
postgres-# order by partition_num desc;
nspname | relname | partition_num
---------+---------+---------------
public | log_ins | 13
public | tbl_log | 1
(2 rows)
log_ins分区表有13个分区
tbl_log分区表有1个分区
2.9.性能测试
基于分区表的分区键、非分区键查询和普通表性能有何差异?
将create_time字段作为传统分区表log_ins的分区键;
user_id字段作为分区表的非分区键。
创建一张普通表log,表结构和log_ins完全一致:
create table log
(
id serial,
user_id int4,
create_time timestamp(0) without time zone
);
insert into log (user_id,create_time)
select round(100000000*random()),generate_series('2020-12-01'::date,'2021-12-01'::date,'1 minute');
查看两张表数量:
select count(*) from log_ins;
select count(*) from log;
postgres=# select count(*) from log_ins;
count
--------
525601
(1 row)
postgres=# select count(*) from log;
count
--------
525601
(1 row)
普通表log创建索引:
create index idx_log_userid on log using btree(user_id);
create index idx_log_create_time on log using btree(create_time);
在分区表log_ins父表和所有子表的user_id上创建索引:
父表:
create index log_ins_userid on log_ins_history using btree (user_id);
子表:
create index idx_his_userid on log_ins_history using btree (user_id);
create index idx_log_ins_202101_userid on log_ins_202101 using btree (user_id);
create index idx_log_ins_202102_userid on log_ins_202102 using btree (user_id);
create index idx_log_ins_202103_userid on log_ins_202103 using btree (user_id);
create index idx_log_ins_202104_userid on log_ins_202104 using btree (user_id);
create index idx_log_ins_202105_userid on log_ins_202105 using btree (user_id);
create index idx_log_ins_202106_userid on log_ins_202106 using btree (user_id);
create index idx_log_ins_202107_userid on log_ins_202107 using btree (user_id);
create index idx_log_ins_202108_userid on log_ins_202108 using btree (user_id);
create index idx_log_ins_202109_userid on log_ins_202109 using btree (user_id);
create index idx_log_ins_202110_userid on log_ins_202110 using btree (user_id);
create index idx_log_ins_202111_userid on log_ins_202111 using btree (user_id);
create index idx_log_ins_202112_userid on log_ins_202112 using btree (user_id);
user_id对于分区表log_ins来说,这是非分区键。
场景一、根据user_id检索,普通表和分区表性能差异如何?
select * from log where user_id = ?;
select * from log_ins where user_id = ?;
查找一个在表log和log_ins都存在的user_id;
select * from log,log_ins where log.user_id = log_ins.user_id limit 1;
postgres=# select * from log,log_ins where log.user_id = log_ins.user_id limit 1;
id | user_id | create_time | id | user_id | create_time
-------+----------+---------------------+--------+----------+---------------------
47269 | 16283799 | 2021-01-02 19:48:00 | 525676 | 16283799 | 2020-12-01 01:14:00
(1 row)
普通表log上的执行计划:
explain select * from log where user_id = '16283799';
postgres=# explain select * from log where user_id = '16283799';
QUERY PLAN
---------------------------------------------------------------------------
Index Scan using idx_log_userid on log (cost=0.42..8.44 rows=1 width=16)
Index Cond: (user_id = 16283799)
(2 rows)
explain select * from log_ins where user_id = '16283799';
postgres=# explain select * from log_ins where user_id = '16283799';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Append (cost=0.00..100.77 rows=14 width=16)
-> Seq Scan on log_ins log_ins_1 (cost=0.00..0.00 rows=1 width=16)
Filter: (user_id = 16283799)
-> Index Scan using idx_his_userid on log_ins_history log_ins_2 (cost=0.29..8.31 rows=1 width=16)
Index Cond: (user_id = 16283799)
-> Index Scan using idx_log_ins_202101_userid on log_ins_202101 log_ins_3 (cost=0.29..8.31 rows=1 width=16)
Index Cond: (user_id = 16283799)
-> Index Scan using idx_log_ins_202102_userid on log_ins_202102 log_ins_4 (cost=0.29..8.31 rows=1 width=16)
Index Cond: (user_id = 16283799)
-> Index Scan using idx_log_ins_202103_userid on log_ins_202103 log_ins_5 (cost=0.29..8.31 rows=1 width=16)
Index Cond: (user_id = 16283799)
-> Index Scan using idx_log_ins_202104_userid on log_ins_202104 log_ins_6 (cost=0.29..8.31 rows=1 width=16)
Index Cond: (user_id = 16283799)
-> Index Scan using idx_log_ins_202105_userid on log_ins_202105 log_ins_7 (cost=0.29..8.31 rows=1 width=16)
Index Cond: (user_id = 16283799)
-> Index Scan using idx_log_ins_202106_userid on log_ins_202106 log_ins_8 (cost=0.29..8.31 rows=1 width=16)
Index Cond: (user_id = 16283799)
-> Index Scan using idx_log_ins_202107_userid on log_ins_202107 log_ins_9 (cost=0.29..8.31 rows=1 width=16)
Index Cond: (user_id = 16283799)
-> Index Scan using idx_log_ins_202108_userid on log_ins_202108 log_ins_10 (cost=0.29..8.31 rows=1 width=16)
Index Cond: (user_id = 16283799)
-> Index Scan using idx_log_ins_202109_userid on log_ins_202109 log_ins_11 (cost=0.29..8.31 rows=1 width=16)
Index Cond: (user_id = 16283799)
-> Index Scan using idx_log_ins_202110_userid on log_ins_202110 log_ins_12 (cost=0.29..8.31 rows=1 width=16)
Index Cond: (user_id = 16283799)
-> Index Scan using idx_log_ins_202111_userid on log_ins_202111 log_ins_13 (cost=0.29..8.31 rows=1 width=16)
Index Cond: (user_id = 16283799)
-> Seq Scan on log_ins_202112 log_ins_14 (cost=0.00..1.01 rows=1 width=16)
Filter: (user_id = 16283799)
(29 rows)
执行sql三次花费0.427 ms,0.645 ms,0.430 ms:
select * from log where user_id = '16283799';
执行sql三次花费1.290 ms,1.147 ms,1.021 ms:
select * from log_ins where user_id = '16283799';
场景二、根据create_time检索
create_time字段是分区表log_ins分区键。
select * from log where create_time > '2021-01-01' and create_time < '2021-01-02';
select * from log_ins where create_time > '2021-01-01' and create_time < '2021-01-02';
执行sql三次花费1.734 ms,1.630 ms,1.884 ms:
select * from log where create_time > '2021-01-01' and create_time < '2021-01-02';
执行sql三次花费2.567 ms,1.997 ms,2.152 ms:
select * from log_ins where create_time > '2021-01-01' and create_time < '2021-01-02';
执行sql三次花费1.690 ms,1.742 ms,1.766 ms:
select * from log_ins_202101 where create_time > '2021-01-01' and create_time < '2021-01-02';
PG14普通表、传统分区表性能对比
查询场景 普通表:log执行时间 分区表:查询log_ins父表执行时间 分区表:查询log_ins子表执行时间
根据非分区键user_id查询 0.427 ms 1.021 ms 不支持
根据分区键create_time范围查询 1.630 ms 1.997 ms 1.690 ms
结论:
1.根据非分区键user_id查询相比普通表性能差距较大。
2.根据分区键create_time范围查询相比普通表性能有小幅降低,
而查询分区表子表性能比普通表略有提升。
2.10.传统分区表注意事项
(1)当往父表上插入数据时,需要先在父表上创建路由函数和触发器。
(2)分区表上的索引、约束需要使用单独的命令创建。
(3)父表和子表允许单独定义主键,因此父表和子表可能存在重复的主键记录,
目前不支持在分区表上定义全局主键。
(4)update时不建议更新分区键数据,特别会使数据从一个分区移动到另一个分区的场景,
可通过更新触发器实现,但会带来管理上的成本。