文章目录
前言
一、PG分区表-内置分区表
PostgreSQL10一个重量级新特性是支持内置分区表,目前支持范围分区和列表分区。
1.创建分区表
创建分区表的主要语法包含两部分:创建主表和创建分区。
创建主表:
create table table_name (...)
[ partition by { range | list } ({ column_name | ( expression )})]
创建主表时须指定分区方式,可选的分区的方式为RANGE范围分区或LIST列表分区
创建分区:
create table table_name
partition of parent_table [()] for values partition_bound_spec
partition_bound_spec:范围分区,指定每个分区分区键的取值范围;
partition_bound_spec:列表分区,需指定每个分区的分区键值。
pg10创建内置分区表主要分为以下几个步骤:
(1)创建父表,指定分区键和分区策略。
(2)创建分区,创建分区时须指定分区表的父表和分区键的取值范围,
注意分区键的范围不要有重叠,否则会报错。
(3)在分区上创建相应索引,分区键上的索引是必须的,非分区键的索引可
根据实际应用场景选择是否创建。
范围分区表:
create table log_par
(
id serial,
user_id int4,
create_time timestamp(0) without time zone
)partition by range (create_time);
以上分区策略为范围分区。
创建分区,并设置分区的分区键取值范围:
create table log_par_history partition of log_par for values from (UNBOUNDED) to ('2021-01-01');
postgres=# create table log_par_history partition of log_par for values from (UNBOUNDED) to ('2021-01-01');
ERROR: cannot use column reference in partition bound expression
LINE 1: ...par_history partition of log_par for values from (UNBOUNDED)...
UNBOUNDED
Not sure where the book gets it from, but according to the manual, that should be maxvalue:
The special values MINVALUE and MAXVALUE may be used when
creating a range partition to indicate that there is no lower or upper bound on the column's value
不知道这本书是从哪里得到的,但根据手册,应该是最大值:
创建范围分区时可以使用特殊值 MINVALUE 和 MAXVALUE 来指示列值没有下限或上限
CREATE TABLE logs_gt_2011 PARTITION OF logs
FOR VALUES FROM ('2012-01-01') TO (maxvalue);
Share
Improve this answer
Follow
'
create table log_par_history partition of log_par for values from (MINVALUE) to ('2021-01-01');
create table log_par_202101 partition of log_par for values from ('2021-01-01') to ('2021-02-01');
create table log_par_202102 partition of log_par for values from ('2021-02-01') to ('2021-03-01');
create table log_par_202103 partition of log_par for values from ('2021-03-01') to ('2021-04-01');
create table log_par_202104 partition of log_par for values from ('2021-04-01') to ('2021-05-01');
create table log_par_202105 partition of log_par for values from ('2021-05-01') to ('2021-06-01');
create table log_par_202106 partition of log_par for values from ('2021-06-01') to ('2021-07-01');
create table log_par_202107 partition of log_par for values from ('2021-07-01') to ('2021-08-01');
create table log_par_202108 partition of log_par for values from ('2021-08-01') to ('2021-09-01');
create table log_par_202109 partition of log_par for values from ('2021-09-01') to ('2021-10-01');
create table log_par_202110 partition of log_par for values from ('2021-10-01') to ('2021-11-01');
create table log_par_202111 partition of log_par for values from ('2021-11-01') to ('2021-12-01');
create table log_par_202112 partition of log_par for values from ('2021-12-01') to ('2022-01-01');
注意分区的分区键范围不要有重叠,定义分区键范围实质上给分区创建了约束。
给所有分区的分区键创建索引:
create index idx_log_par_his_ctime on log_par_history using btree (create_time);
create index idx_log_par_202101_ctime on log_par_202101 using btree (create_time);
create index idx_log_par_202102_ctime on log_par_202102 using btree (create_time);
create index idx_log_par_202103_ctime on log_par_202103 using btree (create_time);
create index idx_log_par_202104_ctime on log_par_202104 using btree (create_time);
create index idx_log_par_202105_ctime on log_par_202105 using btree (create_time);
create index idx_log_par_202106_ctime on log_par_202106 using btree (create_time);
create index idx_log_par_202107_ctime on log_par_202107 using btree (create_time);
create index idx_log_par_202108_ctime on log_par_202108 using btree (create_time);
create index idx_log_par_202109_ctime on log_par_202109 using btree (create_time);
create index idx_log_par_202110_ctime on log_par_202110 using btree (create_time);
create index idx_log_par_202111_ctime on log_par_202111 using btree (create_time);
create index idx_log_par_202112_ctime on log_par_202112 using btree (create_time);
postgres=# \d log_par;
Partitioned table "public.log_par"
Column | Type | Collation | Nullable | Default
-------------+--------------------------------+-----------+----------+-------------------------------------
id | integer | | not null | nextval('log_par_id_seq'::regclass)
user_id | integer | | |
create_time | timestamp(0) without time zone | | |
Partition key: RANGE (create_time)
Number of partitions: 13 (Use \d+ to list them.)
postgres=# \d+ log_par;
Partitioned table "public.log_par"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
-------------+--------------------------------+-----------+----------+-------------------------------------+---------+-------------+--------------+-------------
id | integer | | not null | nextval('log_par_id_seq'::regclass) | plain | | |
user_id | integer | | | | plain | | |
create_time | timestamp(0) without time zone | | | | plain |
| |
Partition key: RANGE (create_time)
Partitions: log_par_202101 FOR VALUES FROM ('2021-01-01 00:00:00') TO ('2021-02-01 00:00:00'),
log_par_202102 FOR VALUES FROM ('2021-02-01 00:00:00') TO ('2021-03-01 00:00:00'),
log_par_202103 FOR VALUES FROM ('2021-03-01 00:00:00') TO ('2021-04-01 00:00:00'),
log_par_202104 FOR VALUES FROM ('2021-04-01 00:00:00') TO ('2021-05-01 00:00:00'),
log_par_202105 FOR VALUES FROM ('2021-05-01 00:00:00') TO ('2021-06-01 00:00:00'),
log_par_202106 FOR VALUES FROM ('2021-06-01 00:00:00') TO ('2021-07-01 00:00:00'),
log_par_202107 FOR VALUES FROM ('2021-07-01 00:00:00') TO ('2021-08-01 00:00:00'),
log_par_202108 FOR VALUES FROM ('2021-08-01 00:00:00') TO ('2021-09-01 00:00:00'),
log_par_202109 FOR VALUES FROM ('2021-09-01 00:00:00') TO ('2021-10-01 00:00:00'),
log_par_202110 FOR VALUES FROM ('2021-10-01 00:00:00') TO ('2021-11-01 00:00:00'),
log_par_202111 FOR VALUES FROM ('2021-11-01 00:00:00') TO ('2021-12-01 00:00:00'),
log_par_202112 FOR VALUES FROM ('2021-12-01 00:00:00') TO ('2022-01-01 00:00:00'),
log_par_history FOR VALUES FROM (MINVALUE) TO ('2021-01-01 00:00:00')
2.使用分区表
向分区表插入数据:
insert into log_par (user_id,create_time)
select round(100000000*random()),generate_series('2020-12-01'::date,'2021-12-01'::date,'1 minute');
查询数据:
select count(*) from log_par;
select count(*) from only log_par;
postgres=# select count(*) from log_par;
count
--------
525601
(1 row)
postgres=# select count(*) from only log_par;
count
-------
0
(1 row)
postgres=# \dt+ log_par*
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+-----------------+-------------------+----------+-------------+---------------+------------+-------------
public | log_par | partitioned table | postgres | permanent | | 0 bytes |
public | log_par_202101 | table | postgres | permanent | heap | 1968 kB |
public | log_par_202102 | table | postgres | permanent | heap | 1776 kB |
public | log_par_202103 | table | postgres | permanent | heap | 1968 kB |
public | log_par_202104 | table | postgres | permanent | heap | 1904 kB |
public | log_par_202105 | table | postgres | permanent | heap | 1968 kB |
public | log_par_202106 | table | postgres | permanent | heap | 1904 kB |
public | log_par_202107 | table | postgres | permanent | heap | 1968 kB |
public | log_par_202108 | table | postgres | permanent | heap | 1968 kB |
public | log_par_202109 | table | postgres | permanent | heap | 1904 kB |
public | log_par_202110 | table | postgres | permanent | heap | 1968 kB |
public | log_par_202111 | table | postgres | permanent | heap | 1904 kB |
public | log_par_202112 | table | postgres | permanent | heap | 8192 bytes |
public | log_par_history | table | postgres | permanent | heap | 1968 kB |
(14 rows)
查询表所有分区:
select
nmsp_parent.nspname as parent_schema,
parent.relname as parent,
nmsp_child.nspname as child_schema,
child.relname as child_schema
from
pg_inherits join pg_class parent
on pg_inherits.inhparent = parent.oid join pg_class child
on pg_inherits.inhrelid = child.oid join pg_namespace nmsp_parent
on nmsp_parent.oid = parent.relnamespace join pg_namespace nmsp_child
on nmsp_child.oid = child.relnamespace
where
parent.relname = 'log_par';
3.内置分区表探索
postgres=# select
postgres-# nmsp_parent.nspname as parent_schema,
postgres-# parent.relname as parent,
postgres-# nmsp_child.nspname as child_schema,
postgres-# child.relname as child_schema
postgres-# from
postgres-# pg_inherits join pg_class parent
postgres-# on pg_inherits.inhparent = parent.oid join pg_class child
postgres-# on pg_inherits.inhrelid = child.oid join pg_namespace nmsp_parent
postgres-# on nmsp_parent.oid = parent.relnamespace join pg_namespace nmsp_child
postgres-# on nmsp_child.oid = child.relnamespace
postgres-# where
postgres-# parent.relname = 'log_par';
parent_schema | parent | child_schema | child_schema
---------------+---------+--------------+-----------------
public | log_par | public | log_par_history
public | log_par | public | log_par_202101
public | log_par | public | log_par_202102
public | log_par | public | log_par_202103
public | log_par | public | log_par_202104
public | log_par | public | log_par_202105
public | log_par | public | log_par_202106
public | log_par | public | log_par_202107
public | log_par | public | log_par_202108
public | log_par | public | log_par_202109
public | log_par | public | log_par_202110
public | log_par | public | log_par_202111
public | log_par | public | log_par_202112
(13 rows)
4.添加分区
create table log_par_202201 partition of log_par for values from ('2022-01-01') to ('2022-02-01');
create table log_par_202202 partition of log_par for values from ('2022-02-01') to ('2022-03-01');
给分区创建索引
create index idx_log_ins_202201_ctime on log_par_202201 using btree (create_time);
create index idx_log_ins_202202_ctime on log_par_202202 using btree (create_time);
5.删除分区
第一种方法:
drop table log_par_202202;
另一种推荐的方法是解绑分区:
alter table log_par detach partition log_par_202201;
如果后续需要恢复这个分区,通过连接分区方式恢复分区即可:
alter table log_par attach partition log_par_202201 for values from ('2022-01-01') to ('2022-02-01');
6.性能测试
explain analyze select * from log_par where create_time > '2021-01-01' and create_time < '2021-01-02';
postgres=# explain analyze select * from log_par where create_time > '2021-01-01' and create_time < '2021-01-02';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_log_par_202101_ctime on log_par_202101 log_par (cost=0.29..60.57 rows=1464 width=16) (actual time=0.019..0.276 rows=1439 loops=1)
Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone) AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
Planning Time: 0.393 ms
Execution Time: 0.365 ms
(4 rows)
基于分区表的分区键、非分区键查询和普通表性能有何差异?
在分区表log_par所有子表的user_id上创建索引:
create index idx_log_par_his_userid on log_par_history using btree (user_id);
create index idx_log_par_202101_userid on log_par_202101 using btree (user_id);
create index idx_log_par_202102_userid on log_par_202102 using btree (user_id);
create index idx_log_par_202103_userid on log_par_202103 using btree (user_id);
create index idx_log_par_202104_userid on log_par_202104 using btree (user_id);
create index idx_log_par_202105_userid on log_par_202105 using btree (user_id);
create index idx_log_par_202106_userid on log_par_202106 using btree (user_id);
create index idx_log_par_202107_userid on log_par_202107 using btree (user_id);
create index idx_log_par_202108_userid on log_par_202108 using btree (user_id);
create index idx_log_par_202109_userid on log_par_202109 using btree (user_id);
create index idx_log_par_202110_userid on log_par_202110 using btree (user_id);
create index idx_log_par_202111_userid on log_par_202111 using btree (user_id);
create index idx_log_par_202112_userid on log_par_202112 using btree (user_id);
场景一、根据user_id检索,对于分区表log_par而言这是非分区键
select * from log where user_id = ?;
select * from log_par where user_id = ?;
查找相同的user_id
select a.* from log a,log_par b where a.user_id = b.user_id limit 1;
postgres=# select a.* from log a,log_par b where a.user_id = b.user_id limit 1;
id | user_id | create_time
--------+----------+---------------------
132279 | 81924442 | 2021-03-02 20:38:00
(1 row)
普通表log,根据user_id检索:
explain select * from log where user_id = 81924442;
postgres=# explain select * from log where user_id = 81924442;
QUERY PLAN
---------------------------------------------------------------------------
Index Scan using idx_log_userid on log (cost=0.42..8.44 rows=1 width=16)
Index Cond: (user_id = 81924442)
(2 rows)
分区表log_par,根据user_id检索:
explain select * from log_par where user_id = 81924442;
postgres=# explain select * from log_par where user_id = 81924442;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Gather (cost=1000.00..7763.01 rows=30 width=16)
Workers Planned: 2
-> Parallel Append (cost=0.00..6760.01 rows=20 width=16)
-> Parallel Seq Scan on log_par_history log_par_1 (cost=0.00..570.24 rows=1 width=16)
Filter: (user_id = 81924442)
-> Parallel Seq Scan on log_par_202101 log_par_2 (cost=0.00..570.24 rows=1 width=16)
Filter: (user_id = 81924442)
-> Parallel Seq Scan on log_par_202103 log_par_4 (cost=0.00..570.24 rows=1 width=16)
Filter: (user_id = 81924442)
-> Parallel Seq Scan on log_par_202105 log_par_6 (cost=0.00..570.24 rows=1 width=16)
Filter: (user_id = 81924442)
-> Parallel Seq Scan on log_par_202107 log_par_8 (cost=0.00..570.24 rows=1 width=16)
Filter: (user_id = 81924442)
-> Parallel Seq Scan on log_par_202108 log_par_9 (cost=0.00..570.24 rows=1 width=16)
Filter: (user_id = 81924442)
-> Parallel Seq Scan on log_par_202110 log_par_11 (cost=0.00..570.24 rows=1 width=16)
Filter: (user_id = 81924442)
-> Parallel Seq Scan on log_par_202104 log_par_5 (cost=0.00..551.65 rows=1 width=16)
Filter: (user_id = 81924442)
-> Parallel Seq Scan on log_par_202106 log_par_7 (cost=0.00..551.65 rows=1 width=16)
Filter: (user_id = 81924442)
-> Parallel Seq Scan on log_par_202109 log_par_10 (cost=0.00..551.65 rows=1 width=16)
Filter: (user_id = 81924442)
-> Parallel Seq Scan on log_par_202111 log_par_12 (cost=0.00..551.65 rows=1 width=16)
Filter: (user_id = 81924442)
-> Parallel Seq Scan on log_par_202102 log_par_3 (cost=0.00..514.47 rows=1 width=16)
Filter: (user_id = 81924442)
-> Parallel Seq Scan on log_par_202112 log_par_13 (cost=0.00..23.60 rows=5 width=16)
Filter: (user_id = 81924442)
-> Parallel Seq Scan on log_par_202201 log_par_14 (cost=0.00..23.60 rows=5 width=16)
Filter: (user_id = 81924442)
(31 rows)
log表,sql执行三次,最短时间0.430 ms:
select * from log where user_id = 81924442;
postgres=# \timing
Timing is on.
postgres=# select * from log where user_id = 81924442;
id | user_id | create_time
--------+----------+---------------------
132279 | 81924442 | 2021-03-02 20:38:00
(1 row)
Time: 0.705 ms
postgres=# select * from log where user_id = 81924442;
id | user_id | create_time
--------+----------+---------------------
132279 | 81924442 | 2021-03-02 20:38:00
(1 row)
Time: 0.533 ms
postgres=# select * from log where user_id = 81924442;
id | user_id | create_time
--------+----------+---------------------
132279 | 81924442 | 2021-03-02 20:38:00
(1 row)
Time: 0.430 ms
log_par表,sql执行三次,最短时间57.879 ms:
select * from log_par where user_id = 81924442;
postgres=# select * from log_par where user_id = 81924442;
id | user_id | create_time
----+----------+---------------------
7 | 81924442 | 2020-12-01 00:06:00
(1 row)
Time: 59.518 ms
postgres=# select * from log_par where user_id = 81924442;
id | user_id | create_time
----+----------+---------------------
7 | 81924442 | 2020-12-01 00:06:00
(1 row)
Time: 61.964 ms
postgres=# select * from log_par where user_id = 81924442;
id | user_id | create_time
----+----------+---------------------
7 | 81924442 | 2020-12-01 00:06:00
(1 row)
Time: 57.879 ms
场景二、根据create_time检索,create_time字段分区表log_par分区键。
select * from log where create_time > '2021-01-01' and create_time < '2021-01-02';
select * from log_par where create_time > '2021-01-01' and create_time < '2021-01-02';
log表,sql执行三次,1.722 ms,1.647 ms,1.863 ms,最短时间1.647 ms:
select * from log where create_time > '2021-01-01' and create_time < '2021-01-02';
log_par表,log_par执行三次,1.668 ms,1.840 ms,1.798 ms,最短时间1.668 ms:
select * from log_par where create_time > '2021-01-01' and create_time < '2021-01-02';
log_par_202101表,log_par_202101执行三次,1.776 ms,1.629 ms,1.565 ms,最短时间1.565 ms:
select * from log_par_202101 where create_time > '2021-01-01' and create_time < '2021-01-02';
PG14普通表、内置分区表性能对比
查询场景 普通表:log执行时间 分区表:查询log_par父表执行时间 分区表:查询log_ins子表执行时间
根据非分区键user_id查询 0.430 ms 57.879 ms 不支持
根据分区键create_time范围查询 1.647 ms 1.668 ms 1.565 ms
结论:
(1)内置分区表根据非分区键查询相比普通表性能差距较大,
因为这种场景分区表的执行计划会扫描所有分区。
(2)内置分区表根据分区键查询相比普通表性能有小幅降低,
而查询分区表子表性能比普通表略有提升。
7.constraint_exclusion参数
set constraint_exclusion = off;
explain analyze select * from log_par where create_time > '2021-01-01' and create_time < '2021-01-02';
postgres=# show constraint_exclusion;
constraint_exclusion
----------------------
off
(1 row)
Time: 0.281 ms
postgres=# explain analyze select * from log_par where create_time > '2021-01-01' and create_time < '2021-01-02';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_log_par_202101_ctime on log_par_202101 log_par (cost=0.29..60.57 rows=1464 width=16) (actual time
=0.023..0.284 rows=1439 loops=1)
Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone) AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
Planning Time: 0.221 ms
Execution Time: 0.415 ms
(4 rows)
Time: 1.004 ms
8.PG14更新分区数据
PG10之前内置分区表update操作目前不支持更新记录跨分区的情况,
也就是说只允许分区内的更新:
PG14可以支持更新记录跨分区的情况:
select * from log_par_202101 limit 1;
postgres=# select * from log_par_202101 limit 1;
id | user_id | create_time
-------+----------+---------------------
44641 | 91332064 | 2021-01-01 00:00:00
(1 row)
update log_par set create_time = '2021-02-02 01:01:01' where user_id = 91332064;
postgres=# update log_par set create_time = '2021-02-02 01:01:01' where user_id = 91332064;
UPDATE 1
Time: 41.417 ms
9.内置分区注意事项
(1)当往父表上插入数据时,数据会自动根据分区键路由规则插入到分区中,目前仅支持范围分区和列表分区。
(2)分区表上的索引、约束需使用单独的命令创建,目前没有办法一次性自动在所有分区上创建索引、约束。
(3)内置分区表不支持定义(全局)主键,在分区表的分区上创建主键是可以的。
(4)内置分区表的内部实现了继承。
(5)update语句的新记录违反当前分区键的约束则会报错,
(pg10,update语句的新记录目前不支持跨分区的情况;pg14,update语句的新记录目前支持跨分区的情况)。