14.PG分区表-传统分区表

前言

一、PG分区表-传统分区表

pg10版本之前pg分区表一般通过继承加触发器方式实现,这种分区方式不能算是内置分区表,而且步骤非常繁琐。
pg10版本一个重量级的新特性是支持内置分区表。pg10支持范围分区和列表分区。

1.分区表的意义

分区表的优势主要体现在降低大表管理成本和某些场景的性能提升。

分区表主要有以下优势:
(1)当查询或更新一个分区上的大部分数据时,对分区进行索引扫描代价很大,
然而,在分区上使用顺序扫描能提升性能。

(2)当需要删除一个分区数据时,通过drop table删除一个分区,远比delete删除数据高效,
特别适用于日志数据场景。

(3)由于一个表只能存储在一个表空间上,使用分区表后,可以将分区放到不同表的表空间上。
可以将分区放到廉价的存储设备上,也可以将系统常访问的分区存储在高速存储上。

2.传统分区表

传统分区表是通过继承和触发器方式实现的。
需要定义父表、定义子表、定义子表约束、创建子表索引、创建分区插入、删除、修改函数和触发器等。
继承是传统分区表的重要组成部分。

2.1.继承表

首先定义一张父表,之后创建子表并继承父表。
创建一张日志模型表tbl_log父表:
create table tbl_log
(
id int4,
create_date date,
log_type text
);

创建一张子表tbl_log_sql用于存储sql日志:
create table tbl_log_sql(sql text) inherits (tbl_log);

查看tbl_log_sql表结构
\d tbl_log_sql
postgres=# \d tbl_log_sql
               Table "public.tbl_log_sql"
   Column    |  Type   | Collation | Nullable | Default
-------------+---------+-----------+----------+---------
 id          | integer |           |          |
 create_date | date    |           |          |
 log_type    | text    |           |          |
 sql         | text    |           |          |
Inherits: tbl_log

父表和子表都可以插入数据
父表:
insert into tbl_log values (1,'2017-08-26',null);

子表:
insert into tbl_log_sql values (1,'2017-08-27',null,'select 2');

查询父表:
不加only,显示与父表与子表字段相同的数据
postgres=# select * from tbl_log;
 id | create_date | log_type
----+-------------+----------
  1 | 2017-08-26  |
  1 | 2017-08-27  |
(2 rows)

加only只显示父表数据
postgres=# select * from only tbl_log;
 id | create_date | log_type
----+-------------+----------
  1 | 2017-08-26  |
(1 row)

确定数据来源于哪些表,可通过以下SQL查看表的OID:
select tableoid,* from tbl_log;
postgres=# select tableoid,* from tbl_log;
 tableoid | id | create_date | log_type
----------+----+-------------+----------
    16410 |  1 | 2017-08-26  |
    16415 |  1 | 2017-08-27  |
(2 rows)

tableoid是表的隐藏字段,表示表的OID,可通过pg_class系统表关联找到表名。
select p.relname,c.*
from tbl_log c,pg_class p
where c.tableoid = p.oid;

postgres=# select p.relname,c.*
postgres-# from tbl_log c,pg_class p
postgres-# where c.tableoid = p.oid;
   relname   | id | create_date | log_type
-------------+----+-------------+----------
 tbl_log     |  1 | 2017-08-26  |
 tbl_log_sql |  1 | 2017-08-27  |
(2 rows)

如果父表名称前没有加only,则会对父表和所有子表进行DML操作。
删除父表数据:
delete from tbl_log;
select * from tbl_log;

postgres=# delete from tbl_log;
DELETE 2
postgres=# select * from tbl_log;
 id | create_date | log_type
----+-------------+----------
(0 rows)

2.2.创建分区表

(1)创建父表,如果父表上定义了约束,子表会继承;父表不应该写入数据。
(2)通过inherits方式创建继承表,也称之为子表或分区。
(3)给所有子表创建约束。
(4)给所有子表创建索引,由于继承操作不会继承父表上的索引,因此索引需要手工创建。
(5)在父表上定义insert、delete、update触发器,将sql分发到对应分区,
这步可选,因为应用可以根据分区规则定位到对应分区进行DML操作。
(6)启用constraint_exclusion参数,如果这个参数设置成off,则父表上sql性能会降低。

(1)创建父表
create table log_ins
(
id serial,
user_id int4,
create_time timestamp(0) without time zone
);

(2)创建13张子表
create table log_ins_history(check(create_time < '2021-01-01')) inherits(log_ins);
create table log_ins_202101(check(create_time >= '2021-01-01' and create_time < '2021-02-01')) inherits(log_ins);
create table log_ins_202102(check(create_time >= '2021-02-01' and create_time < '2021-03-01')) inherits(log_ins);
create table log_ins_202103(check(create_time >= '2021-03-01' and create_time < '2021-04-01')) inherits(log_ins);
create table log_ins_202104(check(create_time >= '2021-04-01' and create_time < '2021-05-01')) inherits(log_ins);
create table log_ins_202105(check(create_time >= '2021-05-01' and create_time < '2021-06-01')) inherits(log_ins);
create table log_ins_202106(check(create_time >= '2021-06-01' and create_time < '2021-07-01')) inherits(log_ins);
create table log_ins_202107(check(create_time >= '2021-07-01' and create_time < '2021-08-01')) inherits(log_ins);
create table log_ins_202108(check(create_time >= '2021-08-01' and create_time < '2021-09-01')) inherits(log_ins);
create table log_ins_202109(check(create_time >= '2021-09-01' and create_time < '2021-10-01')) inherits(log_ins);
create table log_ins_202110(check(create_time >= '2021-10-01' and create_time < '2021-11-01')) inherits(log_ins);
create table log_ins_202111(check(create_time >= '2021-11-01' and create_time < '2021-12-01')) inherits(log_ins);
create table log_ins_202112(check(create_time >= '2021-12-01' and create_time < '2022-01-01')) inherits(log_ins);

(3)给子表创建索引
create index idx_his_ctime on log_ins_history using btree (create_time);
create index idx_log_ins_202101_ctime on log_ins_202101 using btree (create_time);
create index idx_log_ins_202102_ctime on log_ins_202102 using btree (create_time);
create index idx_log_ins_202103_ctime on log_ins_202103 using btree (create_time);
create index idx_log_ins_202104_ctime on log_ins_202104 using btree (create_time);
create index idx_log_ins_202105_ctime on log_ins_202105 using btree (create_time);
create index idx_log_ins_202106_ctime on log_ins_202106 using btree (create_time);
create index idx_log_ins_202107_ctime on log_ins_202107 using btree (create_time);
create index idx_log_ins_202108_ctime on log_ins_202108 using btree (create_time);
create index idx_log_ins_202109_ctime on log_ins_202109 using btree (create_time);
create index idx_log_ins_202110_ctime on log_ins_202110 using btree (create_time);
create index idx_log_ins_202111_ctime on log_ins_202111 using btree (create_time);
create index idx_log_ins_202112_ctime on log_ins_202112 using btree (create_time);

(4)创建触发器函数,设置数据插入父表时的路由规则。
create or replace function log_ins_insert_trigger()
	returns trigger
	language plpgsql
as $function$
begin
	if (NEW.create_time < '2021-01-01') then
		insert into log_ins_history values (NEW.*);
	elsif (NEW.create_time >= '2021-01-01' and NEW.create_time < '2021-02-01') then
		insert into log_ins_202101 values (NEW.*);
	elsif (NEW.create_time >= '2021-02-01' and NEW.create_time < '2021-03-01') then
		insert into log_ins_202102 values (NEW.*);
	elsif (NEW.create_time >= '2021-03-01' and NEW.create_time < '2021-04-01') then
		insert into log_ins_202103 values (NEW.*);
    elsif (NEW.create_time >= '2021-04-01' and NEW.create_time < '2021-05-01') then
		insert into log_ins_202104 values (NEW.*);
    elsif (NEW.create_time >= '2021-05-01' and NEW.create_time < '2021-06-01') then
		insert into log_ins_202105 values (NEW.*);
    elsif (NEW.create_time >= '2021-06-01' and NEW.create_time < '2021-07-01') then
		insert into log_ins_202106 values (NEW.*);
    elsif (NEW.create_time >= '2021-07-01' and NEW.create_time < '2021-08-01') then
		insert into log_ins_202107 values (NEW.*);
    elsif (NEW.create_time >= '2021-08-01' and NEW.create_time < '2021-09-01') then
		insert into log_ins_202108 values (NEW.*);
    elsif (NEW.create_time >= '2021-09-01' and NEW.create_time < '2021-10-01') then
		insert into log_ins_202109 values (NEW.*);
    elsif (NEW.create_time >= '2021-10-01' and NEW.create_time < '2021-11-01') then
		insert into log_ins_202110 values (NEW.*);
    elsif (NEW.create_time >= '2021-11-01' and NEW.create_time < '2021-12-01') then
		insert into log_ins_202111 values (NEW.*);
    elsif (NEW.create_time >= '2021-12-01' and NEW.create_time < '2022-01-01') then
		insert into log_ins_202112 values (NEW.*);
    else 
	    raise exception 'create_time out of range. Fix the log_ins_insert_trigger() function!';
	end if;
	return null;
end;
$function$;

new.*:是指要插入的数据行。

(5)在父表上定义插入触发器:
create trigger log_ins_insert_trigger before insert on log_ins for each row 
execute procedure log_ins_insert_trigger();

2.3.使用分区表

往父表log_ins插入数据,并验证数据是否插入对应分区。

insert into log_ins (user_id,create_time)
select round(100000000*random()),generate_series('2020-12-01'::date,'2021-12-01'::date,'1 minute');

round(100000000*random()):随机生成8位整数。
generate_series:生成时间数据。

查询数据如下:
select * from log_ins limit 2;

postgres=# select * from log_ins limit 2;
 id | user_id  |     create_time
----+----------+---------------------
  1 | 37324797 | 2020-12-01 00:00:00
  2 |  9736688 | 2020-12-01 00:01:00
(2 rows)

查看父表数据,发现父表里没有数据:
postgres=# select count(*) from only log_ins;
 count
-------
     0
(1 row)

postgres=# select count(*) from log_ins;
 count
--------
 525601
(1 row)

查看子表数据:
select min(create_time),max(create_time) from log_ins_202101;
postgres=# select min(create_time),max(create_time) from log_ins_202101;
         min         |         max
---------------------+---------------------
 2021-01-01 00:00:00 | 2021-01-31 23:59:00
(1 row)

查看子表大小:
postgres=# \dt+ log_ins*
                                          List of relations
 Schema |      Name       | Type  |  Owner   | Persistence | Access method |    Size    | Description
--------+-----------------+-------+----------+-------------+---------------+------------+-------------
 public | log_ins         | table | postgres | permanent   | heap          | 0 bytes    |
 public | log_ins_202101  | table | postgres | permanent   | heap          | 1968 kB    |
 public | log_ins_202102  | table | postgres | permanent   | heap          | 1776 kB    |
 public | log_ins_202103  | table | postgres | permanent   | heap          | 1968 kB    |
 public | log_ins_202104  | table | postgres | permanent   | heap          | 1904 kB    |
 public | log_ins_202105  | table | postgres | permanent   | heap          | 1968 kB    |
 public | log_ins_202106  | table | postgres | permanent   | heap          | 1904 kB    |
 public | log_ins_202107  | table | postgres | permanent   | heap          | 1968 kB    |
 public | log_ins_202108  | table | postgres | permanent   | heap          | 1968 kB    |
 public | log_ins_202109  | table | postgres | permanent   | heap          | 1904 kB    |
 public | log_ins_202110  | table | postgres | permanent   | heap          | 1968 kB    |
 public | log_ins_202111  | table | postgres | permanent   | heap          | 1904 kB    |
 public | log_ins_202112  | table | postgres | permanent   | heap          | 8192 bytes |
 public | log_ins_history | table | postgres | permanent   | heap          | 1968 kB    |
(14 rows)

结论:数据都已经插入到子表里。

2.4.查询父表还是子表

查询父表:
explain analyze select * from log_ins 
where create_time > '2021-01-01' and create_time < '2021-01-02'; 

postgres=# explain analyze select * from log_ins
postgres-# where create_time > '2021-01-01' and create_time < '2021-01-02';

                                   QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..61.95 rows=1387 width=16) (actual time=0.029..0.406 rows=1439 loops=1)
   ->  Seq Scan on log_ins log_ins_1  (cost=0.00..0.00 rows=1 width=16) (actual time=0.005..0.005 rows=0 loops=1)
         Filter: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone) 
		 AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
   ->  Index Scan using idx_log_ins_202101_ctime on log_ins_202101 log_ins_2  (cost=0.29..55.01 rows=1386 width=16) 
   (actual time=0.024..0.283 rows=1439 loops=1)
         Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone) 
		 AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
 Planning Time: 0.519 ms
 Execution Time: 0.492 ms
(7 rows)

结论:在log_ins_202101上进行了索引扫描,执行时间为0.492 ms

查询子表:
explain analyze select * from log_ins_202101 
where create_time > '2021-01-01' and create_time < '2021-01-02'; 

postgres=# explain analyze select * from log_ins_202101
postgres-# where create_time > '2021-01-01' and create_time < '2021-01-02';
                        QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_log_ins_202101_ctime on log_ins_202101  (cost=0.29..55.01 rows=1386 width=16) 
 (actual time=0.021..0.275 rows=1439 loops=1)
   Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone) 
   AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
 Planning Time: 0.136 ms
 Execution Time: 0.358 ms
(4 rows)

结论:直接查询子表只需要0.358 ms。
结论:对于传统分区表分区方式,不建议应用访问父表,而是直接访问子表。

2.5.constraint_exclusion参数

constraint_exclusion:参数用来控制优化器是否根据表上的约束来优化查询。
on:所有表都通过约束优化查询。
off:所有表都不通过约束优化查询。
partition:只对继承表和union all子查询通过检索约束来优化查询。

就是说:设置成on或partition,查询父表时优化器会根据子表上的约束判断检索哪些子表。
而不需要扫描所有子表,从而提升查询性能。

查询默认设置参数:
postgres=# show constraint_exclusion;
 constraint_exclusion
----------------------
 partition
(1 row)

测试:
set constraint_exclusion=off;

postgres=# set constraint_exclusion=off;
SET
postgres=# show constraint_exclusion;
 constraint_exclusion
----------------------
 off
(1 row)


查询父表:
explain analyze select * from log_ins 
where create_time > '2021-01-01' and create_time < '2021-01-02'; 

postgres=# explain analyze select * from log_ins
postgres-# where create_time > '2021-01-01' and create_time < '2021-01-02';

                                            QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..168.26 rows=1407 width=16) (actual time=0.048..0.587 rows=1439 loops=1)
   ->  Seq Scan on log_ins log_ins_1  (cost=0.00..0.00 rows=1 width=16) (actual time=0.003..0.003 rows=0 loops=1)
         Filter: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone) 
		 AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
   ->  Index Scan using idx_his_ctime on log_ins_history log_ins_2  (cost=0.29..8.31 rows=1 width=16) 
   (actual time=0.017..0.017 rows=0 loops=1)
         Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone) 
		 AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
   ->  Index Scan using idx_log_ins_202101_ctime on log_ins_202101 log_ins_3  (cost=0.29..55.01 rows=1386 width=16) 
   (actual time=0.027..0.295 rows=1439 loops=1)
         Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone) 
		 AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
   ->  Index Scan using idx_log_ins_202102_ctime on log_ins_202102 log_ins_4  (cost=0.29..8.31 rows=1 width=16) 
   (actual time=0.017..0.017 rows=0 loops=1)
         Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone) 
		 AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
   ->  Index Scan using idx_log_ins_202103_ctime on log_ins_202103 log_ins_5  (cost=0.29..8.31 rows=1 width=16) 
   (actual time=0.010..0.010 rows=0 loops=1)
         Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone) 
		 AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
   ->  Index Scan using idx_log_ins_202104_ctime on log_ins_202104 log_ins_6  (cost=0.29..8.31 rows=1 width=16) 
   (actual time=0.015..0.015 rows=0 loops=1)
         Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone) 
		 AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
   ->  Index Scan using idx_log_ins_202105_ctime on log_ins_202105 log_ins_7  (cost=0.29..8.31 rows=1 width=16) 
   (actual time=0.008..0.009 rows=0 loops=1)
         Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone) 
		 AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
   ->  Index Scan using idx_log_ins_202106_ctime on log_ins_202106 log_ins_8  (cost=0.29..8.31 rows=1 width=16) 
   (actual time=0.016..0.016 rows=0 loops=1)
         Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone) 
		 AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
   ->  Index Scan using idx_log_ins_202107_ctime on log_ins_202107 log_ins_9  (cost=0.29..8.31 rows=1 width=16) 
   (actual time=0.008..0.008 rows=0 loops=1)
         Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone) 
		 AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
   ->  Index Scan using idx_log_ins_202108_ctime on log_ins_202108 log_ins_10  (cost=0.29..8.31 rows=1 width=16) 
   (actual time=0.017..0.017 rows=0 loops=1)
         Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone) 
		 AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
   ->  Index Scan using idx_log_ins_202109_ctime on log_ins_202109 log_ins_11  (cost=0.29..8.31 rows=1 width=16) 
   (actual time=0.016..0.016 rows=0 loops=1)
         Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone) 
		 AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
   ->  Index Scan using idx_log_ins_202110_ctime on log_ins_202110 log_ins_12  (cost=0.29..8.31 rows=1 width=16) 
   (actual time=0.012..0.012 rows=0 loops=1)
         Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone) 
		 AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
   ->  Index Scan using idx_log_ins_202111_ctime on log_ins_202111 log_ins_13  (cost=0.29..8.31 rows=1 width=16) 
   (actual time=0.012..0.012 rows=0 loops=1)
         Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone) 
		 AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
   ->  Bitmap Heap Scan on log_ins_202112 log_ins_14  (cost=4.24..14.81 rows=9 width=16) 
   (actual time=0.006..0.006 rows=0 loops=1)
         Recheck Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone) 
		 AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
   ->  Bitmap Index Scan on idx_log_ins_202112_ctime  (cost=0.00..4.24 rows=9 width=0) 
		 (actual time=0.004..0.004 rows=0 loops=1)
         Index Cond: ((create_time > '2021-01-01 00:00:00'::timestamp without time zone) 
	     AND (create_time < '2021-01-02 00:00:00'::timestamp without time zone))
 Planning Time: 1.000 ms
 Execution Time: 0.778 ms
(33 rows)

结论:查询父表时扫描了所有分区,执行时间上升到了0.685
结论:建议constraint_exclusion参数设置成partition,不建议设置成on,
因为优化器通过检查约束来优化查询的方式本身就带来一定开销,
如果所有表都启用这个特性,将加重优化器的负担。

2.6.添加分区

方法一、降低对生产系统的影响:
(1)创建分区
create table log_ins_202201 (like log_ins including all);

(2)添加约束
alter table log_ins_202201 add constraint log_ins_202201_create_time_check
check (create_time >= '2022-01-01' and create_time < '2022-02-01');

(3)刷新触发器函数log_ins_insert_trigger()
create or replace function log_ins_insert_trigger()
	returns trigger
	language plpgsql
as $function$
begin
	if (NEW.create_time < '2021-01-01') then
		insert into log_ins_history values (NEW.*);
	elsif (NEW.create_time >= '2021-01-01' and NEW.create_time < '2021-02-01') then
		insert into log_ins_202101 values (NEW.*);
	elsif (NEW.create_time >= '2021-02-01' and NEW.create_time < '2021-03-01') then
		insert into log_ins_202102 values (NEW.*);
	elsif (NEW.create_time >= '2021-03-01' and NEW.create_time < '2021-04-01') then
		insert into log_ins_202103 values (NEW.*);
    elsif (NEW.create_time >= '2021-04-01' and NEW.create_time < '2021-05-01') then
		insert into log_ins_202104 values (NEW.*);
    elsif (NEW.create_time >= '2021-05-01' and NEW.create_time < '2021-06-01') then
		insert into log_ins_202105 values (NEW.*);
    elsif (NEW.create_time >= '2021-06-01' and NEW.create_time < '2021-07-01') then
		insert into log_ins_202106 values (NEW.*);
    elsif (NEW.create_time >= '2021-07-01' and NEW.create_time < '2021-08-01') then
		insert into log_ins_202107 values (NEW.*);
    elsif (NEW.create_time >= '2021-08-01' and NEW.create_time < '2021-09-01') then
		insert into log_ins_202108 values (NEW.*);
    elsif (NEW.create_time >= '2021-09-01' and NEW.create_time < '2021-10-01') then
		insert into log_ins_202109 values (NEW.*);
    elsif (NEW.create_time >= '2021-10-01' and NEW.create_time < '2021-11-01') then
		insert into log_ins_202110 values (NEW.*);
    elsif (NEW.create_time >= '2021-11-01' and NEW.create_time < '2021-12-01') then
		insert into log_ins_202111 values (NEW.*);
    elsif (NEW.create_time >= '2021-12-01' and NEW.create_time < '2022-01-01') then
		insert into log_ins_202112 values (NEW.*);
	elsif (NEW.create_time >= '2022-01-01' and NEW.create_time < '2022-02-01') then
		insert into log_ins_202201 values (NEW.*);
    else 
	    raise exception 'create_time out of range. Fix the log_ins_insert_trigger() function!';
	end if;
	return null;
end;
$function$;

(4)创建相关索引
create index idx_log_ins_202201_ctime on log_ins_202201 using btree (create_time);

(5)所有步骤完成后,将新区log_ins_202201继承到父表log_ins
alter table log_ins_202201 inherits log_ins;

方法二、正常步骤:
(1)创建子表
create table log_ins_202202
(check(create_time >= '2022-02-01' and create_time >= '2022-03-01')) inherits (log_ins);

(2)创建相关索引
create index idx_log_ins_202202_ctime on log_ins_202202 using btree (create_time);

(3)刷新触发器函数log_ins_insert_trigger()
create or replace function log_ins_insert_trigger()
	returns trigger
	language plpgsql
as $function$
begin
	if (NEW.create_time < '2021-01-01') then
		insert into log_ins_history values (NEW.*);
	elsif (NEW.create_time >= '2021-01-01' and NEW.create_time < '2021-02-01') then
		insert into log_ins_202101 values (NEW.*);
	elsif (NEW.create_time >= '2021-02-01' and NEW.create_time < '2021-03-01') then
		insert into log_ins_202102 values (NEW.*);
	elsif (NEW.create_time >= '2021-03-01' and NEW.create_time < '2021-04-01') then
		insert into log_ins_202103 values (NEW.*);
    elsif (NEW.create_time >= '2021-04-01' and NEW.create_time < '2021-05-01') then
		insert into log_ins_202104 values (NEW.*);
    elsif (NEW.create_time >= '2021-05-01' and NEW.create_time < '2021-06-01') then
		insert into log_ins_202105 values (NEW.*);
    elsif (NEW.create_time >= '2021-06-01' and NEW.create_time < '2021-07-01') then
		insert into log_ins_202106 values (NEW.*);
    elsif (NEW.create_time >= '2021-07-01' and NEW.create_time < '2021-08-01') then
		insert into log_ins_202107 values (NEW.*);
    elsif (NEW.create_time >= '2021-08-01' and NEW.create_time < '2021-09-01') then
		insert into log_ins_202108 values (NEW.*);
    elsif (NEW.create_time >= '2021-09-01' and NEW.create_time < '2021-10-01') then
		insert into log_ins_202109 values (NEW.*);
    elsif (NEW.create_time >= '2021-10-01' and NEW.create_time < '2021-11-01') then
		insert into log_ins_202110 values (NEW.*);
    elsif (NEW.create_time >= '2021-11-01' and NEW.create_time < '2021-12-01') then
		insert into log_ins_202111 values (NEW.*);
    elsif (NEW.create_time >= '2021-12-01' and NEW.create_time < '2022-01-01') then
		insert into log_ins_202112 values (NEW.*);
	elsif (NEW.create_time >= '2022-01-01' and NEW.create_time < '2022-02-01') then
		insert into log_ins_202201 values (NEW.*);
	elsif (NEW.create_time >= '2022-02-01' and NEW.create_time < '2022-03-01') then
		insert into log_ins_202202 values (NEW.*);
    else 
	    raise exception 'create_time out of range. Fix the log_ins_insert_trigger() function!';
	end if;
	return null;
end;
$function$;


postgres=# \d+ log_ins
                                                                     Table "public.log_ins"
   Column    |              Type              | Collation | Nullable |               Default               | Storage | Compression | Stats target | Description
-------------+--------------------------------+-----------+----------+-------------------------------------+---------+-------------+--------------+-------------
 id          | integer                        |           | not null | nextval('log_ins_id_seq'::regclass) | plain   |             |              |
 user_id     | integer                        |           |          |                                     | plain   |             |              |
 create_time | timestamp(0) without time zone |           |          |                                     | plain   |             |              |
Triggers:
    log_ins_insert_trigger BEFORE INSERT ON log_ins FOR EACH ROW EXECUTE FUNCTION log_ins_insert_trigger()
Child tables: log_ins_202101,
              log_ins_202102,
              log_ins_202103,
              log_ins_202104,
              log_ins_202105,
              log_ins_202106,
              log_ins_202107,
              log_ins_202108,
              log_ins_202109,
              log_ins_202110,
              log_ins_202111,
              log_ins_202112,
              log_ins_202201,
              log_ins_202202,
              log_ins_history
Access method: heap

2.7.删除分区

传统分区表删除分区通常有两种方法:
第一种方法,直接删除分区:
drop table log_ins_202202;
postgres=# \d+ log_ins;
                                                                     Table "public.log_ins"
   Column    |              Type              | Collation | Nullable |               Default               | Storage | Compression | Stats target | Description
-------------+--------------------------------+-----------+----------+-------------------------------------+---------+-------------+--------------+-------------
 id          | integer                        |           | not null | nextval('log_ins_id_seq'::regclass) | plain   |             |              |
 user_id     | integer                        |           |          |                                     | plain   |             |              |
 create_time | timestamp(0) without time zone |           |          |                                     | plain   |             |              |
Triggers:
    log_ins_insert_trigger BEFORE INSERT ON log_ins FOR EACH ROW EXECUTE FUNCTION log_ins_insert_trigger()
Child tables: log_ins_202101,
              log_ins_202102,
              log_ins_202103,
              log_ins_202104,
              log_ins_202105,
              log_ins_202106,
              log_ins_202107,
              log_ins_202108,
              log_ins_202109,
              log_ins_202110,
              log_ins_202111,
              log_ins_202112,
              log_ins_202201,
              log_ins_history
Access method: heap


第二种方法,先将分区的继承关系去掉,再删除表:
alter table log_ins_202201 no inherit log_ins;

postgres=# \d+ log_ins;
.........
.........
Triggers:
    log_ins_insert_trigger BEFORE INSERT ON log_ins FOR EACH ROW EXECUTE FUNCTION log_ins_insert_trigger()
Child tables: log_ins_202101,
              log_ins_202102,
              log_ins_202103,
              log_ins_202104,
              log_ins_202105,
              log_ins_202106,
              log_ins_202107,
              log_ins_202108,
              log_ins_202109,
              log_ins_202110,
              log_ins_202111,
              log_ins_202112,
              log_ins_history
Access method: heap

select * from log_ins_202201 limit 3;
postgres=# select * from log_ins_202201 limit 3;
 id | user_id | create_time
----+---------+-------------
(0 rows)

drop table log_ins_202201;

2.8.分区表的相关查询

如何查看分区表的定义、分区表分区信息呢?
第一种方法:

\d log_ins
postgres=# \d log_ins
                                          Table "public.log_ins"
   Column    |              Type              | Collation | Nullable |               Default
-------------+--------------------------------+-----------+----------+-------------------------------------
 id          | integer                        |           | not null | nextval('log_ins_id_seq'::regclass)
 user_id     | integer                        |           |          |
 create_time | timestamp(0) without time zone |           |          |
Triggers:
    log_ins_insert_trigger BEFORE INSERT ON log_ins FOR EACH ROW EXECUTE FUNCTION log_ins_insert_trigger()
Number of child tables: 13 (Use \d+ to list them.)

以上显示了有13个分区,并且创建了触发器。

如果想列出分区名称可通过\d+ log_ins
Triggers:
    log_ins_insert_trigger BEFORE INSERT ON log_ins FOR EACH ROW EXECUTE FUNCTION log_ins_insert_trigger()
Child tables: log_ins_202101,
              log_ins_202102,
              log_ins_202103,
              log_ins_202104,
              log_ins_202105,
              log_ins_202106,
              log_ins_202107,
              log_ins_202108,
              log_ins_202109,
              log_ins_202110,
              log_ins_202111,
              log_ins_202112,
              log_ins_history
Access method: heap

第二种方法,通过SQL命令:
select 
	nmsp_parent.nspname as parent_schema,
	parent.relname as parent,
	nmsp_child.nspname as child_schema,
	child.relname as child_schema
from 
	pg_inherits join pg_class parent
    on pg_inherits.inhparent = parent.oid join pg_class child
	on pg_inherits.inhrelid = child.oid join pg_namespace nmsp_parent
	on nmsp_parent.oid = parent.relnamespace join pg_namespace nmsp_child
	on nmsp_child.oid = child.relnamespace
where 
	parent.relname = 'log_ins';

postgres=# select
postgres-# nmsp_parent.nspname as parent_schema,
postgres-# parent.relname as parent,
postgres-# nmsp_child.nspname as child_schema,
postgres-# child.relname as child_schema
postgres-# from
postgres-# pg_inherits join pg_class parent
postgres-#     on pg_inherits.inhparent = parent.oid join pg_class child
postgres-# on pg_inherits.inhrelid = child.oid join pg_namespace nmsp_parent
postgres-# on nmsp_parent.oid = parent.relnamespace join pg_namespace nmsp_child
postgres-# on nmsp_child.oid = child.relnamespace
postgres-# where
postgres-# parent.relname = 'log_ins';
 parent_schema | parent  | child_schema |  child_schema
---------------+---------+--------------+-----------------
 public        | log_ins | public       | log_ins_history
 public        | log_ins | public       | log_ins_202101
 public        | log_ins | public       | log_ins_202102
 public        | log_ins | public       | log_ins_202103
 public        | log_ins | public       | log_ins_202104
 public        | log_ins | public       | log_ins_202105
 public        | log_ins | public       | log_ins_202106
 public        | log_ins | public       | log_ins_202107
 public        | log_ins | public       | log_ins_202108
 public        | log_ins | public       | log_ins_202109
 public        | log_ins | public       | log_ins_202110
 public        | log_ins | public       | log_ins_202111
 public        | log_ins | public       | log_ins_202112
(13 rows)

pg_inherits系统表记录了子表和父表之间的继承关系,查询列出指定分区表的分区。

查看一个库中有哪些分区表,并显示这些分区表的分区数量:
select
	nspname,
	relname,
	count(*) as partition_num
from
	pg_class c,
	pg_namespace n,
	pg_inherits i
where
	c.oid = i.inhparent
	and c.relnamespace = n.oid
	and c.relhassubclass
	and c.relkind in ('r','p')
group by 1,2 
order by partition_num desc;

postgres=# select
postgres-# nspname,
postgres-# relname,
postgres-# count(*) as partition_num
postgres-# from
postgres-# pg_class c,
postgres-# pg_namespace n,
postgres-# pg_inherits i
postgres-# where
postgres-# c.oid = i.inhparent
postgres-# and c.relnamespace = n.oid
postgres-# and c.relhassubclass
postgres-# and c.relkind in ('r','p')
postgres-# group by 1,2
postgres-# order by partition_num desc;
 nspname | relname | partition_num
---------+---------+---------------
 public  | log_ins |            13
 public  | tbl_log |             1
(2 rows)

log_ins分区表有13个分区
tbl_log分区表有1个分区

2.9.性能测试

基于分区表的分区键、非分区键查询和普通表性能有何差异?

将create_time字段作为传统分区表log_ins的分区键;
user_id字段作为分区表的非分区键。

创建一张普通表log,表结构和log_ins完全一致:
create table log 
(
id serial,
user_id int4,
create_time timestamp(0) without time zone
);

insert into log (user_id,create_time)
select round(100000000*random()),generate_series('2020-12-01'::date,'2021-12-01'::date,'1 minute');

查看两张表数量:
select count(*) from log_ins;
select count(*) from log;

postgres=# select count(*) from log_ins;
 count
--------
 525601
(1 row)

postgres=# select count(*) from log;
 count
--------
 525601
(1 row)

普通表log创建索引:
create index idx_log_userid on log using btree(user_id);
create index idx_log_create_time on log using btree(create_time);

在分区表log_ins父表和所有子表的user_id上创建索引:

父表:
create index log_ins_userid on log_ins_history using btree (user_id);

子表:
create index idx_his_userid on log_ins_history using btree (user_id);
create index idx_log_ins_202101_userid on log_ins_202101 using btree (user_id);
create index idx_log_ins_202102_userid on log_ins_202102 using btree (user_id);
create index idx_log_ins_202103_userid on log_ins_202103 using btree (user_id);
create index idx_log_ins_202104_userid on log_ins_202104 using btree (user_id);
create index idx_log_ins_202105_userid on log_ins_202105 using btree (user_id);
create index idx_log_ins_202106_userid on log_ins_202106 using btree (user_id);
create index idx_log_ins_202107_userid on log_ins_202107 using btree (user_id);
create index idx_log_ins_202108_userid on log_ins_202108 using btree (user_id);
create index idx_log_ins_202109_userid on log_ins_202109 using btree (user_id);
create index idx_log_ins_202110_userid on log_ins_202110 using btree (user_id);
create index idx_log_ins_202111_userid on log_ins_202111 using btree (user_id);
create index idx_log_ins_202112_userid on log_ins_202112 using btree (user_id);

user_id对于分区表log_ins来说,这是非分区键。

场景一、根据user_id检索,普通表和分区表性能差异如何?

select * from log where user_id = ?;
select * from log_ins where user_id = ?;

查找一个在表log和log_ins都存在的user_id;
select * from log,log_ins where log.user_id = log_ins.user_id limit 1;
postgres=# select * from log,log_ins where log.user_id = log_ins.user_id limit 1;
  id   | user_id  |     create_time     |   id   | user_id  |     create_time
-------+----------+---------------------+--------+----------+---------------------
 47269 | 16283799 | 2021-01-02 19:48:00 | 525676 | 16283799 | 2020-12-01 01:14:00
(1 row)

普通表log上的执行计划:
explain select * from log where user_id = '16283799';
postgres=# explain select * from log where user_id = '16283799';
                                QUERY PLAN
---------------------------------------------------------------------------
 Index Scan using idx_log_userid on log  (cost=0.42..8.44 rows=1 width=16)
   Index Cond: (user_id = 16283799)
(2 rows)

explain select * from log_ins where user_id = '16283799';
postgres=# explain select * from log_ins where user_id = '16283799';
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..100.77 rows=14 width=16)
   ->  Seq Scan on log_ins log_ins_1  (cost=0.00..0.00 rows=1 width=16)
         Filter: (user_id = 16283799)
   ->  Index Scan using idx_his_userid on log_ins_history log_ins_2  (cost=0.29..8.31 rows=1 width=16)
         Index Cond: (user_id = 16283799)
   ->  Index Scan using idx_log_ins_202101_userid on log_ins_202101 log_ins_3  (cost=0.29..8.31 rows=1 width=16)
         Index Cond: (user_id = 16283799)
   ->  Index Scan using idx_log_ins_202102_userid on log_ins_202102 log_ins_4  (cost=0.29..8.31 rows=1 width=16)
         Index Cond: (user_id = 16283799)
   ->  Index Scan using idx_log_ins_202103_userid on log_ins_202103 log_ins_5  (cost=0.29..8.31 rows=1 width=16)
         Index Cond: (user_id = 16283799)
   ->  Index Scan using idx_log_ins_202104_userid on log_ins_202104 log_ins_6  (cost=0.29..8.31 rows=1 width=16)
         Index Cond: (user_id = 16283799)
   ->  Index Scan using idx_log_ins_202105_userid on log_ins_202105 log_ins_7  (cost=0.29..8.31 rows=1 width=16)
         Index Cond: (user_id = 16283799)
   ->  Index Scan using idx_log_ins_202106_userid on log_ins_202106 log_ins_8  (cost=0.29..8.31 rows=1 width=16)
         Index Cond: (user_id = 16283799)
   ->  Index Scan using idx_log_ins_202107_userid on log_ins_202107 log_ins_9  (cost=0.29..8.31 rows=1 width=16)
         Index Cond: (user_id = 16283799)
   ->  Index Scan using idx_log_ins_202108_userid on log_ins_202108 log_ins_10  (cost=0.29..8.31 rows=1 width=16)
         Index Cond: (user_id = 16283799)
   ->  Index Scan using idx_log_ins_202109_userid on log_ins_202109 log_ins_11  (cost=0.29..8.31 rows=1 width=16)
         Index Cond: (user_id = 16283799)
   ->  Index Scan using idx_log_ins_202110_userid on log_ins_202110 log_ins_12  (cost=0.29..8.31 rows=1 width=16)
         Index Cond: (user_id = 16283799)
   ->  Index Scan using idx_log_ins_202111_userid on log_ins_202111 log_ins_13  (cost=0.29..8.31 rows=1 width=16)
         Index Cond: (user_id = 16283799)
   ->  Seq Scan on log_ins_202112 log_ins_14  (cost=0.00..1.01 rows=1 width=16)
         Filter: (user_id = 16283799)
(29 rows)


执行sql三次花费0.427 ms,0.645 ms,0.430 ms:
select * from log where user_id = '16283799';

执行sql三次花费1.290 ms,1.147 ms,1.021 ms:
select * from log_ins where user_id = '16283799';

场景二、根据create_time检索
create_time字段是分区表log_ins分区键。
select * from log where create_time > '2021-01-01' and create_time < '2021-01-02';
select * from log_ins where create_time > '2021-01-01' and create_time < '2021-01-02';

执行sql三次花费1.734 ms,1.630 ms,1.884 ms:
select * from log where create_time > '2021-01-01' and create_time < '2021-01-02';

执行sql三次花费2.567 ms,1.997 ms,2.152 ms:
select * from log_ins where create_time > '2021-01-01' and create_time < '2021-01-02';

执行sql三次花费1.690 ms,1.742 ms,1.766 ms:
select * from log_ins_202101 where create_time > '2021-01-01' and create_time < '2021-01-02';

PG14普通表、传统分区表性能对比
查询场景                         普通表:log执行时间     分区表:查询log_ins父表执行时间    分区表:查询log_ins子表执行时间 
根据非分区键user_id查询          0.427 ms               1.021 ms                          不支持
根据分区键create_time范围查询    1.630 ms               1.997 ms                          1.690 ms

结论:
1.根据非分区键user_id查询相比普通表性能差距较大。
2.根据分区键create_time范围查询相比普通表性能有小幅降低,
而查询分区表子表性能比普通表略有提升。

2.10.传统分区表注意事项

(1)当往父表上插入数据时,需要先在父表上创建路由函数和触发器。
(2)分区表上的索引、约束需要使用单独的命令创建。
(3)父表和子表允许单独定义主键,因此父表和子表可能存在重复的主键记录,
目前不支持在分区表上定义全局主键。
(4)update时不建议更新分区键数据,特别会使数据从一个分区移动到另一个分区的场景,
可通过更新触发器实现,但会带来管理上的成本。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值