通过存储过程实现DB按月分区
简介
分表的优点
分表其实主要解决的是数据量大的问题。随着单表数据量的增加,对于数据的查询和更新,即使在数据库底层有一定的优化,但是随着量变必定会引起质变,导致性能急剧下降。这时可以通过分表的方法,将单表数据按一定规则水平拆分到多个表中,减小单表的数据量,提升系统性能。
实现方式
本实践采用在数据库postgresql中,结合TRIGGER+PROCEDURE(使用PL/pgSQL编写)的方式在数据插入式实现自动按月建表及数据入表,通过此种方式可以自行扩展应用到按年、月、周、日等粒度进行分表。
示例
准备数据表
CREATE TABLE public.health_history_tbl (
id bigserial NOT NULL,
height int4 NULL,
weight int4 NULL,
usr_id varchar NULL,
record_time timestamp NULL,
CONSTRAINT health_history_tbl_pk PRIMARY KEY (id)
);
postgres=# \d health_history_tbl
Table "public.health_history_tbl"
Column | Type | Modifiers
-------------+-----------------------------+-----------------------------------------------------------------
id | bigint | not null default nextval('health_history_tbl_id_seq'::regclass)
height | integer |
weight | integer |
usr_id | character varying |
record_time | timestamp without time zone |
Indexes:
"health_history_tbl_pk" PRIMARY KEY, btree (id)
创建存储过程(PROCEDURE/FUNCTION)
实例
CREATE OR REPLACE FUNCTION health_history_tbl_partition_function()
RETURNS TRIGGER
AS $$
DECLARE
year_month text;
partition_name text;
chkPrtnExts_sql text;
exists_tbl text;
BEGIN
year_month = to_char( NEW.record_time, 'YYYY_MM' );
partition_name = 'health_history_' || year_month;
chkPrtnExts_sql = 'SELECT to_regclass(''' || partition_name || ''')';
EXECUTE chkPrtnExts_sql INTO exists_tbl;
IF exists_tbl IS NULL THEN
EXECUTE 'CREATE TABLE ' || partition_name || '() INHERITS (health_history_tbl);';
EXECUTE 'CREATE INDEX health_history_'
|| year_month
|| '_usr_id_idx'
|| ' on '
|| partition_name
|| ' USING btree (usr_id);';
END IF;
EXECUTE 'INSERT INTO '
|| partition_name
|| ' values ($1, $2, $3, $4, $5)' USING NEW.id, NEW.height, NEW.weight, NEW.usr_id, NEW.record_time;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
查看函数
\sf health_history_tbl_partition_function()
语法
CREATE [OR REPLACE] FUNCTION function_name (params)
RETURNS return_datatype
AS $$
DECLARE
<declaration>
[...]
BEGIN
<function_body>
[...]
RETURN {variable_name | variable_type}
END;
$$
LANGUAGE plpgsql;
-
function_name:创建函数的名称。
-
params:函数的参数。
-
RETURNS:指定要从函数返回的数据类型。可以是基础数据类型,复合或域类型,或者也可以引用表列的类型。
-
function_body:函数体可执行部分。
-
plpgsql:指定实现该函数的语言的名称。
创建触发器(TRIGGER)
实例
CREATE TRIGGER health_history_tbl_partition_trigger BEFORE INSERT
ON health_history_tbl
FOR EACH ROW
EXECUTE PROCEDURE health_history_tbl_partition_function()
查看触发器
postgres=# \d health_history_tbl
Table "public.health_history_tbl"
Column | Type | Modifiers
-------------+-----------------------------+-----------------------------------------------------------------
id | bigint | not null default nextval('health_history_tbl_id_seq'::regclass)
height | integer |
weight | integer |
usr_id | character varying |
record_time | timestamp without time zone |
Indexes:
"health_history_tbl_pk" PRIMARY KEY, btree (id)
Triggers:
health_history_tbl_partition_trigger BEFORE INSERT ON health_history_tbl FOR EACH ROW EXECUTE PROCEDURE health_history_tbl_partition_function()
语法
CREATE [ CONSTRAINT ] TRIGGER name { BEFORE | AFTER | INSTEAD OF } { event [ OR ... ] }
ON table_name
[ FROM referenced_table_name ]
[ NOT DEFERRABLE | [ DEFERRABLE ] [ INITIALLY IMMEDIATE | INITIALLY DEFERRED ] ]
[ REFERENCING { { OLD | NEW } TABLE [ AS ] transition_relation_name } [ ... ] ]
[ FOR [ EACH ] { ROW | STATEMENT } ]
[ WHEN ( condition ) ]
EXECUTE { FUNCTION | PROCEDURE } function_name ( arguments )
- name:触发器的名称。这必须与同一个表上的任何其他触发器相区别。名称不能是模式限定的,该触发器会继承它所在表的模式。
- BEFORE|AFTER|INSTEAD OF:决定该函数是要在事件之前、之后被调用还是会取代该事件。
- event:INSERT、UPDATE、DELETE 或者 TRUNCATE 之一,这指定了将要引发该触发器的事件。多个事件可以用 OR 指定,要求传递关系的时候除外。对于 UPDATE 事件,可以指定一个列的列表:
UPDATE OF column_name1 [, column_name2 ... ]
- table_name:要使用该触发器的表、视图或外部表的名称(可能是模式限定的)。
- referenced_table_name:约束引用的另一个表的名称(可能是模式限定的)。这个选项被用于外键约束并且不推荐用于一般的目的。这只能为约束触发器指定。
- NOT DEFERRABLE | [ DEFERRABLE ] [ INITIALLY IMMEDIATE|INITIALLY DEFERRED]:该触发器的默认时机。这只能为约束触发器指定。
- REFERENCING:这个关键词紧接在一个或者两个关系名的声明之前,这些关系提供对触发语句的传递关系的访问。
- OLD TABLE|NEW TABLE:这个子句指示接下来的关系名是用于前映像传递关系还是后映像传递关系。
- transition_relation_name:在该触发器中这个传递关系要使用的(未限定)名称。
- FOR EACH ROW|FOR EACH STATEMENT:这指定该触发器函数是应该为该触发器事件影响的每一行被引发一次,还是只为每个 SQL 语句被引发一次。如果都没有被指定,FOR EACH STATEMENT 会是默认值。约束触发器只能被指定为 FOR EACH ROW。
- condition:一个决定该触发器函数是否将被实际执行的布尔表达式。如果指定了 WHEN,只有condition 返回 true 时才会调用该函数。在 FOR EACH ROW 触发器中,WHEN 条件可以分别写 OLD.column_name 或者 NEW.column_name 来引用列的新旧行值。当然,INSERT 触发器不能引用 OLD 并且 DELETE 触发器不能引用 NEW。 INSTEAD OF 触发器不支持 WHEN 条件。当前,WHEN 表达式不能包含子查询。
- FUNCTION_NAME:一个用户提供的函数,它被声明为不用参数并且返回类型 trigger,当触发器引发时会执行该函数。在 CREATE TRIGGER 的语法中,关键词 FUNCTION 和 PROCEDURE 是等效的,但是任何情况下被引用的函数必须是一个函数而不是过程。这里,关键词 PROCEDURE 的使用是有历史原因的并且已经被废弃。
- arguments:一个可选的逗号分隔的参数列表,它在该触发器被执行时会被提供给该函数。参数是字符串常量。简单的名称和数字常量也可以被写在这里,但是它们将全部被转换成字符串。请检查该触发器函数的实现语言的描述来找出在函数内部如何访问这些参数,这可能与普通函数参数不同。
插入数据
postgres=# \d+ health_history_tbl
Table "public.health_history_tbl"
Column | Type | Modifiers | Storage | Stats target | Description
-------------+-----------------------------+-----------------------------------------------------------------+----------+--------------+-------------
id | bigint | not null default nextval('health_history_tbl_id_seq'::regclass) | plain | |
height | integer | | plain | |
weight | integer | | plain | |
usr_id | character varying | | extended | |
record_time | timestamp without time zone | | plain | |
Indexes:
"health_history_tbl_pk" PRIMARY KEY, btree (id)
Triggers:
health_history_tbl_partition_trigger BEFORE INSERT ON health_history_tbl FOR EACH ROW EXECUTE PROCEDURE health_history_tbl_partition_function()
postgres=# INSERT INTO public.health_history_tbl (height,weight,usr_id,record_time)
postgres-# VALUES (170, 60, '101', '2023-01-01 00:00:00'),
postgres-# (170, 61, '101', '2023-02-01 00:00:00'),
postgres-# (170, 60, '101', '2023-03-01 00:00:00');
INSERT 0 0
postgres=# \d+ health_history_tbl
Table "public.health_history_tbl"
Column | Type | Modifiers | Storage | Stats target | Description
-------------+-----------------------------+-----------------------------------------------------------------+----------+--------------+-------------
id | bigint | not null default nextval('health_history_tbl_id_seq'::regclass) | plain | |
height | integer | | plain | |
weight | integer | | plain | |
usr_id | character varying | | extended | |
record_time | timestamp without time zone | | plain | |
Indexes:
"health_history_tbl_pk" PRIMARY KEY, btree (id)
Triggers:
health_history_tbl_partition_trigger BEFORE INSERT ON health_history_tbl FOR EACH ROW EXECUTE PROCEDURE health_history_tbl_partition_function()
Child tables: health_history_2023_01,
health_history_2023_02,
health_history_2023_03
postgres=# select * from health_history_2023_01;
id | height | weight | usr_id | record_time
----+--------+--------+--------+---------------------
20 | 170 | 60 | 101 | 2023-01-01 00:00:00
(1 row)
postgres=# select * from health_history_2023_02;
id | height | weight | usr_id | record_time
----+--------+--------+--------+---------------------
21 | 170 | 61 | 101 | 2023-02-01 00:00:00
(1 row)
postgres=# select * from health_history_2023_03;
id | height | weight | usr_id | record_time
----+--------+--------+--------+---------------------
22 | 170 | 60 | 101 | 2023-03-01 00:00:00
(1 row)
postgres=# select * from health_history_tbl;
id | height | weight | usr_id | record_time
----+--------+--------+--------+---------------------
20 | 170 | 60 | 101 | 2023-01-01 00:00:00
21 | 170 | 61 | 101 | 2023-02-01 00:00:00
22 | 170 | 60 | 101 | 2023-03-01 00:00:00
(3 rows)
参考文档