PostgreSQL12的分区增强
翻译自:Partitioning enhancements in PostgreSQL 12
在PostgreSQL12版本中,声明性分区得到了一些关注,具有一些非常方便的特性。在分区选择上(特别是从大集合的中的几个区进行选择时),引用完整性和自省方面已经有一些显著的改进。
在这篇文章中,我们会首先介绍引用完整性方面的改进。这会为稍后的其他一些解释提供一些样例数据去使用 。这个功能太酷了,所以无论如何要先讲这个。
这个例子来自PostgreSQL 12-计算列的产生 这篇文章,其中提供了一种通过计算你想知道的关于日期的所有信息来构建媒体日历,下面就是代码的简短版本:
CREATE TABLE public.media_calendar (
gregorian date NOT NULL PRIMARY KEY,
month_int integer GENERATED ALWAYS AS (date_part('month'::text, gregorian)) STORED,
day_int integer GENERATED ALWAYS AS (date_part('day'::text, gregorian)) STORED,
year_int integer GENERATED ALWAYS AS (date_part('year'::text, gregorian)) STORED,
quarter_int integer GENERATED ALWAYS AS (date_part('quarter'::text, gregorian)) STORED,
dow_int integer GENERATED ALWAYS AS (date_part('dow'::text, gregorian)) STORED,
doy_int integer GENERATED ALWAYS AS (date_part('doy'::text, gregorian)) STORED
...snip...
);
INSERT INTO public.media_calendar (gregorian)
SELECT '1900-01-01'::date + x
-- Starting with 1900-01-01, fill the table with 200 years of data.
FROM generate_series(0,365*200) x;
现在,我们要给例子增加时间维度,将日期和时间关联在一起,用于精确计算到秒的200年日历。
CREATE TABLE time_dim (
time_of_day time without time zone not null primary key,
hour_of_day integer GENERATED ALWAYS AS (date_part('hour', time_of_day)) stored,
minute_of_day integer GENERATED ALWAYS AS (date_part('minute', time_of_day)) stored,
second_of_day integer GENERATED ALWAYS AS (date_part('second', time_of_day)) stored,
morning boolean GENERATED ALWAYS AS (date_part('hour',time_of_day)<12) stored,
afternoon boolean GENERATED ALWAYS AS (date_part('hour',time_of_day)>=12 AND date_part('hour',time_of_day)<18) stored,
evening boolean GENERATED ALWAYS AS (date_part('hour',time_of_day) >= 18) stored
);
INSERT INTO time_dim (time_of_day )
SELECT '00:00:00'::time + (x || ' seconds')::interval
FROM generate_series (0,24*60*60-1) x; -- start with midnight, add seconds in a day;
现在我们本应该有86400行时间维度的数据,73001行在我们的媒体日历中。当然了,当我们决定把这些联系在一起时,一个笛卡尔连接产生超过60亿数据(6,307,286,400)。好消息是这张表不会再增长,除非凯撒大帝决定给一年增加更多的天数,或者欧盟决定给一天增加更多的秒数。所以,它是一个很好的候选分区,有一个很容易计算的键。
CREATE TABLE hours_to_days (
day date not null references media_calendar(gregorian),
time_of_day time without time zone not null references time_dim(time_of_day),
full_date timestamp without time zone GENERATED ALWAYS AS (day + time_of_day) stored,
PRIMARY KEY (day,time_of_day)
) PARTITION BY RANGE (day);
CREATE INDEX idx_natural_time ON hours_to_days(full_date);
你只看到了一个新的特征,这是在PostgreSQL11(不是打字错误,我说的是11)里被创建的。你可能有一个父->子的外键关联着一个分区表。
好的,我们被允许这样做,现在我们来继续PostgreSQL12的分区内容。
CREATE TABLE hours_to_days_ancient PARTITION OF hours_to_days
FOR VALUES FROM (minvalue) TO ('1990-01-01');
CREATE TABLE hours_to_days_sep PARTITION OF hours_to_days
FOR VALUES FROM ('2040-01-01') TO (maxvalue);
CREATE TABLE hours_to_days_1990 PARTITION OF hours_to_days
FOR VALUES FROM ('1990-01-01') TO ('2000-01-01');
CREATE TABLE hours_to_days_2000 PARTITION OF hours_to_days
FOR VALUES FROM ('2000-01-01') TO ('2010-01-01');
CREATE TABLE hours_to_days_2010 PARTITION OF hours_to_days
FOR VALUES FROM ('2010-01-01') TO ('2020-01-01');
CREATE TABLE hours_to_days_2020 PARTITION OF hours_to_days
FOR VALUES FROM ('2020-01-01') TO ('2030-01-01');
CREATE TABLE hours_to_days_2030 PARTITION OF hours_to_days
FOR VALUES FROM ('2030-01-01') TO ('2040-01-01');
注意分区没有必要去区分范围,日期总量或其他的一些规则。唯一的要求是所有的日期被包括在一个(仅有的一个)分区里。
INSERT INTO hours_to_days (day, time_of_day)
SELECT gregorian, time_of_day
FROM media_calendar
CROSS JOIN time_dim;
现在,喝点咖啡吧,因为有63亿行数据等着被处理。
现在,我们终于等到了PostgreSQL12增强版的第一版。在上一个PostgreSQL版本中,你可能还在用外键关联着分区表和分区字表。
CREATE TABLE sale (
id bigserial primary key,
transaction_date date not null default now()::date,
transaction_time time without time zone not null default date_trunc('seconds', now()::time),
FOREIGN KEY (transaction_date, transaction_time) REFERENCES hours_to_days(day,time_of_day)
);
哇哦!真不错,“哇”是因为人们对这代码感到兴奋。这意味着你可以有一个有维度的分区模型!你能够进行物理维度上的分区或者对其他类型的数据,没有关联完整性的数据进行分区。
现在,让我们看看我们刚才创建的分区表吧,你问 :怎么看?嗯,当然了,我们用PostgreSQL 12新的指导工具。这些工具是:
pg_partition_tree, pg_partition_ancestors, pg_partition_root
让我们来探索刚刚创建的分区表吧。
当我们看父分区表时,结果是平淡无奇的:
\d hours_to_days
Partitioned table "public.hours_to_days"
Column | Type | Collation | Nullable | Default
-------------+-----------------------------+-----------+----------+------------------------------------------------
day | date | | not null |
time_of_day | time without time zone | | not null |
full_date | timestamp without time zone | | | generated always as (day + time_of_day) stored
Partition key: RANGE (day)
Indexes:
"hours_to_days_pkey" PRIMARY KEY, btree (day, time_of_day)
"idx_natural_time" btree (full_date)
Foreign-key constraints:
"hours_to_days_day_fkey" FOREIGN KEY (day) REFERENCES media_calendar(gregorian)
"hours_to_days_time_of_day_fkey" FOREIGN KEY (time_of_day) REFERENCES time_dim(time_of_day)
Number of partitions: 7 (Use \d+ to list them.)
我们看到一些分区的信息,但是却没有我们想知道的信息。我们使用更多的提示信息:
\dS+ hours_to_days --<-- note the Splus
Partitioned table "public.hours_to_days"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-------------+-----------------------------+-----------+----------+------------------------------------------------+---------+--------------+-------------
day | date | | not null | | plain | |
time_of_day | time without time zone | | not null | | plain | |
full_date | timestamp without time zone | | | generated always as (day + time_of_day) stored | plain | |
Partition key: RANGE (day)
Indexes:
"hours_to_days_pkey" PRIMARY KEY, btree (day, time_of_day)
"idx_natural_time" btree (full_date)
Foreign-key constraints:
"hours_to_days_day_fkey" FOREIGN KEY (day) REFERENCES media_calendar(gregorian)
"hours_to_days_time_of_day_fkey" FOREIGN KEY (time_of_day) REFERENCES time_dim(time_of_day)
Partitions: hours_to_days_1990 FOR VALUES FROM ('1990-01-01') TO ('2000-01-01'),
hours_to_days_2000 FOR VALUES FROM ('2000-01-01') TO ('2010-01-01'),
hours_to_days_2010 FOR VALUES FROM ('2010-01-01') TO ('2020-01-01'),
hours_to_days_2020 FOR VALUES FROM ('2020-01-01') TO ('2030-01-01'),
hours_to_days_2030 FOR VALUES FROM ('2030-01-01') TO ('2040-01-01'),
hours_to_days_ancient FOR VALUES FROM (MINVALUE) TO ('1990-01-01'),
hours_to_days_sep FOR VALUES FROM ('2040-01-01') TO (MAXVALUE)
好了,现在我们看到了一系列的分区表。在这篇有趣又有点简短的文章中,我并没有给出子分区的例子。但是,请相信我说的如果子分区存在,这个方法不会列出它们。
SELECT * FROM pg_partition_tree('hours_to_days');
relid | parentrelid | isleaf | level
-----------------------+---------------+--------+-------
hours_to_days | | f | 0
hours_to_days_ancient | hours_to_days | t | 1
hours_to_days_sep | hours_to_days | t | 1
hours_to_days_1990 | hours_to_days | t | 1
hours_to_days_2000 | hours_to_days | t | 1
hours_to_days_2010 | hours_to_days | t | 1
hours_to_days_2020 | hours_to_days | t | 1
hours_to_days_2030 | hours_to_days | t | 1
在这我们能看到任何子分区和分区等级。我们用level
来标识节点的等级,0
是根节点,用parentrelid
展示节点的所属关系。用这些基础的信息,我们能很容易地创建一个关系树。
我们还有另外一些更简单的方法去得到根节点。
SELECT * FROM pg_partition_root('hours_to_days_sep');
pg_partition_root
-------------------
hours_to_days
(1 row)
还有其他一些相似的方法。这展示了继承树从分支逆向地朝向根节点。
SELECT * FROM pg_partition_ancestors('hours_to_days_sep');
relid
-------------------
hours_to_days_sep
hours_to_days
(2 rows)
如果我们使用psql客户端,还有一个元命令来看分区和索引。
\dP
List of partitioned relations
Schema | Name | Owner | Type | Table
--------+---------------------+---------+-------------------+----------------
public | hours_to_days | kroybal | partitioned table |
public | media_calendar | kroybal | partitioned table |
public | hours_to_days_pkey | kroybal | partitioned index | hours_to_days
public | idx_natural_time | kroybal | partitioned index | hours_to_days
public | media_calendar_pkey | kroybal | partitioned index | media_calendar
(5 rows)
跟随许多其他命令的脚步,
ALTER TABLE ... ATTACH PARTITION
像这样的命令,已经淘汰了排他锁。这意味这你可以创建新的分区表,把它们加进分区并设置运行时间,而且不用一个维护窗口。不幸的是,相反面就不行了。修改表…解除分区仍然依赖排他锁,所以,简而言之,动态解除分区仍然需要锁。
还有几项改进,这些改进不需要进行扩展:
-
COPY 命令已经减少了一点开销,允许更快的加载。
-
父级表的表空间规范现在由子表继承。
-
Pg_catalog.pg_indexs 现在能展示子分区的索引信息。
这就完成了新的增强功能。请继续关注PostgreSQL 12中将出现的其他功能的文章。