近期同事在讨论如何在PostgreSQL中一张大表,添加一个带有not null属性的,且具有缺省值的字段,并且要求在秒级完成。
因为此,有了以下的实验记录:
首先我们是在PostgreSQL 10下做的实验:
postgres=# select version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 10.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit
(1 row)
开启计时:
postgres=# \timing
Timing is on.
建表,并查询表信息,插入数据:
postgres=# create table add_c_d_in_ms(id int, a1 text, a2 text, a3 text, a4 text, a5 text, a6 text, a7 text, a8 text not null default 'wangshuo');
CREATE TABLE
Time: 72.243 ms
postgres=# select oid,relname,relnatts from pg_class where relname='add_c_d_in_ms';
oid | relname | relnatts
-------+---------------+----------
16384 | add_c_d_in_ms | 9
(1 row)
Time: 1.504 ms
postgres=# insert into add_c_d_in_ms select generate_series(1,10000000),'wangshuo','highgo'||random(),'huang','wang',generate_series(1,10000000)::text,random()*100000000000,'shuo','ms';
INSERT 0 10000000
Time: 81261.870 ms (01:21.262)
我们看一下列信息:
postgres=# select * from pg_attribute where attrelid = 16384;
attrelid | attname | atttypid | attstattarget | attlen | attnum | attndims | attcacheoff | atttypmod | attbyval | attstorage | attalign | attnotnull | atth
asdef | attidentity | attisdropped | attislocal | attinhcount | attcollation | attacl | attoptions | attfdwoptions
----------+----------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+-----
------+-------------+--------------+------------+-------------+--------------+--------+------------+---------------
16384 | tableoid | 26 | 0 | 4 | -7 | 0 | -1 | -1 | t | p | i | t | f
| | f | t | 0 | 0 | | |
16384 | cmax | 29 | 0 | 4 | -6 | 0 | -1 | -1 | t | p | i | t | f
| | f | t | 0 | 0 | | |
16384 | xmax | 28 | 0 | 4 | -5 | 0 | -1 | -1 | t | p | i | t | f
| | f | t | 0 | 0 | | |
16384 | cmin | 29 | 0 | 4 | -4 | 0 | -1 | -1 | t | p | i | t | f
| | f | t | 0 | 0 | | |
16384 | xmin | 28 | 0 | 4 | -3 | 0 | -1 | -1 | t | p | i | t | f
| | f | t | 0 | 0 | | |
16384 | ctid | 27 | 0 | 6 | -1 | 0 | -1 | -1 | f | p | s | t | f
| | f | t | 0 | 0 | | |
16384 | id | 23 | -1 | 4 | 1 | 0 | -1 | -1 | t | p | i | f | f
| | f | t | 0 | 0 | | |
16384 | a1 | 25 | -1 | -1 | 2 | 0 | -1 | -1 | f | x | i | f | f
| | f | t | 0 | 100 | | |
16384 | a2 | 25 | -1 | -1 | 3 | 0 | -1 | -1 | f | x | i | f | f
| | f | t | 0 | 100 | | |
16384 | a3 | 25 | -1 | -1 | 4 | 0 | -1 | -1 | f | x | i | f | f
| | f | t | 0 | 100 | | |
16384 | a4 | 25 | -1 | -1 | 5 | 0 | -1 | -1 | f | x | i | f | f
| | f | t | 0 | 100 | | |
16384 | a5 | 25 | -1 | -1 | 6 | 0 | -1 | -1 | f | x | i | f | f
| | f | t | 0 | 100 | | |
16384 | a6 | 25 | -1 | -1 | 7 | 0 | -1 | -1 | f | x | i | f | f
| | f | t | 0 | 100 | | |
16384 | a7 | 25 | -1 | -1 | 8 | 0 | -1 | -1 | f | x | i | f | f
| | f | t | 0 | 100 | | |
16384 | a8 | 25 | -1 | -1 | 9 | 0 | -1 | -1 | f | x | i | t | t
| | f | t | 0 | 100 | | |
(15 rows)
Time: 1.574 ms
我们再来查询一条数据看一下:
postgres=# select * from add_c_d_in_ms where id=1;
id | a1 | a2 | a3 | a4 | a5 | a6 | a7 | a8
----+----------+-------------------------+-------+------+----+------------------+------+----
1 | wangshuo | highgo0.460023149382323 | huang | wang | 1 | 25913513777.7776 | shuo | ms
(1 row)
Time: 806.036 ms
然后,我们看一下正常PostgreSQL加一个字段所花费的时间:
postgres=# alter table add_c_d_in_ms add a9 text not null default 'test';
ALTER TABLE
Time: 36803.610 ms (00:36.804)
明显看到时间花费相当长,其实PostgreSQL在这里将数据完全重写了,主要原因就是就是添加的字段带有not null属性。
我们来看下一新家字段的列属性:
postgres=# select * from pg_attribute where attrelid = 16384 and attname='a9';
attrelid | attname | atttypid | attstattarget | attlen | attnum | attndims | attcacheoff | atttypmod | attbyval | attstorage | attalign | attnotnull | attha
sdef | attidentity | attisdropped | attislocal | attinhcount | attcollation | attacl | attoptions | attfdwoptions
----------+---------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+------
-----+-------------+--------------+------------+-------------+--------------+--------+------------+---------------
16384 | a9 | 25 | -1 | -1 | 10 | 0 | -1 | -1 | f | x | i | t | t
| | f | t | 0 | 100 | | |
(1 row)
Time: 0.670 ms
这时候我们插入数据看一下:
postgres=# insert into add_c_d_in_ms (id) values (1);
INSERT 0 1
Time: 14.658 ms
postgres=# select * from add_c_d_in_ms where id=1;
id | a1 | a2 | a3 | a4 | a5 | a6 | a7 | a8 | a9
----+----------+-------------------------+-------+------+----+------------------+------+----------+------
1 | wangshuo | highgo0.460023149382323 | huang | wang | 1 | 25913513777.7776 | shuo | ms | test
1 | | | | | | | | wangshuo | test
(2 rows)
Time: 850.982 ms
下面见证奇迹的时刻来了,如何快速添加这么一个字段:
首先,在这里我们涉及三张系统表,pg_class(表属性)、pg_attribute(列属性)、pg_attrdef(缺省值信息),接下来依次看一下三张表的信息:
#pg_class:oid表系统序列号,relname表名,relnatts列个数(主要修改属性)
postgres=# select oid,relname,relnatts from pg_class where relname='add_c_d_in_ms';
oid | relname | relnatts
-------+---------------+----------
16384 | add_c_d_in_ms | 10
(1 row)
Time: 0.418 ms
#pg_attribute 这里还没有修改,和前面一致,在此就查看了
#pg_attrdef 缺省值信息,这里只有原来的a9带有缺省值
postgres=# select * from pg_attrdef ;
adrelid | adnum | adbin
| adsrc
---------+-------+-------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------+------------------
16384 | 9 | {CONST :consttype 25 :consttypmod -1 :constcollid 100 :constlen -1 :constbyval false :constisnull false :location 123 :constvalue 12 [ 48
0 0 0 119 97 110 103 115 104 117 111 ]} | 'wangshuo'::text
16384 | 10 | {CONST :consttype 25 :consttypmod -1 :constcollid 100 :constlen -1 :constbyval false :constisnull false :location 55 :constvalue 8 [ 32 0
0 0 116 101 115 116 ]} | 'test'::text
(2 rows)
Time: 0.363 ms
下面进行改造:
#添加字段属性
postgres=# insert into pg_attribute select attrelid,'new_n_d',atttypid,attstattarget,attlen,attnum+1,attndims,attcacheoff,atttypmod,attbyval,attstorage,attalign,'t',atthasdef,attidentity,attisdropped,attislocal,attinhcount,attcollation,attacl,attoptions,attfdwoptions from pg_attribute where attrelid=16384 and attname='a9';
INSERT 0 1
Time: 25.008 ms
#修改pg_class字段个数
postgres=# update pg_class set relnatts=relnatts+1 where relname='add_c_d_in_ms';
UPDATE 1
Time: 43.979 ms
#添加缺省值
postgres=# insert into pg_attrdef select adrelid,adnum+1,adbin,adsrc from pg_attrdef where adrelid=16384 and adnum=10;
INSERT 16399 1
Time: 15.774 ms
再次查看三个系统表系统信息:
#pg_class
postgres=# select oid,relname,relnatts from pg_class where relname='add_c_d_in_ms';
oid | relname | relnatts
-------+---------------+----------
16384 | add_c_d_in_ms | 11
(1 row)
#pg_attribute
postgres=# select * from pg_attribute where attrelid = 16384 and attname='new_n_d';
attrelid | attname | atttypid | attstattarget | attlen | attnum | attndims | attcacheoff | atttypmod | attbyval | attstorage | attalign | attnotnull | attha
sdef | attidentity | attisdropped | attislocal | attinhcount | attcollation | attacl | attoptions | attfdwoptions
----------+---------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+------
-----+-------------+--------------+------------+-------------+--------------+--------+------------+---------------
16384 | new_n_d | 25 | -1 | -1 | 11 | 0 | -1 | -1 | f | x | i | t | t
| | f | t | 0 | 100 | | |
(1 row)
Time: 0.516 ms
#pg_attrdef
postgres=# select * from pg_attrdef;
adrelid | adnum | adbin
| adsrc
---------+-------+-------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------+------------------
16384 | 9 | {CONST :consttype 25 :consttypmod -1 :constcollid 100 :constlen -1 :constbyval false :constisnull false :location 123 :constvalue 12 [ 48
0 0 0 119 97 110 103 115 104 117 111 ]} | 'wangshuo'::text
16384 | 10 | {CONST :consttype 25 :consttypmod -1 :constcollid 100 :constlen -1 :constbyval false :constisnull false :location 55 :constvalue 8 [ 32 0
0 0 116 101 115 116 ]} | 'test'::text
16384 | 11 | {CONST :consttype 25 :consttypmod -1 :constcollid 100 :constlen -1 :constbyval false :constisnull false :location 55 :constvalue 8 [ 32 0
0 0 116 101 115 116 ]} | 'test'::text
(3 rows)
插入值实验:
postgres=# insert into add_c_d_in_ms (id) values (1);
INSERT 0 1
Time: 8.407 ms
postgres=# select * from add_c_d_in_ms where id=1;
id | a1 | a2 | a3 | a4 | a5 | a6 | a7 | a8 | a9 | new_n_d
----+----------+-------------------------+-------+------+----+------------------+------+----------+------+---------
1 | wangshuo | highgo0.460023149382323 | huang | wang | 1 | 25913513777.7776 | shuo | ms | test |
1 | | | | | | | | wangshuo | test |
1 | | | | | | | | wangshuo | test | test
(3 rows)
Time: 322.143 ms
问题:
#正常添加字段可以
postgres=# alter table add_c_d_in_ms add a10 text;
ALTER TABLE
#如果添加not null属性的字段,则会检测其他字段属性,将会报错
postgres=# alter table add_c_d_in_ms add a11 text not null default 'aaa';
2018-01-11 00:21:55.587 EST [4217] ERROR: column "new_n_d" contains null values
2018-01-11 00:21:55.587 EST [4217] STATEMENT: alter table add_c_d_in_ms add a11 text not null default 'aaa';
ERROR: column "new_n_d" contains null values
另一种解决方法,添加字段,然后添加check约束:
postgres=# alter table add_c_d_in_ms_new add a9 text default 'abc';
ALTER TABLE
Time: 549.182 ms
postgres=# alter table add_c_d_in_ms_new add constraint ck_tbl_check_a check (a9 is not null);
ALTER TABLE
Time: 46.200 ms
postgres=# insert into add_c_d_in_ms_new (a1) values (1);
INSERT 0 1
Time: 30.716 ms
postgres=# select * from add_c_d_in_ms_new where a1=1::text;
id | a1 | a2 | a3 | a4 | a5 | a6 | a7 | a8 | a9
----+----+----+----+----+----+----+----+----------+-----
| 1 | | | | | | | wangshuo | abc
(1 row)
Time: 0.500 ms
postgres=# insert into add_c_d_in_ms_new (a1,a9) values (1,null);
2018-01-11 01:07:56.456 EST [4217] ERROR: new row for relation "add_c_d_in_ms_new" violates check constraint "ck_tbl_check_a"
2018-01-11 01:07:56.456 EST [4217] DETAIL: Failing row contains (null, 1, null, null, null, null, null, null, wangshuo, null).
2018-01-11 01:07:56.456 EST [4217] STATEMENT: insert into add_c_d_in_ms_new (a1,a9) values (1,null);
ERROR: new row for relation "add_c_d_in_ms_new" violates check constraint "ck_tbl_check_a"
DETAIL: Failing row contains (null, 1, null, null, null, null, null, null, wangshuo, null).
Time: 0.383 ms
我的博客即将搬运同步至腾讯云+社区,邀请大家一同入驻:https://cloud.tencent.com/developer/support-plan?invite_code=25z8r3po8aqsw