PostgreSQL Daily Maintenance - reindex

最新推荐文章于 2024-05-10 10:13:33 发布

DB_su

最新推荐文章于 2024-05-10 10:13:33 发布

阅读量792

点赞数

分类专栏： postgresql 文章标签： postgresql

postgresql 专栏收录该内容

23 篇文章 0 订阅

订阅专栏

Reindex日志维护
随着DML的发生, 索引会出现碎片. 持续膨胀. 降低效率. 虽然PostgreSQL有vacuum机制, 但是索引不像heap表, vacuum后的dead tuple占用的空间可以被马上回收复用, 以b-tree索引为例, 一个b-tree page只有当所有的item全部变成不可用后这个page才能被复用. 所以索引膨胀的概率比表大很多.
例一 :

digoal=# truncate tbl;
TRUNCATE TABLE
Time: 1070.523 ms
digoal=# insert into tbl select generate_series(1,1000000),'test';
INSERT 0 1000000
Time: 4534.320 ms
digoal=# analyze tbl;
ANALYZE
Time: 58.228 ms
digoal=# select pg_relation_size('tbl');
 pg_relation_size 
------------------
         44285952
(1 row)
Time: 0.371 ms
digoal=# select pg_relation_size('idx_tbl_1');
 pg_relation_size 
------------------
         22487040
(1 row)
Time: 0.304 ms
digoal=# update tbl set id=id+1000000;
UPDATE 1000000
Time: 5786.052 ms
digoal=# select pg_relation_size('tbl');
 pg_relation_size 
------------------
         88563712
(1 row)
Time: 0.281 ms
digoal=# select pg_relation_size('idx_tbl_1');
 pg_relation_size 
------------------
         44941312
(1 row)
Time: 0.308 ms

第一次全量更新后, 表和索引都膨胀了1倍.

digoal=# vacuum tbl;
VACUUM
Time: 17.052 ms

vacuum后表的垃圾会回收, 但是索引的不会被回收.
所以第二次全量更新, 表不会再膨胀了, 但是索引继续膨胀.

digoal=# update tbl set id=id-1000000;
UPDATE 1000000
Time: 6063.763 ms
digoal=# select pg_relation_size('tbl');
 pg_relation_size 
------------------
         88563712
(1 row)
Time: 0.814 ms
digoal=# select pg_relation_size('idx_tbl_1');
 pg_relation_size 
------------------
         65585152
(1 row)
Time: 0.353 ms

第三次vacuum与第二次类似.

digoal=# vacuum tbl;
VACUUM
Time: 2111.606 ms

全量更新后, 表未膨胀, 索引膨胀.

digoal=# update tbl set id=id+1000000;
UPDATE 1000000
Time: 7829.064 ms
digoal=# select pg_relation_size('tbl');
 pg_relation_size 
------------------
         88563712
(1 row)
Time: 0.310 ms
digoal=# select pg_relation_size('idx_tbl_1');
 pg_relation_size 
------------------
         88907776
(1 row)
Time: 0.397 ms

例二 :
创建表, 索引, 插入500万测试数据 :

digoal=# create table tbl(id int primary key, info int);
CREATE TABLE
digoal=# insert into tbl select generate_series(1,5000000),1;
INSERT 0 5000000
digoal=# create index idx_tbl_1 on tbl(info);
CREATE INDEX
digoal=# vacuum analyze tbl;
VACUUM

记录当前的表, 索引的大小.

digoal=# select pg_relation_size('tbl');
 pg_relation_size 
------------------
        181239808
(1 row)
digoal=# select pg_relation_size('tbl_pkey');
 pg_relation_size 
------------------
        112328704
(1 row)
digoal=# select pg_relation_size('idx_tbl_1');
 pg_relation_size 
------------------
        112336896
(1 row)

使用pgbench对这个表做更新操作 :

pg93@db-172-16-3-33-> vi update.sql
\setrandom id 1 5000000
update tbl set info=trunc(5000000*random()) where id=:id;

pg93@db-172-16-3-33-> pgbench -M prepared -r -n -c 8 -j 2 -f ./update.sql -T 60
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 2
duration: 60 s
number of transactions actually processed: 361126
tps = 5994.525604 (including connections establishing)
tps = 5995.828525 (excluding connections establishing)
statement latencies in milliseconds:
        0.001356        \setrandom id 1 5000000
        1.331439        update tbl set info=trunc(5000000*random()) where id=:id;

第一批更新后对表做vacuum, 回收dead tuple占用的空间 :

digoal=# vacuum verbose analyze tbl;
INFO:  vacuuming "public.tbl"
INFO:  scanned index "tbl_pkey" to remove 361126 row versions
DETAIL:  CPU 0.08s/1.54u sec elapsed 1.74 sec.
INFO:  scanned index "idx_tbl_1" to remove 361126 row versions
DETAIL:  CPU 0.20s/1.88u sec elapsed 4.00 sec.
INFO:  "tbl": removed 361126 row versions in 23350 pages
DETAIL:  CPU 0.14s/0.34u sec elapsed 1.13 sec.
INFO:  index "tbl_pkey" now contains 5000000 row versions in 13761 pages
DETAIL:  361082 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "idx_tbl_1" now contains 5000000 row versions in 14972 pages
DETAIL:  361126 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "tbl": found 300838 removable, 5000000 nonremovable row versions in 23459 out of 23459 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.62s/4.34u sec elapsed 7.77 sec.
INFO:  analyzing "public.tbl"
INFO:  "tbl": scanned 23459 of 23459 pages, containing 5000000 live rows and 0 dead rows; 30000 rows in sample, 5000000 estimated total rows
VACUUM

查看当前表的大小和索引的大小, 比创建表时大了一些. tbl_pkey没变大多少是因为HOT机制造成的.
HOT的详细介绍可以参考src/backend/access/heap/README.HOT

digoal=# select pg_relation_size('tbl');
 pg_relation_size 
------------------
        192176128
(1 row)
digoal=# select pg_relation_size('tbl_pkey');
 pg_relation_size 
------------------
        112730112
(1 row)
digoal=# select pg_relation_size('idx_tbl_1');
 pg_relation_size 
------------------
        122650624
(1 row)

再次使用pgbench对测试表其进行更新 :

pg93@db-172-16-3-33-> pgbench -M prepared -r -n -c 8 -j 2 -f ./update.sql -T 60
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 2
duration: 60 s
number of transactions actually processed: 417661
tps = 6960.793225 (including connections establishing)
tps = 6962.199893 (excluding connections establishing)
statement latencies in milliseconds:
        0.001263        \setrandom id 1 5000000
        1.146337        update tbl set info=trunc(5000000*random()) where id=:id;

再次执行vacuum,

digoal=# vacuum verbose analyze tbl;
INFO:  vacuuming "public.tbl"
INFO:  scanned index "tbl_pkey" to remove 417660 row versions
DETAIL:  CPU 0.09s/1.64u sec elapsed 1.88 sec.
INFO:  scanned index "idx_tbl_1" to remove 417660 row versions
DETAIL:  CPU 0.24s/1.96u sec elapsed 3.89 sec.
INFO:  "tbl": removed 417660 row versions in 23635 pages
DETAIL:  CPU 0.41s/1.28u sec elapsed 5.38 sec.
INFO:  index "tbl_pkey" now contains 5000000 row versions in 14191 pages
DETAIL:  417335 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "idx_tbl_1" now contains 5000000 row versions in 16433 pages
DETAIL:  417660 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "tbl": found 361383 removable, 5000000 nonremovable row versions in 23727 out of 23727 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 23313 unused item pointers.
0 pages are entirely empty.
CPU 0.93s/5.76u sec elapsed 12.46 sec.
INFO:  analyzing "public.tbl"
INFO:  "tbl": scanned 23727 of 23727 pages, containing 5000000 live rows and 0 dead rows; 30000 rows in sample, 5000000 estimated total rows
VACUUM

当前表的膨胀几乎停止, 因为第一次vacuum后回收了dead tuple的空间.

digoal=# select pg_relation_size('tbl');
 pg_relation_size 
------------------
        194371584
(1 row)

digoal=# select pg_relation_size('tbl_pkey');
 pg_relation_size 
------------------
        116252672
(1 row)

被更新列的索引继续膨胀, 因为索引页的复用需要这个页上的tuple itempoint完全失效后才可以.

digoal=# select pg_relation_size('idx_tbl_1');
 pg_relation_size 
------------------
        134619136
(1 row)

所以要经常给索引瘦身, 同时又不能影响数据库的DML操作.
可以使用如下方法, 重建例一中的索引 :

digoal=# \d tbl
      Table "public.tbl"
 Column |  Type   | Modifiers 
--------+---------+-----------
 id     | integer | 
 info   | text    | 
Indexes:
    "idx_tbl_1" btree (id)

创建索引的同时不影响表的 DML操作.

digoal=# create index concurrently  idx_tbl_2 on tbl(id);
CREATE INDEX
Time: 2599.077 ms

创建好后删除索引1.

digoal=# drop index idx_tbl_1;
DROP INDEX
Time: 21.757 ms

新建的索引又瘦下去了.

digoal=# select pg_relation_size('idx_tbl_2');
 pg_relation_size 
------------------
         22487040
(1 row)
Time: 0.450 ms

对于primary key或者unique key也可以使用此方法.

digoal=# create unique index concurrently user_info_username_key_1 on user_info(username);
CREATE INDEX
digoal=# begin;
BEGIN
digoal=# alter table user_info drop constraint user_info_username_key;
ALTER TABLE
digoal=# alter table user_info add constraint user_info_username_key unique using index user_info_username_key_1;
NOTICE:  ALTER TABLE / ADD CONSTRAINT USING INDEX will rename index "user_info_username_key_1" to "user_info_username_key"
ALTER TABLE
digoal=# end;
COMMIT
digoal=# create unique index concurrently user_info_pkey_1 on user_info(id);
CREATE INDEX
digoal=# begin;
BEGIN
digoal=# alter table user_info drop constraint user_info_pkey;
ALTER TABLE
digoal=# alter table user_info add constraint user_info_pkey primary key using index user_info_pkey_1;
NOTICE:  ALTER TABLE / ADD CONSTRAINT USING INDEX will rename index "user_info_pkey_1" to "user_info_pkey"
ALTER TABLE
digoal=# end;
COMMIT

DB_su

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
1
评论
PostgreSQL Daily Maintenance - reindex

Reindex日志维护随着DML的发生, 索引会出现碎片. 持续膨胀. 降低效率. 虽然PostgreSQL有vacuum机制, 但是索引不像heap表, vacuum后的dead tuple占用的空间可以被马上回收复用, 以b-tree索引为例, 一个b-tree page只有当所有的item全部变成不可用后这个page才能被复用. 所以索引膨胀的概率比表大很多. 例一 : digoal
复制链接

扫一扫