Greenplum中的VACUUM和VACUUM FULL

最新推荐文章于 2024-02-23 16:57:31 发布

wh62592855

最新推荐文章于 2024-02-23 16:57:31 发布

阅读量5.8k

点赞数

分类专栏： GreenPlum 文章标签： greenplum table hadoop insert delete tuples

本文链接：https://blog.csdn.net/wanghai__/article/details/6196652

版权

GreenPlum 专栏收录该内容

17 篇文章 1 订阅

订阅专栏

VACUUM会把由于delete和update操作造成的空洞重复利用，但是不会释放空间。而VACUUM FULL则会释放相应的空间。下面是测试过程

[gpadmin1@hadoop5 ~]$ psql
psql (8.2.13)
Type "help" for help.

template1=# create table ttt2(id int);
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE

template1=# insert into ttt2 select generate_series(1,1000000);
INSERT 0 1000000
template1=# select pg_relation_size('ttt2')/1024/1024;
?column?
----------
34
(1 row)

template1=# select oid,relname,relfilenode from pg_class where relname='ttt2';
oid | relname | relfilenode
-------+---------+-------------
55309 | ttt2 | 55309
(1 row)

这个GP集群总共两个节点，因此这里看到的大小不会是34M

[gpadmin1@hadoop5 1]$ ll -h | grep 55309
-rw------- 1 gpadmin1 gpadmin1 18M Feb 20 21:06 55309

template1=# delete from ttt2 where id<500000;
DELETE 499999

template1=# vacuum ttt2;
VACUUM
template1=# select pg_relation_size('ttt2')/1024/1024;
?column?
----------
34
(1 row)

大小并没有什么变化

[gpadmin1@hadoop5 1]$ ll -h | grep 55309
-rw------- 1 gpadmin1 gpadmin1 18M Feb 20 21:07 55309

template1=# insert into ttt2 select generate_series(1,200000);
INSERT 0 200000
template1=# select pg_relation_size('ttt2')/1024/1024;
?column?
----------
34
(1 row)

这里看到，当再次插入200000万条数据的时候大小没变，之前删除的500000数据所使用的空间被重新利用了

[gpadmin1@hadoop5 1]$ ll -h | grep 55309
-rw------- 1 gpadmin1 gpadmin1 18M Feb 20 21:07 55309

template1=# vacuum full ttt2;
NOTICE: 'VACUUM FULL' is not safe for large tables and has been known to yield unpredictable runtimes.
HINT: Use 'VACUUM' instead.
VACUUM
template1=# select pg_relation_size('ttt2')/1024/1024;
?column?
----------
24
(1 row)

使用vacuum full，可以看到剩余空间得到了释放

[gpadmin1@hadoop5 1]$ ll -h | grep 55309
-rw------- 1 gpadmin1 gpadmin1 13M Feb 20 21:08 55309

下面附上一段GP文档上对两者的说明

Plain VACUUM (without FULL) simply reclaims space and makes it available for re-use. This form of the command can operate in parallel with normal reading and writing of the table, as an exclusive lock is not obtained. VACUUM FULL does more extensive processing, including moving of tuples across blocks to try to compact the table to the minimum number of disk blocks. This form is much slower and requires an exclusive lock on each table while it is being processed.

#################

今天在另外一篇文章里看到的一段话，红色是重点

Note that VACUUM does not shrink a table when it runs, unless there is a large run of space
at the end of a table , and nobody is accessing the table when we try to shrink it. To properly
shrink a table, you need VACUUM FULL. That locks up the whole table for a long time, and
should be avoided, if possible. VACUUM FULL will literally rewrite every row of the table, and
completely rebuild all indexes. That process is faster in 9.0 than it used to be, though it's still a
long time for larger tables.