我们知道恢复数据是dba工作的一部分,做这部分工作一般都是输入恢复命令后就在那里无聊的等着任务的完,特别是大容量的数据恢复起来真的等到晕,那么我们如何提高数据恢复的性能呢,下面说说那些参数有影响
我们知道,导入数据的过程其实就是io的负载,特别考验硬盘的写能力,还有就是数据写入磁盘的方法,另外一个就是cpu的处理能力,再一个就是内存的大量使用,影响的参数大概有这些 shared_buffers work_mem maintenance_work_mem checkpoint_segments autovacuum fsync full_page_writes 现在我们来测试恢复一个2.5G的sql_ascii编码数据文件,恢复后大约是6G的数据 硬件环境及OS CPU:intel i7 3770K 内存:kingston DDRIII 1600 8G*2 硬盘:SATAIII SSD 120G*2 OS: centos 6.4 kernel 3.2.41 实验环境准备,将postgresql的参数设置成原来默认的参数,修改后重启postgresql服务 [postgres@pgsqldb-master data]$ vi postgresql.conf [postgres@pgsqldb-master data]$ cd .. [postgres@pgsqldb-master pgdata]$ pg_ctl restart -D ./data/ -l ./data/logfile.txt & 2&>1 [1] 6977 [postgres@pgsqldb-master pgdata]$ waiting for server to shut down.... done server stopped server starting [1]+ Done pg_ctl restart -D ./data/ -l ./data/logfile.txt [postgres@pgsqldb-master pgdata]$ [postgres@pgsqldb-master pgdata]$ psql psql (9.2.3) Type "help" for help. postgres=# select current_setting('shared_buffers') AS shared_buffers ,current_setting('work_mem') as work_mem,current_setting('maintenance_work_mem') as maintenance_work_men,current_setting('checkpoint_segments') as checkpoint_segments,current_setting('fsync') as fsync,current_setting('autovacuum') as autovacuum; shared_buffers | work_mem | maintenance_work_men | checkpoint_segments | fsync | autovacuum ----------------+----------+----------------------+---------------------+-------+------------ 32MB | 1MB | 16MB | 3 | on | on (1 row) postgres=# postgres=#CREATE TABLE res(kssj timestamp NOT NULL,jssj timestamp,bz text); CREATE TABLE postgres=# [postgres@pgsqldb-master pgdata]$ cd /pgxlog [postgres@pgsqldb-master pgxlog]$vi res.sql 下面写一个sql恢复脚本本,取名res.sql,下面是文件的内容 \c postgres DROP DATABASE cysoft_database; CREATE DATABASE cysoft_database ENCODING 'sql_ascii' TEMPLATE template0; INSERT INTO res (kssj) VALUES(CURRENT_TIMESTAMP); \c cysoft_database \i 16.dump \c postgres UPDATE res SET jssj=CURRENT_TIMESTAMP, bz='shared_buffers='||CURRENT_SETTING('shared_buffers') || ',work_mem='||CURRENT_SETTING('work_mem') || ',maintenance_work_mem='||CURRENT_SETTING('maintenance_work_mem') || ',checkpoint_segments='||CURRENT_SETTING('checkpoint_segments') || ',fsync='||CURRENT_SETTING('fsync') WHERE jssj IS NULL; 按ESC键再输入:qw退出
一、不做任何参数恢复数据
[postgres@pgsqldb-master pgxlog]$ psql -f res.sql ..... [postgres@pgsqldb-master pgxlog]$ psql psql (9.2.3) Type "help" for help. postgres=# select *,jssj-kssj as hs from res ; kssj | jssj | bz | hs ----------------------------+----------------------------+---------------------------------------------------------------------------------------------------------+----------------- 2013-03-28 22:43:28.376343 | 2013-03-28 22:49:48.341472 | shared_buffers=32MB,work_mem=1MB,maintenance_work_mem=16MB,checkpoint_segments=3,autovacuum=on,fsync=on | 00:06:19.965129 (1 row) postgres=# 查看一下导入数据时系统有什么警告的信息 [postgres@pgsqldb-master pgxlog]$ cat /pgdata/data/logfile.txt STATEMENT: CREATE PROCEDURAL LANGUAGE plpgsql; LOG: checkpoints are occurring too frequently (2 seconds apart) HINT: Consider increasing the configuration parameter "checkpoint_segments". LOG: checkpoints are occurring too frequently (1 second apart) ...... ...... HINT: Consider increasing the configuration parameter "checkpoint_segments". LOG: sending cancel to blocking autovacuum PID 7179 DETAIL: Process 7163 waits for AccessExclusiveLock on relation 544596186 of database 544595470. STATEMENT: ALTER TABLE ONLY bom_yl_detail ADD CONSTRAINT bom_yl_detail_auto_bh_key UNIQUE (auto_bh); ERROR: canceling autovacuum task CONTEXT: automatic analyze of table "cysoft_database.public.bom_yl_detail" LOG: sending cancel to blocking autovacuum PID 7179 ..... ..... CONTEXT: automatic analyze of table "cysoft_database.public.cl" ERROR: shared buffer hash table corrupted STATEMENT: CREATE INDEX kq_month_report_yf_idx ON kq_month_report USING btree (yf); 从上面的信息可以看出,checkpoint_segments设置太少了,另外我们恢复数据时不希望autovacuum进程开启的,vacuum和analyze一般都是导入完成后再执行,现在我们调整这两个参数 关闭autovacuum进程 如果不关闭的话,系统在你导入数据的过程中会启用vacuum进程来清理数据,vacuum进程会导致内存占用,一般出现这样的错误ERROR: canceling autovacuum task。最严重的还会导致服务重启,导致导入失败 禁用方法:打开postgresql.conf找到#autovacuum = on项目,修改成autovacuum = off ,注意前面的”#“要去掉,这个参数修改后要重启postgresql服务才能生效 增大checkpoint_segments参数 在导入数据的过程,我们不希望系统老是checkpoint,所以需要增大checkpoint_segment的值。 修改方法:打开postgresql.conf找到#checkpoint_segment = 3项目,修改成checkpoint_segment = 30 ,注意前面的”#“要去掉,这个参数修改后要重启postgresql服务才能生效 [postgres@pgsqldb-master pgxlog]cd /pgdata/data [postgres@pgsqldb-master data]vi postgresql.conf #修改完成后按esc键,再输入:wq退出 [postgres@pgsqldb-master data]cd .. [postgres@pgsqldb-master pgdata]$ pg_ctl restart -D ./data/ -l ./data/logfile.txt & 2&>1 [1] 7439 [postgres@pgsqldb-master pgdata]$ waiting for server to shut down.... done server stopped server starting [1]+ Done pg_ctl restart -D ./data/ -l ./data/logfile.txt [postgres@pgsqldb-master pgdata]$ [postgres@pgsqldb-master pgdata]$ psql psql (9.2.3) Type "help" for help. postgres=# select current_setting('shared_buffers') AS shared_buffers ,current_setting('work_mem') as work_mem,current_setting('maintenance_work_mem') as maintenance_work_men,current_setting('checkpoint_segments') as checkpoint_segments,current_setting('fsync') as fsync,current_setting('autovacuum') as autovacuum; shared_buffers | work_mem | maintenance_work_men | checkpoint_segments | fsync | autovacuum ----------------+----------+----------------------+---------------------+-------+------------ 32MB | 1MB | 16MB | 30 | on | off (1 row) postgres=#
二、增大checkpoint_segments,autovacuum = off 恢复数据测试
postgres=#\q [postgres@pgsqldb-master pgdata]$ cd /pgdata/ [postgres@pgsqldb-master pgdata]$ cd /pgxlog/ [postgres@pgsqldb-master pgxlog]$ psql -f res.sql ..... ..... [postgres@pgsqldb-master pgxlog]$ psql psql (9.2.3) Type "help" for help. postgres=# select *,jssj-kssj as hs from res; kssj | jssj | bz | hs ----------------------------+----------------------------+-----------------------------------------------------------------------------------------------------------+----------------- 2013-03-28 22:43:28.376343 | 2013-03-28 22:49:48.341472 | shared_buffers=32MB,work_mem=1MB,maintenance_work_mem=16MB,checkpoint_segments=3,autovacuum=on,fsync=on | 00:06:19.965129 2013-03-28 23:08:54.526256 | 2013-03-28 23:15:04.926526 | shared_buffers=32MB,work_mem=1MB,maintenance_work_mem=16MB,checkpoint_segments=30,autovacuum=off,fsync=on | 00:06:10.40027 (2 rows) postgres=# postgres=# \q [postgres@pgsqldb-master pgxlog]$ 从上面的结果来看,有提升,但效果不明显,再在我们再做下面的修改 增大shared_buffers参数,就是增大postgresql的专用内存数 修改方法:打开postgresql.conf找到#shared_buffers = 30MB项目,修改成shared_bufferd = 512MB ,注意前面的”#“要去掉,这个参数修改后要重启postgresql服务才能生效 [postgres@pgsqldb-master pgxlog]$ cd /pgdata/data [postgres@pgsqldb-master data]$ vi postgresql.conf #修改完成后按esc键,再输入:wq退出 [postgres@pgsqldb-master data]$ cd .. [postgres@pgsqldb-master pgdata]$ pg_ctl restart -D ./data/ -l ./data/logfile.txt & 2&>1 [1] 7966 [postgres@pgsqldb-master pgdata]$ waiting for server to shut down.... done server stopped server starting [1]+ Done pg_ctl restart -D ./data/ -l ./data/logfile.txt [postgres@pgsqldb-master pgdata]$ [postgres@pgsqldb-master pgdata]$ psql psql (9.2.3) Type "help" for help. postgres=# select current_setting('shared_buffers') AS shared_buffers ,current_setting('work_mem') as work_mem,current_setting('maintenance_work_mem') as maintenance_work_men,current_setting('checkpoint_segments') as checkpoint_segments,current_setting('fsync') as fsync,current_setting('autovacuum') as autovacuum; shared_buffers | work_mem | maintenance_work_men | checkpoint_segments | fsync | autovacuum ----------------+----------+----------------------+---------------------+-------+------------ 512MB | 1MB | 16MB | 30 | on | off (1 row) postgres=#
三、调整shared_buffer参数后测试
postgres=# \q [postgres@pgsqldb-master pgdata]$ cd /pgxlog [postgres@pgsqldb-master pgxlog]$ psql -f res.sql ...... ...... [postgres@pgsqldb-master pgxlog]$ psql psql (9.2.3) Type "help" for help. postgres=# select *,jssj-kssj as hs from res order by kssj; kssj | jssj | bz | hs ----------------------------+----------------------------+------------------------------------------------------------------------------------------------------------+----------------- 2013-03-28 22:43:28.376343 | 2013-03-28 22:49:48.341472 | shared_buffers=32MB,work_mem=1MB,maintenance_work_mem=16MB,checkpoint_segments=3,autovacuum=on,fsync=on | 00:06:19.965129 2013-03-28 23:08:54.526256 | 2013-03-28 23:15:04.926526 | shared_buffers=32MB,work_mem=1MB,maintenance_work_mem=16MB,checkpoint_segments=30,autovacuum=off,fsync=on | 00:06:10.40027 2013-03-29 09:23:28.152466 | 2013-03-29 09:29:23.295784 | shared_buffers=512MB,work_mem=1MB,maintenance_work_mem=16MB,checkpoint_segments=30,autovacuum=off,fsync=on | 00:05:55.143318 (3 rows) postgres=# 我们可以看到又提升了一点点,下面我们再增大导入数据,建立索引时影响比较大的参数work_men,maintenance_work_mem,这两个是会话可以调整的,我们修改一下res.sql文件即可 \c postgres DROP DATABASE cysoft_database; CREATE DATABASE cysoft_database ENCODING 'sql_ascii' TEMPLATE template0; INSERT INTO res (kssj) VALUES(CURRENT_TIMESTAMP); \c cysoft_database
set work_mem = '30MB';
set maintenance_work_mem = '256MB';
\i 16.dump
\c postgres
set work_mem = '30MB';
set maintenance_work_mem = '256MB';
UPDATE res SET jssj=CURRENT_TIMESTAMP, bz='shared_buffers='||CURRENT_SETTING('shared_buffers') || ',work_mem='||CURRENT_SETTING('work_mem') || ',maintenance_work_mem='||CURRENT_SETTING('maintenance_work_mem') || ',checkpoint_segments='||CURRENT_SETTING('checkpoint_segments') || ',fsync='||CURRENT_SETTING('fsync') WHERE jssj IS NULL; 按ESC键再输入:qw退出
四、调整work_mem,maintenance_work_memr参数后测试
[postgres@pgsqldb-master pgdata]$ cd /pgxlog [postgres@pgsqldb-master pgxlog]$ psql -f res.sql ...... ...... [postgres@pgsqldb-master pgxlog]$ psql psql (9.2.3) Type "help" for help. postgres=# select *,jssj-kssj as hs from res order by kssj; kssj | jssj | bz | hs ----------------------------+----------------------------+-------------------------------------------------------------------------------------------------------------+----------------- 2013-03-28 22:43:28.376343 | 2013-03-28 22:49:48.341472 | shared_buffers=32MB,work_mem=1MB,maintenance_work_mem=16MB,checkpoint_segments=3,autovacuum=on,fsync=on | 00:06:19.965129 2013-03-28 23:08:54.526256 | 2013-03-28 23:15:04.926526 | shared_buffers=32MB,work_mem=1MB,maintenance_work_mem=16MB,checkpoint_segments=30,autovacuum=off,fsync=on | 00:06:10.40027 2013-03-29 09:23:28.152466 | 2013-03-29 09:29:23.295784 | shared_buffers=512MB,work_mem=1MB,maintenance_work_mem=16MB,checkpoint_segments=30,autovacuum=off,fsync=on | 00:05:55.143318 2013-03-29 10:05:33.708843 | 2013-03-29 10:10:14.501172 | shared_buffers=512MB,work_mem=30MB,maintenance_work_mem=256MB,checkpoint_segments=30,autovacuum=off,fsync=on | 00:04:40.792329 (4 rows) 我们可以看到增加这两个参数值后提升的性能是非常明显的,原因很简单,恢复数据时需要做大量的CREATE INDEX,ALTER TABLE ADD FOREIGN KEY,所以提高这两个参数值性能明显的提高了 下面我们再来调整fsync参数,关闭磁盘同步,注意,如果在线上系统不要关闭这个参数来提高性能,否则可能导致你的数据系统受到破坏。 关闭这个参数的系统一般都是用于那些无关紧要的应用,数据丢失也不怕的,否肯不建议打开,不管你的硬件保障有多好 修改方法:打开postgresql.conf找到#fsync = on B项目,修改成fsync = off ,注意前面的”#“要去掉,这个参数修改后要重启postgresql服务才能生效 [postgres@pgsqldb-master pgxlog]$ cd /pgdata/data [postgres@pgsqldb-master data]$ vi postgresql.conf #修改完成后按esc键,再输入:wq退出 [postgres@pgsqldb-master data]$ cd .. [postgres@pgsqldb-master pgdata]$ pg_ctl restart -D ./data/ -l ./data/logfile.txt & 2&>1 [1] 3632 [postgres@pgsqldb-master pgdata]$ waiting for server to shut down.......... done server stopped server starting [1]+ Done pg_ctl restart -D ./data/ -l ./data/logfile.txt [postgres@pgsqldb-master pgdata]$ [postgres@pgsqldb-master pgdata]$ psql psql (9.2.3) Type "help" for help. postgres=# select current_setting('shared_buffers') AS shared_buffers ,current_setting('work_mem') as work_mem,current_setting('maintenance_work_mem') as maintenance_work_men,current_setting('checkpoint_segments') as checkpoint_segments,current_setting('fsync') as fsync,current_setting('autovacuum') as autovacuum; shared_buffers | work_mem | maintenance_work_men | checkpoint_segments | fsync | autovacuum ----------------+----------+----------------------+---------------------+-------+------------ 512MB | 1MB | 16MB | 30 | off | off (1 row) postgres=#
五、调整参数fsync=off后测试
[postgres@pgsqldb-master pgdata]$ cd /pgxlog [postgres@pgsqldb-master pgxlog]$ psql -f res.sql ...... ...... [postgres@pgsqldb-master pgxlog]$ psql psql (9.2.3) Type "help" for help. postgres=# select *,jssj-kssj as hs from res order by kssj; kssj | jssj | bz | hs ----------------------------+----------------------------+--------------------------------------------------------------------------------------------------------------+----------------- 2013-03-28 22:43:28.376343 | 2013-03-28 22:49:48.341472 | shared_buffers=32MB,work_mem=1MB,maintenance_work_mem=16MB,checkpoint_segments=3,autovacuum=on,fsync=on | 00:06:19.965129 2013-03-28 23:08:54.526256 | 2013-03-28 23:15:04.926526 | shared_buffers=32MB,work_mem=1MB,maintenance_work_mem=16MB,checkpoint_segments=30,autovacuum=off,fsync=on | 00:06:10.40027 2013-03-29 09:23:28.152466 | 2013-03-29 09:29:23.295784 | shared_buffers=512MB,work_mem=1MB,maintenance_work_mem=16MB,checkpoint_segments=30,autovacuum=off,fsync=on | 00:05:55.143318 2013-03-29 10:05:33.708843 | 2013-03-29 10:10:14.501172 | shared_buffers=512MB,work_mem=30MB,maintenance_work_mem=256MB,checkpoint_segments=30,autovacuum=off,fsync=on | 00:04:40.792329 2013-03-29 11:45:35.545366 | 2013-03-29 11:49:27.979684 | shared_buffers=512MB,work_mem=30MB,maintenance_work_mem=256MB,checkpoint_segments=30,autovacuum=off,fsync=off | 00:03:52.434318 (5 rows) postgres=#\q [postgres@pgsqldb-master pgxlog]#sync
#将数据刷到硬盘上,一定要执行上面的语句
我们可以看到关闭fsync后提升的性能是非常明显的,原因很简单,不用磁盘同步,整块的写入提高了io的性能 看文档调整full_page_writes参数可以减少IO量,减少wal的页写入量,注意,如果在线上系统不要关闭这个参数来提高性能,否则可能导致你的数据系统受到破坏。它的危害类似于fsync 修改方法:打开postgresql.conf找到#full_page_writes = on B项目,修改成full_page_writes = off ,注意前面的”#“要去掉,这个参数修改后要重启postgresql服务才能生效 [email][postgres@pgsqldb-master[/email] pgxlog]$ cd /pgdata/data [email][postgres@pgsqldb-master[/email] data]$ vi postgresql.conf #修改完成后按esc键,再输入:wq退出 [email][postgres@pgsqldb-master[/email] data]$ cd .. [email][postgres@pgsqldb-master[/email] pgdata]$ pg_ctl restart -D ./data/ -l ./data/logfile.txt & 2&>1 [1] 3815 [email][postgres@pgsqldb-master[/email] pgdata]$ waiting for server to shut down.... done server stopped server starting [1]+ Done pg_ctl restart -D ./data/ -l ./data/logfile.txt [email][postgres@pgsqldb-master[/email] pgdata]$ [email][postgres@pgsqldb-master[/email] pgdata]$ psql psql (9.2.3) Type "help" for help. postgres=# select current_setting('shared_buffers') AS shared_buffers ,current_setting('work_mem') as work_mem,current_setting('maintenance_work_mem') as maintenance_work_men,current_setting('checkpoint_segments') as checkpoint_segments,current_setting('fsync') as fsync,current_setting('autovacuum') as autovacuum, current_setting('full_page_writes') as full_page_writes ; shared_buffers | work_mem | maintenance_work_men | checkpoint_segments | fsync | autovacuum | full_page_writes ----------------+----------+----------------------+---------------------+-------+------------+------------------ 512MB | 1MB | 16MB | 30 | off | off | off (1 row) postgres=#
六、调整参数full_page_writes=off后测试
[postgres@pgsqldb-master pgdata]$ cd /pgxlog [postgres@pgsqldb-master pgxlog]$ psql -f res.sql ...... ...... [postgres@pgsqldb-master pgxlog]$ psql psql (9.2.3) Type "help" for help. postgres=# select *,jssj-kssj as hs from res order by kssj; kssj | jssj | bz | hs ----------------------------+----------------------------+----------------------------------------------------------------------------------------------------------------------- -------------+----------------- 2013-03-28 22:43:28.376343 | 2013-03-28 22:49:48.341472 | shared_buffers=32MB,work_mem=1MB,maintenance_work_mem=16MB,checkpoint_segments=3,autovacuum=on,fsync=on | 00:06:19.965129 2013-03-28 23:08:54.526256 | 2013-03-28 23:15:04.926526 | shared_buffers=32MB,work_mem=1MB,maintenance_work_mem=16MB,checkpoint_segments=30,autovacuum=off,fsync=on | 00:06:10.40027 2013-03-29 09:23:28.152466 | 2013-03-29 09:29:23.295784 | shared_buffers=512MB,work_mem=1MB,maintenance_work_mem=16MB,checkpoint_segments=30,autovacuum=off,fsync=on | 00:05:55.143318 2013-03-29 10:05:33.708843 | 2013-03-29 10:10:14.501172 | shared_buffers=512MB,work_mem=30MB,maintenance_work_mem=256MB,checkpoint_segments=30,autovacuum=off,fsync=on | 00:04:40.792329 2013-03-29 11:45:35.545366 | 2013-03-29 11:49:27.979684 | shared_buffers=512MB,work_mem=30MB,maintenance_work_mem=256MB,checkpoint_segments=30,autovacuum=off,fsync=off | 00:03:52.434318 2013-03-29 15:03:17.025641 | 2013-03-29 15:07:09.867797 | shared_buffers=512MB,work_mem=30MB,maintenance_work_mem=256MB,checkpoint_segments=30,autovacuum=off,fsync=off,full_pag e_writes=off | 00:03:52.842156 (6 rows) postgres=#
几乎没有变法,看来是fsync起了优先的作用
总结,恢复数据的参数设置大概如下,修改前后相差 6:19 - 3:52 = 2:27 前面差不多有40%性能的提升
shared_buffers = 512MB
work_mem =30MB
maintenance_work_mem = 256MB
checkpoint_segments = 30
autovacuum =off
fsync = off
full_page_writes=off/on
七、利用pg_restore的多进程大幅提高数据恢复的速度
从CPU的角度来谈谈如何搞升级数据恢复的性能,恢复数据的过程中最慢其实就是建立索引的开销,建立索引时占用系统资源比较大主要是内存,io,cpu的使用,内存我们可以加大,硬盘我们可以将xlog,data,index分开,就是cpu有多个核心无法充分利用,这主要是指建立一个索引时无法利用多个核心,但我们可以利用多个核心同时建立不同的索引,在postgresql8.4采用pg_restore就提供的多进程支持功能,但数据备份时必需使用-Fc的格式才能支持多进程并发恢复数据,使用pg_restore恢复数据时只要加上-j x ,x代表要使用多少进程,一般我们有多少个核心,x就是多少,如我的cpu是8个核心的话就是-j 8最合适不过,多了反而会造成cpu争用
[postgres@pgsqldb-master pgxlog]$pg_dump cysoft_database -Fc > pg_restore.dump #备份一个pg_restore恢复格式的数据文件 [postgres@pgsqldb-master pgxlog]$ vi pg_restor_test.sh # 下面我们建立一个恢复测试脚本 内容如下 #!/bin/sh date; dropdb cysoft_database; createdb cysoft_database -T template0 -E sql_ascii; pg_restore -j 8 -d cysoft_database pg_restore.dump date; 最后按"ESC"键退出 [postgres@pgsqldb-master pgxlog]$chmod +x pg_restor_test.sh [postgres@pgsqldb-master pgxlog]$./pg_restor_test.sh > log.txt 2>&1 & [postgres@pgsqldb-master pgxlog]$ cat log.txt 2013年 03月 30日 星期六 19:54:04 CST 2013年 03月 30日 星期六 19:54:59 CST [postgres@pgsqldb-master pgxlog]$ [postgres@pgsqldb-master bin]$ psql cysoft_database psql (9.2.3) Type "help" for help. cysoft_database=# cysoft_database=# select sum(pg_relation_size(relname::text))/1024/1024 as tablesize from pg_class where relkind='r' and relname not like 'pg_%' and relname not like 'sql%'; tablesize ----------------------- 2626.5468750000000000 (1 row) cysoft_database=# select sum(pg_relation_size(relname::text))/1024/1024 as indexsize from pg_class where relkind='i' and relname not like 'pg_%'; indexsize ----------------------- 1156.0625000000000000 (1 row)
真的完美,指定8进程恢复,用时55秒,只有上面单进程最快的3:52的零头 ,这样的硬件能做到1分钟4G的数据量恢复已经非常不错了,而且还是指定的locale,如果是--no-locale的话还可以提高一部的性能,这个会有专门一个帖子对比
http://bbs.pgsqldb.com/client/post_show.php?zt_auto_bh=57394