PostgreSQL 硬件性能调整

最新推荐文章于 2023-01-10 17:42:01 发布

wgr1001

最新推荐文章于 2023-01-10 17:42:01 发布

阅读量1.3k

点赞数

分类专栏： other

other 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

POSTGRESQL is an object-relational database developed on the Internet by a group of developers spread across the globe. It is an open-source alternative to commercial databases like Oracle and Informix.

POSTGRESQL was originally developed at the University of California at Berkeley. In 1996, a group began development of the database on the Internet. They use email to share ideas and file servers to share code. POSTGRESQL is now comparable to commercial databases in terms of features, performance, and reliability. It has transactions, views, stored procedures, and referential integrity constraints. It supports a large number of programming interfaces, including ODBC, Java (JDBC), TCL/TK, PHP, Perl, and Python. POSTGRESQL continues to improve at a tremendous pace thanks to a talented pool of Internet developers.

译者信息

开源中国七里香
翻译于 1个月前

0人顶此译文

其它翻译版本：1(点击译者名切换)
BreakingBad

POSTGRESQL 是一个对象关系型数据库，由来自全球一组网络开发者开发。它是一个可代替如Oracle、Informix商业数据库的开源版本。

POSTGRESQL 最初由加州大学伯克利分校开发。1996年，一个小组开始在互联网上开发该数据库。他们使用email分享想法，用文件服务器分享代码。POSTGRESQL现在在功能方面、性能方面以及可靠性上可与商业数据库比肩。它支持事务、视图、存储过程和参考完整性约束。它也支持大量的编程接口，包括ODBC、Java(JDBC)、TCL/TK、PHP、Perl以及Python。得益于互联网开发者人才库，POSTGRESQL 还有广阔的增长空间。

译者信息

BreakingBad
翻译于 1个月前

0人顶此译文

其它翻译版本：1(点击译者名切换)
开源中国七里香

POSTGRESQL 由一群分布在世界各地的开发者，在互联网上合作开发的一款对象关系数据库。是替代 Oracle 和 Informix 这种商业数据库的开源软件。

POSTGRESQL 最初由加州大学伯克利分校开发。从1996年开始，由一群人通过互联网合作继续开发。他们利用电子邮件沟通，用文件服务器共享代码。现在的 POSTGRESQL 无论是功能，性能还是稳定性，都可以跟商业数据库媲美。它支持事务，视图，存储过程和引用型的完整性约束（referential integrity constraints）。提供大量程序接口，包括 ODBC， Java (JDBC)，TCL/TK，PHP，Perl 和 Python。托互联网上这帮天才的福， POSTGRESQL 将大步向前走。

Performance Concepts

There are two aspects of database performance tuning. One is improving the database's use of the CPU, memory, and disk drives in the computer. The second is optimizing the queries sent to the database. This article talks about the hardware aspects of performance tuning. The optimization of queries is done using SQL commands likeCREATE INDEX, VACUUM, VACUUM FULL, ANALYZE, CLUSTER, and EXPLAIN. These are discussed in my book, PostgreSQL: Introduction and Concepts at http://momjian.us/main/writings/pgsql/aw_pgsql_book/.

To understand hardware performance issues, it is important to understand what is happening inside the computer. For simplicity, a computer can be thought of as a central processing unit (CPU) surrounded by storage. On the same chip with the CPU are several CPUregisters which store intermediate results and various pointers and counters. Surrounding this is the CPU cache which holds the most recently accessed information. Beyond the CPU cache is a large amount of random-access main memory (RAM) which holds executing programs and data. Beyond this main memory are disk drives, which store even larger amounts of information. Disk drives are the only permanent storage area, so anything to be kept when the computer is turned off must be placed there. In summary, here are the storage areas surrounding the CPU:

$\includegraphics[height=0.25\textheight]{caches}$

Storage Area	Measured in
CPU registers	bytes
CPU cache	kilobytes
RAM	megabytes
disk drives	gigabytes

You can see that storage areas increase in size as they get farther from the CPU. Ideally, a huge amount of permanent memory could be placed right next to the CPU, but this would be too slow and expensive. In practice, the most frequently used information is stored next to the CPU, and less frequently accessed information is stored farther away and brought to the CPU as needed.

译者信息

MagicBLS
翻译于 1个月前

0人顶此译文

性能概念

数据库性能优化有两个方面。一方面是提高数据库对电脑CPU,内存和硬盘的使用。另一方面是最优化传递到数据库的查询。这篇文章讨论的是在硬件方面优化数据库性能。通过使用例如：CREATE INDEX,VACUUM,VACUUM FULL,ANALYZE,CLUSTER和EXPLAIN这些数据库SQL命令，插叙查询的最优化已经完成了。这些在我写的《PostgreSQL:Introduction and Concepts》（http://momjian.us/main/writings/pgsql/aw_pgsql_book/）这本书中已经讨论过了。

为了理解硬件性能的问题，就必须理解在电脑的内部发生了什么。简单的说，一台电脑可以被视为一个被存储器包围的中央处理单元（CPU)。在和CPU同一小片上的是不同的寄存器，它们保存了中间运算结果和各种指针以及计数器。包围这些的是CPU cache，其中有最新的访问信息。越过CPU cache是大量的随机存取存储器（RAM），它保存了正在运行的程序以及数据。在RAM的外围就是硬盘了，它保存了更加多的信息。硬盘是唯一可以永久存储信息的区域。，所以电脑关机后，所有被保存下来的信息都在这里。归纳起来，这些是包围CPU的存储区域：

$\includegraphics[height=0.25\textheight]{caches}$

存储区域容量

CPU寄存器几字节

CPU高速缓存几千字节

RAM 几兆字节

硬盘几千兆字节

你可以看到储存大小随着离CPU距离的增加而增加。理论上，大容量的永久存储可以被安置在CPU的旁边，但是这将变的很慢而且很昂贵。实际当中，最常用的信息被放在CPU的旁边，而不怎么用的信息就放得离CPU远远的。在CPU需要的时候再拿给CPU。

Keeping Information Near the CPU

Moving information between various storage areas happens automatically. Compilers determine which information should be stored in registers. CPU chip logic keeps recently used information in the CPUcache. The operating system controls which information is stored in RAM and shuttles it back and forth from the disk drive.

CPU registers and the CPU cache cannot be effectively tuned by the database administrator. Effective database tuning involves increasing the amount of useful information in RAM, thus preventing disk access where possible.

You might think this is easy to do, but it is not. A computer's RAMcontains many things:

executing programs
program data and stack
POSTGRESQL shared buffer cache
kernel disk buffer cache
kernel

Proper tuning involves keeping as much database information in RAMas possible while not adversely affecting other areas of the operating system.

译者信息

BreakingBad
翻译于 1个月前

0人顶此译文

缩短数据与 CPU 的距离

数据在各种存储区域的转移是自动执行的。编译器决定哪些数据存在寄存器里头。CPU 决定哪些数据存在缓存里面。操作系统负责内存和硬盘之间的数据交换。

数据库管理员对 CPU 的寄存器和缓存无能为力。要提高数据库的性能，只能通过增加内存中的有用数据量，从而减少磁盘访问来获得。

看似简单，其实不然，内存中的数据包含很多东西：

正在执行中的程序
程序的数据和堆栈
POSTGRESQL 共享缓存
内核磁盘缓存
内核

理想的性能调整，既要增加内存中的数据库数据占有量，又不能对系统造成负面影响。

POSTGRESQL Shared Buffer Cache

$\includegraphics[height=0.35\textheight]{buffer_stack}$

POSTGRESQL does not directly change information on disk. Instead, it requests data be read into the POSTGRESQL shared buffer cache. POSTGRESQL backends then read/write these blocks, and finally flush them back to disk.

Backends that need to access tables first look for needed blocks in this cache. If they are already there, they can continue processing right away. If not, an operating system request is made to load the blocks. The blocks are loaded either from the kernel disk buffer cache, or from disk. These can be expensive operations.

The default POSTGRESQL configuration allocates 1000 shared buffers. Each buffer is 8 kilobytes. Increasing the number of buffers makes it more likely backends will find the information they need in the cache, thus avoiding an expensive operating system request. The change can be made with a postmaster command-line flag or by changing the value of shared_buffers in postgresql.conf.

译者信息

BreakingBad
翻译于 1个月前

0人顶此译文

POSTGRESQL 共享缓存

$\includegraphics[height=0.35\textheight]{buffer_stack}$

POSTGRESQL 没有直接访问磁盘，而是访问 POSTGRESQL 的缓存。然后再由 POSTGRESQL 的后台程序读写这些数据块，最后写到磁盘上。

后台首先在表中，查找缓存是否已经存在这些数据。有，就继续处理。没有，则由操作系统从内核磁盘缓存，或者直接从磁盘加载这些数据。无论哪一种，代价都很高。

POSTGRESQL 默认分配 1000 个缓存。每个缓存有 8k 字节。增加缓存的数量，能增加后台访问缓存的频率，减少代价较高的系统请求。缓存的数量，可以通过 postmaster 命令行的参数，或者配置文件 postgresql.conf 中的 shared_buffers 的值来设置。

How Big Is Too Big?

$\includegraphics[width=0.8\paperwidth]{memory}$

You may think, ``I will just give all my RAM to the POSTGRESQL shared buffer cache.'' However, if you do that, there will be no room for the kernel or for any programs to run. The proper size for the POSTGRESQL shared buffer cache is the largest useful size that does not adversely affect other activity.

To understand adverse activity, you need to understand how UNIXoperating systems manage memory. If there is enough memory to hold all programs and data, little memory management is required. However, if everything doesn't fit in RAM, the kernel starts forcing memory pages to a disk area called swap. It moves pages that have not been used recently. This operation is called a swap pageout. Pageouts are not a problem because they happen during periods of inactivity. What is bad is when these pages have to be brought back in from swap, meaning an old page that was moved out to swap has to be moved back into RAM. This is called a swap pagein. This is bad because while the page is moved from swap, the program is suspended until the pagein completes.

Pagein activity is shown by system analysis tools like vmstatand sar and indicates there is not enough memory available to function efficiently. Do not confuse swap pageins with ordinary pageins, which may include pages read from the filesystem as part of normal system operation. If you can't find swap pageins, many pageouts is a good indicator you are are also doing swap pageins.

译者信息

BreakingBad
翻译于 1个月前

0人顶此译文

多大才算太大?

$\includegraphics[width=0.8\paperwidth]{memory}$

你可能在想， “那我把所有的内存都分配给 POSTGRESQL 的缓冲区好了”。如果你这么做，那系统内核以及其他程序就没有内存可用了。理想的 POSTGRESQL 共享缓冲区大小，是在没有对系统产生不利影响的情况下，越大越好。

要理解什么是不利影响，首先要明白 UNIX 是如何管理内存的。要是内存容量足够大，能容下所有的程序和数据。那我们也就用不着管理内存了。问题是，内存的容量有限，所以，需要内核将内存中的数据分页，存入磁盘，这就是传说的的数据交换。原理是，将当前用不上的数据移到磁盘中。这个操作叫做交换区页面移入（swap pageout）。页面移入交换区不难，只要在程序非活跃期执行就可以。问题在于，页面重新从交换区移出来的时候。也就是，移到交换区的旧页面，又重新移回内存。这个操叫交换区移出（ swap pagein）。说它是个问题，是因为，当页面移入内存的时候，程序需要终止执行，直到移入操作完成。

系统的页面移入活跃情况，可以通过像 vmstatand sar 这种系统分析工具来查看，是否有足够的内存，维持系统的正常运作。不要把交换区页面移出，跟常规的页面移出搞混了。常规的页面移出，将页面数据从文件系统中读出来，当作是系统操作的一部分。如果你看不出，是否有交换区页面移出操作。但是交换区页面移入的操作非常活跃，这也说明，有大量的页面移出的操作正在进行。

Effects of Cache Size

You may wonder why cache size is so important. First, imagine the POSTGRESQL shared buffer cache is large enough to hold an entire table. Repeated sequential scans of the table will require no disk access because all the data is already in the cache. Now imagine the cache is one block smaller than the table. A sequential scan of the table will load all table blocks into the cache until the last one. When that block is needed, the oldest block is removed, which in this case is the first block of the table. When another sequential scan happens, the first block is no longer in the cache, and to load it in, the oldest block is removed, which in this case is now the second block in the table. This pushing out of the next needed block continues to the end of the table. This is an extreme example, but you can see that a decrease of one block can change the efficiency of the cache from 100% to 0%. It shows that finding the right cache size can dramatically affect performance.

译者信息

MagicBLS
翻译于 1个月前

0人顶此译文

高速缓存（cache)容量的影响

或许你会想为什么高速缓存的大小如此重要。首先，试想一下PostgreSQL共享缓存大到可以放下整张表。重复连续扫描这张表就不需要硬盘的参与，因为数据已经在cache里了。现在假设cache比表小一个单元。一次连续的扫描将会把所有单元载入cache直到最后一个单元。当需要最后一个单元时，最初的单元被移除。当另一次连续扫描开始的时候，最初的单元已经不再cache里了，为了载入它，最开始的单元会被移除，也就是第一次扫描时的第二个单元会被移除。这将持续进行到单元结束。这个例子很极端，但是你可以看到减少一个单元就将会把cache的效率从100%变为0%。这表明找到合适的cache容量会戏剧性的改变性能。

Proper Sizing of Shared Buffer Cache

Ideally, the POSTGRESQL shared buffer cache will be:

Large enough to hold most commonly-accessed tables
Small enough to avoid swap pagein activity

Keep in mind that the postmaster allocates all shared memory when it starts. This area stays the same size even if no one is accessing the database. Some operating systems pageout unreferenced shared memory, while others lock shared memory into RAM. Locked shared memory is preferred. The POSTGRESQL administrators guide has information about kernel configuration for various operating systems, http://developer.postgresql.org/docs/postgres/kernel-resources.html

译者信息

MagicBLS
翻译于 1个月前

0人顶此译文

合适容量的共享缓存

理论上，POSTGERSQL共享缓存将是：

它应该足够大来应付通常的表访问操作。
它应该足够小来避免 swap pagein 的发生。

记住数据库管理器运行时分配所有的共享存储。这一区域即使在没有访问数据库的请求时也保持一样大小。一些操作系统pageout未指定的共享存储，而另一些LOCK共享存储到RAM中。LOCK贡献存储更好一点。P OSTGERSQL的管理员指导手册里有关于不同操作系统核心配置的信息， http://developer.postgresql.org/docs/postgres/kernel-resources.html。

Sort Memory Batch Size

Another tuning parameter is the amount of memory used for sort batches. When sorting large tables or results, POSTGRESQL will sort them in parts, placing intermediate results in temporary files. These files are then merged and resorted until all rows are sorted. Increasing the batch size creates fewer temporary files and often allows faster sorting. However, if the sort batches are too large, they cause pageins because parts of the sort batch get paged out to swap during sorting. In these cases, it is much faster to use smaller sort batches and more temporary files, so again, swap pageins determine when too much memory has been allocated. Keep in mind this parameter is used for every backend performing a sort, either for ORDER BY, CREATE INDEX, or for a merge join. Several simultaneous sorts will use several times this amount of memory.

This value can be changed using a postmaster command-line flag or by changing the value of sort_mem in postgresql.conf.

译者信息

MagicBLS
翻译于 1个月前

1人顶此译文

其它翻译版本：1(点击译者名切换)
BreakingBad

内存排序队列大小

另一个优化参数是排序队列的内存总量。当对大表或一个记录集排序时，PostgerSQL会将他们分块排序，将中间结果存储在临时文件里。这些文件将在所有队列处理完毕后合并使用。增加每一队列的大小就会产生更少的临时文件同时加快处理速度。然后，如果队列过大，部分内存队列就会在处理过程中pageout到虚拟内存从而导致pagein的发生。因此，用小点的队列产生更多的临时文件会快很多，但又一次，当太多内存被分配时，虚拟内存pagein产生。记住这个参数是后端使用在处理批次上的，而不是 ORDER BY, CREATE INDEX或合并。多少同时的排序就会用多少倍这么多的内存。

这个值可以通过数据库管理器命令行标志改变或通过改变在postgresql.conf中sort_mem的值来改变。

译者信息

BreakingBad
翻译于 1个月前

0人顶此译文

其它翻译版本：1(点击译者名切换)
MagicBLS

批量排序的内存规模

另一项能调节性能的参数是，用做批量排序的内存容量。当对大量数据排序时， POSTGRESQL 会将他们拆分成许多小的数据块进行排序。然后将中间结果存在临时文件里面。这些文件最终被合并，重新排序，直到所有的数据行的排序完毕。增加批量处理的内存规模，能减少临时文件的数量。从而提高排序速度。不过，如果批量处理的规模设置太大，会导致交换区的分页操作变得更频繁。这种情况下，使用大量临时文件的小规模批量排序速度比较快。所以，由交换区分页活跃程度，决定内存是不是被过量分配。记住，这个参数是给后台执行排序用的。如： ORDER BY， CREATE INDEX，或者数据合并。有几个并行排序任务，就需要几倍这样的内存容量。

这个参数的值，可以通过 postmaster 命令行参数，或者配置文件 postgresql.conf 中的 sort_mem 来设置。

Cache Size and Sort Size

Both cache size and sort size affect memory usage, so you cannot maximize one without affecting the other. Keep in mind that cache size is allocated on postmaster startup, while sort size varies depending on the number of sorts being performed. Generally, cache size is more significant than sort size. However, certain queries that use ORDER BY, CREATE INDEX, or merge joins may see speedups with larger sort batch sizes.

Also, many operating systems limit how much shared memory can be allocated. Increasing this limit requires operating system-specific knowledge to either recompile or reconfigure the kernel. More information can be found in the POSTGRESQL administrators guide, http://developer.postgresql.org/docs/postgres/kernel-resources.html.

译者信息

BreakingBad
翻译于 1个月前

0人顶此译文

缓存规模和排序规模

缓存规模和排序规模都会影响内存的使用。你不可能增加一个的规模，而不影响另外一个。记住，缓存的规模是在 postmaster 启动的时候，就设好的。而排序的规模择由排序的数量决定。一般情况下，缓存规模要大过排序的规模。不过，某些用到 ORDER BY， CREATE INDEX 或数据合并的查询，可以通过加大排序规模来提速。

此外，许多操作系统对共享内存的分配有限制。修改这一限制，就意味着，要重新编译或者配置内核。也就是说，你要对操作系统这方面相当熟练才行。更多信息，参考 POSTGRESQL 管理员操作手册，http://developer.postgresql.org/docs/postgres/kernel-resources.html.

As a start for tuning, use 15% of RAM for cache size, and 2-4% for sort size if you have just a few big sessions, and much smaller if you have lots of small sessions. You can try increasing it to see if performance improves and if no swapping occurs. If the shared buffers are oversized, you waste overhead maintaining too many buffers, and it takes RAM that could be used by other processes and as additional kernel disk buffer cache.

A valuable server parameter is effective_cache_size. This parameter is used by the optimizer to estimate the size of the kernel's disk buffer cache. In kernels with a unified buffer cache, this value should be set to the average amount of unused RAM in the kernel because such kernels use unused RAM to cache recently accessed disk pages. On kernels with a fixed-sized disk buffer cache, this should be set to the size of your kernel buffer cache, typically 10% of RAM.

译者信息

Ley
翻译于 1个月前

0人顶此译文

在调整的开始，使用15%的RAM作为缓存大小，如果有几个大的事物就用2-4%的内存做排序大小，如果你有很多小事物的话就使用更小的内存。你可以尝试提高它来看看性能是否提升，swapping交换是否发生。如果共享缓存过大，你就花费太多时间来维护大量的缓存，而且它会浪费掉本可以被其他进程使用的RAM，无法作为额外的内核磁盘的缓存。

有价值的服务器参数是effective_cache_size。这个参数被优化器用来估计内核磁盘缓存的大小。在使用统一缓存的内核里，这个值应该设为内核未使用RAM的平均值，因为这样内核就可以使用未使用的RAM来缓存最近访问的磁盘页。在有固定磁盘缓存的内核里，这个值应该设为内核缓存的大小，一般为RAM的10%。