POSTGRESQL is an object-relational database developed on the Internet by a group of developers spread across the globe. It is an open-source alternative to commercial databases like Oracle and Informix. POSTGRESQL was originally developed at the University of California at Berkeley. In 1996, a group began development of the database on the Internet. They use email to share ideas and file servers to share code. POSTGRESQL is now comparable to commercial databases in terms of features, performance, and reliability. It has transactions, views, stored procedures, and referential integrity constraints. It supports a large number of programming interfaces, including ODBC, Java (JDBC), TCL/TK, PHP, Perl, and Python. POSTGRESQL continues to improve at a tremendous pace thanks to a talented pool of Internet developers. | 译者信息
POSTGRESQL 是一个对象关系型数据库,由来自全球一组网络开发者开发。它是一个可代替如Oracle、Informix商业数据库的开源版本。 POSTGRESQL 最初由加州大学伯克利分校开发。1996年,一个小组开始在互联网上开发该数据库。他们使用email分享想法,用文件服务器分享代码。POSTGRESQL现在在功能方面、性能方面以及可靠性上可与商业数据库比肩。它支持事务、视图、存储过程和参考完整性约束。它也支持大量的编程接口,包括ODBC、Java(JDBC)、TCL/TK、PHP、Perl以及Python。得益于互联网开发者人才库,POSTGRESQL 还有广阔的增长空间。 | 译者信息
POSTGRESQL 由一群分布在世界各地的开发者,在互联网上合作开发的一款对象关系数据库。是替代 Oracle 和 Informix 这种商业数据库的开源软件。 POSTGRESQL 最初由加州大学伯克利分校开发。从1996年开始, 由一群人通过互联网合作继续开发。他们利用电子邮件沟通, 用文件服务器共享代码。现在的 POSTGRESQL 无论是功能, 性能还是稳定性, 都可以跟商业数据库媲美。它支持事务, 视图, 存储过程和引用型的完整性约束(referential integrity constraints)。提供大量程序接口,包括 ODBC, Java (JDBC),TCL/TK,PHP,Perl 和 Python。托互联网上这帮天才的福, POSTGRESQL 将大步向前走。 | |||||||||
Performance ConceptsThere are two aspects of database performance tuning. One is improving the database's use of the CPU, memory, and disk drives in the computer. The second is optimizing the queries sent to the database. This article talks about the hardware aspects of performance tuning. The optimization of queries is done using SQL commands likeCREATE INDEX, VACUUM, VACUUM FULL, ANALYZE, CLUSTER, and EXPLAIN. These are discussed in my book, PostgreSQL: Introduction and Concepts at http://momjian.us/main/writings/pgsql/aw_pgsql_book/. To understand hardware performance issues, it is important to understand what is happening inside the computer. For simplicity, a computer can be thought of as a central processing unit (CPU) surrounded by storage. On the same chip with the CPU are several CPUregisters which store intermediate results and various pointers and counters. Surrounding this is the CPU cache which holds the most recently accessed information. Beyond the CPU cache is a large amount of random-access main memory (RAM) which holds executing programs and data. Beyond this main memory are disk drives, which store even larger amounts of information. Disk drives are the only permanent storage area, so anything to be kept when the computer is turned off must be placed there. In summary, here are the storage areas surrounding the CPU:
You can see that storage areas increase in size as they get farther from the CPU. Ideally, a huge amount of permanent memory could be placed right next to the CPU, but this would be too slow and expensive. In practice, the most frequently used information is stored next to the CPU, and less frequently accessed information is stored farther away and brought to the CPU as needed. | 译者信息
性能概念数据库性能优化有两个方面。一方面是提高数据库对电脑CPU,内存和硬盘的使用。另一方面是最优化传递到数据库的查询。这篇文章讨论的是在硬件方面优化数据库性能。通过使用例如:CREATE INDEX,VACUUM,VACUUM FULL,ANALYZE,CLUSTER和EXPLAIN这些数据库SQL命令,插叙查询的最优化已经完成了。这些在我写的《PostgreSQL:Introduction and Concepts》(http://momjian.us/main/writings/pgsql/aw_pgsql_book/)这本书中已经讨论过了。 为了理解硬件性能的问题,就必须理解在电脑的内部发生了什么。简单的说,一台电脑可以被视为一个被存储器包围的中央处理单元(CPU)。在和CPU同一小片上的是不同的寄存器,它们保存了中间运算结果和各种指针以及计数器。包围这些的是CPU cache,其中有最新的访问信息。越过CPU cache是大量的随机存取存储器(RAM),它保存了正在运行的程序以及数据。在RAM的外围就是硬盘了,它保存了更加多的信息。硬盘是唯一可以永久存储信息的区域。,所以电脑关机后,所有被保存下来的信息都在这里。归纳起来,这些是包围CPU的存储区域:
存储区域 容量 CPU寄存器 几字节 CPU高速缓存 几千字节 RAM 几兆字节 硬盘 几千兆字节 你可以看到储存大小随着离CPU距离的增加而增加。理论上,大容量的永久存储可以被安置在CPU的旁边,但是这将变的很慢而且很昂贵。实际当中,最常用的信息被放在CPU的旁边,而不怎么用的信息就放得离CPU远远的。在CPU需要的时候再拿给CPU。 | ||||||||||
Keeping Information Near the CPUMoving information between various storage areas happens automatically. Compilers determine which information should be stored in registers. CPU chip logic keeps recently used information in the CPUcache. The operating system controls which information is stored in RAM and shuttles it back and forth from the disk drive. CPU registers and the CPU cache cannot be effectively tuned by the database administrator. Effective database tuning involves increasing the amount of useful information in RAM, thus preventing disk access where possible. You might think this is easy to do, but it is not. A computer's RAMcontains many things:
Proper tuning involves keeping as much database information in RAMas possible while not adversely affecting other areas of the operating system. | 译者信息
缩短数据与 CPU 的距离数据在各种存储区域的转移是自动执行的。编译器决定哪些数据存在寄存器里头。CPU 决定哪些数据存在缓存里面。 操作系统负责内存和硬盘之间的数据交换。 数据库管理员对 CPU 的寄存器和缓存无能为力。要提高数据库的性能,只能通过增加内存中的有用数据量, 从而减少磁盘访问来获得。 看似简单, 其实不然, 内存中的数据包含很多东西:
理想的性能调整, 既要增加内存中的数据库数据占有量,又不能对系统造成负面影响。 | ||||||||||
POSTGRESQL Shared Buffer CachePOSTGRESQL does not directly change information on disk. Instead, it requests data be read into the POSTGRESQL shared buffer cache. POSTGRESQL backends then read/write these blocks, and finally flush them back to disk. Backends that need to access tables first look for needed blocks in this cache. If they are already there, they can continue processing right away. If not, an operating system request is made to load the blocks. The blocks are loaded either from the kernel disk buffer cache, or from disk. These can be expensive operations. The default POSTGRESQL configuration allocates 1000 shared buffers. Each buffer is 8 kilobytes. Increasing the number of buffers makes it more likely backends will find the information they need in the cache, thus avoiding an expensive operating system request. The change can be made with a postmaster command-line flag or by changing the value of shared_buffers in postgresql.conf. | 译者信息
POSTGRESQL 共享缓存
POSTGRESQL 没有直接访问磁盘,而是访问 POSTGRESQL 的缓存。然后再由 POSTGRESQL 的后台程序读写这些数据块, 最后写到磁盘上。 后台首先在表中,查找缓存是否已经存在这些数据。 有, 就继续处理。没有, 则由操作系统从内核磁盘缓存, 或者直接从磁盘加载这些数据。无论哪一种,代价都很高。 POSTGRESQL 默认分配 1000 个缓存。每个缓存有 8k 字节。增加缓存的数量,能增加后台访问缓存的频率,减少代价较高的系统请求。缓存的数量,可以通过 postmaster 命令行的参数, 或者配置文件 postgresql.conf 中的 shared_buffers 的值来设置。 | ||||||||||
How Big Is Too Big? | 译者信息
多大才算太大? | ||||||||||
Effects of Cache SizeYou may wonder why cache size is so important. First, imagine the POSTGRESQL shared buffer cache is large enough to hold an entire table. Repeated sequential scans of the table will require no disk access because all the data is already in the cache. Now imagine the cache is one block smaller than the table. A sequential scan of the table will load all table blocks into the cache until the last one. When that block is needed, the oldest block is removed, which in this case is the first block of the table. When another sequential scan happens, the first block is no longer in the cache, and to load it in, the oldest block is removed, which in this case is now the second block in the table. This pushing out of the next needed block continues to the end of the table. This is an extreme example, but you can see that a decrease of one block can change the efficiency of the cache from 100% to 0%. It shows that finding the right cache size can dramatically affect performance. | 译者信息
高速缓存(cache)容量的影响或许你会想为什么高速缓存的大小如此重要。首先,试想一下PostgreSQL共享缓存大到可以放下整张表。重复连续扫描这张表就不需要硬盘的参与,因为数据已经在cache里了。现在假设cache比表小一个单元。一次连续的扫描将会把所有单元载入cache直到最后一个单元。当需要最后一个单元时,最初的单元被移除。当另一次连续扫描开始的时候,最初的单元已经不再cache里了,为了载入它,最开始的单元会被移除,也就是第一次扫描时的第二个单元会被移除。这将持续进行到单元结束。这个例子很极端,但是你可以看到减少一个单元就将会把cache的效率从100%变为0%。这表明找到合适的cache容量会戏剧性的改变性能。 | ||||||||||
Proper Sizing of Shared Buffer CacheIdeally, the POSTGRESQL shared buffer cache will be:
Keep in mind that the postmaster allocates all shared memory when it starts. This area stays the same size even if no one is accessing the database. Some operating systems pageout unreferenced shared memory, while others lock shared memory into RAM. Locked shared memory is preferred. The POSTGRESQL administrators guide has information about kernel configuration for various operating systems, http://developer.postgresql.org/docs/postgres/kernel-resources.html | 译者信息
合适容量的共享缓存理论上,POSTGERSQL共享缓存将是:
记住数据库管理器运行时分配所有的共享存储。这一区域即使在没有访问数据库的请求时也保持一样大小。一些操作系统pageout未指定的共享存储,而另一些LOCK共享存储到RAM中。LOCK贡献存储更好一点。P OSTGERSQL的管理员指导手册里有关于不同操作系统核心配置的信息, http://developer.postgresql.org/docs/postgres/kernel-resources.html。 | ||||||||||
Sort Memory Batch Size | 译者信息
内存排序队列大小另一个优化参数是排序队列的内存总量。当对大表或一个记录集排序时,PostgerSQL会将他们分块排序,将中间结果存储在临时文件里。这些文件将在所有队列处理完毕后合并使用。增加每一队列的大小就会产生更少的临时文件同时加快处理速度。然后,如果队列过大,部分内存队列就会在处理过程中pageout到虚拟内存从而导致pagein的发生。因此,用小点的队列产生更多的临时文件会快很多,但又一次,当太多内存被分配时,虚拟内存pagein产生。记住这个参数是后端使用在处理批次上的,而不是 ORDER BY, CREATE INDEX或合并。多少同时的排序就会用多少倍这么多的内存。 这个值可以通过数据库管理器命令行标志改变或通过改变在postgresql.conf中sort_mem的值来改变。 | 译者信息
批量排序的内存规模 | |||||||||
Cache Size and Sort Size | 译者信息
缓存规模和排序规模 | ||||||||||
As a start for tuning, use 15% of RAM for cache size, and 2-4% for sort size if you have just a few big sessions, and much smaller if you have lots of small sessions. You can try increasing it to see if performance improves and if no swapping occurs. If the shared buffers are oversized, you waste overhead maintaining too many buffers, and it takes RAM that could be used by other processes and as additional kernel disk buffer cache. A valuable server parameter is effective_cache_size. This parameter is used by the optimizer to estimate the size of the kernel's disk buffer cache. In kernels with a unified buffer cache, this value should be set to the average amount of unused RAM in the kernel because such kernels use unused RAM to cache recently accessed disk pages. On kernels with a fixed-sized disk buffer cache, this should be set to the size of your kernel buffer cache, typically 10% of RAM. | 译者信息
在调整的开始,使用15%的RAM作为缓存大小,如果有几个大的事物就用2-4%的内存做排序大小,如果你有很多小事物的话就使用更小的内存。你可以尝试提高它来看看性能是否提升,swapping交换是否发生。如果共享缓存过大,你就花费太多时间来维护大量的缓存,而且它会浪费掉本可以被其他进程使用的RAM,无法作为额外的内核磁盘的缓存。 有价值的服务器参数是effective_cache_size。这个参数被优化器用来估计内核磁盘缓存的大小。在使用统一缓存的内核里,这个值应该设为内核未使用RAM的平均值,因为这样内核就可以使用未使用的RAM来缓存最近访问的磁盘页。在有固定磁盘缓存的内核里,这个值应该设为内核缓存的大小,一般为RAM的10%。 |
PostgreSQL 硬件性能调整
最新推荐文章于 2023-01-10 17:42:01 发布