理解 postgresql.conf 的work_mem 参数配置-CSDN博客

今天我们着重来了解 postgresql.conf 中的 work_mem 参数

官方文档描述如下：

指定在写入临时文件之前内部排序操作和散列表使用的内存量。该值默认为1MB。请注意，对于复杂查询，可能会同时运行多个
排序和散列操作；在开始向临时文件中写数据之前，每一个操作将会被允许使用尽可能多的内存。而且，多个运行会话(session)
可以同时进行这样的操作。因此总的内存使用是work_mem的很多倍，在对 work_mem 做参数调整时，要考虑上述事实。排序操
做用于 ORDER BY, DISTINCT 和mergejoin。散列表用于hash join, 基于散列的聚集操作，基于散列的 IN 子查询。
进行这样的操作

上面的描述非常恰当，让我们来看一些具体的例子：
正如上面所介绍的， work_mem 是用于限制操作的内存使用的，这关系到每个操作的排序和哈希。

从排序开始，让我们来看它是如何工作的。

建立测试表

`CREATE TABLE test (id serial primary key, random_text text);`

`--- 使用脚本生成测试数据`

$perl -e '@c=("a".."z", "A".."Z", 0..9); print join("", map{$c[rand@c]}10..20+rand(40))."n" for 1..1000000' > \ /home/whatcat/test/random_strings

`-- 复制数据到数据库的表中`

`copy test (random_text) from '/home/postgres/test/random_strings'`

查看导入数据的情况

`select * from test limit 5;`

idrandom_text1pktMlgRik7d4zhaOqv5dmP3CsjzE2T5uGSsXl8tgN9dh1c0E7ZE3G7TIhV5TpKHMNjpitcbsrga9rPM5mk6GVeEgyXoxmUACaIh2D84hds5sxUGAFM9YVy7SLbLN8WO96XKiZQI9IZ5ZwNHzmi8C5x5xyhPE6v5nUek5jSQYlGr8ZQ6WW1E8D6osRsXCr7AQYtBAbD44yDCxYcazTaDx1bbEe7VASl3E51iPNEgWvP1BkbBmT9BT5kLC4k2Wa7EdP6H8ts7yOgM4qXMld1fajvxDX4KQSpTc89NZUAklMSJsfXVbNfjOPVtkQEtQ110xlTBIF49zJv5SsGygSpGKQDZMGoJ4q3GqvTsVaJBTwK0Z

(10 rows)

Time: 0.620 ms

因为测试表具有 serial 数据类型的 id 字段，所以对于我们使用排序操作去限制输出前 10， 100， 1000 或者更多记录，
同时 random_text 字段没有索引，将会执行非常大的排序。
开始第一个测试

`show work_mem`

`work_mem`

4MB (1 row)

`explain analyze select * from test where id <= 10 order by random_text asc;`

                                                      QUERY PLAN

Sort (cost=60559.43..61430.65 rows=348488 width=36) (actual time=0.037..0.037 rows=10 loops=1) Sort Key: random_text Sort Method: quicksort Memory: 25kB -> Bitmap Heap Scan on test (cost=6361.21..18949.31 rows=348488 width=36) ( actual time=0.018..0.019 rows=10 loops=1)

    Recheck Cond: (id <= 10)
    Heap Blocks: exact=1
    ->  Bitmap Index Scan on test_pkey  (cost=0.00..6274.09 rows=348488 wid

th=0) (actual time=0.012..0.012 rows=10 loops=1)

          Index Cond: (id <= 10)

Planning time: 0.084 ms Execution time: 0.067 ms (10 rows)

Time: 0.503 ms

从上述执行计划我们可以看到，这个查询使用的内存是 25KB， work_mem 的限制大小为 4MB 。

所以，让我们来看当数据足增大， work_mem 依旧是 4MB. 每一个查询都运行三次，来看下面查询数量下的
实验情况。
rows Sort method sort mem use
10 q 0.503 ms 25 KB
100 q 0.670 ms 33 KB
1000 q 1.189 ms 113 KB
10000 q 10.857 ms 1266 KB
100000 em 163.738 ms 4432 KB
1000000 em 1668.335 ms
44464 KB

我们可以看到在测试数据为 100K 条记录时， PostgreSQL 选择从内存中执行 quicksort 切换到外部合并的方法（参看查询计划）。有意思的是--处理每一条记录的时间并没有显著增加，但是这可能是因为我们处理的数据集依旧很小的原因，如果我们
的机器具有相当多的内存, 因此内核会缓存大量的临时文件数据。

不过，为什么 PostgreSQL 会放弃内存的排序 qsort 去选择disk的外排序，这个原因很简单，当work_mem 不够用时，就会去使
用磁盘，所以意味着它已经被填满了。所以，按“磁盘排序意味着或多或少使用了整个 work_mem 加上 4432KB 的磁盘。

如何纯内存的工作呢？

`set work_mem = '8MB';`

`explain analyze select * from test where id <= 1000000 order by random_text asc;`

                                                  QUERY PLAN

Sort (cost=147735.34..150235.34 rows=1000000 width=35) (actual time=852.386..1 419.290 rows=1000000 loops=1) Sort Key: random_text Sort Method: external merge Disk: 44384kB -> Seq Scan on test (cost=0.00..20732.00 rows=1000000 width=35) (actual tim e=0.018..203.833 rows=1000000 loops=1)

    Filter: (id <= 1000000)

Planning time: 0.513 ms Execution time: 1512.669 ms (7 rows)

Time: 1514.233 ms

从上面的查询计划我们可以看到，实际的内存使用情况。
对于hash操作，哈希算子不会溢出到磁盘上（最起码目前没有提及），而是通过增加 “batches” 的数量实现。

hash 操作的实验

`explain analyze select * from test a join test b using (random_text) where a.id < 10;`

                                                      QUERY PLAN

Hash Join (cost=8.66..21990.75 rows=8 width=39) (actual time=0.093..321.716 ro ws=9 loops=1) Hash Cond: (b.random_text = a.random_text) -> Seq Scan on test b (cost=0.00..18232.00 rows=1000000 width=35) (actual t ime=0.023..143.559 rows=1000000 loops=1) -> Hash (cost=8.56..8.56 rows=8 width=35) (actual time=0.016..0.016 rows=9 loops=1)

    Buckets: 1024  Batches: 1  Memory Usage: 9kB
    ->  Index Scan using test_pkey on test a  (cost=0.42..8.56 rows=8 width

=35) (actual time=0.005..0.008 rows=9 loops=1)

          Index Cond: (id < 10)

Planning time: 33.548 ms Execution time: 321.775 ms (9 rows)

Time: 396.186 ms

数据量增大

`explain analyze select * from test a join test b using (random_text) where a.id < 100000;`

                                                            QUERY PLAN

Hash Join (cost=5604.65..44961.38 rows=98174 width=39) (actual time=66.299..61 1.290 rows=99999 loops=1) Hash Cond: (b.random_text = a.random_text) -> Seq Scan on test b (cost=0.00..18232.00 rows=1000000 width=35) (actual t ime=0.013..150.562 rows=1000000 loops=1) -> Hash (cost=3610.47..3610.47 rows=98174 width=35) (actual time=65.906..65 .906 rows=99999 loops=1)

    Buckets: 65536  Batches: 2  Memory Usage: 3880kB
    ->  Index Scan using test_pkey on test a  (cost=0.42..3610.47 rows=9817

4 width=35) (actual time=0.018..33.366 rows=99999 loops=1)

          Index Cond: (id < 100000)

Planning time: 0.168 ms Execution time: 620.401 ms (9 rows)

Time: 620.979 ms

或者如下数据量

`explain analyze select * from test a join test b using (random_text) where a.id < 1000000;`

                                                      QUERY PLAN

Hash Join (cost=38545.00..96466.00 rows=1000000 width=39) (actual time=445.161 ..2030.208 rows=999999 loops=1) Hash Cond: (a.random_text = b.random_text) -> Seq Scan on test a (cost=0.00..20732.00 rows=1000000 width=35) (actual t ime=0.013..245.331 rows=999999 loops=1)

    Filter: (id < 1000000)
    Rows Removed by Filter: 1

-> Hash (cost=18232.00..18232.00 rows=1000000 width=35) (actual time=444.69 2..444.692 rows=1000000 loops=1)

    Buckets: 65536  Batches: 32  Memory Usage: 2586kB
    ->  Seq Scan on test b  (cost=0.00..18232.00 rows=1000000 width=35) (ac

tual time=0.004..153.395 rows=1000000 loops=1) Planning time: 0.251 ms Execution time: 2116.081 ms (10 rows)

Time: 2116.709 ms

那么将 work_mem 调高是否会有帮助呢？这个想法是通过调高 work_mem 使得hash使用更少的 "batches",散列更大，从而
使得哈希操作更快。为什么是这个想法呢？接下来这么做

`set work_mem = '100MB';`

`explain analyze select * from test a join test b using (random_text) where a.id < 1000000;`

                                                      QUERY PLAN

Hash Join (cost=30732.00..65214.00 rows=1000000 width=39) (actual time=495.487 ..1359.246 rows=999999 loops=1) Hash Cond: (a.random_text = b.random_text) -> Seq Scan on test a (cost=0.00..20732.00 rows=1000000 width=35) (actual t ime=0.014..197.752 rows=999999 loops=1)

    Filter: (id < 1000000)
    Rows Removed by Filter: 1

-> Hash (cost=18232.00..18232.00 rows=1000000 width=35) (actual time=492.33 5..492.335 rows=1000000 loops=1)

    Buckets: 1048576  Batches: 1  Memory Usage: 74107kB
    ->  Seq Scan on test b  (cost=0.00..18232.00 rows=1000000 width=35) (ac

tual time=0.006..154.439 rows=1000000 loops=1) Planning time: 0.202 ms Execution time: 1452.360 ms (10 rows)

Time: 1452.967 ms

我们可以看到 hash 使用了的内存为，速度上变化不大。

我们将 work_mem 调整到很低的参数时，是否会获得一定的性能差异显示呢？

`set work_mem='1MB';`

`explain analyze select * from test a join test b using (random_text) where a.id < 1000000;`

                                                      QUERY PLAN

Hash Join (cost=38545.00..96466.00 rows=1000000 width=39) (actual time=467.893 ..1960.566 rows=999999 loops=1) Hash Cond: (a.random_text = b.random_text) -> Seq Scan on test a (cost=0.00..20732.00 rows=1000000 width=35) (actual t ime=0.032..255.996 rows=999999 loops=1)

    Filter: (id < 1000000)
    Rows Removed by Filter: 1

-> Hash (cost=18232.00..18232.00 rows=1000000 width=35) (actual time=467.16 4..467.164 rows=1000000 loops=1)

    Buckets: 16384  Batches: 128  Memory Usage: 651kB
    ->  Seq Scan on test b  (cost=0.00..18232.00 rows=1000000 width=35) (ac

tual time=0.004..162.964 rows=1000000 loops=1) Planning time: 0.193 ms Execution time: 2046.817 ms (10 rows)

Time: 2047.374 ms

坦率的讲，上述实验三个结果并没有办法很好的解释，因为调整work_mem 可能会使得时间减少，但是work_mem 减少，耗时
变化也没有太明显。

正如之前的实验显示，随着work_mem 参数的增加，对于排序有提升 (在内存中完成，没有溢出到磁盘)。正是由于排序的提升，
对于依靠排序操作的 (聚集操作，去重操作， merge join) 都会加快。
如果我们将 work_mem 设置成 1GB，也会如上述结论所预测的那样吗？

事实上，这个问题是对于执行程序使用的work_mem，是有上限设置的，而不是一味的增加。
例如，我们的执行操作就像上面的实验所说的，我们启动101个排序或者hash 操作，那么我们实际需要的内存是 101*work_mem,
对于单数据库连接。

为了说明一些观点，假设你的查询语句需要 4-10 个 sort/hash 操作，你的最大连接限制参数 max_connections = 100, 那么这就意
味着，查询在100个连接下，内存消耗为 100 * work_mem。

正如，我们上面得到的结论，增加 work_mem 加快查询，但是这么大的并发量，很容易超出实际的物理内存，引起 OOM(out of
memory) 问题。在PostgreSQL 官方论坛的邮件中，一些用户将work_mem设置为 512MB 或者更大而引起OOM问题。简单的说就是你运行的复杂查询不要引起 OOM 问题.

简单的说，最好能为每一个 session 的查询设置自己的 work_mem.最好在 postgresql.conf 的值设置的很低（1-10MB）, 然后根据
那些查询会使用的 work_mem 超过设置的值，在对该查询的 session 设置对应的值。

`set work_mem ='100MB'`

`reset work_mem;`

这里就有一个想淡然的问题，我们怎么知道一个查询到底需要多少内存呢？
这非常简单-- 只需要修改 logging 的参数-- 设置 log_temp_files = 0 （所有的 log 都是临时文件），每隔一段时间
检查一次，检查最慢的查询，它们的执行计划，如果他们是依赖磁盘排序的。

对于 work_mem 的值。有一个不成文的规定就是 work_mem 限制在1GB。即使，你的物理内存足够大，你设置为 10GB ， PostgreSQL 也不会使用那么多的。

`explain analyze select * from lima where mike < 9000000 order by tango;`

sort mothod: quick memory : 1304617 KB

这里比较推荐一个网址，用来解析查询计划 explain.depesz.com

接下来介绍使用 GIN 索引情况下， work_mem 的使用。

在启用FASTUPDATE，在一系列的GIN插入操作。这会引起内存使用增长，当增长后大于work_mem, 系统会清楚暂挂
条目列表。为了避免影响我们监控响应时间，我们启动后台自动清理。可以通过增加 work_mem 或使用 autovacuum 更具
有倾略性的来清理。但是需要提一点， work_mem的增大也意味着发生清理时，耗时的增加。

GIN 索引的使用并不是很常见。当对含有 GIN 索引表执行插入是，可以将数据导入到 tsvector 上有gin 索引的表执行全局搜索。
通过增加 work_mem 可以加速响应效率。

接下来举一个具体的例子
省测绘数据的perl 代码

`!/usr/bin/perl`

use strict; use warnings; use autodie;

open my $fh, '<', '/usr/share/dict/american-english-insane'; my @dict; while (my $1 = <$fh>){

$1 =~s/\s+//;;
push @dict, $1 is $1 =~/\s/;

close $fh;

for my $size (100, 1_000, 10_000, 100_1000, 1_000_000, 10_000_000){

open my $out '>','/home/whatcat/test/' . $size . '.list';

for my $i ( 1.. $size){
    my $count = int(20 + rand 150);
    my @words = map {$dict[rand @dict]} 1..$count;
    print $out join(' ', @words) . "\n";
}
close $out;

} exit;

接下来，建表，创建索引，导入数据，删除表
设置的 work_mem 为
1MB
10MB
100MB
1GB

Script, in case you're interested:

`!/bin/bash`

for data_input_file in /home/whatcat/test/100.list /home/whatcat/test/1000.list /home/whatcat/test/10000.list /home/whatcat/test/1000000.list do

for work_mem in 1MB 10Mb 100MB 1GB
do
    echo "working on $data_input_file with $workj_mem work_mem."
    for i in 1 2 3
    do 
        (
            echo "set work_mem = '$work_mem'; "
            echo "create table gin_test (body text);"
            echo "create index gin_test_body_gin_idx on gin_test using gin (to_tsvector('english', body));"
        ) | psql -qAtX
        /usr/bin/time -f "- %e" psql -c "\\copy gion_test from '$data_input_file' "
        psql -qAtX -c "drop table gin_test"
    done
done

done

尽管上面的代码很长，但是实验结果还是很好容易理解的
file work_mem （sec）
1 MB 10 MB 100 MB 1GB
100.list 0.05 0.05 0.05 0.05
1000.list 0.72 0.74 0.81 0.76
10000.list 9.90 10.07 10.01 9.92
100000.list 673.33 672.68 679.72 680.76

根据的时间的差异，认为这是不相关的。