pg9.4 VS pg12大表join

最新推荐文章于 2022-12-08 09:50:45 发布

thompsonGuo1

最新推荐文章于 2022-12-08 09:50:45 发布

阅读量491

点赞数

分类专栏： postgreSql

本文链接：https://blog.csdn.net/weiquanaishiyao/article/details/106662053

版权

postgreSql 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

pg9.4 VS pg12大表join

1、环境

1、从ECM生产财务库、TC库分别同步数据到pg9.4和pg12版本，保证表数据量一致。

2、数据库配置比较

数据库版本	CPU核数	内存	最大IOPS	存储类型
9.4	8核	16G	8000	本地SSD盘
12	8核	16G	26800	ESSD云盘

3、由于ECM的导出大部分都是按照时间范围的大数据量导出，测试以发票为例，一个月的数据量。

2、前期准备

改原表名

alter table f_invoice
    rename to f_invoice_2;

创建表分区

pg12有个特性，只要在主表上创建索引，分区表上自动创建对应的索引（实测btree和gin都行）。

create table ins_dw_prd12.f_invoice
(
    invoice_id          bigint                   not null,
    source_type         smallint                 not null,
    cause_type          smallint                 not null,
    invoice_type        smallint                 not null,
    bill_type           smallint                 not null,
    outer_order_id      varchar(64),
    order_id            bigint,
    shop_id             integer,
    platform_id         smallint,
    pay_time            timestamp,
    invoice_title       varchar(64)              not null,
    invoice_status      smallint                 not null,
    buyer_nick          varchar(64),
    seller_taxer_name   varchar(64),
    seller_taxer_code   varchar(32),
    invoice_amount      numeric                  not null,
    invoice_num         varchar(128),
    invoice_code        varchar(128),
    red_invoice_num     varchar(128),
    invoice_date        timestamp,
    pdf_url             varchar(128),
    pic_url             varchar(128),
    is_urgent           smallint                 not null,
    is_auto             smallint                 not null,
    sms_type            smallint,
    sms_notice_status   smallint                 not null,
    is_mix_split        smallint                 not null,
    invoice_flow        varchar(32),
    blue_invoice_id     bigint,
    old_blue_invoice_id bigint,
    red_notice_num      varchar(56),
    down_time           timestamp,
    return_time         timestamp,
    invoice_remark      varchar(256),
    remark              varchar(256),
    reject_reason       smallint,
    reject_desc         varchar(256),
    is_lost             smallint,
    is_apply_writeoff   smallint,
    is_writeoff_result  smallint,
    err_msg             varchar(256),
    is_lock             smallint,
    is_pre_reopen       smallint,
    is_ad               smallint,
    create_time         timestamp                not null,
    update_time         timestamp with time zone not null,
    gl_date             timestamp,
    ims_customer_code   varchar(32),
    check_code          varchar(64),
    red_reason_type     smallint,
    receiver_email      varchar(128),
    primary key (invoice_id,create_time)
) partition by range (create_time);

-- 创建表分区
create table f_invoice_p2018 partition of f_invoice for values from (minvalue )to ('2019-01-01');
create table f_invoice_p201901 partition of f_invoice for values from ('2019-01-01' )to ('2019-02-01');
create table f_invoice_p201902 partition of f_invoice for values from ('2019-02-01' )to ('2019-03-01');
create table f_invoice_p201903 partition of f_invoice for values from ('2019-03-01' )to ('2019-04-01');
create table f_invoice_p201904 partition of f_invoice for values from ('2019-04-01' )to ('2019-05-01');
create table f_invoice_p201905 partition of f_invoice for values from ('2019-05-01' )to ('2019-06-01');
create table f_invoice_p201906 partition of f_invoice for values from ('2019-06-01' )to ('2019-07-01');
create table f_invoice_p201907 partition of f_invoice for values from ('2019-07-01' )to ('2019-08-01');
create table f_invoice_p201908 partition of f_invoice for values from ('2019-08-01' )to ('2019-09-01');
create table f_invoice_p201909 partition of f_invoice for values from ('2019-09-01' )to ('2019-10-01');
create table f_invoice_p201910 partition of f_invoice for values from ('2019-10-01' )to ('2019-11-01');
create table f_invoice_p201911 partition of f_invoice for values from ('2019-11-01' )to ('2019-12-01');
create table f_invoice_p201912 partition of f_invoice for values from ('2019-12-01' )to ('2020-01-01');
create table f_invoice_p202001 partition of f_invoice for values from ('2020-01-01' )to ('2020-02-01');
create table f_invoice_p202002 partition of f_invoice for values from ('2020-02-01' )to ('2020-03-01');
create table f_invoice_p202003 partition of f_invoice for values from ('2020-03-01' )to ('2020-04-01');
create table f_invoice_p202004 partition of f_invoice for values from ('2020-04-01' )to ('2020-05-01');
create table f_invoice_p202005 partition of f_invoice for values from ('2020-05-01' )to ('2020-06-01');
create table f_invoice_p202006 partition of f_invoice for values from ('2020-06-01' )to ('2020-07-01');
CREATE TABLE f_invoice_default PARTITION OF f_invoice DEFAULT;

-- 创建索引
create index idx_f_invoice_gin2
    on ins_dw_prd12.f_invoice using gin (source_type, invoice_type, invoice_status, invoice_title, invoice_date,
                               seller_taxer_code, shop_id, create_time) ;
create index idx_f_invoice_invoice_date2
    on ins_dw_prd12.f_invoice (invoice_date);
create index idx_f_invoice_seller_taxer_code2
    on ins_dw_prd12.f_invoice (seller_taxer_code);
create index idx_invoice_createtime_btree2
    on ins_dw_prd12.f_invoice (create_time);

copy数据

-- 从原表拷贝数据
insert into f_invoice select * from f_invoice_2;

3、pg12 VS pg9.4

3.1、普通表对比

pg12

explain(analyse, timing)
SELECT count(*)
FROM (select *
      from ins_dw_prd12.f_invoice_2 fi
      where fi.seller_taxer_code in ('91320200704046760T', '91340100149067617J', '91320214MA1YGE8F94')
        and fi.create_time >= '2019-08-01 00:00:00'
        and fi.create_time <= '2019-09-01 00:00:00') m
         INNER JOIN (select *
                     from ins_dw_prd12.f_invoice_item
                     where invoice_id in (SELECT fi.invoice_id
                                          FROM ins_dw_prd12.f_invoice_2 fi
                                          WHERE fi.seller_taxer_code in
                                                ('91320200704046760T', '91340100149067617J', '91320214MA1YGE8F94')
                                            and fi.create_time >= '2019-08-01 00:00:00'
                                            and fi.create_time <= '2019-09-01 00:00:00')) fit
                    ON fit.invoice_id = m.invoice_id;

执行计划：

Finalize Aggregate  (cost=322631.88..322631.89 rows=1 width=8) (actual time=6265.586..6265.586 rows=1 loops=1)
  ->  Gather  (cost=322631.46..322631.87 rows=4 width=8) (actual time=6264.473..6282.713 rows=5 loops=1)
        Workers Planned: 4
        Workers Launched: 4
        ->  Partial Aggregate  (cost=321631.46..321631.47 rows=1 width=8) (actual time=6259.589..6259.589 rows=1 loops=5)
              ->  Nested Loop  (cost=154762.41..321630.97 rows=194 width=0) (actual time=4100.919..6254.834 rows=38654 loops=5)
                    ->  Parallel Hash Join  (cost=154761.84..308914.48 rows=173 width=16) (actual time=4099.216..4355.187 rows=35911 loops=5)
                          Hash Cond: (fi.invoice_id = fi_1.invoice_id)
                          ->  Parallel Index Scan using idx_invoice_createtime_btree on f_invoice_2 fi  (cost=0.57..153991.56 rows=61577 width=8) (actual time=0.061..236.040 rows=35911 loops=5)
                                Index Cond: ((create_time >= '2019-08-01 00:00:00'::timestamp without time zone) AND (create_time <= '2019-09-01 00:00:00'::timestamp without time zone))
                                Filter: ((seller_taxer_code)::text = ANY ('{91320200704046760T,91340100149067617J,91320214MA1YGE8F94}'::text[]))
                                Rows Removed by Filter: 447251
                          ->  Parallel Hash  (cost=153991.56..153991.56 rows=61577 width=8) (actual time=4098.823..4098.823 rows=35911 loops=5)
                                Buckets: 262144  Batches: 1  Memory Usage: 9152kB
                                ->  Parallel Index Scan using idx_invoice_createtime_btree on f_invoice_2 fi_1  (cost=0.57..153991.56 rows=61577 width=8) (actual time=1.356..4083.857 rows=35911 loops=5)
                                      Index Cond: ((create_time >= '2019-08-01 00:00:00'::timestamp without time zone) AND (create_time <= '2019-09-01 00:00:00'::timestamp without time zone))
                                      Filter: ((seller_taxer_code)::text = ANY ('{91320200704046760T,91340100149067617J,91320214MA1YGE8F94}'::text[]))
                                      Rows Removed by Filter: 447251
                    ->  Index Only Scan using f_invoice_item_invoice_id_idx on f_invoice_item  (cost=0.57..71.17 rows=234 width=8) (actual time=0.052..0.052 rows=1 loops=179556)
                          Index Cond: (invoice_id = fi_1.invoice_id)
                          Heap Fetches: 193269
Planning Time: 0.596 ms
Execution Time: 6282.776 ms

pg9.4

explain(analyse, timing)
SELECT count(*)
FROM (select *
      from ins_dw_prd12.f_invoice fi
      where fi.seller_taxer_code in ('91320200704046760T', '91340100149067617J', '91320214MA1YGE8F94')
        and fi.create_time >= '2019-08-01 00:00:00'
        and fi.create_time <= '2019-09-01 00:00:00') m
         INNER JOIN (select *
                     from ins_dw_prd12.f_invoice_item
                     where invoice_id in (SELECT fi.invoice_id
                                          FROM ins_dw_prd12.f_invoice fi
                                          WHERE fi.seller_taxer_code in
                                                ('91320200704046760T', '91340100149067617J', '91320214MA1YGE8F94')
                                            and fi.create_time >= '2019-08-01 00:00:00'
                                            and fi.create_time <= '2019-09-01 00:00:00')) fit
                    ON fit.invoice_id = m.invoice_id;

执行计划

Aggregate  (cost=1583349.05..1583349.06 rows=1 width=0) (actual time=8706.357..8706.357 rows=1 loops=1)
  ->  Nested Loop  (cost=801491.83..1582945.16 rows=161555 width=0) (actual time=6551.816..8689.740 rows=193269 loops=1)
        ->  Hash Join  (cost=801491.27..1573243.48 rows=641 width=16) (actual time=6551.575..6953.476 rows=179556 loops=1)
              Hash Cond: (fi.invoice_id = fi_1.invoice_id)
              ->  Bitmap Heap Scan on f_invoice fi  (cost=15206.25..782402.63 rows=236591 width=8) (actual time=276.081..385.549 rows=179556 loops=1)
                    Recheck Cond: (((seller_taxer_code)::text = ANY ('{91320200704046760T,91340100149067617J,91320214MA1YGE8F94}'::text[])) AND (create_time >= '2019-08-01 00:00:00'::timestamp without time zone) AND (create_time <= '2019-09-01 00:00:00'::timestamp without time zone))
                    Heap Blocks: exact=49524
                    ->  Bitmap Index Scan on f_invoice_seller_taxer_code_create_time_idx  (cost=0.00..15147.10 rows=236591 width=0) (actual time=263.542..263.542 rows=179556 loops=1)
                          Index Cond: (((seller_taxer_code)::text = ANY ('{91320200704046760T,91340100149067617J,91320214MA1YGE8F94}'::text[])) AND (create_time >= '2019-08-01 00:00:00'::timestamp without time zone) AND (create_time <= '2019-09-01 00:00:00'::timestamp without time zone))
              ->  Hash  (cost=782402.63..782402.63 rows=236591 width=8) (actual time=6274.711..6274.711 rows=179556 loops=1)
                    Buckets: 16384  Batches: 32 (originally 4)  Memory Usage: 4097kB
                    ->  Bitmap Heap Scan on f_invoice fi_1  (cost=15206.25..782402.63 rows=236591 width=8) (actual time=54.600..6113.136 rows=179556 loops=1)
                          Recheck Cond: (((seller_taxer_code)::text = ANY ('{91320200704046760T,91340100149067617J,91320214MA1YGE8F94}'::text[])) AND (create_time >= '2019-08-01 00:00:00'::timestamp without time zone) AND (create_time <= '2019-09-01 00:00:00'::timestamp without time zone))
                          Heap Blocks: exact=49524
                          ->  Bitmap Index Scan on f_invoice_seller_taxer_code_create_time_idx  (cost=0.00..15147.10 rows=236591 width=0) (actual time=43.760..43.760 rows=179556 loops=1)
                                Index Cond: (((seller_taxer_code)::text = ANY ('{91320200704046760T,91340100149067617J,91320214MA1YGE8F94}'::text[])) AND (create_time >= '2019-08-01 00:00:00'::timestamp without time zone) AND (create_time <= '2019-09-01 00:00:00'::timestamp without time zone))
        ->  Index Only Scan using f_invoice_item_invoice_id_idx on f_invoice_item  (cost=0.57..12.62 rows=252 width=8) (actual time=0.009..0.009 rows=1 loops=179556)
              Index Cond: (invoice_id = fi_1.invoice_id)
              Heap Fetches: 162
Planning time: 0.871 ms
Execution time: 8706.466 ms

pg12版本耗时：6282.776 ms，pg9.4版本耗时：8706.466 ms，效率提升38%。

分析：

pg9.6以后的版本开始引入并行，这里测试的用例数据量和索引都是一样的，所以效率提升是由于pg12内部索引并行的优化。执行计划中：Workers Planned: 4，说明用了4个并行度，是由于pg12购买的机器只有8核，并且参数max_parallel_workers默认是4，还能通过核数来提升效率。

3.2、引入分区表

前期准备用已经把数据copy到f_invoice，并建好相关的表分区。

同样的sql

explain(analyse, timing)
SELECT count(*)
FROM (select *
      from ins_dw_prd12.f_invoice fi
      where fi.seller_taxer_code in ('91320200704046760T', '91340100149067617J', '91320214MA1YGE8F94')
        and fi.create_time >= '2019-08-01 00:00:00'
        and fi.create_time <= '2019-09-01 00:00:00') m
         INNER JOIN (select *
                     from ins_dw_prd12.f_invoice_item
                     where invoice_id in (SELECT fi.invoice_id
                                          FROM ins_dw_prd12.f_invoice fi
                                          WHERE fi.seller_taxer_code in
                                                ('91320200704046760T', '91340100149067617J', '91320214MA1YGE8F94')
                                            and fi.create_time >= '2019-08-01 00:00:00'
                                            and fi.create_time <= '2019-09-01 00:00:00')) fit
                    ON fit.invoice_id = m.invoice_id;

执行计划

Finalize Aggregate  (cost=2131432.05..2131432.06 rows=1 width=8) (actual time=7250.107..7250.107 rows=1 loops=1)
  ->  Gather  (cost=2131431.83..2131432.04 rows=2 width=8) (actual time=7250.098..7262.130 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Partial Aggregate  (cost=2130431.83..2130431.84 rows=1 width=8) (actual time=7241.645..7241.645 rows=1 loops=3)
              ->  Nested Loop  (cost=99443.16..2111203.69 rows=7691256 width=0) (actual time=6911.989..7235.295 rows=64423 loops=3)
                    Join Filter: (fi_2.invoice_id = f_invoice_item.invoice_id)
                    ->  Hash Join  (cost=99442.60..197345.88 rows=37876 width=16) (actual time=6911.949..7017.101 rows=59852 loops=3)
                          Hash Cond: (fi.invoice_id = fi_2.invoice_id)
                          ->  Parallel Append  (cost=0.43..96862.13 rows=75752 width=8) (actual time=1.115..62.620 rows=59852 loops=3)
                                ->  Parallel Index Scan using f_invoice_p201908_seller_taxer_code_idx on f_invoice_p201908 fi  (cost=0.43..96480.72 rows=75751 width=8) (actual time=1.112..54.402 rows=59852 loops=3)
                                      Index Cond: ((seller_taxer_code)::text = ANY ('{91320200704046760T,91340100149067617J,91320214MA1YGE8F94}'::text[]))
                                      Filter: ((create_time >= '2019-08-01 00:00:00'::timestamp without time zone) AND (create_time <= '2019-09-01 00:00:00'::timestamp without time zone))
                                ->  Parallel Index Scan using f_invoice_p201909_create_time_idx on f_invoice_p201909 fi_1  (cost=0.43..2.65 rows=1 width=8) (actual time=0.006..0.007 rows=0 loops=1)
                                      Index Cond: ((create_time >= '2019-08-01 00:00:00'::timestamp without time zone) AND (create_time <= '2019-09-01 00:00:00'::timestamp without time zone))
                                      Filter: ((seller_taxer_code)::text = ANY ('{91320200704046760T,91340100149067617J,91320214MA1YGE8F94}'::text[]))
                          ->  Hash  (cost=99439.67..99439.67 rows=200 width=8) (actual time=6910.770..6910.770 rows=179556 loops=3)
                                Buckets: 131072 (originally 1024)  Batches: 4 (originally 1)  Memory Usage: 3073kB
                                ->  HashAggregate  (cost=99437.67..99439.67 rows=200 width=8) (actual time=6835.596..6874.903 rows=179556 loops=3)
                                      Group Key: fi_2.invoice_id
                                      ->  Append  (cost=0.43..98983.16 rows=181803 width=8) (actual time=0.038..6746.552 rows=179556 loops=3)
                                            ->  Index Scan using f_invoice_p201908_seller_taxer_code_idx on f_invoice_p201908 fi_2  (cost=0.43..98071.49 rows=181802 width=8) (actual time=0.038..6718.573 rows=179556 loops=3)
                                                  Index Cond: ((seller_taxer_code)::text = ANY ('{91320200704046760T,91340100149067617J,91320214MA1YGE8F94}'::text[]))
                                                  Filter: ((create_time >= '2019-08-01 00:00:00'::timestamp without time zone) AND (create_time <= '2019-09-01 00:00:00'::timestamp without time zone))
                                            ->  Index Scan using f_invoice_p201909_create_time_idx on f_invoice_p201909 fi_3  (cost=0.43..2.65 rows=1 width=8) (actual time=0.016..0.017 rows=0 loops=3)
                                                  Index Cond: ((create_time >= '2019-08-01 00:00:00'::timestamp without time zone) AND (create_time <= '2019-09-01 00:00:00'::timestamp without time zone))
                                                  Filter: ((seller_taxer_code)::text = ANY ('{91320200704046760T,91340100149067617J,91320214MA1YGE8F94}'::text[]))
                    ->  Index Only Scan using f_invoice_item_invoice_id_idx on f_invoice_item  (cost=0.57..47.60 rows=234 width=8) (actual time=0.003..0.003 rows=1 loops=179556)
                          Index Cond: (invoice_id = fi.invoice_id)
                          Heap Fetches: 193269
Planning Time: 12.368 ms
Execution Time: 7264.656 ms

Workers Planned: 2，并行度只有2，可以明显看到只扫描了f_invoice_p201908和f_invoice_p201909上的索引，由于时间选的是08-01~09-01，所以f_invoice_p201909几乎不耗时，大部分时间在扫描f_invoice_p201908的索引。

Index Scan using f_invoice_p201908_seller_taxer_code_idx on f_invoice_p201908 fi_2  (cost=0.43..98071.49 rows=181802 width=8) (actual time=0.038..6718.573 rows=179556 loops=3)

理论上可以通过增加cpu核数和max_parallel_workers来提升分区表的效率。

后面用了不同月份尝试，并行度不提升（分区表在pg12上跟普通表性能差不多）

4、奇葩的发现

4.1、背景

1、数据量：

表名	数据量
f_invoice	87346130
f_invoice_item	97535867

2、索引：

表：f_invoice_item

CREATE INDEX f_invoice_item_order_item_id_idx ON ins_dw_prd12.f_invoice_item USING btree (order_item_id)
CREATE INDEX f_invoice_item_invoice_id_idx ON ins_dw_prd12.f_invoice_item USING btree (invoice_id) WITH (fillfactor='100')

表：f_invoice

CREATE INDEX idx_f_invoice_gin ON ins_dw_prd12.f_invoice USING gin (source_type, invoice_type, invoice_status, invoice_title, invoice_date, seller_taxer_code, shop_id, create_time)
CREATE INDEX idx_f_invoice_invoice_date ON ins_dw_prd12.f_invoice USING btree (invoice_date) WITH (fillfactor='100')
CREATE INDEX idx_f_invoice_seller_taxer_code ON ins_dw_prd12.f_invoice USING btree (seller_taxer_code) WITH (fillfactor='100')
CREATE INDEX idx_invoice_createtime_btree ON ins_dw_prd12.f_invoice USING btree (create_time) WITH (fillfactor='100')

4.2、奇葩

pg9.4和pg12的普通表

explain(analyse, timing)
SELECT count(*)
FROM (select *
      from ins_dw_prd12.f_invoice_2 fi
      where fi.seller_taxer_code in ('91320200704046760T', '91340100149067617J', '91320214MA1YGE8F94')
        and fi.create_time >= '2019-09-01 00:00:00'
        and fi.create_time <= '2019-10-01 00:00:00') m
         INNER JOIN ins_dw_prd12.f_invoice_item fit
                    ON fit.invoice_id = m.invoice_id;

执行计划：

Finalize Aggregate  (cost=2709869.86..2709869.87 rows=1 width=8) (actual time=83504.746..83504.746 rows=1 loops=1)
  ->  Gather  (cost=2709869.03..2709869.84 rows=8 width=8) (actual time=83488.900..83509.668 rows=9 loops=1)
        Workers Planned: 8
        Workers Launched: 8
        ->  Partial Aggregate  (cost=2708869.03..2708869.04 rows=1 width=8) (actual time=83478.769..83478.769 rows=1 loops=9)
              ->  Parallel Hash Join  (cost=163813.42..2708776.96 rows=36830 width=0) (actual time=58876.039..83475.920 rows=27042 loops=9)
                    Hash Cond: (fit.invoice_id = fi.invoice_id)
                    ->  Parallel Seq Scan on f_invoice_item fit  (cost=0.00..2512888.52 rows=12219052 width=8) (actual time=0.122..80316.204 rows=10854295 loops=9)
                    ->  Parallel Hash  (cost=163154.14..163154.14 rows=52743 width=8) (actual time=252.113..252.113 rows=24461 loops=9)
                          Buckets: 524288  Batches: 1  Memory Usage: 12864kB
                          ->  Parallel Index Scan using idx_invoice_createtime_btree on f_invoice_2 fi  (cost=0.57..163154.14 rows=52743 width=8) (actual time=0.225..239.003 rows=24461 loops=9)
                                Index Cond: ((create_time >= '2019-09-01 00:00:00'::timestamp without time zone) AND (create_time <= '2019-10-01 00:00:00'::timestamp without time zone))
                                Filter: ((seller_taxer_code)::text = ANY ('{91320200704046760T,91340100149067617J,91320214MA1YGE8F94}'::text[]))
                                Rows Removed by Filter: 235679
Planning Time: 0.298 ms
Execution Time: 83509.737 ms

pg12分区表

explain(analyse, timing)
SELECT count(*)
FROM (select *
      from ins_dw_prd12.f_invoice fi
      where fi.seller_taxer_code in ('91320200704046760T', '91340100149067617J', '91320214MA1YGE8F94')
        and fi.create_time >= '2019-09-01 00:00:00'
        and fi.create_time <= '2019-10-01 00:00:00') m
         INNER JOIN ins_dw_prd12.f_invoice_item fit
                    ON fit.invoice_id = m.invoice_id;

执行计划

Finalize Aggregate  (cost=4539009.89..4539009.90 rows=1 width=8) (actual time=4089.118..4089.118 rows=1 loops=1)
  ->  Gather  (cost=4539009.67..4539009.88 rows=2 width=8) (actual time=4088.801..4101.957 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Partial Aggregate  (cost=4538009.67..4538009.68 rows=1 width=8) (actual time=4085.523..4085.524 rows=1 loops=3)
              ->  Nested Loop  (cost=1.00..4483735.31 rows=21709746 width=0) (actual time=2.101..4076.649 rows=81125 loops=3)
                    ->  Parallel Append  (cost=0.43..106404.25 rows=92916 width=8) (actual time=1.383..2078.226 rows=73384 loops=3)
                          ->  Parallel Index Scan using f_invoice_p201909_seller_taxer_code_idx on f_invoice_p201909 fi  (cost=0.43..105937.02 rows=92915 width=8) (actual time=1.380..2066.570 rows=73384 loops=3)
                                Index Cond: ((seller_taxer_code)::text = ANY ('{91320200704046760T,91340100149067617J,91320214MA1YGE8F94}'::text[]))
                                Filter: ((create_time >= '2019-09-01 00:00:00'::timestamp without time zone) AND (create_time <= '2019-10-01 00:00:00'::timestamp without time zone))
                          ->  Parallel Index Scan using f_invoice_p201910_create_time_idx on f_invoice_p201910 fi_1  (cost=0.43..2.65 rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=1)
                                Index Cond: ((create_time >= '2019-09-01 00:00:00'::timestamp without time zone) AND (create_time <= '2019-10-01 00:00:00'::timestamp without time zone))
                                Filter: ((seller_taxer_code)::text = ANY ('{91320200704046760T,91340100149067617J,91320214MA1YGE8F94}'::text[]))
                    ->  Index Only Scan using f_invoice_item_invoice_id_idx on f_invoice_item fit  (cost=0.57..44.77 rows=234 width=8) (actual time=0.026..0.027 rows=1 loops=220151)
                          Index Cond: (invoice_id = fi.invoice_id)
                          Heap Fetches: 243376
Planning Time: 6.910 ms
Execution Time: 4102.010 ms

分析

理论上来讲：f_invoice的子查询join在关联字段上有索引（invoice_id）的大表f_invoice_item，会以f_invoice为驱动表，f_invoice_item会通过索引来扫描，但是事实是走了全表扫描。

但是在分区表中：并行扫描了f_invoice_p201909和f_invoice_p201910的索引后，再Parallel Append，最后跟f_invoice_item来Nested Loop。此时可以看到，f_invoice_item是走了Index Only Scan（只扫了索引，不用会表）

两者性能比较：pg12分区表：4102.010 ms，pg12普通表:83509.737 ms。

5、总体结论

可以通过升级pg12，购买8核的实例，利用pg12自带的并行算法来提升性能。预计可以提升30~50%，如果购买16核的实例，把并行度打开，引入分区表，估计还能提升更大。

thompsonGuo1

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
pg9.4 VS pg12大表join

pg9.4 VS pg12大表join1、环境1、从ECM生产财务库、TC库分别同步数据到pg9.4和pg12版本，保证表数据量一致。2、数据库配置比较数据库版本CPU核数内存最大IOPS存储类型9.48核16G8000本地SSD盘128核16G26800ESSD云盘3、由于ECM的导出大部分都是按照时间范围的大数据量导出，测试以发票为例，一个月的数据量。2、前期准备改原表名alter table f_invoice rename
复制链接

扫一扫