postgresql大表join优化
一、背景
1、数据量:
| 表名 | 数据量 |
|---|---|
| f_invoice | 87346130 |
| f_invoice_item | 97535867 |
2、索引:
表:f_invoice_item
CREATE INDEX f_invoice_item_order_item_id_idx ON ins_dw_prd12.f_invoice_item USING btree (order_item_id) CREATE INDEX f_invoice_item_invoice_id_idx ON ins_dw_prd12.f_invoice_item USING btree (invoice_id) WITH (fillfactor='100')
表:f_invoice
CREATE INDEX idx_f_invoice_gin ON ins_dw_prd12.f_invoice USING gin (source_type, invoice_type, invoice_status, invoice_title, invoice_date, seller_taxer_code, shop_id, create_time) CREATE INDEX idx_f_invoice_invoice_date ON ins_dw_prd12.f_invoice USING btree (invoice_date) WITH (fillfactor='100') CREATE INDEX idx_f_invoice_seller_taxer_code ON ins_dw_prd12.f_invoice USING btree (seller_taxer_code) WITH (fillfactor='100') CREATE INDEX idx_invoice_createtime_btree ON ins_dw_prd12.f_invoice USING btree (create_time) WITH (fillfactor='100')
二 、优化前
sql:
explain(analyse, timing)
SELECT count(*)
from (SELECT fi.invoice_id
FROM ins_dw_prd12.f_invoice fi
WHERE (fi.seller_taxer_code in ('91320200704046760T', '91340100149067617J', '91320214MA1YGE8F94') and
fi.create_time >= '2020-01-01 00:00:00' and fi.create_time <= '2020-01-31 00:00:00')) AS mm
INNER JOIN ins_dw_prd12.f_invoice_item fit ON fit.invoice_id = mm.invoice_id
inner join ins_dw_prd12.f_invoice m on m.invoice_id = mm

本文探讨了在PostgreSQL中遇到的大表JOIN操作的性能问题。通过分析优化前的SQL执行计划,发现并行顺序扫描了表f_invoice_item,尽管存在索引f_invoice_item_invoice_id_idx,但并未使用。通过调整SQL和优化执行计划,最终将执行时间从85秒降低到2.4秒,实现了近40倍的性能提升。
最低0.47元/天 解锁文章
977

被折叠的 条评论
为什么被折叠?



