问题现象
业务昨晚对sql进行了更新,之前没有DATE_CREATED字段(是分区键),跑的很快。发版后加了分区字段,本身是为了减少分区数量的访问,但是加了以后UPDATE执行却变慢了。
before:
update TABLE_RECORD
set IS_DELETED = '1', DATE_UPDATED = LOCALTIMESTAMP(0)
WHERE APPL_NO = $1
AND IS_DELETED = '0'
after:
update TABLE_RECORD
set IS_DELETED = '1', DATE_UPDATED = LOCALTIMESTAMP(0)
WHERE APPL_NO = $1
AND IS_DELETED = '0'
AND DATE_CREATED > now() - interval '31' day
AND DATE_CREATED < now()
发版前访问时间是毫秒级,发版后访问时间是10s,sql运行频繁,业务不可接受。
问题分析
执行计划看上去是正确的
表结构如下
# \d+ TABLE_RECORD
Partitioned table "public.TABLE_RECORD"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------------------------------+-----------------------------+-----------+----------+---------------------------------------------------+----------+--------------+--------------------------
id_TABLE_RECORD | character varying(32) | | not null | nextval('seq_TABLE_RECORD'::regclass) | extended | |
appl_no | character varying(100) | | | | extended | |
r_appl_no | character varying(100) | | | | extended | |
...
created_by | character varying(100) | | not null | 'sys'::character varying | extended | |
date_created | timestamp without time zone | | not null | now() | plain | |
updated_by | character varying(100) | | not null | 'sys'::character varying | extended | |
date_updated | timestamp without time zone | | not null | now() | plain | |
Partition key: RANGE (date_created)
Indexes:
"date_TABLE_RECORD" btree (date_created)
"idx_dateupdated" btree (date_updated)
"idx_applnodeleted" btree (appl_no, is_deleted)
"nk_TABLE_RECORD" btree (appl_no)
Partitions: TABLE_RECORD_202211 FOR VALUES FROM ('2022-11-01 00:00:00') TO ('2022-12-01 00:00:00'),
...
TABLE_RECORD_202303 FOR VALUES FROM ('2023-03-01 00:00:00') TO ('2023-04-01 00:00:00'),
TABLE_RECORD_202304 FOR VALUES FROM ('2023-04-01 00:00:00') TO ('2023-05-01 00:00:00'),
TABLE_RECORD_202305 FOR VALUES FROM ('2023-05-01 00:00:00') TO ('2023-06-01 00:00:00'),
TABLE_RECORD_202306 FOR VALUES FROM ('2023-06-01 00:00:00') TO ('2023-07-01 00:00:00'),
...
TABLE_RECORD_202512 FOR VALUES FROM ('2025-12-01 00:00:00') TO ('2026-01-01 00:00:00'),
TABLE_RECORD_other DEFAULT
这个sql会访问近2个月的分区,这2个分区都是有数据的,上面的update只会更新一条数据。
刚开始分析问题非常困惑,因为我们explain的时候,执行计划是没有问题的
explain分区扫描信息:
-> Index Scan using TABLE_RECORD_202302_date_created_idx on TABLE_RECORD_202302 TABLE_RECORD_4 (cost=0.44..5.47 rows=1 width=485)
Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
-> Index Scan using TABLE_RECORD_202303_date_created_idx on TABLE_RECORD_202303 TABLE_RECORD_5 (cost=0.44..5.47 rows=1 width=482)
Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
-> Index Scan using TABLE_RECORD_202304_date_created_idx on TABLE_RECORD_202304 TABLE_RECORD_6 (cost=0.44..5.47 rows=1 width=481)
Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
-> Index Scan using idx_applnodeleted_25 on TABLE_RECORD_202305 TABLE_RECORD_7 (cost=0.43..30.49 rows=1 width=483)
Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
-> Index Scan using idx_applnodeleted_14 on TABLE_RECORD_202306 TABLE_RECORD_8 (cost=0.56..45.11 rows=18 width=485)
Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
-> Index Scan using idx_applnodeleted_38 on TABLE_RECORD_202307 TABLE_RECORD_9 (cost=0.14..5.17 rows=1 width=3502)
Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
-> Index Scan using idx_applnodeleted_1 on TABLE_RECORD_202308 TABLE_RECORD_10 (cost=0.14..5.17 rows=1 width=3502)
分区数据分布:
select count(*),tableoid::regclass from TABLE_RECORD group by 2;
count | tableoid
-------+---------------------------------
56558 | TABLE_RECORD_202303
4436 | TABLE_RECORD_202211
6929 | TABLE_RECORD_202306
945 | TABLE_RECORD_202305
1413 | TABLE_RECORD_202304
5499 | TABLE_RECORD_202212
1486 | TABLE_RECORD_202301
4722 | TABLE_RECORD_202302
执行计划看上去对不同的分区访问了不同的索引
date_TABLE_RECORD:分区键上的索引
idx_applnodeleted:appl_no, is_deleted组合索引
实际上sql可以通过DATE_CREATED近31天数据字段裁剪分区。但是如果在用这个字段的索引,那就没有任何过滤性了。而appl_no, is_deleted在分区中过滤性较好,真正的执行计划应该选择 idx_applnodeleted组合索引。
上面的explain执行计划虽然不是真正的执行计划,但是能看到5、6月份的走的索引是没有问题的,走的是appl_no, is_deleted组合索引。
查看真实的执行计划会执行sql,所以把update改成select,查看真实的执行计划
# explain (analyze,buffers,timing,verbose) select count(*) from TABLE_RECORD WHERE APPL_NO = 'LZLMATH20230132302302' AND IS_DELETED = '0' AND DATE_CREATED > now() - interval '31' day AND DATE_CREATED < now() ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=266.09..266.10 rows=1 width=8) (actual time=0.565..0.566 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=48
-> Append (cost=0.14..265.95 rows=56 width=0) (actual time=0.388..0.558 rows=1 loops=1)
Buffers: shared hit=48
Subplans Removed: 37
-> Index Scan using idx_applnodeleted_25 on public.TABLE_RECORD_202305 TABLE_RECORD_1 (cost=0.43..30.39 rows=2 width=0) (actual time=0.059..0.059 rows=0 loops=1)
Index Cond: (((TABLE_RECORD_1.appl_no)::text = 'LZLMATH20230132302302'::text) AND ((TABLE_RECORD_1.is_deleted)::text = '0'::text))
Filter: ((TABLE_RECORD_1.date_created < now()) AND (TABLE_RECORD_1.date_created > (now() - '31 days'::interval day)))
Buffers: shared hit=3
-> Index Scan using idx_applnodeleted_14 on public.TABLE_RECORD_202306 TABLE_RECORD_2 (cost=0.56..42.52 rows=17 width=0) (actual time=0.328..0.498 rows=1 loops=1)
Index Cond: (((TABLE_RECORD_2.appl_no)::text = 'LZLMATH20230132302302'::text) AND ((TABLE_RECORD_2.is_deleted)::text = '0'::text))
Filter: ((TABLE_RECORD_2.date_created < now()) AND (TABLE_RECORD_2.date_created > (now() - '31 days'::interval day)))
Buffers: shared hit=45
Planning:
Buffers: shared hit=5867
Planning Time: 17.195 ms
Execution Time: 0.654 ms
(18 rows)
select只访问了5月和6月两个分区,说明分区裁剪没问题;两个分区都是idx_applnodeleted索引,所以索引选择也没问题。
直接执行select语句,执行时间是ms级。
# select count(*) from TABLE_RECORD WHERE APPL_NO = 'LZLMATH20230132302302' AND IS_DELETED = '0' AND DATE_CREATED > now() - interval '31' day AND DATE_CREATED < now() ;
count
-------
1
(1 row)
Time: 4.946 ms
分析到这里,看上去执行计划正常,执行时间正常
业务sql仍然缓慢
但是,postgresql日志中仍然能抓到慢sql,update执行了10s
2023-06-29 11:06:45.077 CST,"lzldbopr","lzldb",116286,"30.88.78.90:51871",649cdebf.1c63e,7,"UPDATE",2023-06-29 09:30:39 CST,759/12440291,4002354803,LOG,00000,"duration: 10287.105 ms " plan:
Query Text: update TABLE_RECORD
set IS_DELETED = '1', DATE_UPDATED = LOCALTIMESTAMP(0)
WHERE APPL_NO = $1
AND IS_DELETED = '0'
AND DATE_CREATED > now() - interval '31' day
AND DATE_CREATED < now()
Update on TABLE_RECORD (cost=0.14..203.79 rows=39 width=2960)
Update on TABLE_RECORD_202211 TABLE_RECORD_1
...
-> Index Scan using TABLE_RECORD_202304_date_created_idx on TABLE_RECORD_202304 TABLE_RECORD_6 (cost=0.44..5.47 rows=1 width=481)
Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
-> Index Scan using TABLE_RECORD_202305_date_created_idx on TABLE_RECORD_202305 TABLE_RECORD_7 (cost=0.44..5.47 rows=1 width=483)
Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
-> Index Scan using TABLE_RECORD_202306_date_created_idx on TABLE_RECORD_202306 TABLE_RECORD_8 (cost=0.44..5.47 rows=1 width=485)
Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
-> Index Scan using idx_applnodeleted_38 on TABLE_RECORD_202307 TABLE_RECORD_9 (cost=0.14..5.17 rows=1 width=3502)
Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
...
5、6月分区还是用的date_created分区键上的索引,执行计划只估算了1行,实际上这2个分区实际上都有百万行数据。
分析到这里就非常困惑,优化器本身可以走到更好的索引,explain也看到走到了那个索引,但是业务sql就是没有走正确的索引。
更新统计信息
因为是pgsql执行计划的问题,首先都会想到收集统计信息
问题发生后,收集了分区表父表和子表的统计信息,并且担心会话缓存了执行计划(plan_cache_mode=auto),将统计信息收集时间之前的会话全部kill。
日志仍然显示sql执行10s,说明不是统计信息的问题。
做到这里问题仍然没有解决,似乎已经黔驴技穷了。
问题根因
前面分析执行计划的时候,dba explain的执行计划与应用的执行计划不一致,不过我们都是用postgres超级用户执行的。切换为应用用户再次explain执行,执行计划结果与日志中的执行计划一致!
因为之前出现过原生分区表权限问题,导致执行计划异常的情况。所以立即检查分区表分区的权限
父表权限
# \dp+ TABLE_RECORD
Access privileges
Schema | Name | Type | Access privileges | Column privileges | Policies
--------+--------------------------+-------------------+-------------------------------------+-------------------+----------
public | TABLE_RECORD | partitioned table | lzldbdata=arwdDxt/lzldbdata +| |
| | | r_lzldbdata_qry=r/lzldbdata +| |
| | | r_lzldbdata_dml=arwd/lzldbdata +| |
(1 row)
子表权限
# \dp+ TABLE_RECORD_202505
Access privileges
Schema | Name | Type | Access privileges | Column privileges | Policies
--------+---------------------------------+-------+------------------------------------+-------------------+----------
public | TABLE_RECORD_202505 | table | lzldbdata=arwdDxt/lzldbdata +| |
分区权限少了r_lzldbdata_dml这个角色的,这个角色是授权给业务用户的。
立即执行授权,问题解决
grant select,update,delete,insert on TABLE_RECORD_202305 to r_lzldbdata_dml;
grant select,update,delete,insert on TABLE_RECORD_202306 to r_lzldbdata_dml;
再次用opr用户explain,执行计划正确,5、6月走了正常的索引
\c - lzldbopr
-> Index Scan using TABLE_RECORD_202303_date_created_idx on TABLE_RECORD_202303 TABLE_RECORD_5 (cost=0.44..5.47 rows=1 width=482)
Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
-> Index Scan using TABLE_RECORD_202304_date_created_idx on TABLE_RECORD_202304 TABLE_RECORD_6 (cost=0.44..5.47 rows=1 width=481)
Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
-> Index Scan using idx_applnodeleted_25 on TABLE_RECORD_202305 TABLE_RECORD_7 (cost=0.43..30.39 rows=1 width=483)
Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
-> Index Scan using idx_applnodeleted_14 on TABLE_RECORD_202306 TABLE_RECORD_8 (cost=0.56..42.57 rows=17 width=485)
Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
-> Index Scan using idx_applnodeleted_38 on TABLE_RECORD_202307 TABLE_RECORD_9 (cost=0.14..5.17 rows=1 width=3502)
Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
-> Index Scan using idx_applnodeleted_1 on TABLE_RECORD_202308 TABLE_RECORD_10 (cost=0.14..5.17 rows=1 width=3502)
Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
pg日志上也未再观察到慢update语句。
测试(没复现)
初始化建表脚本
--切换非超级用户
\c - lzldbdata
--create table
CREATE TABLE PUBLIC.LZLPARTITION
(
APPL_NO varchar(100) NULL,
IS_DELETED varchar(8) NULL,
DATE_CREATED timestamp NOT NULL DEFAULT now(),
DATE_UPDATED timestamp NOT NULL DEFAULT now()
) PARTITION BY RANGE(DATE_CREATED);
--indexes
create index DATE_LZLPARTITION on PUBLIC.LZLPARTITION (DATE_CREATED);
create index NK_LZLPARTITION on PUBLIC.LZLPARTITION (APPL_NO);
--privs
GRANT SELECT ON TABLE public.LZLPARTITION TO r_lzldbdata_qry;
GRANT SELECT,INSERT,UPDATE,DELETE ON TABLE public.LZLPARTITION TO r_lzldbdata_dml;
--partition
create table LZLPARTITION_202301 partition of LZLPARTITION for values from ('2023-01-01 00:00:00') to ('2023-02-01 00:00:00');
create table LZLPARTITION_202302 partition of LZLPARTITION for values from ('2023-02-01 00:00:00') to ('2023-03-01 00:00:00');
create table LZLPARTITION_202303 partition of LZLPARTITION for values from ('2023-03-01 00:00:00') to ('2023-04-01 00:00:00');
create table LZLPARTITION_202304 partition of LZLPARTITION for values from ('2023-04-01 00:00:00') to ('2023-05-01 00:00:00');
create table LZLPARTITION_202305 partition of LZLPARTITION for values from ('2023-05-01 00:00:00') to ('2023-06-01 00:00:00');
create table LZLPARTITION_202306 partition of LZLPARTITION for values from ('2023-06-01 00:00:00') to ('2023-07-01 00:00:00');
生成数据:
insert
into
public.LZLPARTITION
select
n + 10,
'N',
to_char(to_date('2023-01-01', 'YYYY-MM-DD') + ('' || n || ' minute') ::interval, 'YYYY-MM-DD')::"date",
now()
from
generate_series(0, 300000) n
数据分布:
select count(*),tableoid::regclass from lzlpartition group by 2;
count | tableoid
-------+---------------------
44640 | lzlpartition_202301
40320 | lzlpartition_202302
44640 | lzlpartition_202303
43200 | lzlpartition_202304
44640 | lzlpartition_202305
43200 | lzlpartition_202306
39361 | lzlpartition_202307
权限未继承:
# \dp+ lzlpartition
Access privileges
Schema | Name | Type | Access privileges | Column privileges | Policies
--------+--------------+-------------------+-------------------------------------+-------------------+----------
public | lzlpartition | partitioned table | lzldbdata=arwdDxt/lzldbdata +| |
| | | r_lzldbdata_qry=r/lzldbdata +| |
| | | r_lzldbdata_dml=arwd/lzldbdata | |
# \dp+ lzlpartition_202306
Access privileges
Schema | Name | Type | Access privileges | Column privileges | Policies
--------+---------------------+-------+-------------------+-------------------+----------
public | lzlpartition_202306 | table | | |
执行计划(正确):
explain select count(*) from lzlpartition WHERE APPL_NO = '217450' AND IS_DELETED = 'N' AND DATE_CREATED > now() - interval '31' day AND DATE_CREATED < now();
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=36.76..36.77 rows=1 width=8)
-> Append (cost=0.29..36.74 rows=7 width=0)
Subplans Removed: 5
-> Index Scan using lzlpartition_202305_appl_no_idx on lzlpartition_202305 lzlpartition_1 (cost=0.15..5.19 rows=1 width=0)
Index Cond: ((appl_no)::text = '217450'::text)
Filter: (((is_deleted)::text = '0'::text) AND (date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
-> Index Scan using lzlpartition_202306_appl_no_idx on lzlpartition_202306 lzlpartition_2 (cost=0.15..5.19 rows=1 width=0)
Index Cond: ((appl_no)::text = '217450'::text)
Filter: (((is_deleted)::text = '0'::text) AND (date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
权限仍然没有继承,其实测了其他pg版本也是这样的,看上去是一个通用行为。
但是就算这样也没有复现出来,测试结果走了正确的索引,没有像生产环境那样走错误的索引。
问题总结
因为我们收集了统计信息并kill了会话,不应该是缓存执行计划的问题。执行了grant后,分区执行计划立即正确(甚至grant一个分区,对一个分区),所以比较确认是分区权限导致分区执行计划异常的。分析和处理问题的过程总结如下
- 切换应用用户查看执行计划。直接用超级用户登陆查看执行计划是通用做法,但是查看到的执行计划还是不对的。
- 分区表子表的权限问题。根因是postgresql分区表子表上的权限与父表不一致,导致执行计划异常,也就是说权限问题影响了pg的执行计划。
- 该问题不好复现,非常非常偶发。
- 权限导致执行计划异常这个问题非常隐蔽,不好定位。
想深入讨论2个问题
- 权限问题不应该影响执行计划。为什么权限会影响执行计划?
- 子表权限与父表不一致。为什么子表没有全部继承父表权限?
已提交BUG看看官方咋说