PostgreSQL CPU 100%问题排查

前 言

一般来说, CPU 问题是由 低效SQL引发, 而造成低效SQL 的原因有:

  1. SQL 写法不够优化

  2. 表的数据量发生了变化, 但统计信息未及时更新

  3. 数据库本身的bug

分析解决

为了进一步调查这个问题, 请参考如下建议:

请参考文档 【1】其中提到的排查高 CPU 利用率的方法。比如:
本机 PostgreSQL 视图和目录(例如 pg_stat_statements、pg_stat_activity 和 pg_stat_user_tables)查看数据库级详细信息。有关更多信息,请参阅有关监控数据库活动和 pg_stat_statements 的 PostgreSQL 文档。

PostgreSQL 提供各种日志记录参数来记录长时间运行的查询、autovacuum、锁定等待以及连接和断开连接请求。有关更多信息,请参阅如何使用 Amazon RDS for PostgreSQL 启用查询日志记录?

确定原因后,可以使用以下方法进一步降低 CPU 使用:

如果有机会进行调整,请使用 EXPLAIN 和 EXPLAIN ANALYZE 来识别警告。有关更多信息,请参阅有关 EXPLAIN 的 PostgreSQL 文档。

有用的查询

以下是一些有用的查询, 请在您的环境中运行, 看是否有慢查询等

1. 查看当前活跃的DB session 正在运行的SQL语句(运行时间超过10秒)

SELECT now() - query_start as "runtime", usename,application_name, client_hostname, datname,  state, query
	    FROM  pg_stat_activity
	    WHERE now() - query_start > '10 seconds'::interval
	       and state!='idle'
	   ORDER BY runtime DESC;

2. 按 total_time 列出查询,并查看哪个查询在数据库中花费的时间最多

SELECT round(total_time*1000)/1000 AS total_time,query
	FROM pg_stat_statements
	ORDER BY total_time DESC limit 5;

3. 查看哪些table未及时做vacuum,以及未及时收集统计信息

SELECT relname, n_live_tup, n_dead_tup, trunc(100*n_dead_tup/(n_live_tup+1))::float "ratio%",
	to_char(last_autovacuum, 'YYYY-MM-DD HH24:MI:SS') as autovacuum_date,
	to_char(last_autoanalyze, 'YYYY-MM-DD HH24:MI:SS') as autoanalyze_date
	FROM pg_stat_all_tables
	ORDER BY last_autovacuum;

4. 查看有没有被锁的session

select pid,
	usename,
	pg_blocking_pids(pid) as blocked_by,
	query as blocked_query
	from pg_stat_activity
	where cardinality(pg_blocking_pids(pid)) > 0;

5. 推荐创建如下session

pg_stat_tables
CREATE OR REPLACE VIEW pg_stat_tables
	AS
	WITH s AS (
	SELECT *, cast((n_tup_ins + n_tup_upd + n_tup_del) AS numeric) AS total
	       FROM pg_stat_user_tables
	)
	SELECT s.schemaname,       s.relname,       s.relid,
	       s.seq_scan,         s.idx_scan,
	       CASE WHEN s.seq_scan + s.idx_scan = 0 THEN 'NaN'::double precision
	            ELSE round(100 * s.idx_scan/(s.seq_scan+s.idx_scan),2)  END AS idx_scan_ratio,

	       s.seq_tup_read,       s.idx_tup_fetch,

	       sio.heap_blks_read,       sio.heap_blks_hit,
	       CASE WHEN sio.heap_blks_read = 0 THEN 0.00
	            ELSE round(100*sio.heap_blks_hit/(sio.heap_blks_read+sio.heap_blks_hit),2)  END AS hit_ratio,

	       n_tup_ins,       n_tup_upd,       n_tup_del,
	       CASE WHEN s.total = 0 THEN 0.00
	            ELSE round((100*cast(s.n_tup_ins AS numeric)/s.total) ,2) END AS ins_ratio,
	       CASE WHEN s.total = 0 THEN 0.00
	            ELSE round((100*cast(s.n_tup_upd AS numeric)/s.total) ,2) END AS upd_ratio,
	       CASE WHEN s.total = 0 THEN 0.00
	            ELSE round((100*cast(s.n_tup_del AS numeric)/s.total) ,2) END AS del_ratio,

	       s.n_tup_hot_upd,
	       CASE WHEN s.n_tup_upd = 0 THEN 'NaN'::double precision
	            ELSE round(100*cast(cast(n_tup_hot_upd as numeric)/n_tup_upd as numeric), 2) END AS hot_upd_ratio,

	       pg_size_pretty(pg_relation_size(sio.relid)) AS "table_size",
	       pg_size_pretty(pg_total_relation_size(sio.relid)) AS "total_size",

	       s.last_vacuum,       s.last_autovacuum,
	       s.vacuum_count,      s.autovacuum_count,
	       s.last_analyze,      s.last_autoanalyze,
	       s.analyze_count,     s.autoanalyze_count
	FROM s, pg_statio_user_tables AS sio WHERE s.relid = sio.relid ORDER BY relname;
pg_stat_indexes
AS
	SELECT s.schemaname,       s.relname,       s.indexrelname,       s.relid,
	       s.idx_scan,       s.idx_tup_read,       s.idx_tup_fetch,
	       sio.idx_blks_read,       sio.idx_blks_hit,
	       CASE WHEN sio.idx_blks_read  + sio.idx_blks_hit = 0 THEN 'NaN'::double precision
	       ELSE round(100 * sio.idx_blks_hit/(sio.idx_blks_read + sio.idx_blks_hit), 2) END AS idx_hit_ratio,
	       pg_size_pretty(pg_relation_size(s.indexrelid)) AS "index_size"
	FROM pg_stat_user_indexes AS s, pg_statio_user_indexes AS sio
	WHERE s.relid = sio.relid ORDER BY relname;
pg_stat_users
CREATE OR REPLACE VIEW pg_stat_users
	AS
	SELECT datname,       usename,       pid,       backend_start, 
	       (current_timestamp - backend_start)::interval(3) AS "login_time"
	FROM pg_stat_activity;
pg_stat_queries
	CREATE OR REPLACE VIEW pg_stat_queries 
	AS
	SELECT datname,       usename,       pid,
	       (current_timestamp - xact_start)::interval(3) AS duration, 
	       waiting,       query
	FROM pg_stat_activity WHERE pid != pg_backend_pid();
pg_stat_long_trx
CREATE OR REPLACE VIEW pg_stat_long_trx 
	AS
	SELECT pid,        waiting,
	    (current_timestamp - xact_start)::interval(3) AS duration, query
	FROM pg_stat_activity
	WHERE pid <> pg_backend_pid();
pg_stat_waiting_locks
CREATE OR REPLACE VIEW pg_stat_waiting_locks
	AS
	SELECT l.locktype,       c.relname,       l.pid,       l.mode,
	       substring(a.query, 1, 6) AS query,
	       (current_timestamp - xact_start)::interval(3) AS duration
	FROM pg_locks AS l
	  LEFT OUTER JOIN pg_stat_activity AS a ON l.pid = a.pid
	  LEFT OUTER JOIN pg_class AS c ON l.relation = c.oid 
	WHERE  NOT l.granted ORDER BY l.pid;

4. 请启用慢日志查询

请依据文档【2】 启用如下两个参数, 以将慢查询记录到日志中, 方便进行分析:

可以修改以下参数,使日志记录的更详细,并将日志保存至 cloudwatch 中,这样将有利于您更好的发现问题。

log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on
log_temp_files = 0
log_autovacuum_min_duration = 0
log_statement = 1
log_min_duration_statement = 200

[1] 如何排查 Amazon RDS 或 Amazon Aurora PostgreSQL 的高 CPU 利用率? - https://aws.amazon.com/cn/premiumsupport/knowledge-center/rds-aurora-postgresql-high-cpu/?nc1=h_ls

[2] How do I enable query logging using Amazon RDS for PostgreSQL? - https://aws.amazon.com/premiumsupport/knowledge-center/rds-postgresql-query-logging/

  • 1
    点赞
  • 18
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
要查看PostgreSQLCPU和内存使用情况,可以使用以下方法: 1. 查看CPU占用:使用top命令查看CPU占用情况。可以通过执行命令"ps aux | head -1; ps aux | grep -v PID | sort -rn -k 3 | head"来查看每个进程的CPU占用情况,并找到与PostgreSQL相关的进程。 2. 查询执行的SQL:通过执行命令"su - postgres psql -c 'SELECT procpid, START, now()-START AS lap, current_query FROM (SELECT backendid, pg_stat_get_backend_pid(S.backendid) AS procpid, pg_stat_get_backend_activity_start(S.backendid) AS START, pg_stat_get_backend_activity(S.backendid) AS current_query FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS S) AS S WHERE current_query <> ''<IDLE>'' ORDER BY lap DESC;'"来查找执行SQL的进程。 3. 查看执行计划:在定位到具体的SQL之后,可以使用"EXPLAIN SQL"命令来查看执行计划。 关于内存计算(In-Memory Computing),它是指CPU直接从内存而非硬盘上读取数据,并在内存中对数据进行计算和分析。引入内存计算技术可以消除磁盘IO的消耗,并利用内存随机访问的特性来实现更高效的算法。在数据库中引入内存计算技术可以提供更快的查询和处理速度,同时降低了数据访问的延迟。<span class="em">1</span><span class="em">2</span> #### 引用[.reference_title] - *1* [PostgreSQL 数据库运维问题 查看数据库进程 查看CPU占用过高的SQL](https://blog.csdn.net/qq_35260875/article/details/115829218)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] - *2* [基于PostgreSQL的内存计算引擎,来自Lenovo的设计开发经验](https://download.csdn.net/download/weixin_38636655/15465769)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值