Postgresql进程卡住无法退出原因和解决方法

前言

Postgresql进程卡住无法退出怎么办?例如以下场景:

  • kill 进程:无效
  • kill -2/-12 进程:无效
  • pg_terminate_backend(pid):无效

1 模拟卡住

-- pg14下测试通过

cd `pg_config --libdir`/postgresql

cat << EOF > loop.c
#include "postgres.h"
#include "fmgr.h"
#include <unistd.h>
PG_MODULE_MAGIC;
PG_FUNCTION_INFO_V1(loop);
Datum loop(PG_FUNCTION_ARGS)
{
    /* an endless loop */
    while(1) 
    {
        // 注意这里没有加:CHECK_FOR_INTERRUPTS();
        sleep(2); 
    }
}
EOF

gcc -I /data01/bin/pg9000/include/postgresql/server -fPIC -shared -o loop.so loop.c

-- psql
CREATE FUNCTION loop() RETURNS void LANGUAGE c AS 'loop';

-- 卡住
postgres=# 
postgres=# select loop();

2 尝试解决失败的方法

psql执行ctrl+c失败

postgres=# select loop();
^CCancel request sent
^CCancel request sent

kill 失败/kill -2失败/kill -12失败

$ kill 11699
$ kill -2 11699
$ kill -12 11699

pg_terminate_backend失败

postgres=# select pg_cancel_backend(11699);
 pg_cancel_backend 
-------------------
 t
(1 row)

postgres=# select pg_terminate_backend(11699);
 pg_terminate_backend 
----------------------
 t
(1 row)

3 进程在干什么

stracp -p

$ strace -p 11699
Process 11699 attached
restart_syscall(<... resuming interrupted call ...>) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [URG], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], SA_RESTORER|SA_RESTART|SA_NOCLDSTOP, 0x7fbbe4fcc630}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [URG], NULL, 8) = 0
nanosleep({2, 0}, 
0x7fff60c0bb80)       = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [URG], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], SA_RESTORER|SA_RESTART|SA_NOCLDSTOP, 0x7fbbe4fcc630}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [URG], NULL, 8) = 0
nanosleep({2, 0}, 0x7fff60c0bb80)       = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [URG], 8) = 0

gstack

$ gstack 11699
#0  0x00007fbbe45a68d0 in __nanosleep_nocancel () from /lib64/libc.so.6
#1  0x00007fbbe45a6784 in sleep () from /lib64/libc.so.6
#2  0x00007fbbe585e145 in loop () from /data01/bin/pg9000/lib/postgresql/loop.so
#3  0x0000000000728b0f in ExecInterpExpr (state=0x2bf0100, econtext=0x2befe00, isnull=0x7fff60c0c037) at execExprInterp.c:725
#4  0x000000000072a965 in ExecInterpExprStillValid (state=0x2bf0100, econtext=0x2befe00, isNull=0x7fff60c0c037) at execExprInterp.c:1824
#5  0x000000000077eec1 in ExecEvalExprSwitchContext (state=0x2bf0100, econtext=0x2befe00, isNull=0x7fff60c0c037) at ../../../src/include/executor/executor.h:339
#6  0x000000000077ef2a in ExecProject (projInfo=0x2bf00f8) at ../../../src/include/executor/executor.h:373
#7  0x000000000077f118 in ExecResult (pstate=0x2befce8) at nodeResult.c:136
#8  0x000000000073eefa in ExecProcNodeFirst (node=0x2befce8) at execProcnode.c:463
#9  0x00000000007338aa in ExecProcNode (node=0x2befce8) at ../../../src/include/executor/executor.h:257
#10 0x0000000000736130 in ExecutePlan (estate=0x2befab0, planstate=0x2befce8, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x2bee790, execute_once=true) at execMain.c:1551
#11 0x0000000000733f15 in standard_ExecutorRun (queryDesc=0x2b2c850, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:361
#12 0x00007fbbe589ac5e in pgss_ExecutorRun (queryDesc=0x2b2c850, direction=ForwardScanDirection, count=0, execute_once=true) at pg_stat_statements.c:1003
#13 0x0000000000733d27 in ExecutorRun (queryDesc=0x2b2c850, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:303
#14 0x000000000097cd6d in PortalRunSelect (portal=0x2ba3f90, forward=true, count=0, dest=0x2bee790) at pquery.c:921
#15 0x000000000097ca2c in PortalRun (portal=0x2ba3f90, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x2bee790, altdest=0x2bee790, qc=0x7fff60c0c4e0) at pquery.c:765
#16 0x00000000009766d3 in exec_simple_query (query_string=0x2b064f0 "select loop();") at postgres.c:1213
#17 0x000000000097abf1 in PostgresMain (argc=1, argv=0x7fff60c0c770, dbname=0x2b32c00 "postgres", username=0x2b32bd8 "mingjiegao") at postgres.c:4494
#18 0x00000000008b6de6 in BackendRun (port=0x2b2b720) at postmaster.c:4530
#19 0x00000000008b6765 in BackendStartup (port=0x2b2b720) at postmaster.c:4252
#20 0x00000000008b2bdd in ServerLoop () at postmaster.c:1745
#21 0x00000000008b24af in PostmasterMain (argc=1, argv=0x2b000e0) at postmaster.c:1417
#22 0x00000000007b4d2b in main (argc=1, argv=0x2b000e0) at main.c:209

4 原因&解决方案

4.1 原因

原因是PG当前堆栈没有进入信号相应处理函数,一般就是进入死等堆栈了:

  • 一般能cacnel的堆栈:超时后检查中断,然后继续sleep
while
  sleep(timeout)
  CHECK_FOR_INTERRUPTS();
  • 死等堆栈
while
  sleep()

4.2 解决方案

注意:千万不要kill -9,SIGKILL没有信号处理函数,OS会直接停掉进程;PG父进程发现子进程异常退出,会停掉所有进程,释放共享内存,在重新申请共享内存,拉起所有进程。效果就等于异常重启,启动时肯定会需要时间redo,可能造成几分钟的停止服务。(除非后果可以接受,否则不要kill -9)

执行kill -2或kill -12后或者执行pg_terminate_backend后,主动调用信号处理函数,让PG正常退出。

$ gdb attach 11699
(gdb) p ProcessInterrupts()
[Inferior 1 (process 11699) exited with code 01]
The program being debugged exited while in a function called from GDB.
Evaluation of the expression containing the function
(ProcessInterrupts) will be abandoned.
(gdb) q

服务端效果

postgres=# 
postgres=# select loop();
^CCancel request sent
^CCancel request sent

(GDB调用后)

FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# \q

服务端日志:server没有其他报错、server没有重启

11699 [local] mingjiegao postgres 2022-08-19 09:41:43 UTC 42723STATEMENT:  CREATE FUNCTION loop() RETURNS void LANGUAGE c AS 'loop';
11699 [local] mingjiegao postgres 2022-08-19 09:42:24 UTC 00000LOG:  statement: select loop();
13083 [local] mingjiegao postgres 2022-08-19 09:45:39 UTC 00000LOG:  statement: select pg_cancel_backend(11699);
13083 [local] mingjiegao postgres 2022-08-19 09:45:47 UTC 00000LOG:  statement: select pg_terminate_backend(11699);
11699 [local] mingjiegao postgres 2022-08-19 09:52:44 UTC 57P01FATAL:  terminating connection due to administrator command
11699 [local] mingjiegao postgres 2022-08-19 09:52:44 UTC 57P01STATEMENT:  select loop();
  • 4
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

高铭杰

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值