一、背景介绍
鉴于安全考虑,数据库的superuser一般来说是只掌握在DBA同学手中。
最近多次碰到这样的场景,业务同学晚上发版修改表结构,吐槽表被系统进程锁住了,排查后发现是autovacuum worker阻塞的。经常需要爬起来支持,同时也阻塞了发版效率。
那能不能将autovacuum worker的terminate权限给nonsuper user呢?可以方便业务账号直接处理被autovacuum worker阻塞的情况。
业余时间研究了下,将pg_terminate_backend()做了一点小改动,支持了nonsuper user kill autovacuum worker的能力
二、代码分析
先来看下pg_terminate_backend ()函数原生的逻辑
/*
* Signal to terminate a backend process. This is allowed if you are a member
* of the role whose process is being terminated.
*
* Note that only superusers can signal superuser-owned processes.
*/
Datum
pg_terminate_backend(PG_FUNCTION_ARGS)
{
int r = pg_signal_backend(PG_GETARG_INT32(0), SIGTERM);
if (r == SIGNAL_BACKEND_NOSUPERUSER)
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
(errmsg("must be a superuser to terminate superuser process"))));
if (r == SIGNAL_BACKEND_NOPERMISSION)
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
(errmsg("must be a member of the role whose process is being terminated or member of pg_signal_backend"))));
PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
}
主要是调用pg_signal_backend(PG_GETARG_INT32(0), SIGTERM)向进程发送SIGTERM信号,即信号15。
在pg_signal_backend函数中:
1、 首先调用BackendPidGetProc函数获取Backendpid list,这里是从allProc中选取backend pid list,循环匹配我们传入的需要terminate的pid,匹配不到则返回:PID %d is not a PostgreSQL server process;
2、 判断我们要terminate的pid的proc->roleid是否是superuser,并且我们操作的用户是否有superuser权限。不满足条件的话返回:must be a superuser to terminate superuser process;
3、 这里先判断当前执行terminate的用户,是否是proc->roleId的member; 然后判断用户是否有pg_signal_backend权限;这里不满足条件的话返回:
must be a member of the role whose process is being terminated or member of pg_signal_backend
pg原生的设计,nonsuperuser用户terminate autovacuum worker时,会走第3处逻辑。那我们可以在这里进行修改,实现nonsuper user kill autovacuum worker的功能。
4、 调用kill接口向pid发送信号
三、方案设计
我们修改代码的原则是应该尽量保留原生的逻辑,可以考虑加入一个控制参数,默认关闭状态,走原生逻辑;打开参数后,开启新功能。
新增一个bool类型的控制参数enable_nonsuper_kill_autovacuumworker,默认值为off,可以通过alter system的方式修改,重载配置文件生效。
postgres=> select * from pg_settings where name like '%enable_nonsuper_kill_autovacuumworker%';
-[ RECORD 1 ]---+--------------------------------------------------------------------------
name | enable_nonsuper_kill_autovacuumworker
setting | off
unit |
category | Preset Options
short_desc | Whether nonsuperuser could kill autovacuum worker process, default false.
extra_desc |
context | sighup
vartype | bool
source | default
min_val |
max_val |
enumvals |
boot_val | off
reset_val | off
sourcefile |
sourceline |
pending_restart | f
postgres=>
修改pg_signal_backend函数的逻辑如下:
static int
pg_signal_backend(int pid, int sig)
{
PGPROC *proc = BackendPidGetProc(pid);
/*
* BackendPidGetProc returns NULL if the pid isn't valid; but by the time
* we reach kill(), a process for which we get a valid proc here might
* have terminated on its own. There's no way to acquire a lock on an
* arbitrary process to prevent that. But since so far all the callers of
* this mechanism involve some request for ending the process anyway, that
* it might end on its own first is not a problem.
*/
if (proc == NULL)
{
/*
* This is just a warning so a loop-through-resultset will not abort
* if one backend terminated on its own during the run.
*/
ereport(WARNING,
(errmsg("PID %d is not a PostgreSQL server process", pid)));
return SIGNAL_BACKEND_ERROR;
}
/* Only allow superusers to signal superuser-owned backends. */
if (superuser_arg(proc->roleId) && !superuser())
return SIGNAL_BACKEND_NOSUPERUSER;
/* Users can signal backends they have role membership in. */
if (!has_privs_of_role(GetUserId(), proc->roleId) &&
!has_privs_of_role(GetUserId(), DEFAULT_ROLE_SIGNAL_BACKENDID))
{ /* Modify by Nickxyang at 2021-11-28 */
/* 这里先判断enable_nonsuper_kill_autovacuumworker 参数是否开启,未开启则走原生逻辑return SIGNAL_BACKEND_NOPERMISSION; */
if (!enable_nonsuper_kill_autovacuumworker)
return SIGNAL_BACKEND_NOPERMISSION;
/* 这里获取进程对应的结构体信息 */
LocalPgBackendStatus *local_beentry;
PgBackendStatus *beentry;
local_beentry = pgstat_fetch_stat_local_beentry(proc->backendId);
/* 如果获取到的信息为空,说明进程已经不存在?也就不用terminate了,直接return SIGNAL_BACKEND_NOPERMISSION就行 */
if (!local_beentry)
return SIGNAL_BACKEND_NOPERMISSION;
beentry = &local_beentry->backendStatus;
/* 如果获取的进程类型不是B_AUTOVAC_WORKER,那就return SIGNAL_BACKEND_NOPERMISSION
* 也就是说,只有terminate autovacuum worker进程,才会顺利走到下边kill发送信号的逻辑,
* 如果是其它系统进程一律报错无权限
*/
if (beentry->st_backendType != B_AUTOVAC_WORKER)
{
return SIGNAL_BACKEND_NOPERMISSION;
}
/* End at 2021-11-28 */
}
/*
* Can the process we just validated above end, followed by the pid being
* recycled for a new process, before reaching here? Then we'd be trying
* to kill the wrong thing. Seems near impossible when sequential pid
* assignment and wraparound is used. Perhaps it could happen on a system
* where pid re-use is randomized. That race condition possibility seems
* too unlikely to worry about.
*/
/* If we have setsid(), signal the backend's whole process group */
#ifdef HAVE_SETSID
if (kill(-pid, sig))
#else
if (kill(pid, sig))
#endif
{
/* Again, just a warning to allow loops */
ereport(WARNING,
(errmsg("could not send signal to process %d: %m", pid)));
return SIGNAL_BACKEND_ERROR;
}
return SIGNAL_BACKEND_SUCCESS;
}
同时还要修改下pg_stat_progress_vacuum视图的逻辑,因为原生逻辑里nonsuper user无权查看正在进行的vacuum的全部信息,只能看到pid,datid,dataname字段;
修改这里的逻辑,当enable_nonsuper_kill_autovacuumworker参数打开时,可以看到pg_stat_progress_vacuum所有字段信息,根据relid即表oid可以方便terminate对应的 worker进程;参数关闭时保持原有逻辑只显示默认字段。
Datum
pg_stat_get_progress_info(PG_FUNCTION_ARGS)
{
#define PG_STAT_GET_PROGRESS_COLS PGSTAT_NUM_PROGRESS_PARAM + 3
int num_backends = pgstat_fetch_stat_numbackends();
int curr_backend;
char *cmd = text_to_cstring(PG_GETARG_TEXT_PP(0));
ProgressCommandType cmdtype;
TupleDesc tupdesc;
Tuplestorestate *tupstore;
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
MemoryContext per_query_ctx;
MemoryContext oldcontext;
/* check to see if caller supports us returning a tuplestore */
if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("set-valued function called in context that cannot accept a set")));
if (!(rsinfo->allowedModes & SFRM_Materialize))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("materialize mode required, but it is not allowed in this context")));
/* Build a tuple descriptor for our result type */
if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
elog(ERROR, "return type must be a row type");
/* Translate command name into command type code. */
if (pg_strcasecmp(cmd, "VACUUM") == 0)
cmdtype = PROGRESS_COMMAND_VACUUM;
else if (pg_strcasecmp(cmd, "ANALYZE") == 0)
cmdtype = PROGRESS_COMMAND_ANALYZE;
else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
cmdtype = PROGRESS_COMMAND_CLUSTER;
else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
cmdtype = PROGRESS_COMMAND_CREATE_INDEX;
else if (pg_strcasecmp(cmd, "BASEBACKUP") == 0)
cmdtype = PROGRESS_COMMAND_BASEBACKUP;
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid command name: \"%s\"", cmd)));
per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
oldcontext = MemoryContextSwitchTo(per_query_ctx);
tupstore = tuplestore_begin_heap(true, false, work_mem);
rsinfo->returnMode = SFRM_Materialize;
rsinfo->setResult = tupstore;
rsinfo->setDesc = tupdesc;
MemoryContextSwitchTo(oldcontext);
/* 1-based index */
for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
{
LocalPgBackendStatus *local_beentry;
PgBackendStatus *beentry;
Datum values[PG_STAT_GET_PROGRESS_COLS];
bool nulls[PG_STAT_GET_PROGRESS_COLS];
int i;
MemSet(values, 0, sizeof(values));
MemSet(nulls, 0, sizeof(nulls));
local_beentry = pgstat_fetch_stat_local_beentry(curr_backend);
if (!local_beentry)
continue;
beentry = &local_beentry->backendStatus;
/*
* Report values for only those backends which are running the given
* command.
*/
if (!beentry || beentry->st_progress_command != cmdtype)
continue;
/* Value available to all callers */
values[0] = Int32GetDatum(beentry->st_procpid);
values[1] = ObjectIdGetDatum(beentry->st_databaseid);
/* show rest of the values including relid only to role members */
if (HAS_PGSTAT_PERMISSIONS(beentry->st_userid))
{
values[2] = ObjectIdGetDatum(beentry->st_progress_command_target);
for (i = 0; i < PGSTAT_NUM_PROGRESS_PARAM; i++)
values[i + 3] = Int64GetDatum(beentry->st_progress_param[i]);
}
/* Modify by Nickxyang at 2021-11-28 */
/* 当系统进程执行vauum 或者 analyze时,当前用户为nonsuperuser
* 并且 enable_nonsuper_kill_autovacuumworker参数开启时,显示完整的信息
*/
else if (!HAS_PGSTAT_PERMISSIONS(beentry->st_userid) &&
(beentry->st_progress_command == (PROGRESS_COMMAND_VACUUM || PROGRESS_COMMAND_ANALYZE)) &&
enable_nonsuper_kill_autovacuumworker)
{
values[2] = ObjectIdGetDatum(beentry->st_progress_command_target);
for (i = 0; i < PGSTAT_NUM_PROGRESS_PARAM; i++)
values[i + 3] = Int64GetDatum(beentry->st_progress_param[i]);
}
/* End at 2021-11-28 */
else
{
nulls[2] = true;
for (i = 0; i < PGSTAT_NUM_PROGRESS_PARAM; i++)
nulls[i + 3] = true;
}
tuplestore_putvalues(tupstore, tupdesc, values, nulls);
}
/* clean up and return the tuplestore */
tuplestore_donestoring(tupstore);
return (Datum) 0;
}
四、实现效果
可以看到,默认nonsuper user是无法terminate autovacuum worker进程的。
postgres=> \du pguser
List of roles
Role name | Attributes | Member of
-----------+------------+-----------
pguser | | {}
postgres=> select relid::regclass as tablename,* from pg_stat_progress_vacuum ;
tablename | pid | datid | datname | relid | phase | heap_blks_total | heap_blks_scanned | heap_blks_vacuumed | index_vacuum_count | max_dead_tuples | num_dead_tuples
-----------+-------+-------+----------+-------+---------------+-----------------+-------------------+--------------------+--------------------+-----------------+-----------------
| 28477 | 13578 | postgres | | | | | | | |
(1 row)
postgres=> select pg_terminate_backend(28477);
ERROR: must be a member of the role whose process is being terminated or member of pg_signal_backend
postgres=>
使用superuser打开参数
postgres=# show enable_nonsuper_kill_autovacuumworker;alter system set enable_nonsuper_kill_autovacuumworker to on;select pg_reload_conf();
enable_nonsuper_kill_autovacuumworker
---------------------------------------
off
(1 row)
ALTER SYSTEM
pg_reload_conf
----------------
t
(1 row)
postgres=#
参数打开后,成功terminate autovacuum worker
postgres=> show enable_nonsuper_kill_autovacuumworker;
enable_nonsuper_kill_autovacuumworker
---------------------------------------
on
(1 row)
postgres=> select relid::regclass as tablename,* from pg_stat_progress_vacuum ;
tablename | pid | datid | datname | relid | phase | heap_blks_total | heap_blks_scanned | heap_blks_vacuumed | index_vacuum_count | max_dead_tuples | num_dead_tuples
-----------+-------+-------+----------+-------+---------------+-----------------+-------------------+--------------------+--------------------+-----------------+-----------------
tbl_1 | 28477 | 13578 | postgres | 16404 | scanning heap | 833334 | 19655 | 19654 | 0 | 291 | 120
(1 row)
postgres=> select pg_terminate_backend(28477);
pg_terminate_backend
----------------------
t
(1 row)
postgres=> select relid::regclass as tablename,* from pg_stat_progress_vacuum ;
tablename | pid | datid | datname | relid | phase | heap_blks_total | heap_blks_scanned | heap_blks_vacuumed | index_vacuum_count | max_dead_tuples | num_dead_tuples
-----------+-------+-------+----------+-------+---------------+-----------------+-------------------+--------------------+--------------------+-----------------+-----------------
| | | | | | | | | | |
(0 row)
同时验证了下terminate其他系统进程是失败的
postgres=# select pid,usename,backend_type from pg_stat_activity;
pid | usename | backend_type
-------+----------+------------------------------
10138 | | autovacuum launcher
10140 | postgres | logical replication launcher
19027 | pguser | client backend
25883 | pguser | client backend
25898 | postgres | client backend
6076 | postgres | client backend
10136 | | background writer
10135 | | checkpointer
10137 | | walwriter
(9 rows)
postgres=>\c postgres pguser
You are now connected to database "postgres" as user "pguser".
postgres=> select pg_terminate_backend(10138);
ERROR: must be a member of the role whose process is being terminated or member of pg_signal_backend
postgres=> select pg_terminate_backend(10140);
ERROR: must be a member of the role whose process is being terminated or member of pg_signal_backend
postgres=> select pg_terminate_backend(10135);
WARNING: PID 10135 is not a PostgreSQL server process
pg_terminate_backend
----------------------
f
(1 row)
postgres=>
五、小结
从测试结果来看,修改的方案是可行的。
同时有同学应该已经注意到了,原生的逻辑terminate时的报错提示:ERROR: must be a member of the role whose process is being terminated or member of pg_signal_backend.
那是否将pg_signal_backend赋予nonsuperuser就可以terminate autovacuum worker呢?
是这样的,但是如果赋予pg_signal_backend权限,那么可以terminate 所有proc->roleId为0的系统进程,比如autovacuum launcher, logical replication launcher等。这个风险相对较高,要严格把控这个权限。