pg_terminate_backend()运维改造--nonsuperuser支持kill autovacuum worker

一、背景介绍

鉴于安全考虑,数据库的superuser一般来说是只掌握在DBA同学手中。
最近多次碰到这样的场景,业务同学晚上发版修改表结构,吐槽表被系统进程锁住了,排查后发现是autovacuum worker阻塞的。经常需要爬起来支持,同时也阻塞了发版效率。

那能不能将autovacuum worker的terminate权限给nonsuper user呢?可以方便业务账号直接处理被autovacuum worker阻塞的情况。

业余时间研究了下,将pg_terminate_backend()做了一点小改动,支持了nonsuper user kill autovacuum worker的能力

二、代码分析

先来看下pg_terminate_backend ()函数原生的逻辑

/*
 * Signal to terminate a backend process.  This is allowed if you are a member
 * of the role whose process is being terminated.
 *
 * Note that only superusers can signal superuser-owned processes.
 */
Datum
pg_terminate_backend(PG_FUNCTION_ARGS)
{
	int			r = pg_signal_backend(PG_GETARG_INT32(0), SIGTERM);

	if (r == SIGNAL_BACKEND_NOSUPERUSER)
		ereport(ERROR,
				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
				 (errmsg("must be a superuser to terminate superuser process"))));

	if (r == SIGNAL_BACKEND_NOPERMISSION)
		ereport(ERROR,
				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
				 (errmsg("must be a member of the role whose process is being terminated or member of pg_signal_backend"))));

	PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
}

主要是调用pg_signal_backend(PG_GETARG_INT32(0), SIGTERM)向进程发送SIGTERM信号,即信号15。

在pg_signal_backend函数中:
1、 首先调用BackendPidGetProc函数获取Backendpid list,这里是从allProc中选取backend pid list,循环匹配我们传入的需要terminate的pid,匹配不到则返回:PID %d is not a PostgreSQL server process;
2、 判断我们要terminate的pid的proc->roleid是否是superuser,并且我们操作的用户是否有superuser权限。不满足条件的话返回:must be a superuser to terminate superuser process;
3、 这里先判断当前执行terminate的用户,是否是proc->roleId的member; 然后判断用户是否有pg_signal_backend权限;这里不满足条件的话返回:
must be a member of the role whose process is being terminated or member of pg_signal_backend

pg原生的设计,nonsuperuser用户terminate autovacuum worker时,会走第3处逻辑。那我们可以在这里进行修改,实现nonsuper user kill autovacuum worker的功能。

4、 调用kill接口向pid发送信号

三、方案设计

我们修改代码的原则是应该尽量保留原生的逻辑,可以考虑加入一个控制参数,默认关闭状态,走原生逻辑;打开参数后,开启新功能。

新增一个bool类型的控制参数enable_nonsuper_kill_autovacuumworker,默认值为off,可以通过alter system的方式修改,重载配置文件生效。

postgres=> select * from pg_settings where name like '%enable_nonsuper_kill_autovacuumworker%';
-[ RECORD 1 ]---+--------------------------------------------------------------------------
name            | enable_nonsuper_kill_autovacuumworker
setting         | off
unit            | 
category        | Preset Options
short_desc      | Whether nonsuperuser could kill autovacuum worker process, default false.
extra_desc      | 
context         | sighup
vartype         | bool
source          | default
min_val         | 
max_val         | 
enumvals        | 
boot_val        | off
reset_val       | off
sourcefile      | 
sourceline      | 
pending_restart | f

postgres=>

修改pg_signal_backend函数的逻辑如下:

static int
pg_signal_backend(int pid, int sig)
{
        PGPROC     *proc = BackendPidGetProc(pid);

        /*
         * BackendPidGetProc returns NULL if the pid isn't valid; but by the time
         * we reach kill(), a process for which we get a valid proc here might
         * have terminated on its own.  There's no way to acquire a lock on an
         * arbitrary process to prevent that. But since so far all the callers of
         * this mechanism involve some request for ending the process anyway, that
         * it might end on its own first is not a problem.
         */
        if (proc == NULL)
        {
                /*
                 * This is just a warning so a loop-through-resultset will not abort
                 * if one backend terminated on its own during the run.
                 */
                ereport(WARNING,
                                (errmsg("PID %d is not a PostgreSQL server process", pid)));
                return SIGNAL_BACKEND_ERROR;
        }

        /* Only allow superusers to signal superuser-owned backends. */
        if (superuser_arg(proc->roleId) && !superuser())
                return SIGNAL_BACKEND_NOSUPERUSER;

        /* Users can signal backends they have role membership in. */
        if (!has_privs_of_role(GetUserId(), proc->roleId) &&
                !has_privs_of_role(GetUserId(), DEFAULT_ROLE_SIGNAL_BACKENDID))
        {       /* Modify by Nickxyang at 2021-11-28 */
                /* 这里先判断enable_nonsuper_kill_autovacuumworker 参数是否开启,未开启则走原生逻辑return SIGNAL_BACKEND_NOPERMISSION; */
                if (!enable_nonsuper_kill_autovacuumworker)
                        return SIGNAL_BACKEND_NOPERMISSION;
                /* 这里获取进程对应的结构体信息 */
                LocalPgBackendStatus *local_beentry;
                PgBackendStatus *beentry;
                local_beentry = pgstat_fetch_stat_local_beentry(proc->backendId);
                /* 如果获取到的信息为空,说明进程已经不存在?也就不用terminate了,直接return SIGNAL_BACKEND_NOPERMISSION就行 */
                if (!local_beentry)
                        return SIGNAL_BACKEND_NOPERMISSION;
                
                beentry = &local_beentry->backendStatus;
                /* 如果获取的进程类型不是B_AUTOVAC_WORKER,那就return SIGNAL_BACKEND_NOPERMISSION
                 * 也就是说,只有terminate autovacuum worker进程,才会顺利走到下边kill发送信号的逻辑,
                 * 如果是其它系统进程一律报错无权限
                 */
                if (beentry->st_backendType != B_AUTOVAC_WORKER)
                {
                        return SIGNAL_BACKEND_NOPERMISSION;
                }
                /* End at 2021-11-28 */
        }
        /*
         * Can the process we just validated above end, followed by the pid being
         * recycled for a new process, before reaching here?  Then we'd be trying
         * to kill the wrong thing.  Seems near impossible when sequential pid
         * assignment and wraparound is used.  Perhaps it could happen on a system
         * where pid re-use is randomized.  That race condition possibility seems
         * too unlikely to worry about.
         */

        /* If we have setsid(), signal the backend's whole process group */
#ifdef HAVE_SETSID
        if (kill(-pid, sig))
#else
        if (kill(pid, sig))
#endif
        {
                /* Again, just a warning to allow loops */
                ereport(WARNING,
                                (errmsg("could not send signal to process %d: %m", pid)));
                return SIGNAL_BACKEND_ERROR;
        }
        return SIGNAL_BACKEND_SUCCESS;
}

同时还要修改下pg_stat_progress_vacuum视图的逻辑,因为原生逻辑里nonsuper user无权查看正在进行的vacuum的全部信息,只能看到pid,datid,dataname字段;

修改这里的逻辑,当enable_nonsuper_kill_autovacuumworker参数打开时,可以看到pg_stat_progress_vacuum所有字段信息,根据relid即表oid可以方便terminate对应的 worker进程;参数关闭时保持原有逻辑只显示默认字段。

Datum
pg_stat_get_progress_info(PG_FUNCTION_ARGS)
{
#define PG_STAT_GET_PROGRESS_COLS       PGSTAT_NUM_PROGRESS_PARAM + 3
        int                     num_backends = pgstat_fetch_stat_numbackends();
        int                     curr_backend;
        char       *cmd = text_to_cstring(PG_GETARG_TEXT_PP(0));
        ProgressCommandType cmdtype;
        TupleDesc       tupdesc;
        Tuplestorestate *tupstore;
        ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
        MemoryContext per_query_ctx;
        MemoryContext oldcontext;

        /* check to see if caller supports us returning a tuplestore */
        if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
                ereport(ERROR,
                                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                                 errmsg("set-valued function called in context that cannot accept a set")));
        if (!(rsinfo->allowedModes & SFRM_Materialize))
                ereport(ERROR,
                                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                                 errmsg("materialize mode required, but it is not allowed in this context")));

        /* Build a tuple descriptor for our result type */
        if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
                elog(ERROR, "return type must be a row type");

        /* Translate command name into command type code. */
        if (pg_strcasecmp(cmd, "VACUUM") == 0)
                cmdtype = PROGRESS_COMMAND_VACUUM;
        else if (pg_strcasecmp(cmd, "ANALYZE") == 0)
                cmdtype = PROGRESS_COMMAND_ANALYZE;
        else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
                cmdtype = PROGRESS_COMMAND_CLUSTER;
        else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
                cmdtype = PROGRESS_COMMAND_CREATE_INDEX;
        else if (pg_strcasecmp(cmd, "BASEBACKUP") == 0)
                cmdtype = PROGRESS_COMMAND_BASEBACKUP;
        else
                ereport(ERROR,
                                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                                 errmsg("invalid command name: \"%s\"", cmd)));

        per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
        oldcontext = MemoryContextSwitchTo(per_query_ctx);

        tupstore = tuplestore_begin_heap(true, false, work_mem);
        rsinfo->returnMode = SFRM_Materialize;
        rsinfo->setResult = tupstore;
        rsinfo->setDesc = tupdesc;
        MemoryContextSwitchTo(oldcontext);

        /* 1-based index */
        for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
        {
                LocalPgBackendStatus *local_beentry;
                PgBackendStatus *beentry;
                Datum           values[PG_STAT_GET_PROGRESS_COLS];
                bool            nulls[PG_STAT_GET_PROGRESS_COLS];
                int                     i;

                MemSet(values, 0, sizeof(values));
                MemSet(nulls, 0, sizeof(nulls));

                local_beentry = pgstat_fetch_stat_local_beentry(curr_backend);

                if (!local_beentry)
                        continue;

                beentry = &local_beentry->backendStatus;

                /*
                 * Report values for only those backends which are running the given
                 * command.
                 */
                if (!beentry || beentry->st_progress_command != cmdtype)
                        continue;

                /* Value available to all callers */
                values[0] = Int32GetDatum(beentry->st_procpid);
                values[1] = ObjectIdGetDatum(beentry->st_databaseid);

                /* show rest of the values including relid only to role members */
                if (HAS_PGSTAT_PERMISSIONS(beentry->st_userid))
                {
                        values[2] = ObjectIdGetDatum(beentry->st_progress_command_target);
                        for (i = 0; i < PGSTAT_NUM_PROGRESS_PARAM; i++)
                                values[i + 3] = Int64GetDatum(beentry->st_progress_param[i]);
                }
                /* Modify by Nickxyang at 2021-11-28 */
                /* 当系统进程执行vauum 或者 analyze时,当前用户为nonsuperuser   
                 * 并且 enable_nonsuper_kill_autovacuumworker参数开启时,显示完整的信息
                 */
                else if (!HAS_PGSTAT_PERMISSIONS(beentry->st_userid) && 
                        (beentry->st_progress_command == (PROGRESS_COMMAND_VACUUM || PROGRESS_COMMAND_ANALYZE)) && 
                        enable_nonsuper_kill_autovacuumworker)
                {

                        values[2] = ObjectIdGetDatum(beentry->st_progress_command_target);
                        for (i = 0; i < PGSTAT_NUM_PROGRESS_PARAM; i++)
                                values[i + 3] = Int64GetDatum(beentry->st_progress_param[i]);

                }
                /* End at 2021-11-28 */
                else
                {
                        nulls[2] = true;
                        for (i = 0; i < PGSTAT_NUM_PROGRESS_PARAM; i++)
                                nulls[i + 3] = true;
                }

                tuplestore_putvalues(tupstore, tupdesc, values, nulls);
        }

        /* clean up and return the tuplestore */
        tuplestore_donestoring(tupstore);

        return (Datum) 0;
}

四、实现效果

可以看到,默认nonsuper user是无法terminate autovacuum worker进程的。

postgres=> \du pguser
           List of roles
 Role name | Attributes | Member of 
-----------+------------+-----------
 pguser    |            | {}

postgres=>  select relid::regclass as tablename,* from pg_stat_progress_vacuum ;
 tablename |  pid  | datid | datname  | relid |     phase     | heap_blks_total | heap_blks_scanned | heap_blks_vacuumed | index_vacuum_count | max_dead_tuples | num_dead_tuples 
-----------+-------+-------+----------+-------+---------------+-----------------+-------------------+--------------------+--------------------+-----------------+-----------------
           | 28477 | 13578 | postgres |       |               |                 |                   |                    |                    |                 |             
(1 row)
postgres=> select pg_terminate_backend(28477);
ERROR:  must be a member of the role whose process is being terminated or member of pg_signal_backend
postgres=>

使用superuser打开参数

postgres=#  show enable_nonsuper_kill_autovacuumworker;alter system set enable_nonsuper_kill_autovacuumworker to on;select pg_reload_conf();
 enable_nonsuper_kill_autovacuumworker 
---------------------------------------
 off
(1 row)

ALTER SYSTEM
 pg_reload_conf 
----------------
 t
(1 row)

postgres=#

参数打开后,成功terminate autovacuum worker

postgres=>  show enable_nonsuper_kill_autovacuumworker;
 enable_nonsuper_kill_autovacuumworker 
---------------------------------------
 on
(1 row)
postgres=>  select relid::regclass as tablename,* from pg_stat_progress_vacuum ;
 tablename |  pid  | datid | datname  | relid |     phase     | heap_blks_total | heap_blks_scanned | heap_blks_vacuumed | index_vacuum_count | max_dead_tuples | num_dead_tuples 
-----------+-------+-------+----------+-------+---------------+-----------------+-------------------+--------------------+--------------------+-----------------+-----------------
 tbl_1     | 28477 | 13578 | postgres | 16404 | scanning heap |          833334 |             19655 |              19654 |                  0 |             291 |             120
(1 row)

postgres=> select pg_terminate_backend(28477);
 pg_terminate_backend 
----------------------
 t
(1 row)
postgres=> select relid::regclass as tablename,* from pg_stat_progress_vacuum ;
 tablename |  pid  | datid | datname  | relid |     phase     | heap_blks_total | heap_blks_scanned | heap_blks_vacuumed | index_vacuum_count | max_dead_tuples | num_dead_tuples 
-----------+-------+-------+----------+-------+---------------+-----------------+-------------------+--------------------+--------------------+-----------------+-----------------
           |       |       |          |       |               |                 |                   |                    |                    |                 |             
(0 row)

同时验证了下terminate其他系统进程是失败的

postgres=# select pid,usename,backend_type from pg_stat_activity;
  pid  | usename  |         backend_type         
-------+----------+------------------------------
 10138 |          | autovacuum launcher
 10140 | postgres | logical replication launcher
 19027 | pguser   | client backend
 25883 | pguser   | client backend
 25898 | postgres | client backend
  6076 | postgres | client backend
 10136 |          | background writer
 10135 |          | checkpointer
 10137 |          | walwriter
(9 rows)
postgres=>\c postgres pguser
You are now connected to database "postgres" as user "pguser".
postgres=> select pg_terminate_backend(10138);
ERROR:  must be a member of the role whose process is being terminated or member of pg_signal_backend
postgres=> select pg_terminate_backend(10140);
ERROR:  must be a member of the role whose process is being terminated or member of pg_signal_backend
postgres=> select pg_terminate_backend(10135);
WARNING:  PID 10135 is not a PostgreSQL server process
 pg_terminate_backend 
----------------------
 f
(1 row)

postgres=>

五、小结

从测试结果来看,修改的方案是可行的。

同时有同学应该已经注意到了,原生的逻辑terminate时的报错提示:ERROR: must be a member of the role whose process is being terminated or member of pg_signal_backend.

那是否将pg_signal_backend赋予nonsuperuser就可以terminate autovacuum worker呢?

是这样的,但是如果赋予pg_signal_backend权限,那么可以terminate 所有proc->roleId为0的系统进程,比如autovacuum launcher, logical replication launcher等。这个风险相对较高,要严格把控这个权限。

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值