一、 问题背景
之前重启数据库时遇到一个问题,本来主从间有超过2小时的延迟,重启db后监控突然提示延迟为0,告警恢复。一段时间后,监控又显示延迟逐渐增加,直至逐渐回到2小时。这是为什么?
二、 问题分析
1. pg_stat_replication视图定义
告警的延迟是从pg_stat_replication视图获取的lag数据,因此先看看它的定义
CREATE VIEW pg_stat_replication AS
SELECT
S.pid,
S.usesysid,
U.rolname AS usename,
S.application_name,
S.client_addr,
S.client_hostname,
S.client_port,
S.backend_start,
S.backend_xmin,
W.state,
W.sent_lsn,
W.write_lsn,
W.flush_lsn,
W.replay_lsn,
W.write_lag,
W.flush_lag,
W.replay_lag,
W.sync_priority,
W.sync_state,
W.reply_time
FROM pg_stat_get_activity(NULL) AS S
JOIN pg_stat_get_wal_senders() AS W ON (S.pid = W.pid)
LEFT JOIN pg_authid AS U ON (S.usesysid = U.oid);
可以看到主从相关的信息基本都来自 pg_stat_get_wal_senders函数
2. pg_stat_get_wal_senders 函数
由注释可以看到,其用于获取当前数据库中 walsender进程的信息,包含pid,xlog位置等。
这个函数整体比较长,但跟我们问题相关的其实很简单——每次调用时它会从共享内存中循环获取各个walsender进程的信息。因为是从内存获取的,数据库重启后这些信息会被重置,并且连接数和复制进度等统计信息在重启后需要重新积累和记录。因此,这些值会从初始状态开始,随着时间的推移逐渐增长,反映出实际的连接和复制情况。
/*
* Returns activity of walsenders, including pids and xlog locations sent to
* standby servers.
*/
Datum
pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
{
#define PG_STAT_GET_WAL_SENDERS_COLS 12
...
/*
* Get the currently active synchronous standbys. This could be out of
* date before we're done, but we'll use the data anyway.
*/
num_standbys = SyncRepGetCandidateStandbys(&sync_standbys);
for (i = 0; i < max_wal_senders; i++)
{
WalSnd *walsnd = &WalSndCtl->walsnds[i];
XLogRecPtr sentPtr;
XLogRecPtr write;
XLogRecPtr flush;
XLogRecPtr apply;
TimeOffset writeLag;
TimeOffset flushLag;
TimeOffset applyLag;
int priority;
int pid;
WalSndState state;
TimestampTz replyTime;
bool is_sync_standby;
Datum values[PG_STAT_GET_WAL_SENDERS_COLS];
bool nulls[PG_STAT_GET_WAL_SENDERS_COLS];
int j;
/* Collect data from shared memory */
SpinLockAcquire(&walsnd->mutex);
if (walsnd->pid == 0)
{
SpinLockRelease(&walsnd->mutex);
continue;
}
pid = walsnd->pid;
sentPtr = walsnd->sentPtr;
state = walsnd->state;
write = walsnd->write;
flush = walsnd->flush;
apply = walsnd->apply;
writeLag = walsnd->writeLag;
flushLag = walsnd->flushLag;
applyLag = walsnd->applyLag;
priority = walsnd->sync_standby_priority;
replyTime = walsnd->replyTime;
SpinLockRelease(&walsnd->mutex);
...
3. WalSnd结构体
关于获取的具体信息,可以从WalSnd结构体的定义中看到
/*
* Each walsender has a WalSnd struct in shared memory.
*
* This struct is protected by its 'mutex' spinlock field, except that some
* members are only written by the walsender process itself, and thus that
* process is free to read those members without holding spinlock. pid and
* needreload always require the spinlock to be held for all accesses.
*/
typedef struct WalSnd
{
pid_t pid; /* this walsender's PID, or 0 if not active */
WalSndState state; /* this walsender's state */
XLogRecPtr sentPtr; /* WAL has been sent up to this point */
bool needreload; /* does currently-open file need to be
* reloaded? */
/*
* The xlog locations that have been written, flushed, and applied by
* standby-side. These may be invalid if the standby-side has not offered
* values yet.
*/
XLogRecPtr write;
XLogRecPtr flush;
XLogRecPtr apply;
/* Measured lag times, or -1 for unknown/none. */
TimeOffset writeLag;
TimeOffset flushLag;
TimeOffset applyLag;
/*
* The priority order of the standby managed by this WALSender, as listed
* in synchronous_standby_names, or 0 if not-listed.
*/
int sync_standby_priority;
/* Protects shared variables shown above. */
slock_t mutex;
/*
* Pointer to the walsender's latch. Used by backends to wake up this
* walsender when it has work to do. NULL if the walsender isn't active.
*/
Latch *latch;
/*
* Timestamp of the last message received from standby.
*/
TimestampTz replyTime;
} WalSnd;