前面介绍了主备热备中的同步复制代码 部分,下面介绍其中核心部分walsender和walreceiver进程。首先介绍walsender进程部分。 walsender进程,顾名思义就是发送wal日志的进程,walreciver进程就是接收wal日志的进程,主机将产生的wal日志发送给备机,备机通过startup进程完成redo操作,保证主备机之间操作的一致。下面首先以PG9.3中的walsender与walreciver进程为例,介绍主备热备代码流程。
walsender进程相关结构体变量
在src/include/replication/walsender_private.h中,定义了walsender进程需要使用到的结构体,分别如下
/*
* Each walsender has a WalSnd struct in shared memory.
*/
typedef struct WalSnd
{
pid_t pid; /* this walsender's process id, or 0 */
WalSndState state; /* this walsender's state */
XLogRecPtr sentPtr; /* WAL has been sent up to this point */
bool needreload; /* does currently-open file need to be
* reloaded? */
/*
* The xlog locations that have been written, flushed, and applied by
* standby-side. These may be invalid if the standby-side has not offered
* values yet.
*/
XLogRecPtr write;
XLogRecPtr flush;
XLogRecPtr apply;
/* Protects shared variables shown above. */
slock_t mutex;
/*
* Latch used by backends to wake up this walsender when it has work to
* do.
*/
Latch latch;
/*
* The priority order of the standby managed by this WALSender, as listed
* in synchronous_standby_names, or 0 if not-listed. Protected by
* SyncRepLock.
*/
int sync_standby_priority;
} WalSnd;
该结构体是每个walsender进程都存在的,存储相应的变量。WalSndState保存的是walsender进程的状态信息,变量信息如下
typedef enum WalSndState
{
WALSNDSTATE_STARTUP = 0,
WALSNDSTATE_BACKUP,
WALSNDSTATE_CATCHUP,
WALSNDSTATE_STREAMING
} WalSndState;
WALSNDSTATE_STARTUP表示启动状态,WALSNDSTATE_BACKUP表示备份状态,WALSNDSTATE_CATCHUP表示追赶状态,WALSNDSTATE_STREAMING表示流复制状态。write/flush/apply分别表示主机上xlog日志的写入/刷入位置以及接收到备机回应的XLOG日志位置。
/* There is one WalSndCtl struct for the whole database cluster */
typedef struct
{
/*
* Synchronous replication queue with one queue per request type.
* Protected by SyncRepLock.
*/
SHM_QUEUE SyncRepQueue[NUM_SYNC_REP_WAIT_MODE];
/*
* Current location of the head of the queue. All waiters should have a
* waitLSN that follows this value. Protected by SyncRepLock.
*/
XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE];
/*
* Are any sync standbys defined? Waiting backends can't reload the
* config file safely, so checkpointer updates this value as needed.
* Protected by SyncRepLock.
*/
bool sync_standbys_defined;
WalSnd walsnds[1]; /* VARIABLE LENGTH ARRAY */
} WalSndCtlData;
该结构体变量是整个实例拥有一个,用于定义全局相关信息以及每个walsender进程结构。
一次发送xlog日志的最大字节数
/*
* Maximum data payload in a WAL data message. Must be >= XLOG_BLCKSZ.
*
* We don't have a good idea of what a good value would be; there's some
* overhead per message in both walsender and walreceiver, but on the other
* hand sending large batches makes walsender less responsive to signals
* because signals are checked only between messages. 128kB (with
* default 8k blocks) seems like a reasonable guess for now.
*/
#define <strong>MAX_SEND_SIZE</strong> (XLOG_BLCKSZ * 16)
定义是主机模式还是级联主机模式(备机接收之后再次发送给其备机)
bool am_walsender = false; /* Am I a walsender process ? */
bool am_cascading_walsender = false; /* Am I cascading WAL to
* another standby ? */
设置walsender进程个数以及发送超时时间
int max_wal_senders = 0; /* the maximum number of concurrent walsenders */
int wal_sender_timeout = 60 * 1000; /* maximum time to send one WAL data message */
walsender进程函数分析
主备之间的通信有一套完整的协议信息,在达到正常流复制状态之前,需要通过一系列的检查建立正常的关系。
初始化函数InitWalSender
void
InitWalSender(void)
{
am_cascading_walsender = RecoveryInProgress();
/* Create a per-walsender data structure in shared memory */
InitWalSenderSlot();
/* Set up resource owner */
CurrentResourceOwner = ResourceOwnerCreate(NULL, "walsender top-level resource owner");
/*
* Let postmaster know that we're a WAL sender. Once we've declared us as
* a WAL sender process, postmaster will let us outlive the bgwriter and
* kill us last in the shutdown sequence, so we get a chance to stream all
* remaining WAL at shutdown, including the shutdown checkpoint. Note that
* there's no going back, and we mustn't write any WAL records after this.
*/
MarkPostmasterChildWalSender();
SendPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE);
}
在该函数中,首先调用函数InitWalSenderSlot初始化walsnd结构体的相关信息,主要是根据设置的最大walsender进程个数分配足够的内存空间。最后发送PMSIGNAL_ADVANCE_STATE_MACHINE信号给postmater进程,通知其更新进程状态信息。之后,主机再接收到备机发送的信息之后进入相应的处理函数流程。
目前在PG9.3中支持的流复制协议主要有
T_IdentifySystemCmd,
T_BaseBackupCmd,
T_StartReplicationCmd,
T_TimeLineHistoryCmd,
分别表示系统号确认命令、基础备份命令、流复制命令以及时间线信息命令。在主机接收到备机发送的消息之后,在后端处理函数中,如果为主机模式,调用函数exec_replication_command进入流复制命令处理流程。
处理流复制命令函数exec_replication_command
该函数处理接收到的命令,分别调用不同的函数处理
/*
* Execute an incoming replication command.
*/
void
exec_replication_command(const char *cmd_string)
{
int parse_rc;
Node *cmd_node;
MemoryContext cmd_context;
MemoryContext old_context;
elog(DEBUG1, "received replication command: %s", cmd_string);
CHECK_FOR_INTERRUPTS();
cmd_context = AllocSetContextCreate(CurrentMemoryContext,
"Replication command context",
ALLOCSET_DEFAULT_MINSIZE,
ALLOCSET_DEFAULT_INITSIZE,
ALLOCSET_DEFAULT_MAXSIZE);
old_context = MemoryContextSwitchTo(cmd_context);
replication_scanner_init(cmd_string);
parse_rc = replication_yyparse();
if (parse_rc != 0)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
(errmsg_internal("replication command parser returned %d",
parse_rc))));
cmd_node = replication_parse_result;
switch (cmd_node->type)
{
case T_IdentifySystemCmd:
IdentifySystem();
break;
case T_StartReplicationCmd:
StartReplication((StartReplicationCmd *) cmd_node);
break;
case T_BaseBackupCmd:
SendBaseBackup((BaseBackupCmd *) cmd_node);
break;
case T_TimeLineHistoryCmd:
SendTimeLineHistory((TimeLineHistoryCmd *) cmd_node);
break;
default:
elog(ERROR, "unrecognized replication command node tag: %u",
cmd_node->type);
}
/* done */
MemoryContextSwitchTo(old_context);
MemoryContextDelete(cmd_context);
/* Send CommandComplete message */
EndCommand("SELECT", DestRemote);
}
该函数是接收到命令之后,进入相应的函数处理,解析复制命令的时候使用到了语法解析器。
1) T_IdentifySystemCmd 系统确认命令
当备机启动之后,会发送该命令给主机,请求主机的相关信息,包括SystemID、时间线ID以及当前的XLOG日志位置。
/*
* Reply with a result set with one row, three columns. First col is
* system ID, second is timeline ID, and third is current xlog location.
*/
2)T_BaseBackupCmd 基础备份命令
该命令给pg_basebackup命令使用,调用perform_base_backup函数,walsender进程发送数据和日志。
3)T_TimeLineHistoryCmd 时间线历史命令
发送时间线文件及其内容
/*
* Reply with a result set with one row, and two columns. The first col is
* the name of the history file, 2nd is the contents.
*/
在主机与备机建立正常的关系之后,备机会向主机发送该命令请求正常的流复制关系。