WalSender 进程负责发送Primary DataNode节点的Wal日志到Standby DataNode节点。
【WalSender】Walsender 进程的入口函数 bool exec_replication_command(const char *cmd_string)
\src\backend\replication\Walsender.c
/*
* Execute an incoming replication command.
*
* Returns true if the cmd_string was recognized as WalSender command, false
* if not.
*/
bool
exec_replication_command(const char *cmd_string){
......
case T_StartReplicationCmd:
{
StartReplicationCmd *cmd = (StartReplicationCmd *) cmd_node;PreventTransactionChain(true, "START_REPLICATION");
if (cmd->kind == REPLICATION_KIND_PHYSICAL)
StartReplication(cmd);
else
StartLogicalReplication(cmd);
break;
}......
}
/*
* Handle START_REPLICATION command.
*
* At the moment, this never returns, but an ereport(ERROR) will take us back
* to the main loop.
*/
static void
StartReplication(StartReplicationCmd *cmd)
StartLogicalReplication(StartReplicationCmd *cmd)
/*
* Load previously initiated logical slot and prepare for sending data (via
* WalSndLoop).
*/
static void
StartLogicalReplication(StartReplicationCmd *cmd)
{// #lizard forgives
StringInfoData buf;/* make sure that our requirements are still fulfilled */
CheckLogicalDecodingRequirements();Assert(!MyReplicationSlot);
ReplicationSlotAcquire(cmd->slotname, true);
/*
* Force a disconnect, so that the decoding code doesn't need to care
* about an eventual switch from running in recovery, to running in a
* normal environment. Client code is expected to handle reconnects.
*/
if (am_cascading_walsender && !RecoveryInProgress())
{
ereport(LOG,
(errmsg("terminating walsender process after promotion")));
got_STOPPING = true;
}WalSndSetState(WALSNDSTATE_CATCHUP);
/* Send a CopyBothResponse message, and start streaming */
pq_beginmessage(&buf, 'W');
pq_sendbyte(&buf, 0);
pq_sendint(&buf, 0, 2);
pq_endmessage(&buf);
pq_flush();/*
* Initialize position to the last ack'ed one, then the xlog records begin
* to be shipped from that position.
*/
logical_decoding_ctx = CreateDecodingContext(cmd->startpoint, cmd->options,
logical_read_xlog_page,
WalSndPrepareWrite,
WalSndWriteData,
WalSndUpdateProgress);#ifdef __STORAGE_SCALABLE__
/*
* For shard logical decoding, we not only interest in the data belongs to
* target table, but also belongs to target shards. So we have to get shard
* info, and use it as filter.
*/
if (MyReplicationSlot->pgoutput)
{
/* get decode shard info for this relation */if (OidIsValid(MyReplicationSlot->subid) && OidIsValid(MyReplicationSlot->relid))
......
/* Main loop of walsender */
WalSndLoop(XLogSendLogical);
......
/* Main loop of walsender process that streams the WAL over Copy messages. */
static void
WalSndLoop(WalSndSendDataCallback send_data)
流复制过程中,有三个进程协同工作:walsender进程,walreceiver进程和startup进程。其中walsender进程属于主节点的进程,主要用来向备节点发送wal record; walreceiver和startup进程属于备节点进程,wal receiver主要用来接收主端发送来的wal record并写入磁盘上的XLOG文件中,之后startup进程就会对这些wal数据进行replay。三个进程共同协作,完成主备的整个流复制过程。本篇博客主要关注于WalReceiver进程和startup进程之间的交互逻辑。首先看一下这三个进程的调用堆栈,可以更加方便定位所需阅读的代码细节:
walsender进程是用来发送WAL日志记录的,执行顺序如下:PostgresMain()->exec_replication_command()->StartReplication()->WalSndLoop()->XLogSendPhysical()
walreceiver进程是用来接收WAL日志记录的,执行顺序如下:sigusr1_handler()->StartWalReceiver()->AuxiliaryProcessMain()->WalReceiverMain()->walrcv_receive()
startup进程是用来apply日志的,执行顺序如下:PostmasterMain()->StartupDataBase()->AuxiliaryProcessMain()->StartupProcessMain()->StartupXLOG()在流复制启动过程中,三个进程的启动顺序是从备库到主库,即:startup —> walreceiver —> walsender。但是值得注意的是Startup启动后,不会马上发送信号给postmaster来启动wal receiver进程,它先会进行一系列条件的判断然后决定是否通知postmaster启动wal receiver进程。 我们知道Startup进程回放日志所需要的WAL文件有3个来源:归档中获取、pg_wal文件夹下获取、从primary 节点以流复制方式获取。在实际流复制过程中, 如果是非归档,则先会从pg_wal中获取;否则优先从archive归档中获取(Archive Mode); 如果两者都没有,startup要恢复的wal,只能从primary 节点以流复制方式获取,这时startup会发送信号(通过函数SendPostmasterSignal(PMSIGNAL_START_WALRECEIVER) ,可以查看之前的博文了解这段过程)给postmaster进程,请求其启动wal receiver进程从Primary节点来获取wal数据。
在流复制运行中,WAL数据的流向则是walsender进程占据主动位置:walsender —> walreceiver —> startup。从主库backend执行业务操作所产生的XLOG会顺着上述流程从主库walsender进程网络发送到walreceiver网络接收并落盘,最终备库startup进程会对XLOG进行应用。
startup进程主要流程
startup进程进入standby模式和apply日志主要过程:读取pg_control文件,找到redo位点;读取recovery.conf,如果配置standby_mode=on则进入standby模式。
如果是Hot Standby需要初始化clog、subtrans、事务环境等。初始化redo资源管理器,比如Heap、Heap2、Database、XLOG等。
读取WAL record,如果record不存在需要调用XLogPageRead->WaitForWALToBecomeAvailable->RequestXLogStreaming唤醒walreceiver从walsender获取WAL record。
对读取的WAL record进行redo,通过record->xl_rmid信息,调用相应的redo资源管理器进行redo操作。比如heap_redo的XLOG_HEAP_INSERT操作,就是通过record的信息在buffer page中增加一个record。还有部分redo操作(vacuum产生的record)需要检查在Hot Standby模式下的查询冲突,比如某些tuples需要remove,而存在正在执行的query可能读到这些tuples,这样就会破坏事务隔离级别。通过函数ResolveRecoveryConflictWithSnapshot检测冲突,如果发生冲突,那么就把这个query所在的进程kill掉。
检查一致性,如果一致了,Hot Standby模式可以接受用户只读查询;更新共享内存中XLogCtlData的apply位点和时间线;如果恢复到时间点,时间线或者事务id需要检查是否恢复到当前目标;
回到步骤3,读取next WAL record。
WalReceiver 进程负责接收 Primary DataNode节点的Wal日志到Standby DataNode节点。
【WalReceiver】 WalReceiver进程的入口函数 int WalReceiverMain(void)
\src\backend\replication\Walreceive.c
/* Main entry point for walreceiver process */
void
WalReceiverMain(void){
......
*
* Get any missing history files. We do this always, even when we're
* not interested in that timeline, so that if we're promoted to
* become the master later on, we don't select the same timeline that
* was already used in the current master. This isn't bullet-proof -
* you'll need some external software to manage your cluster if you
* need to ensure that a unique timeline id is chosen in every case,
* but let's avoid the confusion of timeline id collisions where we
* can.
*/
WalRcvFetchTimeLineHistoryFiles(startpointTLI, primaryTLI);/*
* Start streaming.
*
* We'll try to start at the requested starting point and timeline,
* even if it's different from the server's latest timeline. In case
* we've already reached the end of the old timeline, the server will
* finish the streaming immediately, and we will go back to await
* orders from the startup process. If recovery_target_timeline is
* 'latest', the startup process will scan pg_wal and find the new
* history file, bump recovery target timeline, and ask us to restart
* on the new timeline.
*/
options.logical = false;
options.startpoint = startpoint;
options.slotname = slotname[0] != '\0' ? slotname : NULL;
options.proto.physical.startpointTLI = startpointTLI;
ThisTimeLineID = startpointTLI;
if (walrcv_startstreaming(wrconn, &options))......