Tbase 源码（七）

driftingman

已于 2022-12-11 18:27:49 修改

阅读量296

点赞数 1

文章标签：分布式数据库 postgresql

于 2022-08-10 22:58:43 首次发布

本文链接：https://blog.csdn.net/driftingman/article/details/126275852

版权

WalSender 进程负责发送Primary DataNode节点的Wal日志到Standby DataNode节点。

【WalSender】Walsender 进程的入口函数 bool exec_replication_command(const char *cmd_string)

\src\backend\replication\Walsender.c

/*
* Execute an incoming replication command.
*
* Returns true if the cmd_string was recognized as WalSender command, false
* if not.
*/
bool
exec_replication_command(const char *cmd_string)

{

......

case T_StartReplicationCmd:
{
StartReplicationCmd *cmd = (StartReplicationCmd *) cmd_node;

PreventTransactionChain(true, "START_REPLICATION");

if (cmd->kind == REPLICATION_KIND_PHYSICAL)
StartReplication(cmd);
else
StartLogicalReplication(cmd);
break;
}

......

}

/*
* Handle START_REPLICATION command.
*
* At the moment, this never returns, but an ereport(ERROR) will take us back
* to the main loop.
*/
static void
StartReplication(StartReplicationCmd *cmd)

StartLogicalReplication(StartReplicationCmd *cmd)

/*
* Load previously initiated logical slot and prepare for sending data (via
* WalSndLoop).
*/
static void
StartLogicalReplication(StartReplicationCmd *cmd)
{// #lizard forgives
StringInfoData buf;

/* make sure that our requirements are still fulfilled */
CheckLogicalDecodingRequirements();

Assert(!MyReplicationSlot);

ReplicationSlotAcquire(cmd->slotname, true);

/*
* Force a disconnect, so that the decoding code doesn't need to care
* about an eventual switch from running in recovery, to running in a
* normal environment. Client code is expected to handle reconnects.
*/
if (am_cascading_walsender && !RecoveryInProgress())
{
ereport(LOG,
(errmsg("terminating walsender process after promotion")));
got_STOPPING = true;
}

WalSndSetState(WALSNDSTATE_CATCHUP);

/* Send a CopyBothResponse message, and start streaming */
pq_beginmessage(&buf, 'W');
pq_sendbyte(&buf, 0);
pq_sendint(&buf, 0, 2);
pq_endmessage(&buf);
pq_flush();

/*
* Initialize position to the last ack'ed one, then the xlog records begin
* to be shipped from that position.
*/
logical_decoding_ctx = CreateDecodingContext(cmd->startpoint, cmd->options,
logical_read_xlog_page,
WalSndPrepareWrite,
WalSndWriteData,
WalSndUpdateProgress);

#ifdef __STORAGE_SCALABLE__
/*
* For shard logical decoding, we not only interest in the data belongs to
* target table, but also belongs to target shards. So we have to get shard
* info, and use it as filter.
*/
if (MyReplicationSlot->pgoutput)
{
/* get decode shard info for this relation */

if (OidIsValid(MyReplicationSlot->subid) && OidIsValid(MyReplicationSlot->relid))

......

/* Main loop of walsender */
WalSndLoop(XLogSendLogical);

......

/* Main loop of walsender process that streams the WAL over Copy messages. */
static void
WalSndLoop(WalSndSendDataCallback send_data)

流复制过程中，有三个进程协同工作：walsender进程，walreceiver进程和startup进程。其中walsender进程属于主节点的进程，主要用来向备节点发送wal record； walreceiver和startup进程属于备节点进程，wal receiver主要用来接收主端发送来的wal record并写入磁盘上的XLOG文件中，之后startup进程就会对这些wal数据进行replay。三个进程共同协作，完成主备的整个流复制过程。本篇博客主要关注于WalReceiver进程和startup进程之间的交互逻辑。首先看一下这三个进程的调用堆栈，可以更加方便定位所需阅读的代码细节：

walsender进程是用来发送WAL日志记录的，执行顺序如下：PostgresMain()->exec_replication_command()->StartReplication()->WalSndLoop()->XLogSendPhysical()
walreceiver进程是用来接收WAL日志记录的，执行顺序如下：sigusr1_handler()->StartWalReceiver()->AuxiliaryProcessMain()->WalReceiverMain()->walrcv_receive()
startup进程是用来apply日志的，执行顺序如下：PostmasterMain()->StartupDataBase()->AuxiliaryProcessMain()->StartupProcessMain()->StartupXLOG()

在流复制启动过程中，三个进程的启动顺序是从备库到主库，即：startup —> walreceiver —> walsender。但是值得注意的是Startup启动后，不会马上发送信号给postmaster来启动wal receiver进程，它先会进行一系列条件的判断然后决定是否通知postmaster启动wal receiver进程。我们知道Startup进程回放日志所需要的WAL文件有3个来源：归档中获取、pg_wal文件夹下获取、从primary 节点以流复制方式获取。在实际流复制过程中，如果是非归档，则先会从pg_wal中获取；否则优先从archive归档中获取（Archive Mode）；如果两者都没有，startup要恢复的wal，只能从primary 节点以流复制方式获取，这时startup会发送信号（通过函数SendPostmasterSignal(PMSIGNAL_START_WALRECEIVER) ，可以查看之前的博文了解这段过程）给postmaster进程，请求其启动wal receiver进程从Primary节点来获取wal数据。

在流复制运行中，WAL数据的流向则是walsender进程占据主动位置：walsender —> walreceiver —> startup。从主库backend执行业务操作所产生的XLOG会顺着上述流程从主库walsender进程网络发送到walreceiver网络接收并落盘，最终备库startup进程会对XLOG进行应用。

startup进程主要流程
startup进程进入standby模式和apply日志主要过程：

读取pg_control文件，找到redo位点;读取recovery.conf，如果配置standby_mode=on则进入standby模式。
如果是Hot Standby需要初始化clog、subtrans、事务环境等。初始化redo资源管理器，比如Heap、Heap2、Database、XLOG等。
读取WAL record，如果record不存在需要调用XLogPageRead->WaitForWALToBecomeAvailable->RequestXLogStreaming唤醒walreceiver从walsender获取WAL record。
对读取的WAL record进行redo，通过record->xl_rmid信息，调用相应的redo资源管理器进行redo操作。比如heap_redo的XLOG_HEAP_INSERT操作，就是通过record的信息在buffer page中增加一个record。还有部分redo操作(vacuum产生的record)需要检查在Hot Standby模式下的查询冲突，比如某些tuples需要remove，而存在正在执行的query可能读到这些tuples，这样就会破坏事务隔离级别。通过函数ResolveRecoveryConflictWithSnapshot检测冲突，如果发生冲突，那么就把这个query所在的进程kill掉。
检查一致性，如果一致了，Hot Standby模式可以接受用户只读查询；更新共享内存中XLogCtlData的apply位点和时间线；如果恢复到时间点，时间线或者事务id需要检查是否恢复到当前目标；
回到步骤3，读取next WAL record。

WalReceiver 进程负责接收 Primary DataNode节点的Wal日志到Standby DataNode节点。

【WalReceiver】 WalReceiver进程的入口函数 int WalReceiverMain(void)

\src\backend\replication\Walreceive.c

/* Main entry point for walreceiver process */
void
WalReceiverMain(void)

{

......

*
* Get any missing history files. We do this always, even when we're
* not interested in that timeline, so that if we're promoted to
* become the master later on, we don't select the same timeline that
* was already used in the current master. This isn't bullet-proof -
* you'll need some external software to manage your cluster if you
* need to ensure that a unique timeline id is chosen in every case,
* but let's avoid the confusion of timeline id collisions where we
* can.
*/
WalRcvFetchTimeLineHistoryFiles(startpointTLI, primaryTLI);

/*
* Start streaming.
*
* We'll try to start at the requested starting point and timeline,
* even if it's different from the server's latest timeline. In case
* we've already reached the end of the old timeline, the server will
* finish the streaming immediately, and we will go back to await
* orders from the startup process. If recovery_target_timeline is
* 'latest', the startup process will scan pg_wal and find the new
* history file, bump recovery target timeline, and ask us to restart
* on the new timeline.
*/
options.logical = false;
options.startpoint = startpoint;
options.slotname = slotname[0] != '\0' ? slotname : NULL;
options.proto.physical.startpointTLI = startpointTLI;
ThisTimeLineID = startpointTLI;
if (walrcv_startstreaming(wrconn, &options))

......