Tbase 源码 (七)

WalSender 进程负责发送Primary DataNode节点的Wal日志到Standby DataNode节点。

【WalSender】Walsender 进程的入口函数 bool  exec_replication_command(const char *cmd_string)

\src\backend\replication\Walsender.c

/*
 * Execute an incoming replication command.
 *
 * Returns true if the cmd_string was recognized as WalSender command, false
 * if not.
 */
bool
exec_replication_command(const char *cmd_string)

{

......

  case T_StartReplicationCmd:
            {
                StartReplicationCmd *cmd = (StartReplicationCmd *) cmd_node;

                PreventTransactionChain(true, "START_REPLICATION");

                if (cmd->kind == REPLICATION_KIND_PHYSICAL)
                    StartReplication(cmd);
                else
                    StartLogicalReplication(cmd);
                break;
            }

......

}

/*
 * Handle START_REPLICATION command.
 *
 * At the moment, this never returns, but an ereport(ERROR) will take us back
 * to the main loop.
 */
static void
StartReplication(StartReplicationCmd *cmd) 

StartLogicalReplication(StartReplicationCmd *cmd)

 

/*
 * Load previously initiated logical slot and prepare for sending data (via
 * WalSndLoop).
 */

static void
StartLogicalReplication(StartReplicationCmd *cmd)
{// #lizard forgives
    StringInfoData buf;

    /* make sure that our requirements are still fulfilled */
    CheckLogicalDecodingRequirements();

    Assert(!MyReplicationSlot);

    ReplicationSlotAcquire(cmd->slotname, true);

    /*
     * Force a disconnect, so that the decoding code doesn't need to care
     * about an eventual switch from running in recovery, to running in a
     * normal environment. Client code is expected to handle reconnects.
     */
    if (am_cascading_walsender && !RecoveryInProgress())
    {
        ereport(LOG,
                (errmsg("terminating walsender process after promotion")));
        got_STOPPING = true;
    }

    WalSndSetState(WALSNDSTATE_CATCHUP);

  /* Send a CopyBothResponse message, and start streaming */
    pq_beginmessage(&buf, 'W');
    pq_sendbyte(&buf, 0);
    pq_sendint(&buf, 0, 2);
    pq_endmessage(&buf);
    pq_flush();

    /*
     * Initialize position to the last ack'ed one, then the xlog records begin
     * to be shipped from that position.
     */
    logical_decoding_ctx = CreateDecodingContext(cmd->startpoint, cmd->options,
                                                 logical_read_xlog_page,
                                                 WalSndPrepareWrite,
                                                 WalSndWriteData,
                                                 WalSndUpdateProgress);

#ifdef __STORAGE_SCALABLE__
    /*
     * For shard logical decoding, we not only interest in the data belongs to
     * target table, but also belongs to target shards. So we have to get shard
     * info, and use it as filter.
     */
    if (MyReplicationSlot->pgoutput)
    {
        /* get decode shard info for this relation */

if (OidIsValid(MyReplicationSlot->subid) && OidIsValid(MyReplicationSlot->relid))

......

/* Main loop of walsender */
    WalSndLoop(XLogSendLogical);

 

......

/* Main loop of walsender process that streams the WAL over Copy messages. */
static void
WalSndLoop(WalSndSendDataCallback send_data)

流复制过程中,有三个进程协同工作:walsender进程,walreceiver进程和startup进程。其中walsender进程属于主节点的进程,主要用来向备节点发送wal record; walreceiver和startup进程属于备节点进程,wal receiver主要用来接收主端发送来的wal record并写入磁盘上的XLOG文件中,之后startup进程就会对这些wal数据进行replay。三个进程共同协作,完成主备的整个流复制过程。本篇博客主要关注于WalReceiver进程和startup进程之间的交互逻辑。首先看一下这三个进程的调用堆栈,可以更加方便定位所需阅读的代码细节:

walsender进程是用来发送WAL日志记录的,执行顺序如下:PostgresMain()->exec_replication_command()->StartReplication()->WalSndLoop()->XLogSendPhysical()
walreceiver进程是用来接收WAL日志记录的,执行顺序如下:sigusr1_handler()->StartWalReceiver()->AuxiliaryProcessMain()->WalReceiverMain()->walrcv_receive()
startup进程是用来apply日志的,执行顺序如下:PostmasterMain()->StartupDataBase()->AuxiliaryProcessMain()->StartupProcessMain()->StartupXLOG()

在流复制启动过程中,三个进程的启动顺序是从备库到主库,即:startup —> walreceiver —> walsender。但是值得注意的是Startup启动后,不会马上发送信号给postmaster来启动wal receiver进程,它先会进行一系列条件的判断然后决定是否通知postmaster启动wal receiver进程。 我们知道Startup进程回放日志所需要的WAL文件有3个来源:归档中获取、pg_wal文件夹下获取、从primary 节点以流复制方式获取。在实际流复制过程中, 如果是非归档,则先会从pg_wal中获取;否则优先从archive归档中获取(Archive Mode); 如果两者都没有,startup要恢复的wal,只能从primary 节点以流复制方式获取,这时startup会发送信号(通过函数SendPostmasterSignal(PMSIGNAL_START_WALRECEIVER) ,可以查看之前的博文了解这段过程)给postmaster进程,请求其启动wal receiver进程从Primary节点来获取wal数据。

在流复制运行中,WAL数据的流向则是walsender进程占据主动位置:walsender —> walreceiver —> startup。从主库backend执行业务操作所产生的XLOG会顺着上述流程从主库walsender进程网络发送到walreceiver网络接收并落盘,最终备库startup进程会对XLOG进行应用。 

startup进程主要流程
startup进程进入standby模式和apply日志主要过程:

读取pg_control文件,找到redo位点;读取recovery.conf,如果配置standby_mode=on则进入standby模式。
如果是Hot Standby需要初始化clog、subtrans、事务环境等。初始化redo资源管理器,比如Heap、Heap2、Database、XLOG等。
读取WAL record,如果record不存在需要调用XLogPageRead->WaitForWALToBecomeAvailable->RequestXLogStreaming唤醒walreceiver从walsender获取WAL record。
对读取的WAL record进行redo,通过record->xl_rmid信息,调用相应的redo资源管理器进行redo操作。比如heap_redo的XLOG_HEAP_INSERT操作,就是通过record的信息在buffer page中增加一个record。还有部分redo操作(vacuum产生的record)需要检查在Hot Standby模式下的查询冲突,比如某些tuples需要remove,而存在正在执行的query可能读到这些tuples,这样就会破坏事务隔离级别。通过函数ResolveRecoveryConflictWithSnapshot检测冲突,如果发生冲突,那么就把这个query所在的进程kill掉。
检查一致性,如果一致了,Hot Standby模式可以接受用户只读查询;更新共享内存中XLogCtlData的apply位点和时间线;如果恢复到时间点,时间线或者事务id需要检查是否恢复到当前目标;
回到步骤3,读取next WAL record。 

WalReceiver 进程负责接收 Primary DataNode节点的Wal日志到Standby DataNode节点。 

【WalReceiver】  WalReceiver进程的入口函数 int WalReceiverMain(void)

\src\backend\replication\Walreceive.c

/* Main entry point for walreceiver process */
void
WalReceiverMain(void)

{

......

*
         * Get any missing history files. We do this always, even when we're
         * not interested in that timeline, so that if we're promoted to
         * become the master later on, we don't select the same timeline that
         * was already used in the current master. This isn't bullet-proof -
         * you'll need some external software to manage your cluster if you
         * need to ensure that a unique timeline id is chosen in every case,
         * but let's avoid the confusion of timeline id collisions where we
         * can.
         */
        WalRcvFetchTimeLineHistoryFiles(startpointTLI, primaryTLI);

      /*
         * Start streaming.
         *
         
* We'll try to start at the requested starting point and timeline,
         * even if it's different from the server's latest timeline. In case
         * we've already reached the end of the old timeline, the server will
         * finish the streaming immediately, and we will go back to await
         * orders from the startup process. If recovery_target_timeline is
         * 'latest', the startup process will scan pg_wal and find the new
         * history file, bump recovery target timeline, and ask us to restart
         * on the new timeline.
         */
        options.logical = false;
        options.startpoint = startpoint;
        options.slotname = slotname[0] != '\0' ? slotname : NULL;
        options.proto.physical.startpointTLI = startpointTLI;
        ThisTimeLineID = startpointTLI;
        if (walrcv_startstreaming(wrconn, &options))

......

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值