【Postgresql源码分析之五】主备热备代码分析--Walsender进程(一)

前面介绍了主备热备中的同步复制代码 部分,下面介绍其中核心部分walsender和walreceiver进程。首先介绍walsender进程部分。  walsender进程,顾名思义就是发送wal日志的进程,walreciver进程就是接收wal日志的进程,主机将产生的wal日志发送给备机,备机通过startup进程完成redo操作,保证主备机之间操作的一致。下面首先以PG9.3中的walsender与walreciver进程为例,介绍主备热备代码流程。

walsender进程相关结构体变量

在src/include/replication/walsender_private.h中,定义了walsender进程需要使用到的结构体,分别如下
/*
 * Each walsender has a WalSnd struct in shared memory.
 */
typedef struct WalSnd
{
	pid_t		pid;			/* this walsender's process id, or 0 */
	WalSndState state;			/* this walsender's state */
	XLogRecPtr	sentPtr;		/* WAL has been sent up to this point */
	bool		needreload;		/* does currently-open file need to be
								 * reloaded? */

	/*
	 * The xlog locations that have been written, flushed, and applied by
	 * standby-side. These may be invalid if the standby-side has not offered
	 * values yet.
	 */
	XLogRecPtr	write;
	XLogRecPtr	flush;
	XLogRecPtr	apply;

	/* Protects shared variables shown above. */
	slock_t		mutex;

	/*
	 * Latch used by backends to wake up this walsender when it has work to
	 * do.
	 */
	Latch		latch;

	/*
	 * The priority order of the standby managed by this WALSender, as listed
	 * in synchronous_standby_names, or 0 if not-listed. Protected by
	 * SyncRepLock.
	 */
	int			sync_standby_priority;
} WalSnd;

该结构体是每个walsender进程都存在的,存储相应的变量。WalSndState保存的是walsender进程的状态信息,变量信息如下
typedef enum WalSndState
{
	WALSNDSTATE_STARTUP = 0,
	WALSNDSTATE_BACKUP,
	WALSNDSTATE_CATCHUP,
	WALSNDSTATE_STREAMING
} WalSndState;

WALSNDSTATE_STARTUP表示启动状态,WALSNDSTATE_BACKUP表示备份状态,WALSNDSTATE_CATCHUP表示追赶状态,WALSNDSTATE_STREAMING表示流复制状态。write/flush/apply分别表示主机上xlog日志的写入/刷入位置以及接收到备机回应的XLOG日志位置。

/* There is one WalSndCtl struct for the whole database cluster */
typedef struct
{
	/*
	 * Synchronous replication queue with one queue per request type.
	 * Protected by SyncRepLock.
	 */
	SHM_QUEUE	SyncRepQueue[NUM_SYNC_REP_WAIT_MODE];

	/*
	 * Current location of the head of the queue. All waiters should have a
	 * waitLSN that follows this value. Protected by SyncRepLock.
	 */
	XLogRecPtr	lsn[NUM_SYNC_REP_WAIT_MODE];

	/*
	 * Are any sync standbys defined?  Waiting backends can't reload the
	 * config file safely, so checkpointer updates this value as needed.
	 * Protected by SyncRepLock.
	 */
	bool		sync_standbys_defined;

	WalSnd		walsnds[1];		/* VARIABLE LENGTH ARRAY */
} WalSndCtlData;

该结构体变量是整个实例拥有一个,用于定义全局相关信息以及每个walsender进程结构。

一次发送xlog日志的最大字节数
/*
 * Maximum data payload in a WAL data message.  Must be >= XLOG_BLCKSZ.
 *
 * We don't have a good idea of what a good value would be; there's some
 * overhead per message in both walsender and walreceiver, but on the other
 * hand sending large batches makes walsender less responsive to signals
 * because signals are checked only between messages.  128kB (with
 * default 8k blocks) seems like a reasonable guess for now.
 */
#define <strong>MAX_SEND_SIZE</strong> (XLOG_BLCKSZ * 16)
定义是主机模式还是级联主机模式(备机接收之后再次发送给其备机)
bool		am_walsender = false;		/* Am I a walsender process ? */
bool		am_cascading_walsender = false;		/* Am I cascading WAL to
							* another standby ? */
设置walsender进程个数以及发送超时时间
int			max_wal_senders = 0;	/* the maximum number of concurrent walsenders */
int			wal_sender_timeout = 60 * 1000;		/* maximum time to send one WAL data message */

walsender进程函数分析

主备之间的通信有一套完整的协议信息,在达到正常流复制状态之前,需要通过一系列的检查建立正常的关系。

初始化函数InitWalSender

void
InitWalSender(void)
{
	am_cascading_walsender = RecoveryInProgress();

	/* Create a per-walsender data structure in shared memory */
	InitWalSenderSlot();

	/* Set up resource owner */
	CurrentResourceOwner = ResourceOwnerCreate(NULL, "walsender top-level resource owner");

	/*
	 * Let postmaster know that we're a WAL sender. Once we've declared us as
	 * a WAL sender process, postmaster will let us outlive the bgwriter and
	 * kill us last in the shutdown sequence, so we get a chance to stream all
	 * remaining WAL at shutdown, including the shutdown checkpoint. Note that
	 * there's no going back, and we mustn't write any WAL records after this.
	 */
	MarkPostmasterChildWalSender();
	SendPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE);
}
在该函数中,首先调用函数InitWalSenderSlot初始化walsnd结构体的相关信息,主要是根据设置的最大walsender进程个数分配足够的内存空间。最后发送PMSIGNAL_ADVANCE_STATE_MACHINE信号给postmater进程,通知其更新进程状态信息。之后,主机再接收到备机发送的信息之后进入相应的处理函数流程。

目前在PG9.3中支持的流复制协议主要有
	T_IdentifySystemCmd,
	T_BaseBackupCmd,
	T_StartReplicationCmd,
	T_TimeLineHistoryCmd,
分别表示系统号确认命令、基础备份命令、流复制命令以及时间线信息命令。在主机接收到备机发送的消息之后,在后端处理函数中,如果为主机模式,调用函数exec_replication_command进入流复制命令处理流程。

处理流复制命令函数exec_replication_command

该函数处理接收到的命令,分别调用不同的函数处理
/*
 * Execute an incoming replication command.
 */
void
exec_replication_command(const char *cmd_string)
{
	int			parse_rc;
	Node	   *cmd_node;
	MemoryContext cmd_context;
	MemoryContext old_context;

	elog(DEBUG1, "received replication command: %s", cmd_string);

	CHECK_FOR_INTERRUPTS();

	cmd_context = AllocSetContextCreate(CurrentMemoryContext,
										"Replication command context",
										ALLOCSET_DEFAULT_MINSIZE,
										ALLOCSET_DEFAULT_INITSIZE,
										ALLOCSET_DEFAULT_MAXSIZE);
	old_context = MemoryContextSwitchTo(cmd_context);

	replication_scanner_init(cmd_string);
	parse_rc = replication_yyparse();
	if (parse_rc != 0)
		ereport(ERROR,
				(errcode(ERRCODE_SYNTAX_ERROR),
				 (errmsg_internal("replication command parser returned %d",
								  parse_rc))));

	cmd_node = replication_parse_result;

	switch (cmd_node->type)
	{
		case T_IdentifySystemCmd:
			IdentifySystem();
			break;

		case T_StartReplicationCmd:
			StartReplication((StartReplicationCmd *) cmd_node);
			break;

		case T_BaseBackupCmd:
			SendBaseBackup((BaseBackupCmd *) cmd_node);
			break;

		case T_TimeLineHistoryCmd:
			SendTimeLineHistory((TimeLineHistoryCmd *) cmd_node);
			break;

		default:
			elog(ERROR, "unrecognized replication command node tag: %u",
				 cmd_node->type);
	}

	/* done */
	MemoryContextSwitchTo(old_context);
	MemoryContextDelete(cmd_context);

	/* Send CommandComplete message */
	EndCommand("SELECT", DestRemote);
}
该函数是接收到命令之后,进入相应的函数处理,解析复制命令的时候使用到了语法解析器。

1) T_IdentifySystemCmd   系统确认命令

       当备机启动之后,会发送该命令给主机,请求主机的相关信息,包括SystemID、时间线ID以及当前的XLOG日志位置。
	/*
	 * Reply with a result set with one row, three columns. First col is
	 * system ID, second is timeline ID, and third is current xlog location.
	 */

2)T_BaseBackupCmd 基础备份命令

该命令给pg_basebackup命令使用,调用perform_base_backup函数,walsender进程发送数据和日志。

3)T_TimeLineHistoryCmd 时间线历史命令

发送时间线文件及其内容
	/*
	 * Reply with a result set with one row, and two columns. The first col is
	 * the name of the history file, 2nd is the contents.
	 */

4)T_StartReplicationCmd  流复制命令
在主机与备机建立正常的关系之后,备机会向主机发送该命令请求正常的流复制关系。




  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值