在MongoDB中, 副本集节点之间为了保持一致性, 需要通过oplog的同步与回放来进行。MongoDB采用的是节点向源节点主动拉取的方式, 从源节点拉取oplog, 目的节点需要及时通知其他节点它的最新的同步到的时间点。
如上图所示, 2个Secondary从Primary上面拉取oplog,每当secondary的时间点发生改变, 会调用replSetUpdatePosition来告诉
在mongod内, 有一个专门的名字为SyncSourceFeedback的线程,它负责向与节点汇报当前节点的进度, Primary本身是不需要的, 因为它不需向其他节点同步数据,当然不存储数据的节点, 例如Arbiter类型的节点也不需要。 有2个类专门负责这项任务: SyncSourceFeedback与Reporter, 其调用关系如下图所示:
SyncSourceFeedback
SyncSourceFeedback负责:
- 节点是否需要向其源节点汇报位置;
- 节点角色的转换, 比如一个节点从secondary转化为primary, 就不需要继续汇报位置了;
- 同步源发生了切换, 比如原本从A节点同步, 后来变成从B同步;
- 调用Reporter汇报位置, 它本身将具体的汇报工作交给Reporter来做;
void SyncSourceFeedback::run(executor::TaskExecutor* executor,
BackgroundSync* bgsync,
ReplicationCoordinator* replCoord) {
Client::initThread("SyncSourceFeedback");
HostAndPort syncTarget;
// keepAliveInterval indicates how frequently to forward progress in the absence of updates.
Milliseconds keepAliveInterval(0);
while (true) { // breaks once _shutdownSignaled is true
// 判断节点的状态, 确定要不要汇报位置
{
while (!_positionChanged && !_shutdownSignaled) {
MemberState state = replCoord->getMemberState();
if (!(state.primary() || state.startup())) {
break;
}
}
//是否程序退出
if (_shutdownSignaled) {
break;
}
_positionChanged = false;
}
{
stdx::lock_guard<stdx::mutex> lock(_mtx);
MemberState state = replCoord->getMemberState();
if (state.primary() || state.startup()) {
continue;
}
}
// 源节点是否发生了变化
const HostAndPort target = bgsync->getSyncTarget();
if (target.empty()) {
if (syncTarget != target) {
syncTarget = target;
}
// Loop back around again; the keepalive functionality will cause us to retry
continue;
}
if (syncTarget != target) {
LOG(1) << "setting syncSourceFeedback to " << target;
syncTarget = target;
}
// 产生Reporter
Reporter reporter(executor,
makePrepareReplSetUpdatePositionCommandFn(replCoord, syncTarget, bgsync),
syncTarget,
keepAliveInterval,
syncSourceFeedbackNetworkTimeoutSecs);
{
stdx::lock_guard<stdx::mutex> lock(_mtx);
if (_shutdownSignaled) {
break;
}
_reporter = &reporter;
}
//上报位置信息
auto status = _updateUpstream(&reporter);
}
}
Status SyncSourceFeedback::_updateUpstream(Reporter* reporter) {
auto syncTarget = reporter->getTarget();
auto triggerStatus = reporter->trigger();
if (!triggerStatus.isOK()) {
warning() << "unable to schedule reporter to update replication progress on " << syncTarget
<< ": " << triggerStatus;
return triggerStatus;
}
auto status = reporter->join();
if (!status.isOK()) {
log() << "SyncSourceFeedback error sending update to " << syncTarget << ": " << status;
}
// Sync source blacklisting will be done in BackgroundSync and SyncSourceResolver.
return status;
}
Reporter
Reporter主要调用executor::TaskExecutor来完成command的request, callback以及response。
command是通过TopologyCoordinator::prepareReplSetUpdatePositionCommand来实现, 然后通过Reporter::trigger()开始一个command, Reporter::join()等待结束。
Status Reporter::join() {
stdx::unique_lock<stdx::mutex> lk(_mutex);
_condition.wait(lk, [this]() { return !_isActive_inlock(); });
return _status;
}
Status Reporter::trigger() {
if (_keepAliveTimeoutWhen != Date_t()) {
// Reset keep alive expiration to signal handler that it was canceled internally.
invariant(_prepareAndSendCommandCallbackHandle.isValid());
_keepAliveTimeoutWhen = Date_t();
_executor->cancel(_prepareAndSendCommandCallbackHandle);
return Status::OK();
} else if (_isActive_inlock()) {
_isWaitingToSendReporter = true;
return Status::OK();
}
auto scheduleResult =
_executor->scheduleWork([=](const executor::TaskExecutor::CallbackArgs& args) {
_prepareAndSendCommandCallback(args, true);
});
_status = scheduleResult.getStatus();
_prepareAndSendCommandCallbackHandle = scheduleResult.getValue();
return _status;
}
void Reporter::_prepareAndSendCommandCallback(const executor::TaskExecutor::CallbackArgs& args,
bool fromTrigger) {
// Must call without holding the lock.
auto prepareResult = _prepareCommand();
_sendCommand_inlock(prepareResult.getValue(), _updatePositionTimeout);
if (!_status.isOK()) {
_onShutdown_inlock();
return;
}
invariant(_remoteCommandCallbackHandle.isValid());
_prepareAndSendCommandCallbackHandle = executor::TaskExecutor::CallbackHandle();
_keepAliveTimeoutWhen = Date_t();
}
void Reporter::_sendCommand_inlock(BSONObj commandRequest, Milliseconds netTimeout) {
LOG(2) << "Reporter sending slave oplog progress to upstream updater " << _target << ": "
<< commandRequest;
auto scheduleResult = _executor->scheduleRemoteCommand(
executor::RemoteCommandRequest(_target, "admin", commandRequest, nullptr, netTimeout),
[this](const executor::TaskExecutor::RemoteCommandCallbackArgs& rcbd) {
_processResponseCallback(rcbd);
});
_status = scheduleResult.getStatus();
_remoteCommandCallbackHandle = scheduleResult.getValue();
}
从上面的代码看到, 基本上所有的功能都是通过executor::TaskExecutor* const _executor 来实现的, 最终通过executor::TaskExecutor::scheduleRemoteCommand完成调用。
replSetUpdatePosition同步的内容
prepareReplSetUpdatePositionCommand是通过TopologyCoordinator::prepareReplSetUpdatePositionCommand来完成,主要是把std::vector _memberData里面的ApplyTime, DurableTime以及其他的每个副本集节点信息发送给源节点。
如下是日志里面打印的该command信息: