Ratis源码解析(转载)

转载自:https://zhuanlan.zhihu.com/p/476876447

参考文献:https://juejin.cn/post/6907151199141625870

参考文献: https://raft.github.io/raft.pdf

介绍

Apache Ratis是开源的、由Java实现的Multi-Raft共识协议,官网在Apache Ratis,对应的代码仓库在ASF Git Repos - ratis.git/summary

总体架构图

Client

  1. 客户端可以为每种不同的请求(Read/Write/Watch等)设置不同的RetryPolicy,这些会被以下的数据结构管理
EnumMap<RaftProtos.RaftClientRequestProto.TypeCase, RetryPolicy> map

2. 客户端的每一个请求都带上了 clientID和callerID, 用于唯一标识客户端的每一个请求。这两个字段会被RaftServer用作实现exactly-once的请求语义

3. 客户端会为每一个Raft Group的Leader进行缓存,如果在60s内这个Leader没有被访问,那么自动会将这个缓存信息过期

Cache<RaftGroupId, RaftPeerId> LEADER_CACHE = CacheBuilder.newBuilder()
      .expireAfterAccess(60, TimeUnit.SECONDS)
      .maximumSize(1024).build();

RetryCache

RetryCache是一个针对client请求的缓存,该cache会缓存最近被请求的request的reply。cache数据结构为

private final Cache<ClientInvocationId, CacheEntry> cache;

每一个Cache Entry有一个Expire Time,可以由用户进行配置。在RaftServerImpl中有使用到对应的Cache缓存。注意这个Cache只是用来优化Client的Retry操作的,并没有严格保证Exactly-Once语义。

final RetryCacheImpl.CacheQueryResult queryResult = retryCache.queryCache(ClientInvocationId.valueOf(request));
final CacheEntry cacheEntry = queryResult.getEntry();
if (queryResult.isRetry()) {
  // if the previous attempt is still pending or it succeeded, return its
  // future
  replyFuture = cacheEntry.getReplyFuture();
} else {
  do_request...
}

Raft Peer身份管理

Raft协议中,每一个节点可能有三种身份,Leader、Candidate、Follower,Ratis在类org.apache.ratis.server.impl.RoleInfo 中实现并管理角色状态机的转换和实现,对应的方法如下:

1. Follower身份

  1. RaftServerImpl类中,通过方法changeToFollower让角色状态机进入到Follower的身份。该方法会关闭上一个身份的状态机(Leader的心跳守护线程和Candidate的选举线程),启动Follower状态机。对应的Follower状态代码实现在FollowerState
private synchronized boolean changeToFollower(long newTerm, boolean force, Object reason) {
  final RaftPeerRole old = role.getCurrentRole();
  final boolean metadataUpdated = state.updateCurrentTerm(newTerm);

  if (old != RaftPeerRole.FOLLOWER || force) {
    setRole(RaftPeerRole.FOLLOWER, reason);
    if (old == RaftPeerRole.LEADER) {
      role.shutdownLeaderState(false);
    } else if (old == RaftPeerRole.CANDIDATE) {
      role.shutdownLeaderElection();
    } else if (old == RaftPeerRole.FOLLOWER) {
      role.shutdownFollowerState();
    }
    role.startFollowerState(this, reason);
  }
  return metadataUpdated;
}

2. ElectionTimeOut: 在配置中可以配置minRpcTimeoutMs和maxRpcTimeoutMs,Ratis使用[minRPCTimeOut, maxRPCTimeOut]的范围内的随机值作为ElectionTimeOut

TimeDuration getRandomElectionTimeout() {
  final int min = properties().minRpcTimeoutMs();
  final int millis = min + ThreadLocalRandom.current().nextInt(properties().maxRpcTimeoutMs() - min + 1);
  return TimeDuration.valueOf(millis, TimeUnit.MILLISECONDS);
}

3. Follower会记录最后一次RPC的时间,如果发现超过了ElectionTimeOut并且符合选举条件,那么就会启动选举流程,调用RaftServerImpl的changeToCandidate方法进入到Candidate身份。

while (isRunning && server.getInfo().isFollower()) {
  final TimeDuration electionTimeout = server.getRandomElectionTimeout();
  final TimeDuration extraSleep = electionTimeout.sleep();
  final boolean isFollower = server.getInfo().isFollower();

  synchronized (server) {
    if (outstandingOp.get() == 0
        && isRunning
        && lastRpcTime.elapsedTime().compareTo(electionTimeout) >= 0
        && !lostMajorityHeartbeatsRecently()) {
      server.changeToCandidate(false);
      break;
    }
  }
}

2. Candidate身份

1. 在RaftServerImpl类中,通过方法changeToCandidate让角色状态机进入到Candidate身份,开启选举流程。对应的Candidate状态代码实现在LeaderElection类中。

  synchronized void changeToCandidate(boolean forceStartLeaderElection) {
    Preconditions.assertTrue(getInfo().isFollower());
    role.shutdownFollowerState();
    setRole(RaftPeerRole.CANDIDATE, "changeToCandidate");
    if (state.shouldNotifyExtendedNoLeader()) {
      stateMachine.followerEvent().notifyExtendedNoLeader(getRoleInfoProto());
    }
    // start election
    role.startLeaderElection(this, forceStartLeaderElection);
  }

2. 通过后台的Daemon来进行Leader Election,代码在LeaderElection类中的run()。Leader Election实现了首领选举流程,并做了两个优化:PreVote和优先级选举。用户可以配置关闭Pre-Vote机制和优先级选举机制。

  • Pre-Vote机制,为保证在网络分区情况下,不会出现经常性的选举。在正式的选举前,先进行一轮PreVote选举,只有通过了PreVote选举,才会增加Term进行正式的选举。
  • 优先级选举机制,为防止出现多个Candidate瓜分选票。为所有的Raft Peer设置优先级,选举的时候,多个不同优先级的Candidate同时出现,高优先级的Candidate将会逼迫低优先级的Candidate进入Follower

选举过程如下:首先进行一轮PreVote,成功之后进行正式的Election,选举为Leader之后状态机会进入Leader身份。在每一轮的选举中,当前的Candidate必须要获得Majority的投票,并且获得所有Priority高于Candidate的Peer的投票,这样才算通过选举。

if (skipPreVote || askForVotes(Phase.PRE_VOTE)) {
  if (askForVotes(Phase.ELECTION)) {
    server.changeToLeader();
  }
}

其中,askForVote方法会向所有的Raft Group中的Peer发送对应的Request Vote RPC,并且收集最终选举结果。

final TermIndex lastEntry = server.getState().getLastEntry();
final Executor voteExecutor = new Executor(this, others.size());
try {
  final int submitted = submitRequests(phase, electionTerm, lastEntry, others, voteExecutor);
  r = waitForResults(phase, electionTerm, submitted, conf, voteExecutor);
} finally {
  voteExecutor.shutdown();
}

其中,SubmitRequests负责发送RPC请求,通过一个固定线程池完成网络任务。waitForResults会在下次ElectionTimeOut之前等待RPC的回复,并根据是否选举成功(Majority+Priority)来返回本次选举结果。如果选举成功,那么就会调用RaftServerImpl的方法changeToLeader进入Leader身份。

3. Leader身份

选举成功后,通过调用RaftServerImpl的方法changeToLeader将角色状态机改为Leader身份,对应的Leader状态代码实现在LeaderState

synchronized void changeToLeader() {
  Preconditions.assertTrue(getInfo().isCandidate());
  role.shutdownLeaderElection();
  setRole(RaftPeerRole.LEADER, "changeToLeader");
  state.becomeLeader();

  // start sending AppendEntries RPC to followers
  final LogEntryProto e = role.startLeaderState(this);
  getState().setRaftConf(e);
}

在成为新的Leader之后,会首先commit一条no-op,保证之前所有的日志被复制到Raft Peers,同时也能精准地获取到Term和Index的信息,返回给ServerStateMachine

LogEntryProto start() {
  // In the beginning of the new term, replicate a conf entry in order
  // to finally commit entries in the previous term.
  // Also this message can help identify the last committed index and the conf.
  final LogEntryProto placeHolder = LogProtoUtils.toLogEntryProto(
    server.getRaftConf(), server.getState().getCurrentTerm(), raftLog.getNextIndex());
  CodeInjectionForTesting.execute(APPEND_PLACEHOLDER,
                                  server.getId().toString(), null);
  raftLog.append(placeHolder);
  processor.start();
  senders.forEach(LogAppender::start);
  return placeHolder;
}

接下来,对于每一个Follower,都启动一个LogAppender的后台线程(LogAppenderDefault.run()),用于向Follower发送对应的日志和snapshot,让Follower的状态跟上Leader的状态。

Raft Log日志管理

Apache Ratis实现了两种不同的日志策略:内存日志和磁盘分段日志。其中内存日志不推荐使用,更多是为了测试而实现的,因此主要看一看SegmentedRaftLog的实现。

Raft Log日志层实现的操作主要就是appendget两个原语,由RaftServer通过append的方式交给日志层实现,由对应的StateMachineUpdater/Leader Log Appender工作线程通过get到最新的日志。

1. 总体结构

SegmentedRaftLog将日志存储在本地磁盘的文件上,以分段文件(segment file)的方式存储,每一个段文件中包含了多个Log Entries。单个段文件有8MB上限,每一条日志大小也有8MB的上限。

当一个段内的日志增多,达到段文件上限8MB之后,会关闭这个段文件,重新开启一个段文件继续写日志。被关闭的段文件不能被修改,只能因为和Leader出现冲突而被truncated

Ratis提供了两个Cache进行性能优化

  • 为当前Open的SegmentFile文件在内存中设立了一个缓存,对应的内存类SegmentFile。目前的缓存策略是简单把整一个SegementFile都缓存在内存,等到写满了的时候再刷到外部磁盘
  • 为Raft Entry设置了一个Cache,会将从磁盘读到的日志缓存起来

2. 提交新的日志

Leader向RaftLog提交新的日志,实现在方法appendImpl。方法首先获取日志文件的writeLock,然后

  • 提交以TransactionContext作为内容的日志前,允许TransactionContext执行preAppendTransaction钩子方法进行额外逻辑操作
  • RaftLogSequentialOps.Runner提交一个异步任务appendEntry,由Runner保证线性执行
  • 等待异步任务完成,返回给上层StateMachine index。
private long appendImpl(long term, TransactionContext operation) throws StateMachineException {
  checkLogState();
  try(AutoCloseableLock writeLock = writeLock()) {
    final long nextIndex = getNextIndex();

    // This is called here to guarantee strict serialization of callback executions in case
    // the SM wants to attach a logic depending on ordered execution in the log commit order.
    operation = operation.preAppendTransaction();
    // build the log entry after calling the StateMachine
    final LogEntryProto e = operation.initLogEntry(term, nextIndex);

    appendEntry(e);
    return nextIndex;
  }
}

异步的appendEntry写日志操作如下:

  • 将日志提交给内存中的Open Segment缓存,
  • 直接更新Log Entry Cache,将这个刚写入的日志缓存起来。注意这两个步骤不能调换,不然会出现不一致。
protected CompletableFuture<Long> appendEntryImpl(LogEntryProto entry) {
  try(AutoCloseableLock writeLock = writeLock()) {
    validateLogEntry(entry);
    // 找到目前的Segment的内存缓冲
    final LogSegment currentOpenSegment = cache.getOpenSegment();
    if (currentOpenSegment == null) {
      // 如果没有Segment,那么新建对应的Segment文件
      cache.addOpenSegment(entry.getIndex());
      fileLogWorker.startLogSegment(entry.getIndex()); // 向IO Worker提交一个新建Segment文件任务
    } else if (isSegmentFull(currentOpenSegment, entry)) {
      // 当前的Segment写满了,那么刷盘这个Segment,新建一个Segment
      cache.rollOpenSegment(true);
      fileLogWorker.rollLogSegment(currentOpenSegment);
    } 

    // If the entry has state machine data, then the entry should be inserted
    // to statemachine first and then to the cache. Not following the order
    // will leave a spurious entry in the cache.
    CompletableFuture<Long> writeFuture =
      fileLogWorker.writeLogEntry(entry).getFuture();
    cache.appendEntry(entry, LogSegment.Op.WRITE_CACHE_WITHOUT_STATE_MACHINE_CACHE);

    return writeFuture;
  } 
}

3. 读取index日志

由StateMachineUpdater调用get接口获得被commit的日志apply到上层StateMachine。由Leader的LogAppender线程读取成功提交、需要被复制到所有Follower的新日志。

读取日志的时候会首先从Log Entry Cache中读取。如果Cache Miss才会去磁盘中读取

public LogEntryProto get(long index) throws RaftLogIOException {
  checkLogState();
  final LogSegment segment;
  final LogRecord record;
  try (AutoCloseableLock readLock = readLock()) {
    segment = cache.getSegment(index);
    if (segment == null) {
      return null;
    }
    record = segment.getLogRecord(index);
    if (record == null) {
      return null;
    }
    final LogEntryProto entry = segment.getEntryFromCache(record.getTermIndex());
    if (entry != null) {
      return entry;
    }
  }
  // the entry is not in the segment's cache. Load the cache without holding the lock.
  return segment.loadCache(record);
}

Apply和Snapshot管理:StateMachineUpdater

这个独立线程不断更新目前Committed的Log,将其apply到上层的StateMachine。同时StateMachineUpdater会根据日志情况为当前制作snapshot,成功后purge掉不再需要的日志条目。其工作循环可以简化为下

for(; state != State.STOP; ) {
  waitForCommit();
  final MemoizedSupplier<List<CompletableFuture<Message>>> futures = applyLog();
  checkAndTakeSnapshot(futures);
}

1. Apply Log

其中applyLog方法负责向上层的StateMachine提交已经提交的日志

private MemoizedSupplier<List<CompletableFuture<Message>>> applyLog() throws RaftLogIOException {
  final MemoizedSupplier<List<CompletableFuture<Message>>> futures = MemoizedSupplier.valueOf(ArrayList::new);
  final long committed = raftLog.getLastCommittedIndex();
  for(long applied; (applied = getLastAppliedIndex()) < committed && state == State.RUNNING && !shouldStop(); ) {
    final long nextIndex = applied + 1;
    final LogEntryProto next = raftLog.get(nextIndex);
    final CompletableFuture<Message> f = server.applyLogToStateMachine(next);
    final long incremented = appliedIndex.incrementAndGet(debugIndexChange);
  }
  return futures;
}

2. Snapshot

当日志超过400000条,或者用户手动触发的时候,就会启动snapshot机制。snapshot机制将会调用上层StateMachine的takeSnapshot()的方法,由SM完成快照并且持久化之后,将Snapshot之后的日志Index返回给底层的RaftLog,然后就可以触发purge任务删除被包含的日志。

private void takeSnapshot() {
  final long i;
  i = stateMachine.takeSnapshot();
  takeSnapshotTimerContext.stop();
  server.getSnapshotRequestHandler().completeTakingSnapshot(i);
  stateMachine.getStateMachineStorage().cleanupOldSnapshots(snapshotRetentionPolicy);

  snapshotIndex.updateIncreasingly(i, infoIndexChange);
  final long purgeIndex = i;
  raftLog.purge(purgeIndex);  
}

构筑上层应用:StateMachine

如果想要在ratis的基础上构建自己的应用程序,例如一个KV Storage Service,那么需要实现StateMachine中的基本操作接口,如下

1. DataApi

默认会把操作以及操作的数据以日志的方式写入到RaftLog。如果应用程序是data-intensive的,那么这可能会导致数据被多次copy,因此暴露DataApi来将操作和数据分开管理。

interface DataApi {
  DataApi DEFAULT = new DataApi() {};
  default CompletableFuture<ByteString> read(LogEntryProto entry);
  default CompletableFuture<?> write(LogEntryProto entry);
  default CompletableFuture<DataStream> stream(RaftClientRequest request);
  default CompletableFuture<?> link(DataStream stream, LogEntryProto entry);
  default CompletableFuture<Void> flush(long logIndex);
  default CompletableFuture<Void> truncate(long logIndex);
}

一个具体的例子由FileStore给出。这个状态机存储filename->file content的映射。在写操作的时候,raft原先的做法是把content先commit到日志,然后apply的时候从日志读取内存,最后写入对应的文件。这个过程中出现了多次的copy,IO效率降低。因此,采用DataApi,可以在commit之前直接把数据写入到文件,然后再提交空的log。由StateMachine的上层逻辑来保证提前写入文件的数据的一致性、持久性。

issue:https://issues.apache.org/jira/browse/RATIS-122?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=16235110#comment-16235110

在FileStore的例子中,采取Override startTransaction的方式,在commit日志之前将write操作改成writecommit操作,然后Override write操作,将File Content在日志提交的时候就写入到文件。最后apply的时候进行commit本次操作。

startTransaction();

WriteLog() {
   StateMachine.write();
   writeRaftLog();
}

apply() {
   commitStateMachineWrite();
}

2. EventApi

EventApi是底层Raft出现状态变更(例如Leader变化)的时候告知上层StateMachine的钩子函数

interface EventApi {
  EventApi DEFAULT = new EventApi() {};
  default void notifyLeaderChanged(RaftGroupMemberId groupMemberId, RaftPeerId newLeaderId) {}
  default void notifyTermIndexUpdated(long term, long index) {}
  default void notifyConfigurationChanged(long term, long index, RaftConfigurationProto newRaftConfiguration) {}
  default void notifyGroupRemove() {}
  default void notifyLogFailed(Throwable cause, LogEntryProto failedEntry) {}
}

3. LeaderEventApi

LeaderEventApi是当前的Peer是Leader的时候,告知上层SM出现特殊的Event的钩子函数

interface LeaderEventApi {
  LeaderEventApi DEFAULT = new LeaderEventApi() {};
  default void notifyFollowerSlowness(RoleInfoProto roleInfoProto) {}
  default void notifyNotLeader(Collection<TransactionContext> pendingEntries){}
}

4. FollowerEventApi

FollowerEventApi是当前的Peer是Follower的时候,告知上层SM出现特殊Event的钩子函数

interface FollowerEventApi {
  FollowerEventApi DEFAULT = new FollowerEventApi() {};
  default void notifyExtendedNoLeader(RoleInfoProto roleInfoProto) {}
  default CompletableFuture<TermIndex> notifyInstallSnapshotFromLeader(
    RoleInfoProto roleInfoProto, TermIndex firstTermIndexInLog) {}
}

5. 生命周期接口

void initialize(RaftServer raftServer, RaftGroupId raftGroupId, RaftStorage storage);
LifeCycle.State getLifeCycleState();
void pause();
void reinitialize() throws IOException;

6. snapshot接口

SnapshotInfo getLatestSnapshot();
void cleanupOldSnapshots(SnapshotRetentionPolicy snapshotRetentionPolicy) throws IOException;
long takeSnapshot() throws IOException;

7. 查询状态机接口

CompletableFuture<Message> query(Message request);
CompletableFuture<Message> queryStale(Message request, long minIndex);

8. 更改状态机接口

注意这里applyTransaction是RaftLog提交日志的调用接口,这个接口一定会按日志顺序被调用(线性化语义),但是上层StateMachine可以决定对应的transaction的执行方法,因此可以适当异步和并行来提高效率。

// 将用户的请求转化为TransactionContext
TransactionContext startTransaction(RaftClientRequest request) throws IOException;
// 提交日志前可以做的额外逻辑
TransactionContext preAppendTransaction(TransactionContext trx) throws IOException;
// 告知用户Transaction失败
TransactionContext cancelTransaction(TransactionContext trx) throws IOException;
TransactionContext applyTransactionSerial(TransactionContext trx) throws InvalidProtocolBufferException;
// 按顺序提交Transaction。 SM决定对trx的操作顺序
CompletableFuture<Message> applyTransaction(TransactionContext trx);

9. Example: CounterStateMachine

CounterStateMachine实现了一个非常简单的应用状态机:管理一个Integer类型的Counter,状态机接受的操作有Get和Increment

Get操作:重写Query接口

public CompletableFuture<Message> query(Message request) {
  String msg = request.getContent().toString(Charset.defaultCharset());
  assertEquals(msg, "GET");
  return CompletableFuture.completedFuture(
    Message.valueOf(counter.toString()));
}

Increment操作:重写applyTransaction接口

public CompletableFuture<Message> applyTransaction(TransactionContext trx) {
  final RaftProtos.LogEntryProto entry = trx.getLogEntry();

  //check if the command is valid
  String logData = entry.getStateMachineLogEntry().getLogData()
    .toString(Charset.defaultCharset());
  assertEquals(logData, "INCRMENT");
  //update the last applied term and index
  final long index = entry.getIndex();
  updateLastAppliedTermIndex(entry.getTerm(), index);

  //actual execution of the command: increment the counter
  counter.incrementAndGet();

  //return the new value of the counter to the client
  final CompletableFuture<Message> f =
    CompletableFuture.completedFuture(Message.valueOf(counter.toString()));

  return f;
}

Snapshot接口:

public long takeSnapshot() {
  //get the last applied index
  final TermIndex last = getLastAppliedTermIndex();

  //create a file with a proper name to store the snapshot
  final File snapshotFile =
    storage.getSnapshotFile(last.getTerm(), last.getIndex());

  //serialize the counter object and write it into the snapshot file
  try (ObjectOutputStream out = new ObjectOutputStream(
    new BufferedOutputStream(new FileOutputStream(snapshotFile)))) {
    out.writeObject(counter);
  } catch (IOException ioe) {
    LOG.warn("Failed to write snapshot file \"" + snapshotFile
             + "\", last applied index=" + last);
  }

  //return the index of the stored snapshot (which is the last applied one)
  return last.getIndex();
}

Multi-Raft实现

单组成员变更

Ratis允许一次变更多个Peer成员,使用了Raft Paper中采用的两阶段成员变更Config(Old, New)的方式,具体的调用入口为

public RaftClientReply setConfiguration(SetConfigurationRequest request) throws IOException {
  return waitForReply(request, setConfigurationAsync(request));
}

对应的setConfigurationAsync实现可以简化为:首先初始化新Peer的必要的数据结构,例如RPC地址、LogAppender线程等内容。

final RaftConfigurationImpl current = getRaftConf();
getRaftServer().addRaftPeers(peersInNewConf);
// add staging state into the leaderState
pending = leaderState.startSetConfiguration(request);

Collection<RaftPeer> newPeers = configurationStagingState.getNewPeers();
// set the staging state
this.stagingState = configurationStagingState;

if (newPeers.isEmpty()) {
  applyOldNewConf();
} else {
  // update the LeaderState's sender list
  addAndStartSenders(newPeers);
}

等完全初始化之后,就提交一个Config(Old,New)日志,等待被apply

final ServerState state = server.getState();
final RaftConfigurationImpl current = state.getRaftConf();
final RaftConfigurationImpl oldNewConf= stagingState.generateOldNewConf(current, state.getLog().getNextIndex());
// apply the (old, new) configuration to log, and use it as the current conf
long index = state.getLog().append(state.getCurrentTerm(), oldNewConf);
updateConfiguration(index, oldNewConf);

最终会被上层ApplyLog的时候捕捉,将最新的Config写入到持久化文件中,使用对应的Config

if (next.hasConfigurationEntry()) {
  // the reply should have already been set. only need to record
  // the new conf in the metadata file and notify the StateMachine.
  state.writeRaftConfiguration(next);
  stateMachine.event().notifyConfigurationChanged(next.getTerm(), next.getIndex(), next.getConfigurationEntry());
}

注意,在过渡阶段,Config(Old, New)会影响赢得选举的条件。即一个Peer必须在Old和New两个Config中都获得Majority才能赢得选举。

boolean hasMajority(Collection<RaftPeerId> others, RaftPeerId selfId) {
  Preconditions.assertTrue(!others.contains(selfId));
  return conf.hasMajority(others, selfId) &&
    (oldConf == null || oldConf.hasMajority(others, selfId));
}

多成员组管理

RaftServer接口定义了一个Raft-Server端所需要完成的所有接口,对应的实现类是RaftServerImpl。这个类完成了一个基于Raft的服务端,其中RoleInfo代表了该Server的角色,StateMachine代表了上层应用的数据状态机,RaftLog代表了这个Server的Raft日志。

RaftServerProxy同样实现了RaftServer接口,该类是Multi-Raft实现入口。通过维护一个Map<GroupId, List<RaftServerImpl>>来构建多个RaftGroup和RaftGroupMember的映射关系。每一个Group中的每一个Member都是一个RaftServerImpl,有独立的RaftLog和StateMachine。

工厂方法build()最终会返回一个RaftServerProxy的实例

public RaftServer build() throws IOException {
  return newRaftServer(
    serverId,
    group,
    Objects.requireNonNull(stateMachineRegistry , "Neither 'stateMachine' nor 'setStateMachineRegistry' " +
                           "is initialized."),
    Objects.requireNonNull(properties, "The 'properties' field is not initialized."),
    parameters);
}

RaftServerProxy接受对应客户端请求,修改Group management的入口在

public CompletableFuture<RaftClientReply> groupManagementAsync(GroupManagementRequest request) {
  final RaftGroupId groupId = request.getRaftGroupId();
  final GroupManagementRequest.Add add = request.getAdd();
  if (add != null) {
    return groupAddAsync(request, add.getGroup());
  }
  final GroupManagementRequest.Remove remove = request.getRemove();
  if (remove != null) {
    return groupRemoveAsync(request, remove.getGroupId(),
                            remove.isDeleteDirectory(), remove.isRenameDirectory());
  }
}

以Add Group为例,收到这个请求的RaftServerProxy会新建一个RaftServerImpl并且以Follower的身份启动。因此推测如果要新加一个Group,需要对所有的对应Peer发送AddGroup的请求。对于remove group来说亦是如此。

impls.addNew(newGroup)
  .thenApplyAsync(newImpl -> {
    final boolean started = newImpl.start();
    return newImpl.newSuccessReply(request);
  });
    
synchronized CompletableFuture<RaftServerImpl> addNew(RaftGroup group) {
  final RaftGroupId groupId = group.getGroupId();
  final CompletableFuture<RaftServerImpl> newImpl = newRaftServerImpl(group);
  final CompletableFuture<RaftServerImpl> previous = map.put(groupId, newImpl);
  return newImpl;
}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值