Ratis源码解析（转载）

最新推荐文章于 2024-08-07 10:28:51 发布

周杰伦jc

最新推荐文章于 2024-08-07 10:28:51 发布

阅读量145

点赞数

文章标签：分布式 java

原文链接：https://zhuanlan.zhihu.com/p/476876447

版权

转载自：https://zhuanlan.zhihu.com/p/476876447

参考文献：https://juejin.cn/post/6907151199141625870

参考文献： https://raft.github.io/raft.pdf

介绍

Apache Ratis是开源的、由Java实现的Multi-Raft共识协议，官网在Apache Ratis，对应的代码仓库在ASF Git Repos - ratis.git/summary。

总体架构图

Client

客户端可以为每种不同的请求(Read/Write/Watch等)设置不同的RetryPolicy，这些会被以下的数据结构管理

EnumMap<RaftProtos.RaftClientRequestProto.TypeCase, RetryPolicy> map

2. 客户端的每一个请求都带上了 clientID和callerID，用于唯一标识客户端的每一个请求。这两个字段会被RaftServer用作实现exactly-once的请求语义

3. 客户端会为每一个Raft Group的Leader进行缓存，如果在60s内这个Leader没有被访问，那么自动会将这个缓存信息过期

Cache<RaftGroupId, RaftPeerId> LEADER_CACHE = CacheBuilder.newBuilder()
      .expireAfterAccess(60, TimeUnit.SECONDS)
      .maximumSize(1024).build();

RetryCache

RetryCache是一个针对client请求的缓存，该cache会缓存最近被请求的request的reply。cache数据结构为

private final Cache<ClientInvocationId, CacheEntry> cache;

每一个Cache Entry有一个Expire Time，可以由用户进行配置。在RaftServerImpl中有使用到对应的Cache缓存。注意这个Cache只是用来优化Client的Retry操作的，并没有严格保证Exactly-Once语义。

final RetryCacheImpl.CacheQueryResult queryResult = retryCache.queryCache(ClientInvocationId.valueOf(request));
final CacheEntry cacheEntry = queryResult.getEntry();
if (queryResult.isRetry()) {
  // if the previous attempt is still pending or it succeeded, return its
  // future
  replyFuture = cacheEntry.getReplyFuture();
} else {
  do_request...
}

Raft Peer身份管理

Raft协议中，每一个节点可能有三种身份，Leader、Candidate、Follower，Ratis在类org.apache.ratis.server.impl.RoleInfo 中实现并管理角色状态机的转换和实现，对应的方法如下：

1. Follower身份

在RaftServerImpl类中，通过方法changeToFollower让角色状态机进入到Follower的身份。该方法会关闭上一个身份的状态机（Leader的心跳守护线程和Candidate的选举线程），启动Follower状态机。对应的Follower状态代码实现在FollowerState中

private synchronized boolean changeToFollower(long newTerm, boolean force, Object reason) {
  final RaftPeerRole old = role.getCurrentRole();
  final boolean metadataUpdated = state.updateCurrentTerm(newTerm);

  if (old != RaftPeerRole.FOLLOWER || force) {
    setRole(RaftPeerRole.FOLLOWER, reason);
    if (old == RaftPeerRole.LEADER) {
      role.shutdownLeaderState(false);
    } else if (old == RaftPeerRole.CANDIDATE) {
      role.shutdownLeaderElection();
    } else if (old == RaftPeerRole.FOLLOWER) {
      role.shutdownFollowerState();
    }
    role.startFollowerState(this, reason);
  }
  return metadataUpdated;
}

2. ElectionTimeOut: 在配置中可以配置minRpcTimeoutMs和maxRpcTimeoutMs，Ratis使用[minRPCTimeOut, maxRPCTimeOut]的范围内的随机值作为ElectionTimeOut

TimeDuration getRandomElectionTimeout() {
  final int min = properties().minRpcTimeoutMs();
  final int millis = min + ThreadLocalRandom.current().nextInt(properties().maxRpcTimeoutMs() - min + 1);
  return TimeDuration.valueOf(millis, TimeUnit.MILLISECONDS);
}

3. Follower会记录最后一次RPC的时间，如果发现超过了ElectionTimeOut并且符合选举条件，那么就会启动选举流程，调用RaftServerImpl的changeToCandidate方法进入到Candidate身份。

while (isRunning && server.getInfo().isFollower()) {
  final TimeDuration electionTimeout = server.getRandomElectionTimeout();
  final TimeDuration extraSleep = electionTimeout.sleep();
  final boolean isFollower = server.getInfo().isFollower();

  synchronized (server) {
    if (outstandingOp.get() == 0
        && isRunning
        && lastRpcTime.elapsedTime().compareTo(electionTimeout) >= 0
        && !lostMajorityHeartbeatsRecently()) {
      server.changeToCandidate(false);
      break;
    }
  }
}

2. Candidate身份

1. 在RaftServerImpl类中，通过方法changeToCandidate让角色状态机进入到Candidate身份，开启选举流程。对应的Candidate状态代码实现在LeaderElection类中。

  synchronized void changeToCandidate(boolean forceStartLeaderElection) {
    Preconditions.assertTrue(getInfo().isFollower());
    role.shutdownFollowerState();
    setRole(RaftPeerRole.CANDIDATE, "changeToCandidate");
    if (state.shouldNotifyExtendedNoLeader()) {
      stateMachine.followerEvent().notifyExtendedNoLeader(getRoleInfoProto());
    }
    // start election
    role.startLeaderElection(this, forceStartLeaderElection);
  }

2. 通过后台的Daemon来进行Leader Election，代码在LeaderElection类中的run()。Leader Election实现了首领选举流程，并做了两个优化：PreVote和优先级选举。用户可以配置关闭Pre-Vote机制和优先级选举机制。

Pre-Vote机制，为保证在网络分区情况下，不会出现经常性的选举。在正式的选举前，先进行一轮PreVote选举，只有通过了PreVote选举，才会增加Term进行正式的选举。
优先级选举机制，为防止出现多个Candidate瓜分选票。为所有的Raft Peer设置优先级，选举的时候，多个不同优先级的Candidate同时出现，高优先级的Candidate将会逼迫低优先级的Candidate进入Follower

选举过程如下：首先进行一轮PreVote，成功之后进行正式的Election，选举为Leader之后状态机会进入Leader身份。在每一轮的选举中，当前的Candidate必须要获得Majority的投票，并且获得所有Priority高于Candidate的Peer的投票，这样才算通过选举。

if (skipPreVote || askForVotes(Phase.PRE_VOTE)) {
  if (askForVotes(Phase.ELECTION)) {
    server.changeToLeader();
  }
}

其中，askForVote方法会向所有的Raft Group中的Peer发送对应的Request Vote RPC，并且收集最终选举结果。

final TermIndex lastEntry = server.getState().getLastEntry();
final Executor voteExecutor = new Executor(this, others.size());
try {
  final int submitted = submitRequests(phase, electionTerm, lastEntry, others, voteExecutor);
  r = waitForResults(phase, electionTerm, submitted, conf, voteExecutor);
} finally {
  voteExecutor.shutdown();
}

其中，SubmitRequests负责发送RPC请求，通过一个固定线程池完成网络任务。waitForResults会在下次ElectionTimeOut之前等待RPC的回复，并根据是否选举成功（Majority+Priority)来返回本次选举结果。如果选举成功，那么就会调用RaftServerImpl的方法changeToLeader进入Leader身份。

3. Leader身份

选举成功后，通过调用RaftServerImpl的方法changeToLeader将角色状态机改为Leader身份，对应的Leader状态代码实现在LeaderState中

synchronized void changeToLeader() {
  Preconditions.assertTrue(getInfo().isCandidate());
  role.shutdownLeaderElection();
  setRole(RaftPeerRole.LEADER, "changeToLeader");
  state.becomeLeader();

  // start sending AppendEntries RPC to followers
  final LogEntryProto e = role.startLeaderState(this);
  getState().setRaftConf(e);
}

在成为新的Leader之后，会首先commit一条no-op，保证之前所有的日志被复制到Raft Peers，同时也能精准地获取到Term和Index的信息，返回给ServerStateMachine。

LogEntryProto start() {
  // In the beginning of the new term, replicate a conf entry in order
  // to finally commit entries in the previous term.
  // Also this message can help identify the last committed index and the conf.
  final LogEntryProto placeHolder = LogProtoUtils.toLogEntryProto(
    server.getRaftConf(), server.getState().getCurrentTerm(), raftLog.getNextIndex());
  CodeInjectionForTesting.execute(APPEND_PLACEHOLDER,
                                  server.getId().toString(), null);
  raftLog.append(placeHolder);
  processor.start();
  senders.forEach(LogAppender::start);
  return placeHolder;
}

接下来，对于每一个Follower，都启动一个LogAppender的后台线程(LogAppenderDefault.run())，用于向Follower发送对应的日志和snapshot，让Follower的状态跟上Leader的状态。

Raft Log日志管理

Apache Ratis实现了两种不同的日志策略：内存日志和磁盘分段日志。其中内存日志不推荐使用，更多是为了测试而实现的，因此主要看一看SegmentedRaftLog的实现。

Raft Log日志层实现的操作主要就是append和get两个原语，由RaftServer通过append的方式交给日志层实现，由对应的StateMachineUpdater/Leader Log Appender工作线程通过get到最新的日志。

1. 总体结构

SegmentedRaftLog将日志存储在本地磁盘的文件上，以分段文件(segment file)的方式存储，每一个段文件中包含了多个Log Entries。单个段文件有8MB上限，每一条日志大小也有8MB的上限。

当一个段内的日志增多，达到段文件上限8MB之后，会关闭这个段文件，重新开启一个段文件继续写日志。被关闭的段文件不能被修改，只能因为和Leader出现冲突而被truncated。

Ratis提供了两个Cache进行性能优化

为当前Open的SegmentFile文件在内存中设立了一个缓存，对应的内存类SegmentFile。目前的缓存策略是简单把整一个SegementFile都缓存在内存，等到写满了的时候再刷到外部磁盘
为Raft Entry设置了一个Cache，会将从磁盘读到的日志缓存起来

2. 提交新的日志

Leader向RaftLog提交新的日志，实现在方法appendImpl。方法首先获取日志文件的writeLock，然后

提交以TransactionContext作为内容的日志前，允许TransactionContext执行preAppendTransaction钩子方法进行额外逻辑操作
向RaftLogSequentialOps.Runner提交一个异步任务appendEntry，由Runner保证线性执行
等待异步任务完成，返回给上层StateMachine index。

private long appendImpl(long term, TransactionContext operation) throws StateMachineException {
  checkLogState();
  try(AutoCloseableLock writeLock = writeLock()) {
    final long nextIndex = getNextIndex();

    // This is called here to guarantee strict serialization of callback executions in case
    // the SM wants to attach a logic depending on ordered execution in the log commit order.
    operation = operation.preAppendTransaction();
    // build the log entry after calling the StateMachine
    final LogEntryProto e = operation.initLogEntry(term, nextIndex);

    appendEntry(e);
    return nextIndex;
  }
}

异步的appendEntry写日志操作如下：

将日志提交给内存中的Open Segment缓存，
直接更新Log Entry Cache，将这个刚写入的日志缓存起来。注意这两个步骤不能调换，不然会出现不一致。

protected CompletableFuture<Long> appendEntryImpl(LogEntryProto entry) {
  try(AutoCloseableLock writeLock = writeLock()) {
    validateLogEntry(entry);
    // 找到目前的Segment的内存缓冲
    final LogSegment currentOpenSegment = cache.getOpenSegment();
    if (currentOpenSegment == null) {
      // 如果没有Segment，那么新建对应的Segment文件
      cache.addOpenSegment(entry.getIndex());
      fileLogWorker.startLogSegment(entry.getIndex()); // 向IO Worker提交一个新建Segment文件任务
    } else if (isSegmentFull(currentOpenSegment, entry)) {
      // 当前的Segment写满了，那么刷盘这个Segment，新建一个Segment
      cache.rollOpenSegment(true);
      fileLogWorker.rollLogSegment(currentOpenSegment);
    } 

    // If the entry has state machine data, then the entry should be inserted
    // to statemachine first and then to the cache. Not following the order
    // will leave a spurious entry in the cache.
    CompletableFuture<Long> writeFuture =
      fileLogWorker.writeLogEntry(entry).getFuture();
    cache.appendEntry(entry, LogSegment.Op.WRITE_CACHE_WITHOUT_STATE_MACHINE_CACHE);

    return writeFuture;
  } 
}

3. 读取index日志

由StateMachineUpdater调用get接口获得被commit的日志apply到上层StateMachine。由Leader的LogAppender线程读取成功提交、需要被复制到所有Follower的新日志。

读取日志的时候会首先从Log Entry Cache中读取。如果Cache Miss才会去磁盘中读取

public LogEntryProto get(long index) throws RaftLogIOException {
  checkLogState();
  final LogSegment segment;
  final LogRecord record;
  try (AutoCloseableLock readLock = readLock()) {
    segment = cache.getSegment(index);
    if (segment == null) {
      return null;
    }
    record = segment.getLogRecord(index);
    if (record == null) {
      return null;
    }
    final LogEntryProto entry = segment.getEntryFromCache(record.getTermIndex());
    if (entry != null) {
      return entry;
    }
  }
  // the entry is not in the segment's cache. Load the cache without holding the lock.
  return segment.loadCache(record);
}

Apply和Snapshot管理：StateMachineUpdater

这个独立线程不断更新目前Committed的Log，将其apply到上层的StateMachine。同时StateMachineUpdater会根据日志情况为当前制作snapshot，成功后purge掉不再需要的日志条目。其工作循环可以简化为下

for(; state != State.STOP; ) {
  waitForCommit();
  final MemoizedSupplier<List<CompletableFuture<Message>>> futures = applyLog();
  checkAndTakeSnapshot(futures);
}

1. Apply Log

其中applyLog方法负责向上层的StateMachine提交已经提交的日志

private MemoizedSupplier<List<CompletableFuture<Message>>> applyLog() throws RaftLogIOException {
  final MemoizedSupplier<List<CompletableFuture<Message>>> futures = MemoizedSupplier.valueOf(ArrayList::new);
  final long committed = raftLog.getLastCommittedIndex();
  for(long applied; (applied = getLastAppliedIndex()) < committed && state == State.RUNNING && !shouldStop(); ) {
    final long nextIndex = applied + 1;
    final LogEntryProto next = raftLog.get(nextIndex);
    final CompletableFuture<Message> f = server.applyLogToStateMachine(next);
    final long incremented = appliedIndex.incrementAndGet(debugIndexChange);
  }
  return futures;
}

2. Snapshot

当日志超过400000条，或者用户手动触发的时候，就会启动snapshot机制。snapshot机制将会调用上层StateMachine的takeSnapshot()的方法，由SM完成快照并且持久化之后，将Snapshot之后的日志Index返回给底层的RaftLog，然后就可以触发purge任务删除被包含的日志。

private void takeSnapshot() {
  final long i;
  i = stateMachine.takeSnapshot();
  takeSnapshotTimerContext.stop();
  server.getSnapshotRequestHandler().completeTakingSnapshot(i);
  stateMachine.getStateMachineStorage().cleanupOldSnapshots(snapshotRetentionPolicy);

  snapshotIndex.updateIncreasingly(i, infoIndexChange);
  final long purgeIndex = i;
  raftLog.purge(purgeIndex);  
}

构筑上层应用：StateMachine

如果想要在ratis的基础上构建自己的应用程序，例如一个KV Storage Service，那么需要实现StateMachine中的基本操作接口，如下

1. DataApi

默认会把操作以及操作的数据以日志的方式写入到RaftLog。如果应用程序是data-intensive的，那么这可能会导致数据被多次copy，因此暴露DataApi来将操作和数据分开管理。

interface DataApi {
  DataApi DEFAULT = new DataApi() {};
  default CompletableFuture<ByteString> read(LogEntryProto entry);
  default CompletableFuture<?> write(LogEntryProto entry);
  default CompletableFuture<DataStream> stream(RaftClientRequest request);
  default CompletableFuture<?> link(DataStream stream, LogEntryProto entry);
  default CompletableFuture<Void> flush(long logIndex);
  default CompletableFuture<Void> truncate(long logIndex);
}

一个具体的例子由FileStore给出。这个状态机存储filename->file content的映射。在写操作的时候，raft原先的做法是把content先commit到日志，然后apply的时候从日志读取内存，最后写入对应的文件。这个过程中出现了多次的copy，IO效率降低。因此，采用DataApi，可以在commit之前直接把数据写入到文件，然后再提交空的log。由StateMachine的上层逻辑来保证提前写入文件的数据的一致性、持久性。

issue：https://issues.apache.org/jira/browse/RATIS-122?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=16235110#comment-16235110

在FileStore的例子中，采取Override startTransaction的方式，在commit日志之前将write操作改成writecommit操作，然后Override write操作，将File Content在日志提交的时候就写入到文件。最后apply的时候进行commit本次操作。

startTransaction();

WriteLog() {
   StateMachine.write();
   writeRaftLog();
}

apply() {
   commitStateMachineWrite();
}

2. EventApi

EventApi是底层Raft出现状态变更（例如Leader变化）的时候告知上层StateMachine的钩子函数

interface EventApi {
  EventApi DEFAULT = new EventApi() {};
  default void notifyLeaderChanged(RaftGroupMemberId groupMemberId, RaftPeerId newLeaderId) {}
  default void notifyTermIndexUpdated(long term, long index) {}
  default void notifyConfigurationChanged(long term, long index, RaftConfigurationProto newRaftConfiguration) {}
  default void notifyGroupRemove() {}
  default void notifyLogFailed(Throwable cause, LogEntryProto failedEntry) {}
}

3. LeaderEventApi

LeaderEventApi是当前的Peer是Leader的时候，告知上层SM出现特殊的Event的钩子函数

interface LeaderEventApi {
  LeaderEventApi DEFAULT = new LeaderEventApi() {};
  default void notifyFollowerSlowness(RoleInfoProto roleInfoProto) {}
  default void notifyNotLeader(Collection<TransactionContext> pendingEntries){}
}

4. FollowerEventApi

FollowerEventApi是当前的Peer是Follower的时候，告知上层SM出现特殊Event的钩子函数

interface FollowerEventApi {
  FollowerEventApi DEFAULT = new FollowerEventApi() {};
  default void notifyExtendedNoLeader(RoleInfoProto roleInfoProto) {}
  default CompletableFuture<TermIndex> notifyInstallSnapshotFromLeader(
    RoleInfoProto roleInfoProto, TermIndex firstTermIndexInLog) {}
}

5. 生命周期接口

void initialize(RaftServer raftServer, RaftGroupId raftGroupId, RaftStorage storage);
LifeCycle.State getLifeCycleState();
void pause();
void reinitialize() throws IOException;

6. snapshot接口

SnapshotInfo getLatestSnapshot();
void cleanupOldSnapshots(SnapshotRetentionPolicy snapshotRetentionPolicy) throws IOException;
long takeSnapshot() throws IOException;

7. 查询状态机接口

CompletableFuture<Message> query(Message request);
CompletableFuture<Message> queryStale(Message request, long minIndex);

8. 更改状态机接口

注意这里applyTransaction是RaftLog提交日志的调用接口，这个接口一定会按日志顺序被调用（线性化语义），但是上层StateMachine可以决定对应的transaction的执行方法，因此可以适当异步和并行来提高效率。

// 将用户的请求转化为TransactionContext
TransactionContext startTransaction(RaftClientRequest request) throws IOException;
// 提交日志前可以做的额外逻辑
TransactionContext preAppendTransaction(TransactionContext trx) throws IOException;
// 告知用户Transaction失败
TransactionContext cancelTransaction(TransactionContext trx) throws IOException;
TransactionContext applyTransactionSerial(TransactionContext trx) throws InvalidProtocolBufferException;
// 按顺序提交Transaction。 SM决定对trx的操作顺序
CompletableFuture<Message> applyTransaction(TransactionContext trx);

9. Example: CounterStateMachine

CounterStateMachine实现了一个非常简单的应用状态机：管理一个Integer类型的Counter，状态机接受的操作有Get和Increment

Get操作：重写Query接口

public CompletableFuture<Message> query(Message request) {
  String msg = request.getContent().toString(Charset.defaultCharset());
  assertEquals(msg, "GET");
  return CompletableFuture.completedFuture(
    Message.valueOf(counter.toString()));
}

Increment操作：重写applyTransaction接口

public CompletableFuture<Message> applyTransaction(TransactionContext trx) {
  final RaftProtos.LogEntryProto entry = trx.getLogEntry();

  //check if the command is valid
  String logData = entry.getStateMachineLogEntry().getLogData()
    .toString(Charset.defaultCharset());
  assertEquals(logData, "INCRMENT");
  //update the last applied term and index
  final long index = entry.getIndex();
  updateLastAppliedTermIndex(entry.getTerm(), index);

  //actual execution of the command: increment the counter
  counter.incrementAndGet();

  //return the new value of the counter to the client
  final CompletableFuture<Message> f =
    CompletableFuture.completedFuture(Message.valueOf(counter.toString()));

  return f;
}

Snapshot接口：

public long takeSnapshot() {
  //get the last applied index
  final TermIndex last = getLastAppliedTermIndex();

  //create a file with a proper name to store the snapshot
  final File snapshotFile =
    storage.getSnapshotFile(last.getTerm(), last.getIndex());

  //serialize the counter object and write it into the snapshot file
  try (ObjectOutputStream out = new ObjectOutputStream(
    new BufferedOutputStream(new FileOutputStream(snapshotFile)))) {
    out.writeObject(counter);
  } catch (IOException ioe) {
    LOG.warn("Failed to write snapshot file \"" + snapshotFile
             + "\", last applied index=" + last);
  }

  //return the index of the stored snapshot (which is the last applied one)
  return last.getIndex();
}

Multi-Raft实现

单组成员变更

Ratis允许一次变更多个Peer成员，使用了Raft Paper中采用的两阶段成员变更Config(Old, New)的方式，具体的调用入口为

public RaftClientReply setConfiguration(SetConfigurationRequest request) throws IOException {
  return waitForReply(request, setConfigurationAsync(request));
}

对应的setConfigurationAsync实现可以简化为：首先初始化新Peer的必要的数据结构，例如RPC地址、LogAppender线程等内容。

final RaftConfigurationImpl current = getRaftConf();
getRaftServer().addRaftPeers(peersInNewConf);
// add staging state into the leaderState
pending = leaderState.startSetConfiguration(request);

Collection<RaftPeer> newPeers = configurationStagingState.getNewPeers();
// set the staging state
this.stagingState = configurationStagingState;

if (newPeers.isEmpty()) {
  applyOldNewConf();
} else {
  // update the LeaderState's sender list
  addAndStartSenders(newPeers);
}

等完全初始化之后，就提交一个Config(Old,New)日志，等待被apply

final ServerState state = server.getState();
final RaftConfigurationImpl current = state.getRaftConf();
final RaftConfigurationImpl oldNewConf= stagingState.generateOldNewConf(current, state.getLog().getNextIndex());
// apply the (old, new) configuration to log, and use it as the current conf
long index = state.getLog().append(state.getCurrentTerm(), oldNewConf);
updateConfiguration(index, oldNewConf);

最终会被上层ApplyLog的时候捕捉，将最新的Config写入到持久化文件中，使用对应的Config

if (next.hasConfigurationEntry()) {
  // the reply should have already been set. only need to record
  // the new conf in the metadata file and notify the StateMachine.
  state.writeRaftConfiguration(next);
  stateMachine.event().notifyConfigurationChanged(next.getTerm(), next.getIndex(), next.getConfigurationEntry());
}

注意，在过渡阶段，Config(Old, New)会影响赢得选举的条件。即一个Peer必须在Old和New两个Config中都获得Majority才能赢得选举。

boolean hasMajority(Collection<RaftPeerId> others, RaftPeerId selfId) {
  Preconditions.assertTrue(!others.contains(selfId));
  return conf.hasMajority(others, selfId) &&
    (oldConf == null || oldConf.hasMajority(others, selfId));
}

多成员组管理

RaftServer接口定义了一个Raft-Server端所需要完成的所有接口，对应的实现类是RaftServerImpl。这个类完成了一个基于Raft的服务端，其中RoleInfo代表了该Server的角色，StateMachine代表了上层应用的数据状态机，RaftLog代表了这个Server的Raft日志。

RaftServerProxy同样实现了RaftServer接口，该类是Multi-Raft实现入口。通过维护一个Map<GroupId, List<RaftServerImpl>>来构建多个RaftGroup和RaftGroupMember的映射关系。每一个Group中的每一个Member都是一个RaftServerImpl，有独立的RaftLog和StateMachine。

工厂方法build()最终会返回一个RaftServerProxy的实例

public RaftServer build() throws IOException {
  return newRaftServer(
    serverId,
    group,
    Objects.requireNonNull(stateMachineRegistry , "Neither 'stateMachine' nor 'setStateMachineRegistry' " +
                           "is initialized."),
    Objects.requireNonNull(properties, "The 'properties' field is not initialized."),
    parameters);
}

RaftServerProxy接受对应客户端请求，修改Group management的入口在

public CompletableFuture<RaftClientReply> groupManagementAsync(GroupManagementRequest request) {
  final RaftGroupId groupId = request.getRaftGroupId();
  final GroupManagementRequest.Add add = request.getAdd();
  if (add != null) {
    return groupAddAsync(request, add.getGroup());
  }
  final GroupManagementRequest.Remove remove = request.getRemove();
  if (remove != null) {
    return groupRemoveAsync(request, remove.getGroupId(),
                            remove.isDeleteDirectory(), remove.isRenameDirectory());
  }
}

以Add Group为例，收到这个请求的RaftServerProxy会新建一个RaftServerImpl并且以Follower的身份启动。因此推测如果要新加一个Group，需要对所有的对应Peer发送AddGroup的请求。对于remove group来说亦是如此。

impls.addNew(newGroup)
  .thenApplyAsync(newImpl -> {
    final boolean started = newImpl.start();
    return newImpl.newSuccessReply(request);
  });
    
synchronized CompletableFuture<RaftServerImpl> addNew(RaftGroup group) {
  final RaftGroupId groupId = group.getGroupId();
  final CompletableFuture<RaftServerImpl> newImpl = newRaftServerImpl(group);
  final CompletableFuture<RaftServerImpl> previous = map.put(groupId, newImpl);
  return newImpl;
}