1.绪论
在深度解析RocketMq源码-高可用存储组件(一) raft协议详解-CSDN博客中我们详细的介绍了raft协议的leader选举,日志复制,一致性检查的流程。本文将从源码角度分析Dledger框架式如何实现raft协议的。代码基于dledger-release-0.3.1.2。
2.Dledger的总体架构-DLedgerServer
可以看出DledgerServer实现了DledgerProtocaolHander和DledgerClientProtocolHandler两个接口,他们共同规定了raft协议中的核心方法。
2.1 raft协议接口
public interface DLedgerClientProtocolHandler {
//发送协议追加消息
CompletableFuture<AppendEntryResponse> handleAppend(AppendEntryRequest request) throws Exception;
//获取日志
CompletableFuture<GetEntriesResponse> handleGet(GetEntriesRequest request) throws Exception;
//处理元数据信息
CompletableFuture<MetadataResponse> handleMetadata(MetadataRequest request) throws Exception;
//处理leader关系的转换
CompletableFuture<LeadershipTransferResponse> handleLeadershipTransfer(LeadershipTransferRequest leadershipTransferRequest) throws Exception;
}
public interface DLedgerProtocolHandler extends DLedgerClientProtocolHandler {
//处理投票
CompletableFuture<VoteResponse> handleVote(VoteRequest request) throws Exception;
//处理心跳
CompletableFuture<HeartBeatResponse> handleHeartBeat(HeartBeatRequest request) throws Exception;
//拉取消息
CompletableFuture<PullEntriesResponse> handlePull(PullEntriesRequest request) throws Exception;
//推送消息
CompletableFuture<PushEntryResponse> handlePush(PushEntryRequest request) throws Exception;
}
2.2 DledgerServer的组成组件
DledgerServer主要包含了每个节点的元信息,用于选举的组件,用于远程通信的组件,日志存储的组件等组成。我们接下来根据不同的功能分析不同的组件。
public class DLedgerServer extends AbstractDLedgerServer {
private static Logger logger = LoggerFactory.getLogger(DLedgerServer.class);
//成员状态的元数据信息,每个节点都有自己的状态
private MemberState memberState;
//Dledger的配置信息
private DLedgerConfig dLedgerConfig;
//Dledger的日志的存储组件,和rocketmq的组件几乎一样,本质上还是先写入到于文件建立映射的mappedBytebuffer中,然后在启动一个线程flush到磁盘中
private DLedgerStore dLedgerStore;
//Dledger进行远程通信的组件,raft进行选举和日志复制都离不开该组件
private DLedgerRpcService dLedgerRpcService;
private final RpcServiceMode rpcServiceMode;
//Dledger负责推送日志的组件
private DLedgerEntryPusher dLedgerEntryPusher;
//Dledger进行leader选举的组件
private DLedgerLeaderElector dLedgerLeaderElector;
private ScheduledExecutorService executorService;
//状态机执行的地方,raft协议中leader向follower同步的是一系列命令,我们需要将他应用到状态机中,才会对该节点生效
private Optional<StateMachineCaller> fsmCaller;
}
2.3 节点元信息组件-MemberState
raft协议中,每个节点都需要保存自己的一些元信息,比如当前节点id,term,节点的角色,存储消息的最后一条索引等。这些都是保存在MemberState中的。
# //这个是维护成员的状态的类
public class MemberState {
public static final String TERM_PERSIST_FILE = "currterm";
public static final String TERM_PERSIST_KEY_TERM = "currTerm";
public static final String TERM_PERSIST_KEY_VOTE_FOR = "voteLeader";
public static Logger logger = LoggerFactory.getLogger(MemberState.class);
//dledger节点的配置信息,比如每个节点持久化日志的时间间隔是多少,日志是否存储到磁盘中等
public final DLedgerConfig dLedgerConfig;
private final ReentrantLock defaultLock = new ReentrantLock();
//节点所属的组,mulitgroup-raft 中使用,为了增大吞吐量,一个raft实例可以包含多个节点,他们共属于一个group,每个guoup下面有多个节点id
private final String group;
//节点自己的id
private final String selfId;
//集群节点的地址
private final String peers;
//节点的角色,默认为candidate
private volatile Role role = CANDIDATE;
//当前term的leader的id
private volatile String leaderId;
//当前term是多少,从0开始累积
private volatile long currTerm = 0;
//投票的节点地址
private volatile String currVoteFor;
//当前节点最后一条日志的索引
private volatile long ledgerEndIndex = -1;
//当前节点最新的term
private volatile long ledgerEndTerm = -1;
//组中最大的term是多少
private long knownMaxTermInGroup = -1;
//key:节点id value:节点的地址
private Map<String, String> peerMap = new HashMap<>();
//key:节点id value:是否存活
private Map<String, Boolean> peersLiveTable = new ConcurrentHashMap<>();
private volatile String transferee;
private volatile long termToTakeLeadership = -1;
}
//获取下一个周期,当follower的选举时钟超时过后,会更改自己为candidate,并且增加自己的term参与选举
public synchronized long nextTerm() {
//检查自己是否是candidate
PreConditions.check(role == CANDIDATE, DLedgerResponseCode.ILLEGAL_MEMBER_STATE, "%s != %s", role, CANDIDATE);
if (knownMaxTermInGroup > currTerm) {
currTerm = knownMaxTermInGroup;
} else {
//增加自己的term
++currTerm;
}
currVoteFor = null;
//将term持久化
persistTerm();
return currTerm;
}
//选举成功,会设置自己role为当前term的leader
public synchronized void changeToLeader(long term) {
PreConditions.check(currTerm == term, DLedgerResponseCode.ILLEGAL_MEMBER_STATE, "%d != %d", currTerm, term);
this.role = LEADER;
this.leaderId = selfId;
peersLiveTable.clear();
}
//如果其他角色选举为leader,会改变自己的角色为follower
public synchronized void changeToFollower(long term, String leaderId) {
PreConditions.check(currTerm == term, DLedgerResponseCode.ILLEGAL_MEMBER_STATE, "%d != %d", currTerm, term);
this.role = FOLLOWER;
this.leaderId = leaderId;
transferee = null;
}
//设置自己在当前周期为candidate
public synchronized void changeToCandidate(long term) {
assert term >= currTerm;
PreConditions.check(term >= currTerm, DLedgerResponseCode.ILLEGAL_MEMBER_STATE, "should %d >= %d", term, currTerm);
if (term > knownMaxTermInGroup) {
knownMaxTermInGroup = term;
}
//the currTerm should be promoted in handleVote thread
this.role = CANDIDATE;
this.leaderId = null;
transferee = null;
}
2.4 持久化组件-DLedgerMmapFileStore
DLedgerMmapFileStore和rocketmq中的CommitLog是一样的实现方式,其实就是多和mappedFile组成的dataFileList来存储数据,并且根据索引Id和偏移量建立了IndexFile。mappedFile也是通过mmap技术与磁盘文件建立映射关系,写文件的时候先写入到直接内存中,然后再flush到磁盘文件中。
public class DLedgerMmapFileStore extends DLedgerStore {
public static final String CHECK_POINT_FILE = "checkpoint";
public static final String END_INDEX_KEY = "endIndex";
public static final String COMMITTED_INDEX_KEY = "committedIndex";
//魔数
public static final int MAGIC_1 = 1;
public static final int CURRENT_MAGIC = MAGIC_1;
//index文件每条数据的索引
public static final int INDEX_UNIT_SIZE = 32;
private static Logger logger = LoggerFactory.getLogger(DLedgerMmapFileStore.class);
//append 日志的钩子
public List<AppendHook> appendHooks = new ArrayList<>();
//append到commitLog的开始索引
private long ledgerBeginIndex = -1;
//append到commitLog的结束索引
private long ledgerEndIndex = -1;
//已经commit的索引
private long committedIndex = -1;
//已经commit的位置
private long committedPos = -1;
//已经将日志存储到commitLog的周期
private long ledgerEndTerm;
//Dledger的配置
private DLedgerConfig dLedgerConfig;
//节点元信息
private MemberState memberState;
//commitLog
private MmapFileList dataFileList;
//索引文件信息
private MmapFileList indexFileList;
private ThreadLocal<ByteBuffer> localEntryBuffer;
private ThreadLocal<ByteBuffer> localIndexBuffer;
//flush 数据到磁盘的服务
private FlushDataService flushDataService;
//清除数据的服务
private CleanSpaceService cleanSpaceService;
private volatile boolean isDiskFull = false;
}
这里需要声明一个概念,raft协议是二阶段提交的方式,第一阶段是leader写入到commitLog个过后,会更新ledgerEndIndex(最后一条写入的文件),我们将这一个动作称之为预写入。然后发送同步请求给follower,follower将数据预写入到自己磁盘文件后,给leader返回Ack,leader收到超过半数的follower的预写入过后,会将数据commit,并应用到状态机中,这个时候会更新committedIndex和applyIndex。
2.5 网络通信组件-DLedgerRpcService
2.5.1 组成
DLedgerRpcService其实还是利用NettyRemotingServer进行网络通信的
public class DLedgerRpcNettyService extends DLedgerRpcService {
private static Logger logger = LoggerFactory.getLogger(DLedgerRpcNettyService.class);
//netty server
private final NettyRemotingServer remotingServer;
//netty client
private final NettyRemotingClient remotingClient;
//所属的Dledger Server
private AbstractDLedgerServer dLedger;
//进行网络连接后,会返回一个future作为处理结果,这些future会交给该executor处理
private ExecutorService futureExecutor = Executors.newFixedThreadPool(4, new NamedThreadFactory("FutureExecutor"));
//发送选举请求的executor
private ExecutorService voteInvokeExecutor = Executors.newCachedThreadPool(new NamedThreadFactory("voteInvokeExecutor"));
//发送心跳的executor
private ExecutorService heartBeatInvokeExecutor = Executors.newCachedThreadPool(new NamedThreadFactory("heartBeatInvokeExecutor"));
}
public DLedgerRpcNettyService(AbstractDLedgerServer dLedger, NettyServerConfig nettyServerConfig, NettyClientConfig nettyClientConfig, ChannelEventListener channelEventListener) {
this.dLedger = dLedger;
//注册netty的processor,所有收到的网络请求都会交给该processor进行转发
NettyRequestProcessor protocolProcessor = new NettyRequestProcessor() {
@Override
public RemotingCommand processRequest(ChannelHandlerContext ctx, RemotingCommand request) throws Exception {
return DLedgerRpcNettyService.this.processRequest(ctx, request);
}
@Override
public boolean rejectRequest() {
return false;
}
};
//register remoting server(We will only listen to one port. Limit in the configuration file)
//获取自己节点的地址,端口可以由配置进行配置
String address = this.dLedger.getListenAddress();
if (nettyServerConfig == null) {
nettyServerConfig = new NettyServerConfig();
}
//监听对应地址
nettyServerConfig.setListenPort(Integer.parseInt(address.split(":")[1]));
//构建NettyRemotingServer & 注册不同网络请求的处理器
this.remotingServer = registerRemotingServer(nettyServerConfig, channelEventListener, protocolProcessor);
//start the remoting client
if (nettyClientConfig == null) {
nettyClientConfig = new NettyClientConfig();
}
//构建网络客户端
this.remotingClient = new NettyRemotingClient(nettyClientConfig, null);
}
注册processor来处理不同的请求
//前面分析过,真正的处理网络请求的逻辑其实是在不同的NettyRequestProcessor中
private void registerProcessor(NettyRemotingServer remotingServer, NettyRequestProcessor protocolProcessor) {
//获取节点元数据信息,memberState
remotingServer.registerProcessor(DLedgerRequestCode.METADATA.getCode(), protocolProcessor, null);
//追加消息
remotingServer.registerProcessor(DLedgerRequestCode.APPEND.getCode(), protocolProcessor, null);
//获取消息
remotingServer.registerProcessor(DLedgerRequestCode.GET.getCode(), protocolProcessor, null);
//拉取消息
remotingServer.registerProcessor(DLedgerRequestCode.PULL.getCode(), protocolProcessor, null);
//推送消息
remotingServer.registerProcessor(DLedgerRequestCode.PUSH.getCode(), protocolProcessor, null);
//投票
remotingServer.registerProcessor(DLedgerRequestCode.VOTE.getCode(), protocolProcessor, null);
//心跳
remotingServer.registerProcessor(DLedgerRequestCode.HEART_BEAT.getCode(), protocolProcessor, null);
//更改节点的leader
remotingServer.registerProcessor(DLedgerRequestCode.LEADERSHIP_TRANSFER.getCode(), protocolProcessor, null);
}
2.5.2 发送请求
Dledger对发送不同的请求封装成了不同的方法。
//发送心跳网络请求
public CompletableFuture<HeartBeatResponse> heartBeat(HeartBeatRequest request) {}
//发送投票的网络请求
public CompletableFuture<VoteResponse> vote(VoteRequest request) {}
//发送get索引消息的网络请求
public CompletableFuture<GetEntriesResponse> get(GetEntriesRequest request) throws Exception {}
//发送追加日志的网络请求
public CompletableFuture<AppendEntryResponse> append(AppendEntryRequest request) throws Exception {}
//发送拉取日志的网络请求
public CompletableFuture<PullEntriesResponse> pull(PullEntriesRequest request) throws Exception {}
//发送推送日志的网络请求
public CompletableFuture<PushEntryResponse> push(PushEntryRequest request) throws Exception {}
//发送转换leader的网络请求
public CompletableFuture<LeadershipTransferResponse> leadershipTransfer(
LeadershipTransferRequest request) throws Exception {}
//处理响应交给不同的组件处理
public RemotingCommand processRequest(ChannelHandlerContext ctx, RemotingCommand request) throws Exception {}
2.5.3 接受请求并进行响应
RpcService其实还是只是对netty的网络请求进行了封装,具体的逻辑在其他组件里面,比如选举在中,而追加日志在DLedgerEntryPusher中。
//根据不同的请求,调用DledgerServer的其他组件进行响应
public RemotingCommand processRequest(ChannelHandlerContext ctx, RemotingCommand request) throws Exception {}
2.6 选举组件-DLedgerLeaderElector
这是Dledger进行选举的组件,后面会专门讲解。深度解析RocketMq源码-高可用存储组件(三)Dledger框架选举流程-CSDN博客
2.7 日志同步组件-DLedgerEntryPusher
这是Dledger进行同步的组件,后面会专门讲解。深度解析RocketMq源码-高可用存储组件(四)Dledger框架日志同步流程-CSDN博客
2.8 状态机-StateMachineCaller
@Override
public void run() {
while (!this.isStopped()) {
try {
final ApplyTask task = this.taskQueue.poll(5, TimeUnit.SECONDS);
if (task != null) {
switch (task.type) {
case COMMITTED:
//如果是提交日志(也即应用到状态机)
doCommitted(task.committedIndex);
break;
case SNAPSHOT_SAVE:
//是一个钩子方法,交由客户端实现
doSnapshotSave(task.cb);
break;
case SNAPSHOT_LOAD:
//加载快照,钩子方法,交由客户端实现
doSnapshotLoad(task.cb);
break;
}
}
} catch (final InterruptedException e) {
logger.error("Error happen in {} when pull task from task queue", getServiceName(), e);
} catch (Throwable e) {
logger.error("Apply task exception", e);
}
}
}
private void doCommitted(final long committedIndex) {
//获取上一条已经被apply到状态机的索引
final long lastAppliedIndex = this.lastAppliedIndex.get();
if (lastAppliedIndex >= committedIndex) {
return;
}
final CommittedEntryIterator iter = new CommittedEntryIterator(this.dLedgerStore, committedIndex, this.applyingIndex, lastAppliedIndex, this.completeEntryCallback);
//遍历每一条已经commit但是没有被apply到状态机的日志
while (iter.hasNext()) {
//调用状态机的apply方法
this.statemachine.onApply(iter);
}
final long lastIndex = iter.getIndex();
this.lastAppliedIndex.set(lastIndex);
//更新状态机的上次应用的term和index
final DLedgerEntry dLedgerEntry = this.dLedgerStore.get(lastIndex);
if (dLedgerEntry != null) {
this.lastAppliedTerm = dLedgerEntry.getTerm();
}
// Check response timeout.
if (iter.getCompleteAckNums() == 0) {
if (this.entryPusher != null) {
this.entryPusher.checkResponseFuturesTimeout(this.lastAppliedIndex.get() + 1);
}
}
}
可以看出,在日志commit过后其实调用的是状态机的apply方法将其应用到状态机中,而具体逻辑是交给使用该框架的客户端来实现的。
3.DLedgerServer的方法
3.1 处理投票
//进行投票的方法,本质上是调用dLedgerLeaderElector的handleVote来实现的
public CompletableFuture<VoteResponse> handleVote(VoteRequest request) throws Exception {
try {
PreConditions.check(memberState.getSelfId().equals(request.getRemoteId()), DLedgerResponseCode.UNKNOWN_MEMBER, "%s != %s", request.getRemoteId(), memberState.getSelfId());
PreConditions.check(memberState.getGroup().equals(request.getGroup()), DLedgerResponseCode.UNKNOWN_GROUP, "%s != %s", request.getGroup(), memberState.getGroup());
return dLedgerLeaderElector.handleVote(request, false);
} catch (DLedgerException e) {
logger.error("[{}][HandleVote] failed", memberState.getSelfId(), e);
VoteResponse response = new VoteResponse();
response.copyBaseInfo(request);
response.setCode(e.getCode().getCode());
response.setLeaderId(memberState.getLeaderId());
return CompletableFuture.completedFuture(response);
}
}
2.同步消息
//同步消息-本质上还是调用的dLedgerStore追加消息 & 调用dLedgerEntryPusher超过半数以上follower返回追加消息成功
public CompletableFuture<AppendEntryResponse> handleAppend(AppendEntryRequest request) throws IOException {
DLedgerEntry dLedgerEntry = new DLedgerEntry();
dLedgerEntry.setBody(request.getBody());
DLedgerEntry resEntry = dLedgerStore.appendAsLeader(dLedgerEntry);
return dLedgerEntryPusher.waitAck(resEntry, false);
}
4.总结
本文主要讲的是Dledger框架的大概组成部分,已经每个组件的使用,后面会详细介绍Dledger的投票和日志同步流程。