jraft 源码学习笔记

lljksven

已于 2023-02-09 19:39:53 修改

阅读量341

点赞数

文章标签： java

于 2023-01-09 14:43:35 首次发布

本文链接：https://blog.csdn.net/lljksven/article/details/128614352

版权

jraft 源码学习笔记

文档

一致性读优化

Lease Read

按照文档，对比最新的github example 搭出demo

启动源码分析

new RaftGroupService
创建nodeImpl，node初始化定时器
JRaft-VoteTimer 投票 RepeatedTimer （应是监控选举超时后，开始预选举prevote）
JRaft-ElectionTimer 选举 RepeatedTimer （应是监控心跳超时后，开始预选举prevote）
JRaft-StepDownTimer 让位
JRaft-SnapshotTimer 快照 RepeatedTimer

this.applyDisruptor.handleEventsWith(new LogEntryAndClosureHandler());

检查log的数据一致性：logManager.checkConsistency
this.logManager.appendEntries(entries, bootstrapDone);

ReplicatorGroupImpl 复制

node初始化时，会让出重启选举定时器
心跳超时后：
NodeImpl.handleElectionTimeout -> .preVote (this.currTerm + 1, lastLogId.getIndex(), lastLogId.getIndex())
-> OnPreVoteRpcDone.run -> .handlePreVoteResponse -> Ballot.grant(若法定一半以上批准prevote) -> NodeImpl.electSelf

正式选举
.electSelf(成为候选者，currTerm++) -> .handleRequestVoteResponse -> Ballot.grant(若响应term大于本地，则让出启动选举定时；若法定一半以上批准vote) -> .becomeLeader
-> 并开启投票定时器投票超时则默认让出step down重新走心跳检测
.becomeLeader -> 变为leader -> 配置replicatorGroup(add follower, learner) -> 重置ballotBox -> 启动stepDownTimer监听
-> .handleStepDownTimeout -> .checkDeadNodes（读锁的时候不让出，写锁内会让位重新选举） -> .checkDeadNodes0 存活数量需要大于等于 (法定数量/2 + 1)

服务端非leader处理投票、日志同步、快照等请求，接口为RaftServerService，实现为NodeImpl
NodeImpl.handlePreVoteRequest
判断投票节点小于本地节点的term，本地的leader是在第一个选举超时时间内，会返回不同意的结果
若远程的term大于本地的term，同意投票，若远程的term等于本地的term，则比较远程的index大于等于本地的index，同意投票

NodeImpl.handleRequestVoteRequest
若远程的term大于本地的term，本地让位心跳监控定时器，
若term和index满足要求，且当前投票的votedId为空，本地让位心跳监控定时器，保存投票信息
若保存的投票信息和远程peerId一致，且远程和本地term一致，则返回同意

.handleAppendEntriesRequest

其他API

stampedLock
Batch Disruptor Ring Buffer MPSC
Replication pipeline：流水线复制
rockdb
网络分区包括两种，一种是非对称网络分区（本地最新leader通讯时间戳），一种是对称网络分区。

Ballot 投票
quorum 法定人数
step down 让位，辞职下台
Disruptor 中断器
probe 探查

https://www.jianshu.com/c/263f8f0fa8de
https://www.cnblogs.com/luozhiyun/category/1560442.html
https://cloud.tencent.com/developer/column/80143
https://www.yuque.com/huarou/gd4szw/yt6z35
http://t.zoukankan.com/luozhiyun-p-12005975.html

基本

1 强leader，通过启动配置的peers的数量，确定n/2 + 1数量的节点同意才会成为leader（Ballot#init, Ballot#isGranted）
groupId组id， serverId当前节点的ip:port, peer同serverId， lastLogTerm最新日志的term，lastLogIndex最新日志的index，其中idex递增连续不随term变化而重置；votedId当前节点投票的peerId
remote.lastLogId >= local.lastLogId 指：(remote.lastLogTerm > local.lastLogTerm) || (remote.lastLogTerm == local.lastLogTerm && remote.lastLogIndex >= local.lastLogIndex)

选举

0 SOFAJRaft 选举机制剖析
1 预投票：发起方preVote#RequestVoteRequest，带有groupId,serverId,peer,lastLogIndex,lastLogTerm; 其中请求的term+1，但是node的term属性不变
接受方RequestVoteRequestProcessor->nodeImpl#handlePreVoteRequest; 当launch.term>=receive.term&&launch.lastLogId>=receive.lastLogId时接受投票请求。其中当lauch.term<receive.term&&receive.isLeader(),会刷新激活launch在leader的复制列表，其中当receive节点在接受最后leader心跳+超时时间内拒绝预投票
回调OnPreVoteRpcDone->handlePreVoteResponse 若接受预投票人数达到法定人数，就开启正式投票

2 正式投票：发起方electSelf#RequestVoteRequest，状态变为候选者，term++，同样带有groupId,serverId,peer,lastLogIndex,lastLogTerm;
接收方RequestVoteRequestProcessor->nodeImpl#handleRequestVoteRequest; launch.term>receive.term时，receive会让位stepdown并更新term为一致。launch.term==receive.term&&launch.lastLogId>=receive.lastLogId&&receive.voteId.isNull时接受投票且更新receive节点的voteId

3 （对称网络分区）若有ABC三个节点A为leader，当B与AC网络不通时，B会被选举定时发起投票，若是直接正式投票term+1，不断重试；造成后果是大量提升term并在网络恢复后会踢掉leader。预投票的作用就是应对此种场景让term不会上升且leader不会切换。
4 （非对称网络分区）若有ABC三个节点A为leader，当B与A不通，与C通。防止B发起预投票时踢掉leaderA，在ABC的lastLogId无差异的时候会循环切换leader，因此receive接受最后leader心跳+超时时间内拒绝预投票

log日志复制

1 Replicator#sendEmptyEntries(false, null) 探查；回调为Replicator#onRpcReturned -> onAppendEntriesReturned -> BallotBox#commitAt

1 Replicator#sendEmptyEntries(true, heartBeatClosure) 心跳；回调heartBeatClosure为Replicator#onHeartbeatReturned -> Replicator#startHeartbeatTimer -> Replicator#onTimeout -> ThreadId#setError -> Replicator#OnError -> 多线程Replicator#sendHeartbeat。另外nodeImpl#readLeader(一致性读ReadOnlySafe)也会发送心跳，回调为ReadIndexHeartbeatResponseClosure

3 时间轮HashedWheelTimer，实现原理是一个数组，数组元素是双向链表，每次tick是通过Thread.sleep()实现