MasterElection 投票
前面我们看到了MasterElection用于选举的任务,我们再看一下run方法:
@Override
public void run() {
try {
// ready状态
if (!peers.isReady()) {
return;
}
//获取本地结点
RaftPeer local = peers.local();
// 任期-500毫秒
local.leaderDueMs -= GlobalExecutor.TICK_PERIOD_MS;
//任期超时没到
if (local.leaderDueMs > 0) {
return;
}
// 重置任期超时时间和心跳时间,准备开始拉票
// reset timeout
local.resetLeaderDue();
local.resetHeartbeatDue();
// 发起投票
sendVote();
} catch (Exception e) {
Loggers.RAFT.warn("[RAFT] error while master election {}", e);
}
}
每隔500毫秒的任务周期,leaderDueMs就会-500毫秒,然后判断是否已经过期了。 当有leader的时候,会定时的发送心跳任务,每次follower收到心跳任务都会重置这个leaderDueMs时间。只有当没有leader了,此时leaderDueMs会一直减小直到小于0,某一个节点的leaderDueMs小于0了,它就会发起投票。
我们看一下sendVote方法:
private void sendVote() {
RaftPeer local = peers.get(NetUtils.localServer());
Loggers.RAFT.info("leader timeout, start voting,leader: {}, term: {}", JacksonUtils.toJson(getLeader()),
local.term);
// 所有的peer的voteFor设置成null
peers.reset();
// 任期+1
local.term.incrementAndGet();
// 为自己投票
local.voteFor = local.ip;
// 设置为后选者
local.state = RaftPeer.State.CANDIDATE;
Map<String, String> params = new HashMap<>(1);
params.put("vote", JacksonUtils.toJson(local));
// 发送投票请求,异步等待响应 除自己外的节点
for (final String server : peers.allServersWithoutMySelf()) {
// /v1/ns/raft/vote
final String url = buildUrl(server, API_VOTE);
try {
HttpClient.asyncHttpPost(url, null, params, new AsyncCompletionHandler<Integer>() {
@Override
public Integer onCompleted(Response response) throws Exception {
if (response.getStatusCode() != HttpURLConnection.HTTP_OK) {
Loggers.RAFT
.error("NACOS-RAFT vote failed: {}, url: {}", response.getResponseBody(), url);
return 1;
}
//解析其他结点的信息
RaftPeer peer = JacksonUtils.toObj(response.getResponseBody(), RaftPeer.class);
Loggers.RAFT.info("received approve from peer: {}", JacksonUtils.toJson(peer));
// 决定谁是leader
peers.decideLeader(peer);
return 0;
}
});
} catch (Exception e) {
Loggers.RAFT.warn("error while sending vote to server: {}", server);
}
}
}
既然这个节点发起投票了,那么就先投自己一票,设置自己为候选者CANDIDATE,然后向集群中除了自己节点外的其他节点发起投票请求,当收到响应后来决定谁是leader。
我们看一下/v1/ns/raft/vote接口,RaftController的vote方法接收:
@PostMapping("/vote")
public JsonNode vote(HttpServletRequest request, HttpServletResponse response) throws Exception {
RaftPeer peer = raftCore.receivedVote(JacksonUtils.toObj(WebUtils.required(request, "vote"), RaftPeer.class));
return JacksonUtils.transferToJsonNode(peer);
}
返回一个这个节点自己的peer,里面包含它投给谁的票。raftCore.receivedVote:
public synchronized RaftPeer receivedVote(RaftPeer remote) {
// 找不到
if (!peers.contains(remote)) {
throw new IllegalStateException("can not find peer: " + remote.ip);
}
// 自己
RaftPeer local = peers.get(NetUtils.localServer());
// 如果待投的节点任期比自己还小,那就投自己。
if (remote.term.get() <= local.term.get()) {
String msg = "received illegitimate vote" + ", voter-term:" + remote.term + ", votee-term:" + local.term;
Loggers.RAFT.info(msg);
if (StringUtils.isEmpty(local.voteFor)) {
local.voteFor = local.ip;
}
return local;
}
//设置任期到期时间,重新选举计时
local.resetLeaderDue();
//作为跟随者
local.state = RaftPeer.State.FOLLOWER;
// 投请求者的票
local.voteFor = remote.ip;
// 任期设置成请求者的
local.term.set(remote.term.get());
Loggers.RAFT.info("vote {} as leader, term: {}", remote.ip, remote.term);
// 返回自己,告诉请求者我投你
return local;
}
取出来自己,如果对方的任期term比自己的还小,那么就投自己。否则设置成跟随者FOLLOWER,把voteFor设置成对方,最后返回。
回去看发起者收到投票后如何处理的peers.decideLeader(peer):
public RaftPeer decideLeader(RaftPeer candidate) {
// 先更新进去
peers.put(candidate.ip, candidate);
SortedBag ips = new TreeBag();
int maxApproveCount = 0;
String maxApprovePeer = null;
for (RaftPeer peer : peers.values()) {
if (StringUtils.isEmpty(peer.voteFor)) {
continue;
}
// 添加投票目标
ips.add(peer.voteFor);
// 更新投票最大者
if (ips.getCount(peer.voteFor) > maxApproveCount) {
maxApproveCount = ips.getCount(peer.voteFor);
maxApprovePeer = peer.voteFor;
}
}
// 如果投票最大者大于总数一半
if (maxApproveCount >= majorityCount()) {
// 对应的节点设置为leader
RaftPeer peer = peers.get(maxApprovePeer);
peer.state = RaftPeer.State.LEADER;
//如果leader有改变的话就通知
if (!Objects.equals(leader, peer)) {
leader = peer;
// 投票完成事件
ApplicationUtils.publishEvent(new LeaderElectFinishedEvent(this, leader, local()));
Loggers.RAFT.info("{} has become the LEADER", leader.ip);
}
}
return leader;
}
首先计算得票最大的票数和对应的ip是哪个,然后看看此时最大票数是否已经大于半数了,如果超过半数了,说明投票已经成功了,那么设置对应的peer为leader。最后比较一下之前的leader和现在的leader是否是同一个,如果不是就是leader在这次投票后有变更,那么进行LeaderElectFinishedEvent事件通知。
RaftListener监听这个事件:
@Override
public void onApplicationEvent(ApplicationEvent event) {
if (event instanceof BaseRaftEvent) {
BaseRaftEvent raftEvent = (BaseRaftEvent) event;
RaftPeer local = raftEvent.getLocal();
String json = JacksonUtils.toJson(local);
Map map = JacksonUtils.toObj(json, HashMap.class);
Member self = memberManager.getSelf();
self.setExtendVal(GROUP, map);
memberManager.update(self);
}
}
就是更新serverList中对应自己那个节点的扩展属性extendInfo。
我们想一个问题,如果此时被选举出来的leader不是self自己(想想什么时候会有这种情况),那会怎么样呢?实际上这次投票就结束了,后面会把状态更新回来。而由于此时整个集群范围内还是没有leader,其他的节点同样也会因为leaderDueMs超时而再次发起选举投票,直到某个节点的选举结果leader是它self本身,那么他的心跳任务就会触发通知到别的节点上去。
HeartBeat 心跳任务
首先我们要知道的是心跳任务必须是leader才会发送。
@Override
public void run() {
try {
if (!peers.isReady()) {
return;
}
RaftPeer local = peers.local();
// -500毫秒
local.heartbeatDueMs -= GlobalExecutor.TICK_PERIOD_MS;
// 没有到期,退出
if (local.heartbeatDueMs > 0) {
return;
}
//重置心跳任期
local.resetHeartbeatDue();
sendBeat();
} catch (Exception e) {
Loggers.RAFT.warn("[RAFT] error while sending beat {}", e);
}
}
该心跳了就发:
private void sendBeat() throws IOException, InterruptedException {
RaftPeer local = peers.local();
// 如果是单机或者本机不是leader,则退出
if (ApplicationUtils.getStandaloneMode() || local.state != RaftPeer.State.LEADER) {
return;
}
if (Loggers.RAFT.isDebugEnabled()) {
Loggers.RAFT.debug("[RAFT] send beat with {} keys.", datums.size());
}
//重置任期
local.resetLeaderDue();
// build data
ObjectNode packet = JacksonUtils.createEmptyJsonNode();
//放入leader结点信息
packet.replace("peer", JacksonUtils.transferToJsonNode(local));
ArrayNode array = JacksonUtils.createEmptyArrayNode();
if (switchDomain.isSendBeatOnly()) {
Loggers.RAFT.info("[SEND-BEAT-ONLY] {}", String.valueOf(switchDomain.isSendBeatOnly()));
}
if (!switchDomain.isSendBeatOnly()) {
//还要带数据key
for (Datum datum : datums.values()) {
ObjectNode element = JacksonUtils.createEmptyJsonNode();
//加入不同类型的key
if (KeyBuilder.matchServiceMetaKey(datum.key)) {
element.put("key", KeyBuilder.briefServiceMetaKey(datum.key));
} else if (KeyBuilder.matchInstanceListKey(datum.key)) {
element.put("key", KeyBuilder.briefInstanceListkey(datum.key));
}
element.put("timestamp", datum.timestamp.get());
array.add(element);
}
}
//带上数据的key
packet.replace("datums", array);
// broadcast
Map<String, String> params = new HashMap<String, String>(1);
params.put("beat", JacksonUtils.toJson(packet));
String content = JacksonUtils.toJson(params);
ByteArrayOutputStream out = new ByteArrayOutputStream();
GZIPOutputStream gzip = new GZIPOutputStream(out);
gzip.write(content.getBytes(StandardCharsets.UTF_8));
gzip.close();
byte[] compressedBytes = out.toByteArray();
String compressedContent = new String(compressedBytes, StandardCharsets.UTF_8);
if (Loggers.RAFT.isDebugEnabled()) {
Loggers.RAFT.debug("raw beat data size: {}, size of compressed data: {}", content.length(),
compressedContent.length());
}
//发送给除了自己以外的
for (final String server : peers.allServersWithoutMySelf()) {
try {
// /v1/ns/raft/beat
final String url = buildUrl(server, API_BEAT);
if (Loggers.RAFT.isDebugEnabled()) {
Loggers.RAFT.debug("send beat to server " + server);
}
HttpClient.asyncHttpPostLarge(url, null, compressedBytes, new AsyncCompletionHandler<Integer>() {
@Override
public Integer onCompleted(Response response) throws Exception {
if (response.getStatusCode() != HttpURLConnection.HTTP_OK) {
Loggers.RAFT.error("NACOS-RAFT beat failed: {}, peer: {}", response.getResponseBody(),
server);
MetricsMonitor.getLeaderSendBeatFailedException().increment();
return 1;
}
//收到响应就添加RaftPeer,更新节点信息
peers.update(JacksonUtils.toObj(response.getResponseBody(), RaftPeer.class));
if (Loggers.RAFT.isDebugEnabled()) {
Loggers.RAFT.debug("receive beat response from: {}", url);
}
return 0;
}
@Override
public void onThrowable(Throwable t) {
Loggers.RAFT.error("NACOS-RAFT error while sending heart-beat to peer: {} {}", server, t);
MetricsMonitor.getLeaderSendBeatFailedException().increment();
}
});
} catch (Exception e) {
Loggers.RAFT.error("error while sending heart-beat to peer: {} {}", server, e);
MetricsMonitor.getLeaderSendBeatFailedException().increment();
}
}
}
首先必须是leader才会发送心跳,发送的数据包中放入了当前这个leader的信息。另外,如果SendBeatOnly=false,则会带上数据包一起发送。
然后向除了自己以外的其他节点发送/v1/ns/raft/beat消息,收到响应后就更新对应follower的peer信息:peers.update(JacksonUtils.toObj(response.getResponseBody(), RaftPeer.class))
public RaftPeer update(