建立socket连接前
leader.lead
leader先进行数据加载,从本地加载数据
zk.loadData();
然后新建一个LearnerCnxAcceptor线程并启动
cnxAcceptor = new LearnerCnxAcceptor();
cnxAcceptor.start();
该线程会为所有其余节点创建socket,并阻塞等待连接
follower.followleader
向leader请求连接
建立socket连接后
follower
然后向leader注册registerWithLeader
QuorumPacket qp = new QuorumPacket();
qp.setType(pktType);
qp.setZxid(ZxidUtils.makeZxid(self.getAcceptedEpoch(), 0));
/*
* Add sid to payload
*/
LearnerInfo li = new LearnerInfo(self.getMyId(), 0x10000, self.getQuorumVerifier().getVersion());
ByteArrayOutputStream bsid = new ByteArrayOutputStream();
BinaryOutputArchive boa = BinaryOutputArchive.getArchive(bsid);
boa.writeRecord(li, "LearnerInfo");
qp.setData(bsid.toByteArray());
writePacket(qp, true);
向leader发送数据包,包含自己的epoch,zxid,myid
leader
ia.readRecord(qp, "packet");
if (this.getVersion() < 0x10000) {
// we are going to have to extrapolate the epoch information
long epoch = ZxidUtils.getEpochFromZxid(zxid);
ss = new StateSummary(epoch, zxid);
// fake the message
learnerMaster.waitForEpochAck(this.getSid(), ss);
} else {
byte[] ver = new byte[4];
ByteBuffer.wrap(ver).putInt(0x10000);
QuorumPacket newEpochPacket = new QuorumPacket(Leader.LEADERINFO, newLeaderZxid, ver, null);
oa.writeRecord(newEpochPacket, "packet");
messageTracker.trackSent(Leader.LEADERINFO);
bufferedOutput.flush();
QuorumPacket ackEpochPacket = new QuorumPacket();
ia.readRecord(ackEpochPacket, "packet");
messageTracker.trackReceived(ackEpochPacket.getType());
if (ackEpochPacket.getType() != Leader.ACKEPOCH) {
LOG.error("{} is not ACKEPOCH", ackEpochPacket.toString());
return;
}
ByteBuffer bbepoch = ByteBuffer.wrap(ackEpochPacket.getData());
ss = new StateSummary(bbepoch.getInt(), ackEpochPacket.getZxid());
learnerMaster.waitForEpochAck(this.getSid(), ss);
}
leader读取数据包,并从数据包中获取sid和旧的epoch来构建新的epoch并发送,等待follower的回复,ia.readRecord(ackEpochPacket, "packet");
follower
readPacket(qp),从leader发送的信息里获取新的epoch,zxid,myid
if (qp.getType() == Leader.LEADERINFO) {
// we are connected to a 1.0 server so accept the new epoch and read the next packet
leaderProtocolVersion = ByteBuffer.wrap(qp.getData()).getInt();
byte[] epochBytes = new byte[4];
final ByteBuffer wrappedEpochBytes = ByteBuffer.wrap(epochBytes);
if (newEpoch > self.getAcceptedEpoch()) {
wrappedEpochBytes.putInt((int) self.getCurrentEpoch());
self.setAcceptedEpoch(newEpoch);
} else if (newEpoch == self.getAcceptedEpoch()) {
// since we have already acked an epoch equal to the leaders, we cannot ack
// again, but we still need to send our lastZxid to the leader so that we can
// sync with it if it does assume leadership of the epoch.
// the -1 indicates that this reply should not count as an ack for the new epoch
wrappedEpochBytes.putInt(-1);
} else {
throw new IOException("Leaders epoch, "
+ newEpoch
+ " is less than accepted epoch, "
+ self.getAcceptedEpoch());
}
QuorumPacket ackNewEpoch = new QuorumPacket(Leader.ACKEPOCH, lastLoggedZxid, epochBytes, null);
writePacket(ackNewEpoch, true);
给leader发送leader的epoch,自己的zxid
leader
leader读取数据包,并接受follower的确认,超过半数
public boolean containsQuorum(Set<Long> ackSet) {
return (ackSet.size() > half);
}
开始判断数据同步模式syncFollower
如果zxid相同,选择DIFF模式,即已经同步。
如果follower的zxid大于leader,选择TRUNC模式,发送leader最新的commitedLog来让follower回滚
如果followerzxid小于leader,选择COMMIT模式,发送commitedLog,让follower更新
数据差距过大,选择SNAP模式,直接发送快照
follower
开始数据同步,syncWithLeader
如果是DIFF不操作
如果是SNAP开始直接从快照全量同步
如果是TRUNC就回滚
数据同步模式确认结束发送ack信息给leader
leader
leader收到过半的ack后开始数据同步,发送数据包