- QuorumPeer
这个类就是zookeeper的Leader选举的启动类,负责创建选举算法,zk数据恢复,启动leader选举等 - zookeeper的服务器状态:
public enum ServerState {
LOOKING, FOLLOWING, LEADING, OBSERVING;
}
1.LOOKING状态:这个状态表示当前服务器还未选举出Leader,只有在位于该状态时才会重新进行Leader选举
2.FOLLOWING状态:这个状态表示当前服务器的角色是Follower
3.LEADING状态:这个状态表示当前服务器的角色是Leader
4.OBSERVING状态:这个状态表示当前服务器的角色是Observer
关于Leader选举的run():
@Override
public void run() {
updateThreadName();
LOG.debug("Starting quorum peer");
try {
jmxQuorumBean = new QuorumBean(this);
MBeanRegistry.getInstance().register(jmxQuorumBean, null);
for(QuorumServer s: getView().values()){
ZKMBeanInfo p;
if (getId() == s.id) {
p = jmxLocalPeerBean = new LocalPeerBean(this);
try {
MBeanRegistry.getInstance().register(p, jmxQuorumBean);
} catch (Exception e) {
LOG.warn("Failed to register with JMX", e);
jmxLocalPeerBean = null;
}
} else {
RemotePeerBean rBean = new RemotePeerBean(this, s);
try {
MBeanRegistry.getInstance().register(rBean, jmxQuorumBean);
jmxRemotePeerBean.put(s.id, rBean);
} catch (Exception e) {
LOG.warn("Failed to register with JMX", e);
}
}
}
} catch (Exception e) {
LOG.warn("Failed to register with JMX", e);
jmxQuorumBean = null;
}
try {
/*
* Main loop
*/
while (running) {
switch (getPeerState()) {
case LOOKING:
LOG.info("LOOKING");
ServerMetrics.getMetrics().LOOKING_COUNT.add(1);
if (Boolean.getBoolean("readonlymode.enabled")) {
LOG.info("Attempting to start ReadOnlyZooKeeperServer");
// Create read-only server but don't start it immediately
final ReadOnlyZooKeeperServer roZk =
new ReadOnlyZooKeeperServer(logFactory, this, this.zkDb);
// Instead of starting roZk immediately, wait some grace
// period before we decide we're partitioned.
//
// Thread is used here because otherwise it would require
// changes in each of election strategy classes which is
// unnecessary code coupling.
Thread roZkMgr = new Thread() {
public void run() {
try {
// lower-bound grace period to 2 secs
sleep(Math.max(2000, tickTime));
if (ServerState.LOOKING.equals(getPeerState())) {
roZk.startup();
}
} catch (InterruptedException e) {
LOG.info("Interrupted while attempting to start ReadOnlyZooKeeperServer, not started");
} catch (Exception e) {
LOG.error("FAILED to start ReadOnlyZooKeeperServer", e);
}
}
};
try {
roZkMgr.start();
reconfigFlagClear();
if (shuttingDownLE) {
shuttingDownLE = false;
startLeaderElection();
}
setCurrentVote(makeLEStrategy().lookForLeader());
} catch (Exception e) {
LOG.warn("Unexpected exception", e);
setPeerState(ServerState.LOOKING);
} finally {
// If the thread is in the the grace period, interrupt
// to come out of waiting.
roZkMgr.interrupt();
roZk.shutdown();
}
} else {
try {
reconfigFlagClear();
if (shuttingDownLE) {
shuttingDownLE = false;
startLeaderElection();
}
setCurrentVote(makeLEStrategy().lookForLeader());
} catch (Exception e) {
LOG.warn("Unexpected exception", e);
setPeerState(ServerState.LOOKING);
}
}
break;
case OBSERVING:
try {
LOG.info("OBSERVING");
setObserver(makeObserver(logFactory));
observer.observeLeader();
} catch (Exception e) {
LOG.warn("Unexpected exception",e );
} finally {
observer.shutdown();
setObserver(null);
updateServerState();
// Add delay jitter before we switch to LOOKING
// state to reduce the load of ObserverMaster
if (isRunning()) {
Observer.waitForObserverElectionDelay();
}
}
break;
case FOLLOWING:
try {
LOG.info("FOLLOWING");
setFollower(makeFollower(logFactory));
follower.followLeader();
} catch (Exception e) {
LOG.warn("Unexpected exception",e);
} finally {
follower.shutdown();
setFollower(null);
updateServerState();
}
break;
case LEADING:
LOG.info("LEADING");
try {
setLeader(makeLeader(logFactory));
leader.lead();
setLeader(null);
} catch (Exception e) {
LOG.warn("Unexpected exception",e);
} finally {
if (leader != null) {
leader.shutdown("Forcing shutdown");
setLeader(null);
}
updateServerState();
}
break;
}
start_fle = Time.currentElapsedTime();
}
} finally {
LOG.warn("QuorumPeer main thread exited");
MBeanRegistry instance = MBeanRegistry.getInstance();
instance.unregister(jmxQuorumBean);
instance.unregister(jmxLocalPeerBean);
for (RemotePeerBean remotePeerBean : jmxRemotePeerBean.values()) {
instance.unregister(remotePeerBean);
}
jmxQuorumBean = null;
jmxLocalPeerBean = null;
jmxRemotePeerBean = null;
}
}
现在详细描述一下这个方法的步骤:
1.注册Leader选举相关的JMX服务
2.循环判断服务器状态
- LOOKING状态:
1.进行数据统计
2.判断是否开启服务器只读模式
3.如果开启了只读模式,那么初始化一个只读的zookeeper服务
4.启动一个线程,进行选举等待,等待时间在2000和tickTime之间选一个大的,如果等待结束还没选举出leader,那么启动此时只读模式的zookeeper服务以便对外提供只读服务
5.如果Leader选举成功,保存选举出来的投票,中断上述线程并结束已启动的只读服务
6.如果没有开启只读模式,那么直接进行Leader选举,并保存选举出来的投票 - OBSERVING状态:
1.说明Leader已经选举出来了,当前角色是Observer
2.创建观察者Observer实例,并缓存这个实例
3.通过Observer实例调用observeLeader()方法 - FOLLOWING状态:
1.说明Leader已经选举出来了,当前角色是Follower
2.创建跟随者Follower实例,并缓存这个实例
3.通过Follower实例调用followLeader()方法 - LEADING状态:
1.说明Leader已经选举出来了,当前角色就是Leader
2.创建领导者Leader实例,并缓存这个实例
3.通过Leader实例调用lead()方法展开leader工作
3.如果与Leader之间连接断开,会停止当前的服务并再次调整当前服务器状态为LOOKING,有可能会进行新一轮的Leader选举,或者只是网络闪断,重新接收到消息后继续作为Follower对外提供服务
投票验证器的设置
投票验证器有以下两种情况会进行设置:
1.zookeeper服务启动的时候
2.调用reconfig命令重新加载配置文件并启动服务的时候
- setQuorumVerifier(QuorumVerifier qv, boolean writeToDisk)
public QuorumVerifier setQuorumVerifier(QuorumVerifier qv, boolean writeToDisk){
synchronized (QV_LOCK) {
if ((quorumVerifier != null) && (quorumVerifier.getVersion() >= qv.getVersion())) {
// this is normal. For example - server found out about new config through FastLeaderElection gossiping
// and then got the same config in UPTODATE message so its already known
LOG.debug(getId() + " setQuorumVerifier called with known or old config " + qv.getVersion() +
". Current version: " + quorumVerifier.getVersion());
return quorumVerifier;
}
QuorumVerifier prevQV = quorumVerifier;
quorumVerifier = qv;
if (lastSeenQuorumVerifier == null || (qv.getVersion() > lastSeenQuorumVerifier.getVersion()))
lastSeenQuorumVerifier = qv;
if (writeToDisk) {
// some tests initialize QuorumPeer without a static config file
if (configFilename != null) {
try {
String dynamicConfigFilename = makeDynamicConfigFilename(
qv.getVersion());
QuorumPeerConfig.writeDynamicConfig(
dynamicConfigFilename, qv, false);
QuorumPeerConfig.editStaticConfig(configFilename,
dynamicConfigFilename,
needEraseClientInfoFromStaticConfig());
} catch (IOException e) {
LOG.error("Error closing file: ", e.getMessage());
}
} else {
LOG.info("writeToDisk == true but configFilename == null");
}
}
if (qv.getVersion() == lastSeenQuorumVerifier.getVersion()) {
QuorumPeerConfig.deleteFile(getNextDynamicConfigFilename());
}
QuorumServer qs = qv.getAllMembers().get(getId());
if (qs != null) {
setAddrs(qs.addr, qs.electionAddr, qs.clientAddr);
}
updateObserverMasterList();
return prevQV;
}
}
private void updateObserverMasterList() {
if (observerMasterPort <= 0) {
return; // observer masters not enabled
}
observerMasters.clear();
StringBuilder sb = new StringBuilder();
for (QuorumServer server : quorumVerifier.getVotingMembers().values()) {
InetSocketAddress addr = new InetSocketAddress(server.addr.getAddress(), observerMasterPort);
observerMasters.add(new QuorumServer(server.id, addr));
sb.append(addr).append(",");
}
LOG.info("Updated learner master list to be {}", sb.toString());
Collections.shuffle(observerMasters);
// Reset the internal index of the observerMaster when
// the observerMaster List is refreshed
nextObserverMaster = 0;
}
方法解析如下:
1.阻塞获取QV_LOCK这个对象锁
2.判断当前缓存的quorumVerifier对象是否存在以及对比版本,符合条件则重新设置quorumVerifier对象
3.判断当前缓存的lastSeenQuorumVerifier对象是否存在或对比版本,符合条件则重新设置lastSeenQuorumVerifier对象
4.根据参数writeToDisk来决定是否写入磁盘
5.重新设置对外服务地址、选举地址、Leader与Follower之间交互地址
6.刷新observerMaster列表并重置observerMaster的内部索引
- setQuorumVerifier(QuorumVerifier qv, boolean writeToDisk)
public void setLastSeenQuorumVerifier(QuorumVerifier qv, boolean writeToDisk){
// If qcm is non-null, we may call qcm.connectOne(), which will take the lock on qcm
// and then take QV_LOCK. Take the locks in the same order to ensure that we don't
// deadlock against other callers of connectOne(). If qcmRef gets set in another
// thread while we're inside the synchronized block, that does no harm; if we didn't
// take a lock on qcm (because it was null when we sampled it), we won't call
// connectOne() on it. (Use of an AtomicReference is enough to guarantee visibility
// of updates that provably happen in another thread before entering this method.)
QuorumCnxManager qcm = qcmRef.get();
Object outerLockObject = (qcm != null) ? qcm : QV_LOCK;
synchronized (outerLockObject) {
synchronized (QV_LOCK) {
if (lastSeenQuorumVerifier != null && lastSeenQuorumVerifier.getVersion() > qv.getVersion()) {
LOG.error("setLastSeenQuorumVerifier called with stale config " + qv.getVersion() +
". Current version: " + quorumVerifier.getVersion());
}
// assuming that a version uniquely identifies a configuration, so if
// version is the same, nothing to do here.
if (lastSeenQuorumVerifier != null &&
lastSeenQuorumVerifier.getVersion() == qv.getVersion()) {
return;
}
lastSeenQuorumVerifier = qv;
if (qcm != null) {
connectNewPeers(qcm);
}
if (writeToDisk) {
try {
String fileName = getNextDynamicConfigFilename();
if (fileName != null) {
QuorumPeerConfig.writeDynamicConfig(fileName, qv, true);
}
} catch (IOException e) {
LOG.error("Error writing next dynamic config file to disk: ", e.getMessage());
}
}
}
}
}
private void connectNewPeers(QuorumCnxManager qcm){
if (quorumVerifier != null && lastSeenQuorumVerifier != null) {
Map<Long, QuorumServer> committedView = quorumVerifier.getAllMembers();
for (Entry<Long, QuorumServer> e : lastSeenQuorumVerifier.getAllMembers().entrySet()) {
if (e.getKey() != getId() && !committedView.containsKey(e.getKey()))
qcm.connectOne(e.getKey());
}
}
}
分析一下上述步骤:
1.如果缓存了QuorumCnxManager对象qcm,它将获取qcm上的锁,然后获取QV_LOCK,这个设计是因为接下来的调用QuorumCnxManager的connectOne方法是会锁定这个qcm对象的,保持同样的锁访问顺序可以避免不同线程之间死锁
2.判断lastSeenQuorumVerifier对象是否存在以及版本
3.如果缓存了QuorumCnxManager对象qcm,跟新的服务器建立连接
4.根据writeToDisk变量决定是否写入磁盘