zookeeper选举源码解析

最新推荐文章于 2024-08-03 13:23:24 发布

qq_23596677

最新推荐文章于 2024-08-03 13:23:24 发布

阅读量152

点赞数

分类专栏： zookeeper

原文链接：https://blog.csdn.net/long290046464/article/details/81408624

版权

zookeeper 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

Zookeeper快速选举流程详解

在讲解流程之前，先说明一下选举流程中涉及到的角色，以及涉及到的关键类和变量（源码参考版本：3.4.9）：

角色：1.LOOKING：竞选

2.OBSERVING：观察

3.FOLLOWING：跟随者

4.LEADER：领导者

投票信息：

1.logicalclock（electionEpoch）：本地选举周期，每次投票都会自增

2.epoch（peerEpoch）：选举周期，每次选举最终确定完leader结束选举流程时会自增(真正zxid的前32位)

3.zxid：数据ID，每次数据变动都会自增（真正zxid的后32位，zxid一共64位）

4.sid：该投票信息所属的serverId

5.leader：提议的leader（被提议的server的serverId，即sid）

投票比较规则：

1.epoch大的胜出，否则进行步骤2

2.zxid大的胜出，否则进行步骤3

3.sid大的胜出

比较规则的源码如下：


 
 
   
   
    
    
   
   
   
   
    
    
     
     /**
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
          * Check if a pair (server id, zxid) succeeds our
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
          * current vote.
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
          *
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
          * @param id    Server identifier
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
          * @param zxid  Last zxid observed by the issuer of this vote
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
          */
    
    
   
   

   
   
    
    
   
   
   
   
    
        
     
     protected boolean totalOrderPredicate(long newId, long newZxid, long newEpoch, long curId, long curZxid, long curEpoch) {
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
             LOG.debug(
     
     "id: " + newId + 
     
     ", proposed id: " + curId + 
     
     ", zxid: 0x" +
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                     Long.toHexString(newZxid) + 
     
     ", proposed zxid: 0x" + Long.toHexString(curZxid));
    
    
   
   

   
   
    
    
   
   
   
   
    
            
     
     if(self.getQuorumVerifier().getWeight(newId) == 
     
     0){
    
    
   
   

   
   
    
    
   
   
   
   
    
                
     
     return 
     
     false;
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
             }
    
    
   
   

   
   
    
    
   
   
   
   
    
            
    
    
   
   

   
   
    
    
   
   
   
   
    
            
     
     /*
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
              * We return true if one of the following three cases hold:
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
              * 1- New epoch is higher
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
              * 2- New epoch is the same as current epoch, but new zxid is higher
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
              * 3- New epoch is the same as current epoch, new zxid is the same
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
              *  as current zxid, but server id is higher.
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
              */
    
    
   
   

   
   
    
    
   
   
   
   
    
            
    
    
   
   

   
   
    
    
   
   
   
   
    
            
     
     return ((newEpoch > curEpoch) || 
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                     ((newEpoch == curEpoch) &&
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                     ((newZxid > curZxid) || ((newZxid == curZxid) && (newId > curId)))));
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
         }

下面首先讲解一下大概的选举流程，这里暂时先不用考虑投票的数据是如何进行交互的，只管拿来用即可，后续会讲到选举期间投票数据是如何进行交互的。

1.首先更新logicalclock并提议自己为leader并广播出去

2.进入本轮投票的循环

3.从recvqueue队列中获取一个投票信息，如果为空则检查是否要重发自己的投票或者重连，否则进入步骤4

4.判断投票信息中的选举状态：

LOOKING状态：1.如果对方的logicalclock大于本地的logicalclock，则更新本地的logicalclock并清空本地投票信息统计箱recvset，并将自己作为候选和投票中的leader进行比较，选择大的作为新的投票，然后广播出去，否则进入步骤2

2.如果对方的logicalclock小于本地的logicalclock，则忽略对方的投票，重新进入下一轮选举流程，否则进入步骤3

3.如果两方的logicalclock相等，则比较当前本地被推选的leader和投票中的leader，选择大的作为新的投票，然后广播出去

4.把对方的投票信息保存到本地投票统计箱recvset中，判断当前被选举的leader是否在投票中占了大多数（大于一半的server数量），如果是则需再等待finalizeWait时间（从recvqueue继续poll投票消息）看是否有人修改了leader的候选，如果有则再将该投票信息再放回recvqueue中并重新开始下一轮循环，否则确定角色，结束选举

OBSERVING状态：没有投票权，无视直接进入下一轮选举

FOLLOWING/LEADING：1.如果对方的logicalclock等于本地的logicalclock，把对方的投票信息保存到本地投票统计箱recvset中，判断对方的投票信息是否在recvset中占大多数并且确认自己确实为leader，如果是则确定角色，结束选举，否则进入步骤2

2.将对方的投票信息放入本地统计不参与投票信息箱outofelection中，判断对方的投票信息是否在outofelection中占大多数并且确认自己确实为leader，如果是则更新logicalclock，并确定角色，结束选举，否则进入下一轮选举

选举流程源码如下：


 
 
   
   
    
    
   
   
   
   
    
    
     
     /**
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
          * Starts a new round of leader election. Whenever our QuorumPeer
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
          * changes its state to LOOKING, this method is invoked, and it
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
          * sends notifications to all other peers.
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
          *
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
          * 开始新的一轮leader选举。
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
          * 每当当前的peer的选举状态为LOOKING时，这个方法就会执行，并且会向其他peer发送提议leader消息。
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
          *
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
          */
    
    
   
   

   
   
    
    
   
   
   
   
    
        
     
     public Vote lookForLeader() throws InterruptedException {
    
    
   
   

   
   
    
    
   
   
   
   
    
            
     
     try {
    
    
   
   

   
   
    
    
   
   
   
   
    
                
     
     self.jmxLeaderElectionBean = 
     
     new LeaderElectionBean();
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                 MBeanRegistry.getInstance().register(
    
    
   
   

   
   
    
    
   
   
   
   
    
                        
     
     self.jmxLeaderElectionBean, 
     
     self.jmxLocalPeerBean);
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
             } 
     
     catch (
     
     Exception e) {
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                 LOG.warn(
     
     "Failed to register with JMX", e);
    
    
   
   

   
   
    
    
   
   
   
   
    
                
     
     self.jmxLeaderElectionBean = 
     
     null;
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
             }
    
    
   
   

   
   
    
    
   
   
   
   
    
            
     
     if (
     
     self.start_fle == 
     
     0) {
    
    
   
   

   
   
    
    
   
   
   
   
    
               
     
     self.start_fle = System.currentTimeMillis();
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
             }
    
    
   
   

   
   
    
    
   
   
   
   
    
            
     
     try {
    
    
   
   

   
   
    
    
   
   
   
   
    
                
     
     //本机统计的投票信息
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                 HashMap<Long, Vote> recvset = 
     
     new HashMap<Long, Vote>();
    
    
   
   

   
   
    
    
   
   
   
   
    
     
    
    
   
   

   
   
    
    
   
   
   
   
    
                
     
     //FOLLOWING LEADING状态的节点信息-->非LOOKING状态
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                 HashMap<Long, Vote> outofelection = 
     
     new HashMap<Long, Vote>();
    
    
   
   

   
   
    
    
   
   
   
   
    
     
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                 int notTimeout = finalizeWait;
    
    
   
   

   
   
    
    
   
   
   
   
    
     
    
    
   
   

   
   
    
    
   
   
   
   
    
                
     
     //提议选举自己为leader
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                 synchronized(this){
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                     logicalclock++;
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                     updateProposal(getInitId(), getInitLastLoggedZxid(), getPeerEpoch());
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                 }
    
    
   
   

   
   
    
    
   
   
   
   
    
     
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                 LOG.info(
     
     "New election. My id =  " + 
     
     self.getId() +
    
    
   
   

   
   
    
    
   
   
   
   
    
                        
     
     ", proposed zxid=0x" + Long.toHexString(proposedZxid));
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                 sendNotifications();
    
    
   
   

   
   
    
    
   
   
   
   
    
     
    
    
   
   

   
   
    
    
   
   
   
   
    
                
     
     /*
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                  * Loop in which we exchange notifications until we find a leader
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                  *
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                  * 循环：开始交换提议信息，直到选举出leader
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                  */
    
    
   
   

   
   
    
    
   
   
   
   
    
     
    
    
   
   

   
   
    
    
   
   
   
   
    
                
     
     while ((
     
     self.getPeerState() == ServerState.LOOKING) &&
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                         (!stop)){
    
    
   
   

   
   
    
    
   
   
   
   
    
                    
     
     /*
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                      * Remove next notification from queue, times out after 2 times
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                      * the termination time
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                      */
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                     Notification n = recvqueue.poll(notTimeout,
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                             TimeUnit.MILLISECONDS);
    
    
   
   

   
   
    
    
   
   
   
   
    
     
    
    
   
   

   
   
    
    
   
   
   
   
    
                    
     
     /*
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                      * Sends more notifications if haven't received enough.
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                      * Otherwise processes new notification.
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                      */
    
    
   
   

   
   
    
    
   
   
   
   
    
                    
     
     if(n == 
     
     null){
    
    
   
   

   
   
    
    
   
   
   
   
    
                        
     
     if(manager.haveDelivered()){
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                             sendNotifications();
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                         } 
     
     else {
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                             manager.connectAll();
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                         }
    
    
   
   

   
   
    
    
   
   
   
   
    
     
    
    
   
   

   
   
    
    
   
   
   
   
    
                        
     
     /*
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                          * Exponential backoff
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                          */
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                         int tmpTimeOut = notTimeout*
     
     2;
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                         notTimeout = (tmpTimeOut < maxNotificationInterval?
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                 tmpTimeOut : maxNotificationInterval);
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                         LOG.info(
     
     "Notification time out: " + notTimeout);
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                     }
    
    
   
   

   
   
    
    
   
   
   
   
    
                    
     
     else 
     
     if(
     
     self.getVotingView().containsKey(n.sid)) {
    
    
   
   

   
   
    
    
   
   
   
   
    
                        
     
     /*
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                          * Only proceed if the vote comes from a replica in the
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                          * voting view.
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                          */
    
    
   
   

   
   
    
    
   
   
   
   
    
                        
     
     switch (n.state) {
    
    
   
   

   
   
    
    
   
   
   
   
    
                        
     
     case LOOKING:
    
    
   
   

   
   
    
    
   
   
   
   
    
                            
     
     // If notification > current, replace and send messages out
    
    
   
   

   
   
    
    
   
   
   
   
    
                            
     
     if (n.electionEpoch > logicalclock) {
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                 logicalclock = n.electionEpoch;
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                 recvset.clear();
    
    
   
   

   
   
    
    
   
   
   
   
    
                                
     
     if(totalOrderPredicate(n.leader, n.zxid, n.peerEpoch,
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                         getInitId(), getInitLastLoggedZxid(), getPeerEpoch())) {
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                     updateProposal(n.leader, n.zxid, n.peerEpoch);
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                 } 
     
     else {
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                     updateProposal(getInitId(),
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                             getInitLastLoggedZxid(),
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                             getPeerEpoch());
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                 }
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                 sendNotifications();
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                             } 
     
     else 
     
     if (n.electionEpoch < logicalclock) {
    
    
   
   

   
   
    
    
   
   
   
   
    
                                
     
     if(LOG.isDebugEnabled()){
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                     LOG.debug(
     
     "Notification election epoch is smaller than logicalclock. n.electionEpoch = 0x"
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                             + Long.toHexString(n.electionEpoch)
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                             + 
     
     ", logicalclock=0x" + Long.toHexString(logicalclock));
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                 }
    
    
   
   

   
   
    
    
   
   
   
   
    
                                
     
     break;
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                             } 
     
     else 
     
     if (totalOrderPredicate(n.leader, n.zxid, n.peerEpoch,
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                     proposedLeader, proposedZxid, proposedEpoch)) {
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                 updateProposal(n.leader, n.zxid, n.peerEpoch);
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                 sendNotifications();
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                             }
    
    
   
   

   
   
    
    
   
   
   
   
    
     
    
    
   
   

   
   
    
    
   
   
   
   
    
                            
     
     if(LOG.isDebugEnabled()){
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                 LOG.debug(
     
     "Adding vote: from=" + n.sid +
    
    
   
   

   
   
    
    
   
   
   
   
    
                                        
     
     ", proposed leader=" + n.leader +
    
    
   
   

   
   
    
    
   
   
   
   
    
                                        
     
     ", proposed zxid=0x" + Long.toHexString(n.zxid) +
    
    
   
   

   
   
    
    
   
   
   
   
    
                                        
     
     ", proposed election epoch=0x" + Long.toHexString(n.electionEpoch));
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                             }
    
    
   
   

   
   
    
    
   
   
   
   
    
     
    
    
   
   

   
   
    
    
   
   
   
   
    
                            
     
     // 把对方的投票意愿缓存起来，用于最终的统计
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                             recvset.put(n.sid, 
     
     new Vote(n.leader, n.zxid, n.electionEpoch, n.peerEpoch));
    
    
   
   

   
   
    
    
   
   
   
   
    
     
    
    
   
   

   
   
    
    
   
   
   
   
    
                            
     
     if (termPredicate(recvset,
    
    
   
   

   
   
    
    
   
   
   
   
    
                                    
     
     new Vote(proposedLeader, proposedZxid,
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                             logicalclock, proposedEpoch))) {
    
    
   
   

   
   
    
    
   
   
   
   
    
     
    
    
   
   

   
   
    
    
   
   
   
   
    
                                
     
     // Verify if there is any change in the proposed leader
    
    
   
   

   
   
    
    
   
   
   
   
    
                                
     
     while((n = recvqueue.poll(finalizeWait,
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                         TimeUnit.MILLISECONDS)) != 
     
     null){
    
    
   
   

   
   
    
    
   
   
   
   
    
                                    
     
     if(totalOrderPredicate(n.leader, n.zxid, n.peerEpoch,
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                             proposedLeader, proposedZxid, proposedEpoch)){
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                         recvqueue.put(n);
    
    
   
   

   
   
    
    
   
   
   
   
    
                                        
     
     break;
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                     }
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                 }
    
    
   
   

   
   
    
    
   
   
   
   
    
     
    
    
   
   

   
   
    
    
   
   
   
   
    
                                
     
     /*
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                  * This predicate is true once we don't read any new
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                  * relevant message from the reception queue
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                  */
    
    
   
   

   
   
    
    
   
   
   
   
    
                                
     
     if (n == 
     
     null) {
    
    
   
   

   
   
    
    
   
   
   
   
    
                                    
     
     self.setPeerState((proposedLeader == 
     
     self.getId()) ?
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                             ServerState.LEADING: learningState());
    
    
   
   

   
   
    
    
   
   
   
   
    
     
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                     Vote endVote = 
     
     new Vote(proposedLeader,
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                             proposedZxid,
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                             logicalclock,
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                             proposedEpoch);
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                     leaveInstance(endVote);
    
    
   
   

   
   
    
    
   
   
   
   
    
                                    
     
     return endVote;
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                 }
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                             }
    
    
   
   

   
   
    
    
   
   
   
   
    
                            
     
     break;
    
    
   
   

   
   
    
    
   
   
   
   
    
                        
     
     case OBSERVING:
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                             LOG.debug(
     
     "Notification from observer: " + n.sid);
    
    
   
   

   
   
    
    
   
   
   
   
    
                            
     
     break;
    
    
   
   

   
   
    
    
   
   
   
   
    
                        
     
     case FOLLOWING:
    
    
   
   

   
   
    
    
   
   
   
   
    
                        
     
     case LEADING:
    
    
   
   

   
   
    
    
   
   
   
   
    
                            
     
     /*
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                              * Consider all notifications from the same epoch
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                              * together.
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                              */
    
    
   
   

   
   
    
    
   
   
   
   
    
                            
     
     if(n.electionEpoch == logicalclock){
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                 recvset.put(n.sid, 
     
     new Vote(n.leader,
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                               n.zxid,
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                               n.electionEpoch,
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                               n.peerEpoch));
    
    
   
   

   
   
    
    
   
   
   
   
    
                               
    
    
   
   

   
   
    
    
   
   
   
   
    
                                
     
     if(ooePredicate(recvset, outofelection, n)) {
    
    
   
   

   
   
    
    
   
   
   
   
    
                                    
     
     self.setPeerState((n.leader == 
     
     self.getId()) ?
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                             ServerState.LEADING: learningState());
    
    
   
   

   
   
    
    
   
   
   
   
    
     
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                     Vote endVote = 
     
     new Vote(n.leader, 
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                             n.zxid, 
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                             n.electionEpoch, 
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                             n.peerEpoch);
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                     leaveInstance(endVote);
    
    
   
   

   
   
    
    
   
   
   
   
    
                                    
     
     return endVote;
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                 }
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                             }
    
    
   
   

   
   
    
    
   
   
   
   
    
     
    
    
   
   

   
   
    
    
   
   
   
   
    
                            
     
     /*
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                              * Before joining an established ensemble, verify
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                              * a majority is following the same leader.
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                              */
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                             outofelection.put(n.sid, 
     
     new Vote(n.version,
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                 n.leader,
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                 n.zxid,
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                 n.electionEpoch,
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                 n.peerEpoch,
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                                 n.state));
    
    
   
   

   
   
    
    
   
   
   
   
    
               
    
    
   
   

   
   
    
    
   
   
   
   
    
                            
     
     if(ooePredicate(outofelection, outofelection, n)) {
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                 synchronized(this){
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                     logicalclock = n.electionEpoch;
    
    
   
   

   
   
    
    
   
   
   
   
    
                                    
     
     self.setPeerState((n.leader == 
     
     self.getId()) ?
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                             ServerState.LEADING: learningState());
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                 }
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                 Vote endVote = 
     
     new Vote(n.leader,
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                         n.zxid,
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                         n.electionEpoch,
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                                         n.peerEpoch);
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                 leaveInstance(endVote);
    
    
   
   

   
   
    
    
   
   
   
   
    
                                
     
     return endVote;
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                             }
    
    
   
   

   
   
    
    
   
   
   
   
    
                            
     
     break;
    
    
   
   

   
   
    
    
   
   
   
   
    
                        
     
     default:
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                             LOG.warn(
     
     "Notification state unrecognized: {} (n.state), {} (n.sid)",
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                                     n.state, n.sid);
    
    
   
   

   
   
    
    
   
   
   
   
    
                            
     
     break;
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                         }
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                     } 
     
     else {
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                         LOG.warn(
     
     "Ignoring notification from non-cluster member " + n.sid);
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                     }
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                 }
    
    
   
   

   
   
    
    
   
   
   
   
    
                
     
     return 
     
     null;
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
             } 
     
     finally {
    
    
   
   

   
   
    
    
   
   
   
   
    
                
     
     try {
    
    
   
   

   
   
    
    
   
   
   
   
    
                    
     
     if(
     
     self.jmxLeaderElectionBean != 
     
     null){
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                         MBeanRegistry.getInstance().unregister(
    
    
   
   

   
   
    
    
   
   
   
   
    
                                
     
     self.jmxLeaderElectionBean);
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                     }
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                 } 
     
     catch (
     
     Exception e) {
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                     LOG.warn(
     
     "Failed to unregister with JMX", e);
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
                 }
    
    
   
   

   
   
    
    
   
   
   
   
    
                
     
     self.jmxLeaderElectionBean = 
     
     null;
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
             }
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
         }

选举流程图如下：

上面讲解了快速的选举流程，那么选举中的数据是怎么交互的呢，下面来进行进一步的讲解：

在zookeeper的启动脚本zkServer.cmd可以看到有这么一行脚本内容：


 
 
   
   
    
    
   
   
   
   
    
    
     
     set ZOOMAIN=org.apache.zookeeper.
     
     server.quorum.QuorumPeerMain
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     echo 
     
     on
    
    
   
   

   
   
    
    
   
   
   
   
    
    
     
     call %JAVA% 
     
     "-Dzookeeper.log.dir=%ZOO_LOG_DIR%" 
     
     "-Dzookeeper.root.logger=%ZOO_LOG4J_PROP%" -cp 
     
     "%CLASSPATH%" %ZOOMAIN% 
     
     "%ZOOCFG%" %*

我们得知启动类为：org.apache.zookeeper.server.quorum.QuorumPeerMain，跟踪代码可以得知选举流程为：

FastLeaderElection类中的lookForLeader()方法，实际发生网络交互的地方为QuorumCnxManager类，类图关系如下两图：

网络交互类图

具体说明：
QuorumCnxManager类为实际发生网络交互的地方，负责网络通讯中收集与发送投票信息，有类图关系中可以看到此类中有个叫Listener的内部类，此类负责保证连接的一对一以及启动两个线程进行投票消息的收发：sendWorker和recvWorker；
FastLeaderElection类中也有两个内部类负责投票信息的收发：WorkerSender和WorkerReceiver。
消息发送条线：选举方法lookForLeader()中发送投票时是将投票信息放入FastLeaderElection类中的sendqueue队列中，而WorkerSender（FastLeaderElection）：负责将sendqueue队列中的信息放入QuorumCnxManager类中的queueSendMap中；而sendWorker（QuorumCnxManager）：负责将QuorumCnxManager类中的queueSendMap中的投票信息发送到网络上。

消息接收条线：recvWorker（QuorumCnxManager）：负责接收网络上的投票信息，并放入QuorumCnxManager类的recrQueue队列中；WorkerReceiver（FastLeaderElection）：负责从QuorumCnxManager类中的recrQueue队列中获取数据，并放入FastLeaderElection类中的recvqueue队列中。

自己拷贝了一份3.4.9的源码并添加了些许注释：https://github.com/learnertogether/zookeeper-3.4.9.git