rocketmq DLedger主从自动切换


rocketmq DLedger主从自动切换

         

rocketmq从4.5开始,提供了故障自动切换功能,当主从集群中的master故障后,可自动从多个slave中选举出master,完成故障转移,不需要人工操作

rocketmq使用DLedger实现自动故障转移,DLedger是基于raft协议的commitLog存储库,主要包括master选举和日志复制

         

               

*********************

master 选举

          

节点状态:leader、candidate、follower

public class MemberState {

   ...

    public static enum Role {
        UNKNOWN,
        CANDIDATE,
        LEADER,
        FOLLOWER;

        private Role() {
        }
    }
}

leader:接受客户端请求,本地写入日志数据,并将数据复制给follower定期发送心跳数据给follower维护leader状态

candidate:master故障后节点的中间状态,只有处于candidate状态的节点才会发送投票选举请求master选举完成后,节点状态为leader或者follower

follower:负责同步leader的日志数据接受leader心跳数据,重置倒计时器保持follower状态,并将心跳响应返回给leader

             

master选举触发:

集群初始启动,此时所有节点都处于candidate状态,需要选举产生master;

master故障或者网络故障导致超过半数follower接收不到心跳数据,倒计时器到期触发master选举

           

****************

master选举过程

              

follower倒计时器到期,状态转变为candidate,向自己及其它节点发起投票请求(自己给自己投赞成票)

                 

其他节点收到投票请求后,如果满足以下任一条件,则拒绝投票:

reject_already_voted:当前节点已经投票已经投票、

reject_already_has_leader:集群中已经选举产生leader、

reject_expired_term:请求节点投票term小于当前节点投票term、

reject_term_not_ready:请求节点投票term大于当前节点投票term、

reject_term_small_than_ledger:请求节点投票term小于当前节点日志term(ledgerEndTerm)、

reject_expired_ledger:请求节点日志term(ledgerEndTerm)小于当前节点日志term(ledgerEndTerm)、

reject_small_ledger_end_index:请求节点与当前节点日志term(ledgerEndTerm)相等,但是日志索引小于当前节点日志索引(ledgerEndIndex)、

否则,当前节点投票同意请求节点为主节点(accepted)

                

请求投票结果:

DLedgerLeaderSelector.matainAsCandidate()方法

        final AtomicInteger allNum = new AtomicInteger(0);            //所有投票数
        final AtomicInteger validNum = new AtomicInteger(0);          //有效投票
        final AtomicInteger acceptedNum = new AtomicInteger(0);       //同意票
        final AtomicInteger notReadyTermNum = new AtomicInteger(0);   //未准备好投票,请求节点投票term大于远端节点投票term,远端节点返回rejected_term_not_ready
        final AtomicInteger biggerLedgerNum = new AtomicInteger(0);   //请求节点日志term小于远端节点日志term,或者日志term相同,请求节点日志索引小于远端节点日志索引(ledgerEndIndex)
        final AtomicBoolean alreadyHasLeader = new AtomicBoolean(false); //当前集群已有leader



        if (knownMaxTermInGroup.get() > term) {
            parseResult = VoteResponse.ParseResult.WAIT_TO_VOTE_NEXT;
            nextTimeToRequestVote = getNextTimeToRequestVote();
            changeRoleToCandidate(knownMaxTermInGroup.get());
        } else if (alreadyHasLeader.get()) {
            parseResult = VoteResponse.ParseResult.WAIT_TO_VOTE_NEXT;
            nextTimeToRequestVote = getNextTimeToRequestVote() + heartBeatTimeIntervalMs * maxHeartBeatLeak;
        } else if (!memberState.isQuorum(validNum.get())) {
            parseResult = VoteResponse.ParseResult.WAIT_TO_REVOTE;
            nextTimeToRequestVote = getNextTimeToRequestVote();
        } else if (memberState.isQuorum(acceptedNum.get())) {
            parseResult = VoteResponse.ParseResult.PASSED;
        } else if (memberState.isQuorum(acceptedNum.get() + notReadyTermNum.get())) {
            parseResult = VoteResponse.ParseResult.REVOTE_IMMEDIATELY;
        } else if (memberState.isQuorum(acceptedNum.get() + biggerLedgerNum.get())) {
            parseResult = VoteResponse.ParseResult.WAIT_TO_REVOTE;
            nextTimeToRequestVote = getNextTimeToRequestVote();
        } else {
            parseResult = VoteResponse.ParseResult.WAIT_TO_VOTE_NEXT;
            nextTimeToRequestVote = getNextTimeToRequestVote();
        }
        lastParseResult = parseResult;
        logger.info("[{}] [PARSE_VOTE_RESULT] cost={} term={} memberNum={} allNum={} acceptedNum={} notReadyTermNum={} biggerLedgerNum={} alreadyHasLeader={} maxTerm={} result={}",
            memberState.getSelfId(), lastVoteCost, term, memberState.peerSize(), allNum, acceptedNum, notReadyTermNum, biggerLedgerNum, alreadyHasLeader, knownMaxTermInGroup.get(), parseResult);

        if (parseResult == VoteResponse.ParseResult.PASSED) {
            logger.info("[{}] [VOTE_RESULT] has been elected to be the leader in term {}", memberState.getSelfId(), term);
            changeRoleToLeader(term);
        }

选主成功:同意票数(acceptedNum)超过一半

立即重新投票:acceptedNum + notReadyTermNum 超过一半

同一投票term重新投票:有效票数(validNum)未超过一半、acceptedNum + biggerLegderNum 超过一半

自增投票term重新投票:请求节点的投票term小于集群中最大的投票term、集群中已有leader(此种情况当接收到leader的心跳数据时会转变为follower)、以及其他情况

              

****************

节点状态变更

             

candidate状态

变为leader:投票选举阶段获得半数以上的accepted投票

变为follower:如果集群中已有leader节点,candidate节点收到leader节点的心跳数据

维持candidate:其他状况需要重新投票选主

             

leader状态:通过发送心跳数据,根据心跳响应维持leader状态或者变为candidate状态

    private void maintainAsLeader() throws Exception {
        if (DLedgerUtils.elapsed(lastSendHeartBeatTime) > heartBeatTimeIntervalMs) {
                                                //超过心跳间隔时间,发送心跳
            long term;
            String leaderId;
            synchronized (memberState) {
                if (!memberState.isLeader()) {  //非leader节点直接返回
                    //stop sending
                    return;
                }
                term = memberState.currTerm();
                leaderId = memberState.getLeaderId();
                lastSendHeartBeatTime = System.currentTimeMillis();
            }
            sendHeartbeats(term, leaderId);     //leader节点发送心跳
        }
    }


    private void sendHeartbeats(long term, String leaderId) throws Exception {  //leader节点发送心跳
        final AtomicInteger allNum = new AtomicInteger(1);       //所有节点数
        final AtomicInteger succNum = new AtomicInteger(1);      //响应为success的节点数
        final AtomicInteger notReadyNum = new AtomicInteger(0);  //发送心跳的投票term大于接收心跳节点的投票term的借点数目
        final AtomicLong maxTerm = new AtomicLong(-1);           //所有节点最大投票term
        final AtomicBoolean inconsistLeader = new AtomicBoolean(false);  //leader节点不一致
        final CountDownLatch beatLatch = new CountDownLatch(1);
        long startHeartbeatTimeMs = System.currentTimeMillis();
        for (String id : memberState.getPeerMap().keySet()) {
            if (memberState.getSelfId().equals(id)) {
                continue;
            }
            HeartBeatRequest heartBeatRequest = new HeartBeatRequest();
            heartBeatRequest.setGroup(memberState.getGroup());
            heartBeatRequest.setLocalId(memberState.getSelfId());  //当前节点id
            heartBeatRequest.setRemoteId(id);                      //接收发送心跳数据的远端节点id
            heartBeatRequest.setLeaderId(leaderId);                //当前节点leaderId
            heartBeatRequest.setTerm(term);                        //当前节点投票term
            CompletableFuture<HeartBeatResponse> future = dLedgerRpcService.heartBeat(heartBeatRequest);
                                                 //心跳响应数据

            future.whenComplete((HeartBeatResponse x, Throwable ex) -> {
                try {

                    if (ex != null) {
                        throw ex;
                    }
                    switch (DLedgerResponseCode.valueOf(x.getCode())) {
                        case SUCCESS:                  //响应为success的节点数
                            succNum.incrementAndGet();
                            break;
                        case EXPIRED_TERM:             //响应为expired_term的节点数(发送心跳节点的投票term小于接收心跳节点的投票term数)
                            maxTerm.set(x.getTerm());  //设置最大请求term
                            break;
                        case INCONSISTENT_LEADER:      //响应inconsistent_leader,集群中leader不一致
                            inconsistLeader.compareAndSet(false, true);
                            break;
                        case TERM_NOT_READY:           //响应为term_not_ready,发送心跳节点的投票term大于接收心跳节点的投票term
                            notReadyNum.incrementAndGet();     
                            break;
                        default:
                            break;
                    }
                    if (memberState.isQuorum(succNum.get())
                        || memberState.isQuorum(succNum.get() + notReadyNum.get())) {
                        //如果响应为success的节点超过半数
                        //或者succNum + notReady(此种情况会立即重新投票),到计数器减一

                        beatLatch.countDown();
                    }
                } catch (Throwable t) {
                    logger.error("Parse heartbeat response failed", t);
                } finally {
                    allNum.incrementAndGet();
                    if (allNum.get() == memberState.peerSize()) {
                        //所有响应节点数等与集群节点数,到计数器减一

                        beatLatch.countDown();
                    }
                }
            });
        }

        beatLatch.await(heartBeatTimeIntervalMs, TimeUnit.MILLISECONDS);  
                  //等待一个心跳间隔周期,在此间隔期间,不满足
                  //memberState.isQuorum(succNum.get())、 
                  //memberState.isQuorum(succNum.get() + notReadyNum.get())
                  //allNum.get() == memberState.peerSize()时,会重新发送心跳

        if (memberState.isQuorum(succNum.get())) { 
                  //心跳响应success超过半数,设置心跳发送成功时间,当前节点状态保持为leader

            lastSuccHeartBeatTime = System.currentTimeMillis();
        } else {
            logger.info("[{}] Parse heartbeat responses in cost={} term={} allNum={} succNum={} notReadyNum={} inconsistLeader={} maxTerm={} peerSize={} lastSuccHeartBeatTime={}",
                memberState.getSelfId(), DLedgerUtils.elapsed(startHeartbeatTimeMs), term, allNum.get(), succNum.get(), notReadyNum.get(), inconsistLeader.get(), maxTerm.get(), memberState.peerSize(), new Timestamp(lastSuccHeartBeatTime));
            if (memberState.isQuorum(succNum.get() + notReadyNum.get())) {
                lastSendHeartBeatTime = -1;
            } else if (maxTerm.get() > term) {
                changeRoleToCandidate(maxTerm.get());
            } else if (inconsistLeader.get()) {
                changeRoleToCandidate(term);
            } else if (DLedgerUtils.elapsed(lastSuccHeartBeatTime) > maxHeartBeatLeak * heartBeatTimeIntervalMs) {
                changeRoleToCandidate(term);
            }  //如果集群最大投票term大于当前leader状态节点投票term、
               //出现不一致leader
               //上一次发送心跳间隔时间超过最大心跳间隔时间,leader转为candidate
        }
    }


    public CompletableFuture<HeartBeatResponse> handleHeartBeat(HeartBeatRequest request) throws Exception {
                                                //处理leader节点心跳数据

        if (!memberState.isPeerMember(request.getLeaderId())) {
                                                //如果集群中不存在节点id,返回unknown_member

            logger.warn("[BUG] [HandleHeartBeat] remoteId={} is an unknown member", request.getLeaderId());
            return CompletableFuture.completedFuture(new HeartBeatResponse().term(memberState.currTerm()).code(DLedgerResponseCode.UNKNOWN_MEMBER.getCode()));
        }

        if (memberState.getSelfId().equals(request.getLeaderId())) {
                                                //如果当前节点id等于请求节点leaderId(leader节点不需要给自己发送心跳),返回unexpected_error

            logger.warn("[BUG] [HandleHeartBeat] selfId={} but remoteId={}", memberState.getSelfId(), request.getLeaderId());
            return CompletableFuture.completedFuture(new HeartBeatResponse().term(memberState.currTerm()).code(DLedgerResponseCode.UNEXPECTED_MEMBER.getCode()));
        }

        if (request.getTerm() < memberState.currTerm()) {
                                                //如果请求节点的投票term小于当前节点的投票term,返回expired_term

            return CompletableFuture.completedFuture(new HeartBeatResponse().term(memberState.currTerm()).code(DLedgerResponseCode.EXPIRED_TERM.getCode()));
        } else if (request.getTerm() == memberState.currTerm()) {
                                                //如果当前节点的投票term与请求节点想等

            if (request.getLeaderId().equals(memberState.getLeaderId())) {
                                                //如果leaderId相等,返回success

                lastLeaderHeartBeatTime = System.currentTimeMillis();
                return CompletableFuture.completedFuture(new HeartBeatResponse());
            }
        }

        //abnormal case
        //hold the lock to get the latest term and leaderId
        synchronized (memberState) {            //如果遇到异常情况,获取当前节点状态锁,获取最新的投票term、leaderId重新判断
            if (request.getTerm() < memberState.currTerm()) {  //如果请求节点的投票term小于当前节点的投票term,返回expired_term
                return CompletableFuture.completedFuture(new HeartBeatResponse().term(memberState.currTerm()).code(DLedgerResponseCode.EXPIRED_TERM.getCode()));
            } else if (request.getTerm() == memberState.currTerm()) {  //请求节点的投票term等于当前节点的投票term
                if (memberState.getLeaderId() == null) {        
                           //如果当前节点的leaderId为null(节点处于candidate状态),则将节点转变为follower状态,返回success

                    changeRoleToFollower(request.getTerm(), request.getLeaderId());
                    return CompletableFuture.completedFuture(new HeartBeatRespo
  • 1
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值