ZFKC原理及源码解析

原理

概述

NameNode主备切换主要由 ZKFailoverController、HealthMonitor 和 ActiveStandbyElector 这 3 个组件来协同实现。
HealthMonitor负责监控NN的健康情况,起一个线程去发送rpc请求,根据响应来确认NN状态,一旦状态发生改变通过回调函数通知zkfc
ActiveStandbyElector主要负责凭借ZK进行协调,监听zk集群节点,用作选举

ZK集群上有两个路径用于Hadoop HA切换
一是/hadoop-ha/{dfs.nameservices}/ActiveStandbyElectorLockactive路径用于创建临时节点,也就是锁
二是/hadoop−ha/{dfs.nameservices}/ActiveBreadCrumb路径用于创建永久节点,存储ANN的地址信息
正常情况下,临时节点删除时候会将永久节点一并删除

[zk: localhost:2181(CONNECTED) 10] ls /hadoop-ha/xxxxx
[ActiveBreadCrumb, ActiveStandbyElectorLock]

当ANN的HM监控到NN状态异常,通过回调函数通知zkfc,zkfc调用ASE方法进行退出选举,即删除zk节点;或者ZKFC整个服务不可用,长时间不向zk集群发送心跳,zk集群删除ha节点。
其他SNN通过ASE监控到zk集群Active节点被删除,如果自身状态健康,会在ActiveStandbyElectorLockactive路径下创建临时节点,即抢夺锁。当SNN创建成功,会去检测面包屑路径下是否有节点存在,如果存在,尝试将节点删除,通过调用transitionToStandby方法删除,如果删除不了会使用ssh登录kill进程或者启动shell脚本来fence ANN。
如果删除不掉会放弃锁,退出选举,等待一段时间(为了让其他的SNN能够抢夺锁),如果删除成功,会调用becomeActive方法,底层调用transitionToActive方法将NN变成ANN。

zkfc服务如下图所示
ZKFC组件
zkfc状态机如下图所示
ZKFC状态机
hadoop官方issue详细介绍了ZFKC的设计功能
https://issues.apache.org/jira/browse/HDFS-2185

流程

1.HealthMonitor 初始化完成之后会启动内部的线程来定时调用对应 NameNode 的 HAServiceProtocol RPC 接口的方法,对NameNode 的健康状态进行检测。

2.HealthMonitor 如果检测到 NameNode 的健康状态发生变化,会回调 ZKFailoverController 注册的相应方法进行处理。

3.如果 ZKFailoverController 判断需要进行主备切换,会首先使用 ActiveStandbyElector 来进行自动的主备选举。

4.ActiveStandbyElector 与 Zookeeper 进行交互完成自动的主备选举。

5.ActiveStandbyElector 在主备选举完成后,会回调 ZKFailoverController 的相应方法来通知当前的 NameNode 成为主 NameNode或者备 NameNode

6.ZKFailoverController 调用对应 NameNode 的 HAServiceProtocol RPC 接口的方法将 NameNode 转换为 Active 状态或 Standby 状态。

HealthMonitor

HealthMonitor的作用是通过RPC来监视本地NN的健康状态(HealthMonitor.State)和服务状态(HAServiceStatus),当状态信息发生变化,通过callback向ZKFC发送信息。

//HealthMonitor的五种状态
/**
 * The health monitor is still starting up.
 */
INITIALIZING,

/**
 * The service is not responding to health check RPCs.
 */
SERVICE_NOT_RESPONDING,

/**
 * The service is connected and healthy.
 */
SERVICE_HEALTHY,

/**
 * The service is running but unhealthy.
 */
SERVICE_UNHEALTHY,

/**
 * The health monitor itself failed unrecoverably and can
 * no longer provide accurate information.
 */
HEALTH_MONITOR_FAILED;

//HAServiceStatus四种状态
INITIALIZING("initializing"),
ACTIVE("active"),
STANDBY("standby"),
STOPPING("stopping");

ActiveStandbyElector

ActiveStandbyElector主要控制和监控ZK上的节点的状态,与ZKFC交互,如何调用了joinElection,ASE会尝试在ZK上创建节点(获取锁),如果成功创建节点,那么调用becomeActive成为ANN,如果失败,调用becameStandby成为SNN继续监听NN的健康状态和注册watcher监听active锁。

/**
 * To participate in election, the app will call joinElection. The result will
 * be notified by a callback on either the becomeActive or becomeStandby app
 * interfaces.
 */
public synchronized void joinElection(byte[] data)

/**
 * Any service instance can drop out of the election by calling quitElection. 
 * <br/>
 */
public synchronized void quitElection(boolean needFence)

ZKFC

ZKFC在创建的时候会初始化HealthMonitor和ActiveStandbyElector,ZKFC就是协调HealthMonitor和ActiveStandbyElector,根据发来的事件,完成HA切换。

Fencing

kill掉主节点的zkfc,zk无法接收ANN心跳,通知SNN的zkfc,SNN zkfc在zk上成功创建znode后,会让之前的ANN调用transitionToStandby() 方法,如果无效会使用其他方法(比如kill掉节点),然后自己调用transitionToActive() 成为主节点。

源码

DFSZKFailoverController其实是一个main方法的java程序
main方法中构造了DFSZKFailoverController并且运行了run方法
在run方法中的doRun方法中有几个重要的方法

private int doRun(String[] args)
    throws Exception {
    try {
        //初始化zk
        initZK();
        //格式化zk
        formatZK(force, interactive);
		//初始化rpc
        initRPC();
        //初始化hm
        initHM();
        //启动rpc
        startRPC();
        
        mainLoop();
    } finally {
        rpcServer.stopAndJoin();

        elector.quitElection(true);
        healthMonitor.shutdown();
        healthMonitor.join();
    }
    return 0;
}

initZK 初始化zk,获取zk连接信息,如集群信息,acl认证,解析等等以及初始化ActiveStandbyElector

//初始化ActiveStandbyElector和传入回调方法becomeActive or becomeStandby app等等
elector = new ActiveStandbyElector(zkQuorum,
                                   zkTimeout, getParentZnode(), zkAcls, zkAuths,
                                   new ElectorCallbacks(), maxRetryNum);


//构造ActiveStandbyElector
public ActiveStandbyElector(String zookeeperHostPorts,
                            int zookeeperSessionTimeout, String parentZnodeName, List<ACL> acl,
                            List<ZKAuthInfo> authInfo, ActiveStandbyElectorCallback app,
                            int maxRetryNum, boolean failFast) throws IOException,
HadoopIllegalArgumentException, KeeperException {
    ...
        if (failFast) {
            createConnection();
        } else {
            reEstablishSession();
        }
}

//创建与zk的连接
private void createConnection() throws IOException, KeeperException {
    if (zkClient != null) {
        try {
            zkClient.close();
        } catch (InterruptedException e) {
            throw new IOException("Interrupted while closing ZK",
                                  e);
        }
        zkClient = null;
        watcher = null;
    }
    zkClient = connectToZooKeeper();
    if (LOG.isDebugEnabled()) {
        LOG.debug("Created new connection for " + this);
    }
}


//连接zk,并初始化watcher监听zk上的节点
protected synchronized ZooKeeper connectToZooKeeper() throws IOException,
KeeperException {
    watcher = new WatcherWithClientRef();
    ZooKeeper zk = createZooKeeper();
    watcher.setZooKeeperRef(zk);
    watcher.waitForZKConnectionEvent(zkSessionTimeout);
    ...
    }

fomartZK() 格式化zk,创建一个目录,用于后续将NN的状态写给zk
initRPC() 初始化ZKFCRpcServer
initHM() 开启健康检查HealthMonitor

private void initHM() {
    //1.初始化hm,启动线程
    healthMonitor = new HealthMonitor(conf, localTarget);
    //2.添加回调函数
    healthMonitor.addCallback(new HealthCallbacks());
    //3.添加回调函数
    healthMonitor.addServiceStateCallback(new ServiceStateCallBacks());
    //4.开启
    healthMonitor.start();
}

//2.回调函数
class HealthCallbacks implements HealthMonitor.Callback {
    @Override
    public void enteredState(HealthMonitor.State newState) {
        //设置最新状态
        setLastHealthState(newState);
        //2.1检查是否选举
        recheckElectability();
    }
}

//2.1检查是否选举  HealthMonitor回调方法recheckElectability检查service当前状态,在recheckElectability方法中,会根据最近一次检测出的健康状态,做对应的处理动作,当HealthMonitor.State为健康,触发joinElection选举,尝试在zk上创建znode;初始化暂不选举,不健康会退出选举(如果NN为active状态,则删除zk上的节点)。  
private void recheckElectability() {
    // Maintain lock ordering of elector -> ZKFC
    synchronized (elector) {
        synchronized (this) {
            boolean healthy = lastHealthState == State.SERVICE_HEALTHY;
            switch (lastHealthState) {
                case SERVICE_HEALTHY:
                    //2.1.1选举
                    elector.joinElection(targetToData(localTarget));
                    if (quitElectionOnBadState) {
                        quitElectionOnBadState = false;
                    }
                    break;
                case SERVICE_UNHEALTHY:
                    //2.1.2退出选举
                    elector.quitElection(true);
                    serviceState = HAServiceState.INITIALIZING;
                    break;
            }
        }
    }
}

//2.1.1 joinElection方法中有joinElectionInternal方法
private void joinElectionInternal() {
    ...
        createRetryCount = 0;
    wantToBeInElection = true;
    createLockNodeAsync();
}
//joinElectionInternal方法中的createLockNodeAsync会调用zk客户端方法创建临时znode
private void createLockNodeAsync() {
    zkClient.create(zkLockFilePath, appData, zkAcl, CreateMode.EPHEMERAL,
                    this, zkClient);
}

//2.1.2 quitElection退出选举,zk上的临时节点也会被删除
public synchronized void quitElection(boolean needFence) {
    // 如果当前NameNode从Active状态变为Standby状态,则删除临时znode
    tryDeleteOwnBreadCrumbNode();
}

//3.回调函数
class ServiceStateCallBacks implements HealthMonitor.ServiceStateCallback {
    @Override
    public void reportServiceStatus(HAServiceStatus status) {
        // 传入当前检测出的健康状态进行检查
        verifyChangedServiceState(status.getState());
    }
}

//3.1传入当前检测出的健康状态进行检查
void verifyChangedServiceState(HAServiceState changedState) {
    synchronized (elector) {
        synchronized (this) {
            if (serviceState == HAServiceState.INITIALIZING) {
                if (quitElectionOnBadState) {
                    LOG.debug("rechecking for electability from bad state");
                    recheckElectability();
                }
                return;
            }
            if (changedState == serviceState) {
                serviceStateMismatchCount = 0;
                return;
            }
            if (serviceStateMismatchCount == 0) {
                // recheck one more time. As this might be due to parallel transition.
                serviceStateMismatchCount++;
                return;
            }
            // quit the election as the expected state and reported state
            // mismatches.
            LOG.error("Local service " + localTarget
                      + " has changed the serviceState to " + changedState
                      + ". Expected was " + serviceState
                      + ". Quitting election marking fencing necessary.");
            delayJoiningUntilNanotime = System.nanoTime()
                + TimeUnit.MILLISECONDS.toNanos(1000);
            elector.quitElection(true);
            quitElectionOnBadState = true;
            serviceStateMismatchCount = 0;
            serviceState = HAServiceState.INITIALIZING;
        }
    }
}


//4.启动线程
public void run() {
    while (shouldRun) {
        try { 
            //MonitorDaemon线程运行了两个方法
            //一直循环尝试连接,直到通过HAServiceProtocol代理连接上HA-servce
            loopUntilConnected();
            //4.1监控检查
            doHealthChecks();
        } catch (InterruptedException ie) {
            Preconditions.checkState(!shouldRun,
                                     "Interrupted but still supposed to run");
        }
    }
}

//4.1监控检查
private void doHealthChecks() throws InterruptedException {
    while (shouldRun) {
        HAServiceStatus status = null;
        boolean healthy = false;
        try {
            //发送一个rpc请求来查看是否响应从而判断NN的健康状态
            status = proxy.getServiceStatus();
            proxy.monitorHealth();
            healthy = true;
        } ...
            if (healthy) {
                //根据不同状态会调用enterState方法
                enterState(State.SERVICE_HEALTHY);
            }

        Thread.sleep(checkIntervalMillis);
    }
}

startRPC()启动ZKFCRpcServer

rpcServer.stopAndJoin();
elector.quitElection(true);
healthMonitor.shutdown();
healthMonitor.join();
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值