This is the code study of HDFS HA.
- Both Primary NN and Standby NN startup in Standby State
- format ZKFC, create znode
- should startup DFSZKFailoverController on the nodes which run Namenode instances
- the ZKFC which hold znode "/hadoop-ha/ActiveStandbyElectorLock" become active and it will call transitionToActive to the NN which it belongs to.
conclusion:
1. Regarding HA setup cluster, NN itself cannot startup in active state, only ZKFC can make it enter into active state
2. two active state occur when split brain occurs, so fencher is needed
Zookpeer Cluster
|
|via ActiveStandbyElector
HAServiceProtocol via HealthMonitor |
Namenode----------------------------------ZKFailoverController-------------------ZKFCRpcServer(ZKFCProtocol)--------------HAAdmin
event source:
1. HealthMonitor event triggered by NN state change, such as disk full
2. ActiveStandbyElector event triggered by znode state change
<----ActiveStandbyElectorCallback(becomeActive, becomeStandby, fenceOldActive, etc)
ZKFailoverController-----------------------------------------------------------------------------------------------------------ActiveStandbyElector
----> (quitElection, joinElection, etc)
<----HealthCallbacks, ServiceStateCallBacks
ZKFailoverController------------------------------------------------------------HealthMonitor
ZKFailoverController:
serviceState: INITIALIZING, ACTIVE, STANDBY, STOPPING
ActiveStandbyElector:
state: INIT, ACTIVE, STANDBY, NEUTRAL
quitElection() : disconnect the connection with zookeeper cluster
joinElection() :
1. create connection with zookeeper cluster
2. create "lock" znode if not exist, if created "lock" znode successfully, enter into primary state, else if "lock" znode already exists, enter into standby state
3. register watcher on the "lock" znode through method exist, this watcher will be triggered when "lock" znode state changes, such as deleted
HealthMonitor:
state: INITIALIZING, SERVICE_NOT_RESPONDING, SERVICE_HEALTHY, SERVICE_UNHEALTHY, HEALTH_MONITOR_FAILED
lastServiceState: INITIALIZING, ACTIVE, STANDBY, STOPPING
Znode: /hadoop-ha/
/hadoop-ha/ActiveStandbyElectorLock EPHEMERAL primary create this znode, will be deleted when primary close the connection with zookeeper cluster
/hadoop-ha/ActiveBreadCrumb persistent store appData: current active NN information
HAServiceProtocol:
monitorHealth
getServiceStatus
transitionToActive
transitionToStandby
ZKFCProtocol: used by HAAdmin, for (fromNN, toNN), HAAdmin call gracefulFailover to the toNN, the toNN will call cedeActive to the fromNN to make it quit election and then the toNN will try to become active through the normal election path.
cedeActive
gracefulFailover
Zookeeper:
two types of callback: watcher and callback
watcher: left on the zookeeper server, and will be triggered only once
callback: execution result of current rpc call, used to notify the zookeeper client of the execution result