这两天稍微有点时间,把我们项目用的kafka的东西过了一遍。但是总感觉对其中的zookeeper理解不够。于是翻了不少资料对zookeeper的介绍,都不满意。
尤其是选主算法那边,别人不可能面面俱到,这东西还是要自己多钻研下,于是翻翻它的源码,写下自己的理解。
1.入口:org.apache.zookeeper.server.quorum.QuorumPeerMain
public static void main(String[] args) {
QuorumPeerMain main = new QuorumPeerMain();
try {
main.initializeAndRun(args);
1.1读取配置,注意几个默认配置
protected int tickTime = ZooKeeperServer.DEFAULT_TICK_TIME;//3000
//最大连接数
protected int maxClientCnxns = 60;
//快速paxos
protected int electionAlg = 3;
protected int electionPort = 2182;
//默认follower
protected LearnerType peerType = LearnerType.PARTICIPANT;
还记得以前为了找如果配follower/observer找半天,看看源码发现配peerType即可,可配observer/participant(不区分大小写)---看源码也是有好处的
if (key.equals("peerType")) {
if (value.toLowerCase().equals("observer")) {
peerType = LearnerType.OBSERVER;
} else if (value.toLowerCase().equals("participant")) {
peerType = LearnerType.PARTICIPANT;
} else
{
throw new ConfigException("Unrecognised peertype: " + value);
}
}
zookeeper的台数至少3,并且最好是奇数,因为它的选举机制是n/2+1
if (servers.size() == 2) {
LOG.warn("No server failure will be tolerated. " +
"You need at least 3 servers.");
} else if (servers.size() % 2 == 0) {
LOG.warn("Non-optimial configuration, consider an odd number of servers.");
}
zookeeper你可以不设置group,但是一旦设置了所有的server都必须在group中
至于至于group有什么用,我相信好多人其实是不知道。大多数其实知道的是n/2+1。而我要说的是这个group跟这个n/2+1是有关系。
zookeeper里面决策生效是QuorumVerifier(org.apache.zookeeper.server.quorum.flexible),它的实现类有两个,其中一个就是QuorumMaj,也就是我们平常用的n/2+1
public QuorumMaj(int n){
this.half = n/2;
}
public boolean containsQuorum(HashSet<Long> set){
return (set.size() > half);
}
除此之外,zookeeper还有一种决策,我把它称之为group决策QuorumHierarchical
private void computeGroupWeight(){
....
if(!groupWeight.containsKey(gid))
groupWeight.put(gid, serverWeight.get(sid));
else {
long totalWeight = serverWeight.get(sid) + groupWeight.get(gid);
groupWeight.put(gid, totalWeight);
}
如果你配了group,它就会走这个,首先它会根据你server分配到的group,算出整个组的weight,如果未配weight,则默认1
然后根据你当前支持此决策的server算出这些server的group的weight,然后分别去看看每个weight有没有达到group-all的weight/2有余,满足条件就加1,最后总数有没有超出总组数(weight=0的组不算)一半以上,则决策通过
for(long gid : expansion.keySet()) {
LOG.debug("Group info: " + expansion.get(gid) + ", " + gid + ", " + groupWeight.get(gid));
if(expansion.get(gid) > (groupWeight.get(gid) / 2) )
majGroupCounter++;
}
LOG.debug("Majority group counter: " + majGroupCounter + ", " + numGroups);
if((majGroupCounter > (numGroups / 2))){
LOG.debug("Positive set size: " + set.size());
return true;
}
注意:参与决策的server是不包括observer的
server的myid是在myid文件夹的第一行中,第一行不要有别的任何信息。后面的行可以写东西,但是会被zookeeper忽略
try {
myIdString = br.readLine();
} finally {
br.close();
}
未完待续