zookeeper有好几种节点类型;
PERSISTENT(0, false, false, false, false),
PERSISTENT_SEQUENTIAL(2, false, true, false, false),
EPHEMERAL(1, true, false, false, false),
EPHEMERAL_SEQUENTIAL(3, true, true, false, false),
CONTAINER(4, false, false, true, false),
PERSISTENT_WITH_TTL(5, false, false, false, true),
PERSISTENT_SEQUENTIAL_WITH_TTL(6, false, true, false, true);
现在来探究zookeeper如何实现顺序节点。
通过调试,我们找到关键代码:
zookeeper的预处理器PrepRequestProcessor方法pRequest2TxnCreate(...)
/**
* zk请求事务处理
*/
private void pRequest2TxnCreate(int type, Request request, Record record, boolean deserialize) throws IOException, KeeperException {
if (deserialize) {
ByteBufferInputStream.byteBuffer2Record(request.request, record);
}
int flags;
String path;
List<ACL> acl;
byte[] data;
long ttl;
if (type == OpCode.createTTL) {
CreateTTLRequest createTtlRequest = (CreateTTLRequest)record;
flags = createTtlRequest.getFlags();
path = createTtlRequest.getPath();
acl = createTtlRequest.getAcl();
data = createTtlRequest.getData();
ttl = createTtlRequest.getTtl();
} else {
CreateRequest createRequest = (CreateRequest)record;
flags = createRequest.getFlags();
path = createRequest.getPath();
acl = createRequest.getAcl();
data = createRequest.getData();
ttl = -1;
}
//flag to 节点类型的转化
CreateMode createMode = CreateMode.fromFlag(flags);
validateCreateRequest(path, createMode, request, ttl);
String parentPath = validatePathForCreate(path, request.sessionId);
List<ACL> listACL = fixupACL(path, request.authInfo, acl);
ChangeRecord parentRecord = getRecordForPath(parentPath);
//ACL检查
checkACL(zks, parentRecord.acl, ZooDefs.Perms.CREATE, request.authInfo);
int parentCVersion = parentRecord.stat.getCversion();
if (createMode.isSequential()) {
//判断是否是顺序节点,如果是顺序节点,则在路径的后半段拼接上父节点的Cversion属性
path = path + String.format(Locale.ENGLISH, "%010d", parentCVersion);
}
validatePath(path, request.sessionId);
try {
if (getRecordForPath(path) != null) {
throw new KeeperException.NodeExistsException(path);
}
} catch (KeeperException.NoNodeException e) {
// ignore this one
}
boolean ephemeralParent = EphemeralType.get(parentRecord.stat.getEphemeralOwner()) == EphemeralType.NORMAL;
if (ephemeralParent) {
throw new KeeperException.NoChildrenForEphemeralsException(path);
}
//newCversion进行自增
int newCversion = parentRecord.stat.getCversion()+1;
if (type == OpCode.createContainer) {
request.setTxn(new CreateContainerTxn(path, data, listACL, newCversion));
} else if (type == OpCode.createTTL) {
request.setTxn(new CreateTTLTxn(path, data, listACL, newCversion, ttl));
} else {
request.setTxn(new CreateTxn(path, data, listACL, createMode.isEphemeral(),
newCversion));
}
StatPersisted s = new StatPersisted();
if (createMode.isEphemeral()) {
s.setEphemeralOwner(request.sessionId);
}
parentRecord = parentRecord.duplicate(request.getHdr().getZxid());
parentRecord.childCount++;
parentRecord.stat.setCversion(newCversion);
addChangeRecord(parentRecord);
addChangeRecord(new ChangeRecord(request.getHdr().getZxid(), path, s, 0, listACL));
}
可以看到关键代码:
if (createMode.isSequential()) {
//判断是否是顺序节点,如果是顺序节点,则在路径的后半段拼接上父节点的Cversion属性
path = path + String.format(Locale.ENGLISH, "%010d", parentCVersion);
}
如果是顺序节点,则在客户端执行的路径后面加上10位的顺序数字串,例如0000000005,代表第5位。顺序的控制通过父节点的Cversino属性自增来进行控制和保存。
但是,在这段代码里面,我们没有看到zookeeper对节点或者别的对象进行加锁操作,那么,zookeeper是如何防止并发的呢,如何能保证两个同时到来的且相同路径名称请求不会出现相同的顺序。
我们接着看,上面PrepRequestProcessor只是对于zk客户端请求的预处理,还不是最终创建节点的代码,我们接着看最终创建节点的代码:
public void createNode(final String path, byte data[], List<ACL> acl,
long ephemeralOwner, int parentCVersion, long zxid, long time, Stat outputStat)
throws KeeperException.NoNodeException,
KeeperException.NodeExistsException {
int lastSlash = path.lastIndexOf('/');
//获取父节点名称
String parentName = path.substring(0, lastSlash);
String childName = path.substring(lastSlash + 1);
StatPersisted stat = new StatPersisted();
stat.setCtime(time);
stat.setMtime(time);
stat.setCzxid(zxid);
stat.setMzxid(zxid);
stat.setPzxid(zxid);
stat.setVersion(0);
stat.setAversion(0);
stat.setEphemeralOwner(ephemeralOwner);
DataNode parent = nodes.get(parentName);
if (parent == null) {
throw new KeeperException.NoNodeException();
}
synchronized (parent) {
Set<String> children = parent.getChildren();
//如果父节点下面存在相同名称的子节点,则返回异常
if (children.contains(childName)) {
throw new KeeperException.NodeExistsException();
}
if (parentCVersion == -1) {
parentCVersion = parent.stat.getCversion();
parentCVersion++;
}
parent.stat.setCversion(parentCVersion);
parent.stat.setPzxid(zxid);
Long longval = aclCache.convertAcls(acl);
DataNode child = new DataNode(data, longval, stat);
parent.addChild(childName);
nodes.put(path, child);
EphemeralType ephemeralType = EphemeralType.get(ephemeralOwner);
if (ephemeralType == EphemeralType.CONTAINER) {
containers.add(path);
} else if (ephemeralType == EphemeralType.TTL) {
ttls.add(path);
} else if (ephemeralOwner != 0) {
HashSet<String> list = ephemerals.get(ephemeralOwner);
if (list == null) {
list = new HashSet<String>();
ephemerals.put(ephemeralOwner, list);
}
synchronized (list) {
list.add(path);
}
}
if (outputStat != null) {
child.copyStat(outputStat);
}
}
// now check if its one of the zookeeper node child
if (parentName.startsWith(quotaZookeeper)) {
// now check if its the limit node
if (Quotas.limitNode.equals(childName)) {
// this is the limit node
// get the parent and add it to the trie
pTrie.addPath(parentName.substring(quotaZookeeper.length()));
}
if (Quotas.statNode.equals(childName)) {
updateQuotaForPath(parentName
.substring(quotaZookeeper.length()));
}
}
// also check to update the quotas for this node
String lastPrefix = getMaxPrefixWithQuota(path);
if(lastPrefix != null) {
// ok we have some match and need to update
updateCount(lastPrefix, 1);
updateBytes(lastPrefix, data == null ? 0 : data.length);
}
dataWatches.triggerWatch(path, Event.EventType.NodeCreated);
childWatches.triggerWatch(parentName.equals("") ? "/" : parentName,
Event.EventType.NodeChildrenChanged);
}
在最终创建节点的代码上,可以看到zookeeper如何对于相同路径名的节点进行防重处理:
1.获取父节点的名称,例如路径是:/12346789000000001,那么这里的父节点名称是:“”(这里还要分持久节点和临时节点,临时节点的根节点是zookeeper,不再是“”)。
2.对父节点进行synchronized加锁处理,对节点名称进行防重判断,如果名称重复,则抛出异常。
通过阅读最终创建节点源码,我们得知,zookeeper对于顺序节点的名称防重也不是放到最后来处理。
紧接我们回到PrepRequestProcessor的启动方法run():
@Override
public void run() {
try {
while (true) {
Request request = submittedRequests.take();
long traceMask = ZooTrace.CLIENT_REQUEST_TRACE_MASK;
if (request.type == OpCode.ping) {
traceMask = ZooTrace.CLIENT_PING_TRACE_MASK;
}
if (LOG.isTraceEnabled()) {
ZooTrace.logRequest(LOG, traceMask, 'P', request, "");
}
if (Request.requestOfDeath == request) {
break;
}
pRequest(request);
}
} catch (RequestProcessorException e) {
if (e.getCause() instanceof XidRolloverException) {
LOG.info(e.getCause().getMessage());
}
handleException(this.getName(), e);
} catch (Exception e) {
handleException(this.getName(), e);
}
LOG.info("PrepRequestProcessor exited loop!");
}
并且通过调试发现,该PrepRequestProcessor线程属于守护线程,而并非是根据请求单独生成的专属处理器。
那么到这里就容易理解了,采用守护线程来处理请求队列的方式,是目前大多框架常用的I/O多路复用和NIO的手段,同时接受多个客户端的请求,待客户端请求就绪之后(例如完成登录校验等),即把请求句柄加入到请求处理队列,并且唤醒处理器,处理器遍历队列,使用单线程的方式进行单个处理。由于处理上不存在对于不同的父节点而进行阻塞时间过长等因素,这样的处理能同时保证效率和并发的两个考量。
而在zookeeper处理顺序节点的问题上,假如两个顺序节点请求同时到来,由于两个请求加入到队列中存在先后顺序问题,故在处理上也存在先后的顺序问题,不存在同时获取到相同顺序的情况。
另外,zookeeper实现分布式锁也是类似的方式,通过在同一个节点下创建顺序子节点,子节点最小的获取到分布式锁。