zookeeper watch java_深入浅出Zookeeper(四):Watch实现剖析

版本日期备注1.02020.4.8文章首发

1.12020.4.18优化小结部分描述,使其更加详细易懂

1.22020.8.10删除部分不重要的代码,减少篇幅,优化描述方式

1. 前言本文面向读者:有java基础并对zk源码实现感兴趣的同学。

用过zookeeper的同学都知道watch是一个非常好用的机制,今天我们就来看看它的实现原理。

在正文开始前,我们先来简单回忆一下watch是什么?

zk提供了分布式数据的发布/订阅功能——即典型的发布订阅模型,其定义了一种一对多的订阅关系,能够让多个订阅者同时监听某个主题对象,当这个主题对象自身状态变化时,则会通知所有订阅者。具体来说,则是zk允许一个客户端向服务端注册一个watch监听,当服务端的一些指定事件触发了这个watch,那么就会向该客户端发送事件通知。

2. 实现剖析

在剖析其实现前,我们不妨来想一想,如果自己动手实现一个watch机制,该怎么做呢?

最简单的方法是在client保存当前节点的版本,并去轮询这个节点的状态。如果发现版本变化,则client触发watch。不过比起轮询,不是有更多的好方法,不是吗?

轮询会给服务器带来不小的压力,或许我们可以考虑采用类似webhook的方式,让server保存和client约定好的地址,当watch的数据节点发生变化时,便通知client。

想到这儿,其实已经和zk自己的watch实现有点像了。沿着这个思路,我们开始剖析:

63c3c3dd2348062fc0f26a9dba0d345a.png

一般我们在使用client时,getData、getChildren、exist都可以用来向zk注册watcher。其原理都是一样的,我们以exist方法为例子进行剖析——即当关注的节点被创建or删除时,client会收到通知。

2.1 客户端的实现

先从客户端部分开始解析——即以我们调用的原生API为入口。

2.1.1 Zookeeper.exists/**

* The asynchronous version of exists.

*

* @see #exists(String, Watcher)

*/

public void exists(final String path, Watcher watcher,

StatCallback cb, Object ctx)

{

final String clientPath = path;

PathUtils.validatePath(clientPath);

// the watch contains the un-chroot path

WatchRegistration wcb = null; //1.

if (watcher != null) {

wcb = new ExistsWatchRegistration(watcher, clientPath);

}

final String serverPath = prependChroot(clientPath);

RequestHeader h = new RequestHeader();

h.setType(ZooDefs.OpCode.exists);

ExistsRequest request = new ExistsRequest();

request.setPath(serverPath);

request.setWatch(watcher != null);//3.

SetDataResponse response = new SetDataResponse();

cnxn.queuePacket(h, new ReplyHeader(), request, response, cb,

clientPath, serverPath, ctx, wcb); //2.

}

和watch相关的地方有两处,一个是其被转换成WatchRegistration中的一个属性(见注释标记1),最后变成一个Packet——即client与server的最小通信单元(见注释标记2)。

另外一处则是request.setWatch(watcher != null),是个布尔变量。(见注释标记3)

2.1.2 ClientCnxn.queuePacket

在上文中的最后一段代码是 cnxn.queuePacket(h, new ReplyHeader(), request, response, cb,clientPath, serverPath, ctx, wcb);,我们继续看下去。public Packet queuePacket(RequestHeader h, ReplyHeader r, Record request,

Record response, AsyncCallback cb, String clientPath,

String serverPath, Object ctx, WatchRegistration watchRegistration) {

return queuePacket(h, r, request, response, cb, clientPath, serverPath,

ctx, watchRegistration, null);

}

public Packet queuePacket(RequestHeader h, ReplyHeader r, Record request,

Record response, AsyncCallback cb, String clientPath,

String serverPath, Object ctx, WatchRegistration watchRegistration,

WatchDeregistration watchDeregistration) {

Packet packet = null;

// Note that we do not generate the Xid for the packet yet. It is

// generated later at send-time, by an implementation of ClientCnxnSocket::doIO(),

// where the packet is actually sent.

packet = new Packet(h, r, request, response, watchRegistration);

packet.cb = cb;

packet.ctx = ctx;

packet.clientPath = clientPath;

packet.serverPath = serverPath;

packet.watchDeregistration = watchDeregistration;

// The synchronized block here is for two purpose:

// 1. synchronize with the final cleanup() in SendThread.run() to avoid race

// 2. synchronized against each packet. So if a closeSession packet is added,

// later packet will be notified.

synchronized (state) {

if (!state.isAlive() || closing) {

conLossPacket(packet);

} else {

// If the client is asking to close the session then

// mark as closing

if (h.getType() == OpCode.closeSession) {

closing = true;

}

outgoingQueue.add(packet);

}

}

sendThread.getClientCnxnSocket().packetAdded();

return packet;

}

这段代码看起来很多,不过仅仅在做一件事——即拼装Packet,并将其加入发送队列。该队列由ClientCnxn中的一个SendThread消费(见SendThread.run)。该方法有较多的条件分支,且不够clean code,故在此不再贴代码,避免扰乱视听。

从下面的代码可以得知,在Zk的client中,会维护发送队列和等待回复的队列,里面都是一个个Packet。/**

* These are the packets that have been sent and are waiting for a response.

*/

private final LinkedList pendingQueue = new LinkedList();

/**

* These are the packets that need to be sent.

*/

private final LinkedBlockingDeque outgoingQueue = new LinkedBlockingDeque();

接下来,我们查看SendThread.readReponse,即消费client队列中Packet的逻辑:void readResponse(ByteBuffer incomingBuffer) throws IOException {

ByteBufferInputStream bbis = new ByteBufferInputStream(

incomingBuffer);

BinaryInputArchive bbia = BinaryInputArchive.getArchive(bbis);

ReplyHeader replyHdr = new ReplyHeader();

replyHdr.deserialize(bbia, "header");

if (replyHdr.getXid() == -2) {

// -2 is the xid for pings

if (LOG.isDebugEnabled()) {

LOG.debug("Got ping response for sessionid: 0x"

+ Long.toHexString(sessionId)

+ " after "

+ ((System.nanoTime() - lastPingSentNs) / 1000000)

+ "ms");

}

return;

}

if (replyHdr.getXid() == -4) {

// -4 is the xid for AuthPacket

if(replyHdr.getErr() == KeeperException.Code.AUTHFAILED.intValue()) {

state = States.AUTH_FAILED;

eventThread.queueEvent( new WatchedEvent(Watcher.Event.EventType.None,

Watcher.Event.KeeperState.AuthFailed, null) );

eventThread.queueEventOfDeath();

}

if (LOG.isDebugEnabled()) {

LOG.debug("Got auth sessionid:0x"

+ Long.toHexString(sessionId));

}

return;

}

if (replyHdr.getXid() == -1) {

// -1 means notification

if (LOG.isDebugEnabled()) {

LOG.debug("Got notification sessionid:0x"

+ Long.toHexString(sessionId));

}

WatcherEvent event = new WatcherEvent();

event.deserialize(bbia, "response");

// convert from a server path to a client path

if (chrootPath != null) {

String serverPath = event.getPath();

if(serverPath.compareTo(chrootPath)==0)

event.setPath("/");

else if (serverPath.length() > chrootPath.length())

event.setPath(serverPath.substring(chrootPath.length()));

else {

LOG.warn("Got server path " + event.getPath()

+ " which is too short for chroot path "

+ chrootPath);

}

}

WatchedEvent we = new WatchedEvent(event);

if (LOG.isDebugEnabled()) {

LOG.debug("Got " + we + " for sessionid 0x"

+ Long.toHexString(sessionId));

}

eventThread.queueEvent( we );

return;

}

// If SASL authentication is currently in progress, construct and

// send a response packet immediately, rather than queuing a

// response as with other packets.

if (tunnelAuthInProgress()) {

GetSASLRequest request = new GetSASLRequest();

request.deserialize(bbia,"token");

zooKeeperSaslClient.respondToServer(request.getToken(),

ClientCnxn.this);

return;

}

Packet packet;

synchronized (pendingQueue) {

if (pendingQueue.size() == 0) {

throw new IOException("Nothing in the queue, but got "

+ replyHdr.getXid());

}

packet = pendingQueue.remove();

}

/*

* Since requests are processed in order, we better get a response

* to the first request!

*/

try {

if (packet.requestHeader.getXid() != replyHdr.getXid()) {

packet.replyHeader.setErr(

KeeperException.Code.CONNECTIONLOSS.intValue());

throw new IOException("Xid out of order. Got Xid "

+ replyHdr.getXid() + " with err " +

+ replyHdr.getErr() +

" expected Xid "

+ packet.requestHeader.getXid()

+ " for a packet with details: "

+ packet );

}

packet.replyHeader.setXid(replyHdr.getXid());

packet.replyHeader.setErr(replyHdr.getErr());

packet.replyHeader.setZxid(replyHdr.getZxid());

if (replyHdr.getZxid() > 0) {

lastZxid = replyHdr.getZxid();

}

if (packet.response != null && replyHdr.getErr() == 0) {

packet.response.deserialize(bbia, "response");

}

if (LOG.isDebugEnabled()) {

LOG.debug("Reading reply sessionid:0x"

+ Long.toHexString(sessionId) + ", packet:: " + packet);

}

} finally {

finishPacket(packet);

}

}

synchronized (pendingQueue)中,我们可以看到从队列中拿出了Packet,并最后将其丢入了finishPacket。protected void finishPacket(Packet p) {

int err = p.replyHeader.getErr();

if (p.watchRegistration != null) {

p.watchRegistration.register(err);

}

// Add all the removed watch events to the event queue, so that the

// clients will be notified with 'Data/Child WatchRemoved' event type.

if (p.watchDeregistration != null) {

Map> materializedWatchers = null;

try {

materializedWatchers = p.watchDeregistration.unregister(err);

for (Entry> entry : materializedWatchers

.entrySet()) {

Set watchers = entry.getValue();

if (watchers.size() > 0) {

queueEvent(p.watchDeregistration.getClientPath(), err,

watchers, entry.getKey());

// ignore connectionloss when removing from local

// session

p.replyHeader.setErr(Code.OK.intValue());

}

}

} catch (KeeperException.NoWatcherException nwe) {

p.replyHeader.setErr(nwe.code().intValue());

} catch (KeeperException ke) {

p.replyHeader.setErr(ke.code().intValue());

}

}

if (p.cb == null) {

synchronized (p) {

p.finished = true;

p.notifyAll();

}

} else {

p.finished = true;

eventThread.queuePacket(p);

}

}

这段方法中,我们会分析这两段逻辑:p.watchRegistration.register

queueEvent

2.1.3 watchRegistration

需要注意的是,WatchRegistration在Packet发送前并不会被序列化发送过去,避免发送不必要的信息,毕竟已经在request中标记为watch了。那么这个WatchRegistration有什么用呢?还好register的逻辑很简单,我们来看一下:/**

* Register the watcher with the set of watches on path.

* @param rc the result code of the operation that attempted to

* add the watch on the path.

*/

public void register(int rc) {

if (shouldAddWatch(rc)) {

Map> watches = getWatches(rc);

synchronized(watches) {

Set watchers = watches.get(clientPath);

if (watchers == null) {

watchers = new HashSet();

watches.put(clientPath, watchers);

}

watchers.add(watcher);

}

}

}

2.1.4 queueEvent

代码比较少,可以看到client维护了一个path-watchers的字典,到这里,相信大多数读者都能猜到实现了——即收到回复时根据相应的path去找对应wacher。接下来来看queueEvent:void queueEvent(String clientPath, int err,

Set materializedWatchers, EventType eventType) {

KeeperState sessionState = KeeperState.SyncConnected;

if (KeeperException.Code.SESSIONEXPIRED.intValue() == err

|| KeeperException.Code.CONNECTIONLOSS.intValue() == err) {

sessionState = Event.KeeperState.Disconnected;

}

WatchedEvent event = new WatchedEvent(eventType, sessionState,

clientPath);

eventThread.queueEvent(event, materializedWatchers);

}

逻辑很简单,判断状态,然后组装event,交给eventThread去做通知。private void queueEvent(WatchedEvent event,

Set materializedWatchers) {

if (event.getType() == EventType.None

&& sessionState == event.getState()) {

return;

}

sessionState = event.getState();

final Set watchers;

if (materializedWatchers == null) {

// materialize the watchers based on the event

watchers = watcher.materialize(event.getState(),

event.getType(), event.getPath());

} else {

watchers = new HashSet();

watchers.addAll(materializedWatchers);

}

WatcherSetEventPair pair = new WatcherSetEventPair(watchers, event);

// queue the pair (watch set & event) for later processing

waitingEvents.add(pair);

}

ClientWatchManager.materialize不再展示源码,我们只要知道,在这段逻辑中

,当watch被触发后,即会被移除,而状态正是保存在ZkWatchManager里:static class ZKWatchManager implements ClientWatchManager {

private final Map> dataWatches =

new HashMap>();

private final Map> existWatches =

new HashMap>();

private final Map> childWatches =

new HashMap>();

//......

}

再说回来eventThread.run最后做的事情——即入队。那么我们来看看这个线程的核心方法:@Override

@SuppressFBWarnings("JLM_JSR166_UTILCONCURRENT_MONITORENTER")

public void run() {

try {

isRunning = true;

while (true) {

Object event = waitingEvents.take();

if (event == eventOfDeath) {

wasKilled = true;

} else {

processEvent(event);

}

if (wasKilled)

synchronized (waitingEvents) {

if (waitingEvents.isEmpty()) {

isRunning = false;

break;

}

}

}

} catch (InterruptedException e) {

LOG.error("Event thread exiting due to interruption", e);

}

LOG.info("EventThread shut down for session: 0x{}",

Long.toHexString(getSessionId()));

}

又是熟悉的配方熟悉的味道——就是一个死循环去消费队列里的元素,然后我们来看看processEvent:private void processEvent(Object event) {

try {

if (event instanceof WatcherSetEventPair) {

// each watcher will process the event

WatcherSetEventPair pair = (WatcherSetEventPair) event;

for (Watcher watcher : pair.watchers) {

try {

watcher.process(pair.event);

} catch (Throwable t) {

LOG.error("Error while calling watcher ", t);

}

}

} else if (event instanceof LocalCallback) {

//在本文中这些逻辑不重要,skip

}

当process被调用后,我们自己编写的逻辑就会被触发。

看完客户端部分的代码,大家不妨可以思考思考,将一个上层的机制(watch)与底层的通信代码(如finishPacket)显示的写在一起真的好吗?如果让你来写,你会怎么做呢?

2.2 服务端实现

在上文,我们了解了client的watch相关实现,接下来,我们就来捋一捋服务端的watch实现。

我们直接到ZkServer handle request的地方——FinalRequestProcessor的processRequest中的相关部分:case OpCode.exists: {

lastOp = "EXIS";

// TODO we need to figure out the security requirement for this!

ExistsRequest existsRequest = new ExistsRequest();

ByteBufferInputStream.byteBuffer2Record(request.request,

existsRequest);

String path = existsRequest.getPath();

if (path.indexOf('\0') != -1) {

throw new KeeperException.BadArgumentsException();

}

Stat stat = zks.getZKDatabase().statNode(path, existsRequest

.getWatch() ? cnxn : null);

rsp = new ExistsResponse(stat);

break;

}

可以看到,如果request是要求watch的,那么会将ServerCnxn传递下去,ServerCnxn代表了客户端和服务器之间的连接。这样当数据事件发生时,可以通过连接触发client的watch。

跳转DataTree.statNode:public Stat statNode(String path, Watcher watcher)

throws KeeperException.NoNodeException {

Stat stat = new Stat();

DataNode n = nodes.get(path);

if (watcher != null) {

dataWatches.addWatch(path, watcher);

}

if (n == null) {

throw new KeeperException.NoNodeException();

}

synchronized (n) {

n.copyStat(stat);

return stat;

}

}

当watcher != null时,则会添加一个watcher当服务端的dataWatches中。接下来,我们来看一下服务端的watch核心类——WatchManager:/**

* This class manages watches. It allows watches to be associated with a string

* and removes watchers and their watches in addition to managing triggers.

*/

class WatchManager {

private static final Logger LOG = LoggerFactory.getLogger(WatchManager.class);

private final HashMap> watchTable =

new HashMap>();

private final HashMap> watch2Paths =

new HashMap>();

synchronized int size(){

int result = 0;

for(Set watches : watchTable.values()) {

result += watches.size();

}

return result;

}

synchronized void addWatch(String path, Watcher watcher) {

HashSet list = watchTable.get(path);

if (list == null) {

// don't waste memory if there are few watches on a node

// rehash when the 4th entry is added, doubling size thereafter

// seems like a good compromise

list = new HashSet(4);

watchTable.put(path, list);

}

list.add(watcher);

HashSet paths = watch2Paths.get(watcher);

if (paths == null) {

// cnxns typically have many watches, so use default cap here

paths = new HashSet();

watch2Paths.put(watcher, paths);

}

paths.add(path);

}

synchronized void removeWatcher(Watcher watcher) {

HashSet paths = watch2Paths.remove(watcher);

if (paths == null) {

return;

}

for (String p : paths) {

HashSet list = watchTable.get(p);

if (list != null) {

list.remove(watcher);

if (list.size() == 0) {

watchTable.remove(p);

}

}

}

}

Set triggerWatch(String path, EventType type) {

return triggerWatch(path, type, null);

}

Set triggerWatch(String path, EventType type, Set supress) {

WatchedEvent e = new WatchedEvent(type,

KeeperState.SyncConnected, path);

HashSet watchers;

synchronized (this) {

watchers = watchTable.remove(path);

if (watchers == null || watchers.isEmpty()) {

if (LOG.isTraceEnabled()) {

ZooTrace.logTraceMessage(LOG,

ZooTrace.EVENT_DELIVERY_TRACE_MASK,

"No watchers for " + path);

}

return null;

}

for (Watcher w : watchers) {

HashSet paths = watch2Paths.get(w);

if (paths != null) {

paths.remove(path);

}

}

}

for (Watcher w : watchers) {

if (supress != null && supress.contains(w)) {

continue;

}

w.process(e);

}

return watchers;

}

/**

* Brief description of this object.

*/

@Override

public synchronized String toString() {

StringBuilder sb = new StringBuilder();

sb.append(watch2Paths.size()).append(" connections watching ")

.append(watchTable.size()).append(" paths\n");

int total = 0;

for (HashSet paths : watch2Paths.values()) {

total += paths.size();

}

sb.append("Total watches:").append(total);

return sb.toString();

}

/**

* String representation of watches. Warning, may be large!

* @param byPath iff true output watches by paths, otw output

* watches by connection

* @return string representation of watches

*/

synchronized void dumpWatches(PrintWriter pwriter, boolean byPath) {

if (byPath) {

for (Entry> e : watchTable.entrySet()) {

pwriter.println(e.getKey());

for (Watcher w : e.getValue()) {

pwriter.print("\t0x");

pwriter.print(Long.toHexString(((ServerCnxn)w).getSessionId()));

pwriter.print("\n");

}

}

} else {

for (Entry> e : watch2Paths.entrySet()) {

pwriter.print("0x");

pwriter.println(Long.toHexString(((ServerCnxn)e.getKey()).getSessionId()));

for (String path : e.getValue()) {

pwriter.print("\t");

pwriter.println(path);

}

}

}

}

/**

* Checks the specified watcher exists for the given path

*

* @param path

* znode path

* @param watcher

* watcher object reference

* @return true if the watcher exists, false otherwise

*/

synchronized boolean containsWatcher(String path, Watcher watcher) {

HashSet paths = watch2Paths.get(watcher);

if (paths == null || !paths.contains(path)) {

return false;

}

return true;

}

/**

* Removes the specified watcher for the given path

*

* @param path

* znode path

* @param watcher

* watcher object reference

* @return true if the watcher successfully removed, false otherwise

*/

synchronized boolean removeWatcher(String path, Watcher watcher) {

HashSet paths = watch2Paths.get(watcher);

if (paths == null || !paths.remove(path)) {

return false;

}

HashSet list = watchTable.get(path);

if (list == null || !list.remove(watcher)) {

return false;

}

if (list.size() == 0) {

watchTable.remove(path);

}

return true;

}

/**

* Returns a watch report.

*

* @return watch report

* @see WatchesReport

*/

synchronized WatchesReport getWatches() {

Map> id2paths = new HashMap>();

for (Entry> e: watch2Paths.entrySet()) {

Long id = ((ServerCnxn) e.getKey()).getSessionId();

HashSet paths = new HashSet(e.getValue());

id2paths.put(id, paths);

}

return new WatchesReport(id2paths);

}

/**

* Returns a watch report by path.

*

* @return watch report

* @see WatchesPathReport

*/

synchronized WatchesPathReport getWatchesByPath() {

Map> path2ids = new HashMap>();

for (Entry> e : watchTable.entrySet()) {

Set ids = new HashSet(e.getValue().size());

path2ids.put(e.getKey(), ids);

for (Watcher watcher : e.getValue()) {

ids.add(((ServerCnxn) watcher).getSessionId());

}

}

return new WatchesPathReport(path2ids);

}

/**

* Returns a watch summary.

*

* @return watch summary

* @see WatchesSummary

*/

synchronized WatchesSummary getWatchesSummary() {

int totalWatches = 0;

for (HashSet paths : watch2Paths.values()) {

totalWatches += paths.size();

}

return new WatchesSummary (watch2Paths.size(), watchTable.size(),

totalWatches);

}

}

整个类非常好理解,先看两个核心成员变量:watchTable:path-watchs

watch2Paths:watcher-paths

addWatch就是往两个map中添加数据,而触发便是根据path遍历出那些watcher,并从内存中删除它们,然后调用它们的process——这时ServerCnxn就会发送一个Packet到client。

那么什么时候触发呢?也很简单。就在DataTree的代码里,对相应数据进行操作时,就会触发watcher。我们以DataTree.setData为例:public Stat setData(String path, byte data[], int version, long zxid,

long time) throws KeeperException.NoNodeException {

Stat s = new Stat();

DataNode n = nodes.get(path);

if (n == null) {

throw new KeeperException.NoNodeException();

}

byte lastdata[] = null;

synchronized (n) {

lastdata = n.data;

n.data = data;

n.stat.setMtime(time);

n.stat.setMzxid(zxid);

n.stat.setVersion(version);

n.copyStat(s);

}

// now update if the path is in a quota subtree.

String lastPrefix = getMaxPrefixWithQuota(path);

if(lastPrefix != null) {

this.updateBytes(lastPrefix, (data == null ? 0 : data.length)

- (lastdata == null ? 0 : lastdata.length));

}

// 触发处

dataWatches.triggerWatch(path, EventType.NodeDataChanged);

return s;

}

至此,我们就理清watch在Zk里到底是怎么一回事了。同时,我们也了解watcher的几个特性:一次性:无论是client还是server,一旦watcher被触发,zk都会将其移除。这意味着开发者需要反复注册,但是好处也很明显——降低了服务器压力,避免频繁更新的节点一直触发watcher。

客户端串行执行:客户端回调是一个串行同步的过程,这使得回调是有序的。同样,开发者要注意不要因为一个watcher的逻辑影响整个client回调。

轻量:client的request是否要watch其实仅仅通过一个boolean来决定,同样的,server的response的watch回调——WatchedEvent也仅仅只有三个属性:通知状态

事件类型

节点路径

这种轻量化的设计使得网络开销和服务端内存开销上都是很廉价的。

3. 小结

在本文中,我们一起了解了watch的实现机理。简单总结如下:client在发送请求时候,会将watch的具体状态保存在client中,即存在于等待回复队列中

标记watch的request到达服务端后,服务端会将这个watcher(包含client的连接属性)以字典的形式保存在内存中

当watch的数据发生相应变化时,去字典里找出注册的watch,并拿到对应client连接

根据连接,发送一个通知到client

client从等待回复队列中取出元素,watch的回调被触发

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值