深入分析 Watcher 机制的实现原理
ZooKeeper 的 Watcher 机制,总的来说可以分为三个过程:
-
客户端注册 Watcher、
-
服务器处理 Watcher
-
客户端回调 Watcher
客户端注册 watcher 有 3 种方式,getData、exists、getChildren;以如下代码为例来分析整个触发机制的原理
![image-20200531002042353](https://xiepanpan123.oss-cn-beijing.aliyuncs.com/image-20200531002042353.png)
客户端注册 Watcher
在创建一个 ZooKeeper 客户端对象实例时,我们通过new Watcher()向构造方法中传入一个默认的 Watcher, 这
个 Watcher 将作为整个 ZooKeeper 会话期间的默认Watcher,会一直被保存在客户端 ZKWatchManager 的defaultWatcher 中
public ZooKeeper(
String connectString,
int sessionTimeout,
Watcher watcher,
long sessionId,
byte[] sessionPasswd,
boolean canBeReadOnly,
HostProvider hostProvider,
ZKClientConfig clientConfig) throws IOException {
LOG.info(
"Initiating client connection, connectString={} "
+ "sessionTimeout={} watcher={} sessionId=0x{} sessionPasswd={}",
connectString,
sessionTimeout,
watcher,
Long.toHexString(sessionId),
(sessionPasswd == null ? "<null>" : "<hidden>"));
this.clientConfig = clientConfig != null ? clientConfig : new ZKClientConfig();
ConnectStringParser connectStringParser = new ConnectStringParser(connectString);
this.hostProvider = hostProvider;
//初始化了 ClientCnxn,并且调用 cnxn.start()方法
cnxn = new ClientCnxn(
connectStringParser.getChrootPath(),
hostProvider,
sessionTimeout,
this.clientConfig,
watcher,
getClientCnxnSocket(),
sessionId,
sessionPasswd,
canBeReadOnly);
cnxn.seenRwServerBefore = true; // since user has provided sessionId
cnxn.start();
}
ClientCnxn:是 Zookeeper 客户端和 Zookeeper 服务器端进行通信和事件通知处理的主要类,它内部包含两个类,
- SendThread :负责客户端和服务器端的数据通信, 也包括事件信息的传输
- EventThread : 主要在客户端回调注册的 Watchers 进行通知处理
客户端 通过 exists 注册监听
public Stat exists(final String path, Watcher watcher) throws KeeperException, InterruptedException {
final String clientPath = path;
PathUtils.validatePath(clientPath);
// the watch contains the un-chroot path
WatchRegistration wcb = null;
if (watcher != null) {
wcb = new ExistsWatchRegistration(watcher, clientPath);
}
final String serverPath = prependChroot(clientPath);
RequestHeader h = new RequestHeader();
h.setType(ZooDefs.OpCode.exists);
ExistsRequest request = new ExistsRequest();
//是否注册监听
request.setPath(serverPath);
request.setWatch(watcher != null);
//设置服务端响应的接收类
SetDataResponse response = new SetDataResponse();
//将封装的 RequestHeader、ExistsRequest、SetDataResponse、WatchRegistration 添加到发送队列
ReplyHeader r = cnxn.submitRequest(h, request, response, wcb);
if (r.getErr() != 0) {
if (r.getErr() == KeeperException.Code.NONODE.intValue()) {
return null;
}
throw KeeperException.create(KeeperException.Code.get(r.getErr()), clientPath);
}
//返回 exists 得到的结果(Stat 信息)
return response.getStat().getCzxid() == -1 ? null : response.getStat();
}
进入cnxn.submitRequest 方法
public ReplyHeader submitRequest(
RequestHeader h,
Record request,
Record response,
WatchRegistration watchRegistration,
WatchDeregistration watchDeregistration) throws InterruptedException {
ReplyHeader r = new ReplyHeader();
//将消息添加到队列,并构造一个 Packet 传输对象
Packet packet = queuePacket(
h,
r,
request,
response,
null,
null,
null,
null,
watchRegistration,
watchDeregistration);
synchronized (packet) {
if (requestTimeout > 0) {
// Wait for request completion with timeout
waitForPacketFinish(r, packet);
} else {
// Wait for request completion infinitely
while (!packet.finished) {
//在数据包没有处理完成之前,一直阻塞
packet.wait();
}
}
}
if (r.getErr() == Code.REQUESTTIMEOUT.intValue()) {
sendThread.cleanAndNotifyState();
}
return r;
}
queuePacket方法:
public Packet queuePacket(
RequestHeader h,
ReplyHeader r,
Record request,
Record response,
AsyncCallback cb,
String clientPath,
String serverPath,
Object ctx,
WatchRegistration watchRegistration,
WatchDeregistration watchDeregistration) {
//将相关传输对象转化成 Packet
Packet packet = null;
// Note that we do not generate the Xid for the packet yet. It is
// generated later at send-time, by an implementation of ClientCnxnSocket::doIO(),
// where the packet is actually sent.
packet = new Packet(h, r, request, response, watchRegistration);
packet.cb = cb;
packet.ctx = ctx;
packet.clientPath = clientPath;
packet.serverPath = serverPath;
packet.watchDeregistration = watchDeregistration;
// The synchronized block here is for two purpose:
// 1. synchronize with the final cleanup() in SendThread.run() to avoid race
// 2. synchronized against each packet. So if a closeSession packet is added,
// later packet will be notified.
synchronized (state) {
if (!state.isAlive() || closing) {
conLossPacket(packet);
} else {
// If the client is asking to close the session then
// mark as closing
if (h.getType() == OpCode.closeSession) {
closing = true;
}
//添加到 outgoingQueue
outgoingQueue.add(packet);
}
}
//触发selector的执行 此处是多路复用机制,唤醒 Selector,告诉他有数据包添加过来了
sendThread.getClientCnxnSocket().packetAdded();
return packet;
}
在 ZooKeeper 中,Packet 是一个最小的通信协议单元,即数据包。Pakcet 用于进行客户端与服务端之间的网络传
输,任何需要传输的对象都需要包装成一个 Packet 对象。在 ClientCnxn 中 WatchRegistration 也 会被封 装到
Pakcet 中,然后由 SendThread 线程调用 queuePacket方法把 Packet 放入发送队列中等待客户端发送,这又是
一个异步过程,分布式系统采用异步通信是一个非常常见的手段
在初始化连接的时候,zookeeper 初始化了两个线程并且启动了。接下来我们来分析 SendThread 的发送过程,因
为是一个线程,所以启动的时候会调用 SendThread.run 方法
@Override
public void run() {
clientCnxnSocket.introduce(this, sessionId, outgoingQueue);
clientCnxnSocket.updateNow();
clientCnxnSocket.updateLastSendAndHeard();
int to;
long lastPingRwServer = Time.currentElapsedTime();
final int MAX_SEND_PING_INTERVAL = 10000; //10 seconds
InetSocketAddress serverAddress = null;
while (state.isAlive()) {
try {
//如果没有连接 发起连接
if (!clientCnxnSocket.isConnected()) {
// don't re-establish connection if we are closing
if (closing) {
break;
}
if (rwServerAddress != null) {
serverAddress = rwServerAddress;
rwServerAddress = null;
} else {
serverAddress = hostProvider.next(1000);
}
onConnecting(serverAddress);
//发起连接
startConnect(serverAddress);
clientCnxnSocket.updateLastSendAndHeard();
}
if (state.isConnected()) {
// determine whether we need to send an AuthFailed event.
//如果是连接状态 则处理sasl的认证授权
if (zooKeeperSaslClient != null) {
boolean sendAuthEvent = false;
if (zooKeeperSaslClient.getSaslState() == ZooKeeperSaslClient.SaslState.INITIAL) {
try {
zooKeeperSaslClient.initialize(ClientCnxn.this);
} catch (SaslException e) {
LOG.error("SASL authentication with Zookeeper Quorum member failed.", e);
changeZkState(States.AUTH_FAILED);
sendAuthEvent = true;
}
}
KeeperState authState = zooKeeperSaslClient.getKeeperState();
if (authState != null) {
if (authState == KeeperState.AuthFailed) {
// An authentication error occurred during authentication with the Zookeeper Server.
changeZkState(States.AUTH_FAILED);
sendAuthEvent = true;
} else {
if (authState == KeeperState.SaslAuthenticated) {
sendAuthEvent = true;
}
}
}
if (sendAuthEvent) {
eventThread.queueEvent(new WatchedEvent(Watcher.Event.EventType.None, authState, null));
if (state == States.AUTH_FAILED) {
eventThread.queueEventOfDeath();
}
}
}
to = readTimeout - clientCnxnSocket.getIdleRecv();
} else {
to = connectTimeout - clientCnxnSocket.getIdleRecv();
}
//to 表示客户端距离 timeout还剩下多少时间 准备发起ping连接
//超时
if (to <= 0) {
String warnInfo = String.format(
"Client session timed out, have not heard from server in %dms for session id 0x%s",
clientCnxnSocket.getIdleRecv(),
Long.toHexString(sessionId));
LOG.warn(warnInfo);
throw new SessionTimeoutException(warnInfo);
}
if (state.isConnected()) {
//1000(1 second) is to prevent race condition missing to send the second ping
//also make sure not to send too many pings when readTimeout is small
//计算下一次ping请求的时间
int timeToNextPing = readTimeout / 2
- clientCnxnSocket.getIdleSend()
- ((clientCnxnSocket.getIdleSend() > 1000) ? 1000 : 0);
//send a ping request either time is due or no packet sent out within MAX_SEND_PING_INTERVAL
if (timeToNextPing <= 0 || clientCnxnSocket.getIdleSend() > MAX_SEND_PING_INTERVAL) {
//发送ping请求
sendPing();
clientCnxnSocket.updateLastSend();
} else {
if (timeToNextPing < to) {
to = timeToNextPing;
}
}
}
// If we are in read-only mode, seek for read/write server
if (state == States.CONNECTEDREADONLY) {
long now = Time.currentElapsedTime();
int idlePingRwServer = (int) (now - lastPingRwServer);
if (idlePingRwServer >= pingRwTimeout) {
lastPingRwServer = now;
idlePingRwServer = 0;
pingRwTimeout = Math.min(2 * pingRwTimeout, maxPingRwTimeout);
pingRwServer();
}
to = Math.min(to, pingRwTimeout - idlePingRwServer);
}
//调用 clientCnxnSocket,发起传输 其中 pendingQueue 是一个用来存放已经发送、等待回应的 Packet 队列,
//clientCnxnSocket 默 认 使 用
//ClientCnxnSocketNIO(ps:还记得在哪里初始化吗?实例化 zookeeper 的时候)
clientCnxnSocket.doTransport(to, pendingQueue, ClientCnxn.this);
} catch (Throwable e) {
if (closing) {
// closing so this is expected
LOG.warn(
"An exception was thrown while closing send thread for session 0x{}.",
Long.toHexString(getSessionId()),
e);
break;
} else {
LOG.warn(
"Session 0x{} for sever {}, Closing socket connection. "
+ "Attempting reconnect except it is a SessionExpiredException.",
Long.toHexString(getSessionId()),
serverAddress,
e);
// At this point, there might still be new packets appended to outgoingQueue.
// they will be handled in next connection or cleared up if closed.
cleanAndNotifyState();
}
}
}
synchronized (state) {
// When it comes to this point, it guarantees that later queued
// packet to outgoingQueue will be notified of death.
cleanup();
}
clientCnxnSocket.close();
if (state.isAlive()) {
eventThread.queueEvent(new WatchedEvent(Event.EventType.None, Event.KeeperState.Disconnected, null));
}
eventThread.queueEvent(new WatchedEvent(Event.EventType.None, Event.KeeperState.Closed, null));
ZooTrace.logTraceMessage(
LOG,
ZooTrace.getTextTraceLevel(),
"SendThread exited loop for session: 0x" + Long.toHexString(getSessionId()));
}
ClientCnxnSocketNetty的doTransport方法
@Override
void doTransport(
int waitTimeOut,
Queue<Packet> pendingQueue,
ClientCnxn cnxn) throws IOException, InterruptedException {
try {
if (!firstConnect.await(waitTimeOut, TimeUnit.MILLISECONDS)) {
return;
}
Packet head = null;
if (needSasl.get()) {
if (!waitSasl.tryAcquire(waitTimeOut, TimeUnit.MILLISECONDS)) {
return;
}
} else {
//从outgoingQueue中取出数据包保存到实例中
head = outgoingQueue.poll(waitTimeOut, TimeUnit.MILLISECONDS);
}
// check if being waken up on closing.
if (!sendThread.getZkState().isAlive()) {
// adding back the packet to notify of failure in conLossPacket().
addBack(head);
return;
}
// channel disconnection happened
if (disconnected.get()) {
//异常流程,channel 关闭了,讲当前的 packet 添加到 addBack 中
addBack(head);
throw new EndOfStreamException("channel for sessionid 0x" + Long.toHexString(sessionId) + " is lost");
}
//关键
if (head != null) {
//如果当前存在需要发送的数据包,则调用 doWrite 方法,pendingQueue 表示处于已经发送过等待响应的 packet 队列
doWrite(pendingQueue, head, cnxn);
}
} finally {
updateNow();
}
}
进入doWrite方法
private void doWrite(Queue<Packet> pendingQueue, Packet p, ClientCnxn cnxn) {
updateNow();
boolean anyPacketsSent = false;
while (true) {
if (p != WakeupPacket.getInstance()) {
判断请求头以及判断当前请求类型不是 ping 或者 auth 操作
if ((p.requestHeader != null)
&& (p.requestHeader.getType() != ZooDefs.OpCode.ping)
&& (p.requestHeader.getType() != ZooDefs.OpCode.auth)) {
设置 xid,这个 xid 用来区分请求类型
p.requestHeader.setXid(cnxn.getXid());
synchronized (pendingQueue) {
//将当前的 packet 添加到 pendingQueue 队列中
pendingQueue.add(p);
}
}
//将数据包发送出去
sendPktOnly(p);
anyPacketsSent = true;
}
if (outgoingQueue.isEmpty()) {
break;
}
p = outgoingQueue.remove();
}
// TODO: maybe we should flush in the loop above every N packets/bytes?
// But, how do we determine the right value for N ...
if (anyPacketsSent) {
channel.flush();
}
}
进入sendPktOnly sendPkt方法
private ChannelFuture sendPkt(Packet p, boolean doFlush) {
// Assuming the packet will be sent out successfully. Because if it fails,
// the channel will close and clean up queues.
// //序列化请求数据
p.createBB();
更 新 最 后 一 次 发 送updateLastSend
updateLastSend();
final ByteBuf writeBuffer = Unpooled.wrappedBuffer(p.bb);
// 通过 nio channel 发送字节缓存到服务端
final ChannelFuture result = doFlush ? channel.writeAndFlush(writeBuffer) : channel.write(writeBuffer);
result.addListener(onSendPktDoneListener);
return result;
}
p.createBB()方法
public void createBB() {
try {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
BinaryOutputArchive boa = BinaryOutputArchive.getArchive(baos);
boa.writeInt(-1, "len"); // We'll fill this in later
if (requestHeader != null) {
//序列化 header 头(requestHeader)
requestHeader.serialize(boa, "header");
}
if (request instanceof ConnectRequest) {
request.serialize(boa, "connect");
// append "am-I-allowed-to-be-readonly" flag
boa.writeBool(readOnly, "readOnly");
} else if (request != null) {
//序列化 request(request)
request.serialize(boa, "request");
}
baos.close();
this.bb = ByteBuffer.wrap(baos.toByteArray());
this.bb.putInt(this.bb.capacity() - 4);
this.bb.rewind();
} catch (IOException e) {
LOG.warn("Unexpected exception", e);
}
}
从 createBB 方法中,我们看到在底层实际的网络传输序列化中,zookeeper 只会讲 requestHeader 和 request 两个
属性进行序列化,即只有这两个会被序列化到底层字节数组中去进行网络传输,不会将 watchRegistration 相关的信
息进行网络传输。
总结
用户调用 exists 注册监听以后,会做几个事情
-
讲请求数据封装为 packet,添加到 outgoingQueue
-
SendThread 这个线程会执行数据发送操作,主要是将outgoingQueue 队列中的数据发送到服务端
-
通过 clientCnxnSocket.doTransport(to, pendingQueue, ClientCnxn.this); 其中 ClientCnxnSocket 只 zookeeper客户端和服务端的连接通信的封装,有两个具体的实现类 ClientCnxnSocketNetty 和 ClientCnxnSocketNIO;具体使用哪一个类来实现发送,是在初始化过程是在实例化 Zookeeper 的时候设置的,代码如下
cnxn = new ClientCnxn(
connectStringParser.getChrootPath(),
hostProvider,
sessionTimeout,
this.clientConfig,
watcher,
getClientCnxnSocket(),
sessionId,
sessionPasswd,
canBeReadOnly);
cnxn.seenRwServerBefore = true; // since user has provided sessionId
cnxn.start();
从配置文件 zoo.conf 配置 nio还是netty
private ClientCnxnSocket getClientCnxnSocket() throws IOException {
String clientCnxnSocketName = getClientConfig().getProperty(ZKClientConfig.ZOOKEEPER_CLIENT_CNXN_SOCKET);
if (clientCnxnSocketName == null) {
clientCnxnSocketName = ClientCnxnSocketNIO.class.getName();
}
try {
Constructor<?> clientCxnConstructor = Class.forName(clientCnxnSocketName)
.getDeclaredConstructor(ZKClientConfig.class);
ClientCnxnSocket clientCxnSocket = (ClientCnxnSocket) clientCxnConstructor.newInstance(getClientConfig());
return clientCxnSocket;
} catch (Exception e) {
throw new IOException("Couldn't instantiate " + clientCnxnSocketName, e);
}
}
- . 基于第 3 步,最终会在 ClientCnxnSocketNetty 方法中
执行 sendPkt 将请求的数据包发送到服务端
具体:
发送数据包:
![image-20200531004504090](https://xiepanpan123.oss-cn-beijing.aliyuncs.com/image-20200531004504090.png)