在ZooKeeper的使用过程中,大家都知道ZooKeeper客户端与服务端在建立连接时使用长连接,以此来维护客户端与服务端之间的心跳及各种命令(远程通信),那么客户端的线程模型是什么样的呢?
一、线程模型
ZooKeeper客户端与服务端通信的线程模型主要由三个队列和两个线程组成。
三个队列分别为:
-
待发送消息队列(
OutgoingQueue
)
该队列主要存储需要发送的消息,使用java.util.concurrent.LinkedBlockingDeque -
已发送等待响应的队列(
PendingQueue
)
该队列主要存储已发送的消息,使用java.util.LinkedList -
事件队列(
EventQueue
)
该队列主要储存各种事件消息,使用java.util.concurrent.LinkedBlockingQueue
两个线程分别为:
- 消息发送线程
该线程主要维护客户端与服务端之间的消息通信以及超时重连机制。ZooKeeper有一种状态会导致该线程退出,在与服务端建立连接时,客户端会发送连接请求报文到服务端,服务端会根据请求中信息判断该连接会话是否过期,如果该连接被服务端判定为过期,该线程会退出。
2) 事件处理线程
该线程主要处理各种事件,处理事件种类详见org.apache.zookeeper.ClientCnxn.EventThread.processEvent(Objectevent)
方法。
主要实现过程
- 同步请求
①ZooKeeper客户端实例化完成后,会同时启动消息发送线程和事件处理线程。
②客户端在提交各种操作命令时都会先封装为数据包(Packet),加入到消息待发送队列队尾,再循环判断消息是否处理完成,未处理完成则同步等待。
③消息发送线程获取消息待发送队列队首消息,经由网络通讯模块发送消息,将发送消息在加入到已发送消息等待响应队列队尾。
④服务端返回结果,移除已发送消息等待响应队列队尾元素,处理返回结果,标记消息处理完成,并通知。
⑤客户端获取返回结果,继续执行业务处理。
- 异步请求
①ZooKeeper客户端实例化完成后,会同时启动消息发送线程和事件处理线程。
②客户端在提交各种操作命令时都会先封装为数据包(Packet),并提供回调函数,然后将该消息加入到消息待发送队列队尾,继续执行其它业务。
③消息发送线程从消息待发送队列获取消息,经由网络通讯模块发送消息,将发送消息在加入到已发送消息等待响应队列队尾。
④服务端返回结果,移除已发送消息等待响应队列队尾元素,处理返回结果,标记消息处理完成,并将该消息加入到事件处理队列队尾。
⑤事件处理线程获取事件处理队列队首消息,调用回调函数处理响应结果。
二、源码分析
类名:org.apache.zookeeper. ZooKeeper
public ZooKeeper(String connectString, int sessionTimeout, Watcher watcher) // 方法一
throws IOException
{
this(connectString, sessionTimeout, watcher, false);
}
public ZooKeeper(String connectString, int sessionTimeout, Watcher watcher, boolean canBeReadOnly) throws IOException // 方法二
{
LOG.info("Initiating client connection, connectString=" + connectString
+ " sessionTimeout=" + sessionTimeout + " watcher=" + watcher);
watchManager.defaultWatcher = watcher;
ConnectStringParser connectStringParser = new ConnectStringParser(
connectString);
HostProvider hostProvider = new StaticHostProvider(
connectStringParser.getServerAddresses());
cnxn = new ClientCnxn(connectStringParser.getChrootPath(),
hostProvider, sessionTimeout, this, watchManager,
getClientCnxnSocket(), canBeReadOnly);
cnxn.start();
}
其中方法一
为构造函数重载方法,其调用方法二
,并提供相应的默认的参数。
在方法二
中可以看到在实例化cnxn后,立即调用了cnxn.start()方法,启动了消息发送线程和事件处理线程。
- 2.1 线程
每当我们创建一个Zookeeper实例的时候,会有两个线程被创建:SendThread和EventThread。所以当我们使用ZK Client端的时候应该尽量只创建一个Zookeeper实例并反复使用。大量的创建销毁Zookeeper实例不仅会反复的创建和销毁线程,而且会在Server端创建大量的Session。
类名:org.apache.zookeeper.ClientCnxn.SendThread
:专门负责IO处理。
可以对run进行抽象看待,流程如下:
loop:
- try:
- - !isConnected()
- - - connect()
- - doTransport()
- catch:
- - cleanup()
close()
先判断是否连接,没有连接则调用connect方法进行连接,有连接则直接使用;然后调用doTransport方法进行通信,若连接过程中出现异常,则调用cleanup()方法;最后关闭连接。
public void run() {
while (state.isAlive()) { // this != CLOSED && this != AUTH_FAILED; 刚才设置了首次状态为连接状态
try {
//如果还没连上,则启动连接程序
if (!clientCnxnSocket.isConnected()) { //所有的clientCnxnSocket都是clientCnxnSocketNIO实例
//不是首次连接则休息1S
if(!isFirstConnect){
try {
Thread.sleep(r.nextInt(1000));s
} catch (InterruptedException e) {
LOG.warn("Unexpected exception", e);
}
}
// don't re-establish connection if we are closing
if (closing || !state.isAlive()) {
break;
}
startConnect();// 启动连接
clientCnxnSocket.updateLastSendAndHeard(); //更新Socket最后一次发送以及听到消息的时间
}
if (state.isConnected()) {
// determine whether we need to send an AuthFailed event.
if (zooKeeperSaslClient != null) {
......
}
// 下一次超时时间
to = readTimeout - clientCnxnSocket.getIdleRecv();
} else {
// 如果还没连接上 重置当前剩余可连接时间
to = connectTimeout - clientCnxnSocket.getIdleRecv();
}
// 连接超时
if (to <= 0) {
}
// 判断是否 需要发送Ping心跳包
if (state.isConnected()) {
sendPing();
}
// If we are in read-only mode, seek for read/write server
if (state == States.CONNECTEDREADONLY) {
}
// The most important step. Do real IO
clientCnxnSocket.doTransport(to, pendingQueue, outgoingQueue, ClientCnxn.this);
} catch (Throwable e) {
}
}
cleanup();
...
}
}
可以看到while循环中state.isAlive()返回true则该线程就不会退出,state.isAlive()该方法判断只要连接状态不是关闭或者权限验证失败就返回true,对应方法详见org.apache.zookeeper.ZooKeeper.States类。
该方法中有三个超时时间需要注意:
-
会话超时时间(SessionTimeout):当首次进行连接时,会话超时时间为配置参数传入时间。
-
连接超时时间:当首次进行连接时,连接超时时间为会话超时间/客户端连接串返回服务端个(sessionTimeout / hostProvider.size());当连接建立后和服务端通信,返回协商时间negotiatedSessionTimeout,连接超时时间为negotiatedSessionTimeout/客户端连接串返回服务端个数(connectTimeout= negotiatedSessionTimeout / hostProvider.size())
-
读超时时间:当首次进行连接时读超时时间为会话超时时间2/3,设置公式为sessionTimeout * 2 / 3;当连接建立后和服务端通信,返回协商时间negotiatedSessionTimeout,设置公式为negotiatedSessionTimeout 2 / 3
-
Session超时判断:当是在连接时,判断条件为连接超时时间-通信空闲时间(当前时间-上次消息发送时间)<0为超时,当是在连接建立后,通信时,判断条件为读超时时间-通信空闲时间<0为超时,会引起Session超时异常。
类名:org.apache.zookeeper.ClientCnxnSocketNIO
void registerAndConnect(SocketChannel sock, InetSocketAddress addr) throws IOException {
sockKey = sock.register(selector, SelectionKey.OP_CONNECT);
boolean immediateConnect = sock.connect(addr); //如果返回true表示连接已经建立完成,返回false如果要处理连接建立完成事件,需要关注sockKey对应的状态。
if (immediateConnect) {
sendThread.primeConnection();
}
}
void doTransport(int waitTimeOut, List<Packet> pendingQueue, ClientCnxn cnxn)
throws IOException, InterruptedException {
selector.select(waitTimeOut);
Set<SelectionKey> selected;
synchronized (this) {
selected = selector.selectedKeys();
}
// Everything below and until we get back to the select is
// non blocking, so time is effectively a constant. That is
// Why we just have to do this once, here
updateNow();
for (SelectionKey k : selected) {
SocketChannel sc = ((SocketChannel) k.channel());
if ((k.readyOps() & SelectionKey.OP_CONNECT) != 0) {
if (sc.finishConnect()) { //连接建立完成后,发送连接数据包
updateLastSendAndHeard();
updateSocketAddresses();
sendThread.primeConnection();
}
} else if ((k.readyOps() & (SelectionKey.OP_READ | SelectionKey.OP_WRITE)) != 0) {
doIO(pendingQueue, cnxn);
}
}
if (sendThread.getZkState().isConnected()) {
if (findSendablePacket(outgoingQueue,
sendThread.tunnelAuthInProgress()) != null) {
enableWrite();
}
}
selected.clear();
}
void doIO(List<Packet> pendingQueue, ClientCnxn cnxn) throws InterruptedException, IOException {
SocketChannel sock = (SocketChannel) sockKey.channel();
if (sock == null) {
throw new IOException("Socket is null!");
}
if (sockKey.isReadable()) {
int rc = sock.read(incomingBuffer);
if (rc < 0) {
throw new EndOfStreamException(
"Unable to read additional data from server sessionid 0x"
+ Long.toHexString(sessionId)
+ ", likely server has closed socket");
}
if (!incomingBuffer.hasRemaining()) {
incomingBuffer.flip();
if (incomingBuffer == lenBuffer) {
recvCount++;
readLength();
} else if (!initialized) {
readConnectResult();
enableRead();
if (findSendablePacket(outgoingQueue,
sendThread.tunnelAuthInProgress()) != null) {
// Since SASL authentication has completed (if client is configured to do so),
// outgoing packets waiting in the outgoingQueue can now be sent.
enableWrite();
}
lenBuffer.clear();
incomingBuffer = lenBuffer;
updateLastHeard();
initialized = true;
} else {
sendThread.readResponse(incomingBuffer);
lenBuffer.clear();
incomingBuffer = lenBuffer;
updateLastHeard();
}
}
}
......
}
这段IO代码比较有意思,在读取数据流操作时,首先判断incomingBuffer == lenBuffer,然而lenBuffer为ClientCnxnSocket中一个受保护的成员变量,其修饰符为final,当该条件成立时,读取数据流长度,并为incomingBuffer重新分配对应长度的数据数组,在下次数据读取时,读取对应长度的消息报文,再调用sendThread.readResponse(incomingBuffer)解析数据,继续清除数据,并将lenBuffer再次赋值给incomingBuffer。当第一次读取了数据,而initialized为false时,该时候读取到的数据包为建立连接后服务端推送的数据包,该数据包返回值会确定客户端连接的协商超时事件,会话ID等选项。
registerAndConnect方法是在发送消息线程中run方法中调用startConnect()调用。在客户端与服务端连接建立后会首先调用sendThread.primeConnection方法发送连接请求数据包。
- 2.2 创建节点
类名:org.apache.zookeeper. ZooKeeper
public String create(final String path, byte data[], List<ACL> acl, CreateMode createMode) throws KeeperException, InterruptedException
{
final String clientPath = path;
PathUtils.validatePath(clientPath, createMode.isSequential());
EphemeralType.validateTTL(createMode, -1);
final String serverPath = prependChroot(clientPath);
RequestHeader h = new RequestHeader();
h.setType(createMode.isContainer() ? ZooDefs.OpCode.createContainer : ZooDefs.OpCode.create);
CreateRequest request = new CreateRequest();
CreateResponse response = new CreateResponse();
request.setData(data);
request.setFlags(createMode.toFlag());
request.setPath(serverPath);
if (acl != null && acl.size() == 0) {
throw new KeeperException.InvalidACLException();
}
request.setAcl(acl);
ReplyHeader r = cnxn.submitRequest(h, request, response, null);
if (r.getErr() != 0) {
throw KeeperException.create(KeeperException.Code.get(r.getErr()),
clientPath);
}
if (cnxn.chrootPath == null) {
return response.getPath();
} else {
return response.getPath().substring(cnxn.chrootPath.length());
}
}
public void create(final String path, byte data[], List<ACL> acl, CreateMode createMode, StringCallback cb, Object ctx)
{
final String clientPath = path;
PathUtils.validatePath(clientPath, createMode.isSequential());
EphemeralType.validateTTL(createMode, -1);
final String serverPath = prependChroot(clientPath);
RequestHeader h = new RequestHeader();
h.setType(createMode.isContainer() ? ZooDefs.OpCode.createContainer : ZooDefs.OpCode.create);
CreateRequest request = new CreateRequest();
CreateResponse response = new CreateResponse();
ReplyHeader r = new ReplyHeader();
request.setData(data);
request.setFlags(createMode.toFlag());
request.setPath(serverPath);
request.setAcl(acl);
cnxn.queuePacket(h, r, request, response, cb, clientPath,
serverPath, ctx, null);
}
类名:org.apache.zookeeper.ClientCnxn
public ReplyHeader submitRequest(RequestHeader h, Record request, Record response, WatchRegistration watchRegistration)
throws InterruptedException {
return submitRequest(h, request, response, watchRegistration, null);
}
public ReplyHeader submitRequest(RequestHeader h, Record request, Record response, WatchRegistration watchRegistration,
WatchDeregistration watchDeregistration)
throws InterruptedException {
ReplyHeader r = new ReplyHeader();
Packet packet = queuePacket(h, r, request, response, null, null, null,
null, watchRegistration, watchDeregistration);
synchronized (packet) {
while (!packet.finished) {
packet.wait();
}
}
return r;
}
public Packet queuePacket(RequestHeader h, ReplyHeader r, Record request, Record response, AsyncCallback cb, String clientPath,
String serverPath, Object ctx, WatchRegistration watchRegistration) {
return queuePacket(h, r, request, response, cb, clientPath, serverPath,
ctx, watchRegistration, null);
}
public Packet queuePacket(RequestHeader h, ReplyHeader r, Record request, Record response, AsyncCallback cb, String clientPath,
String serverPath, Object ctx, WatchRegistration watchRegistration,
WatchDeregistration watchDeregistration) {
Packet packet = null;
// Note that we do not generate the Xid for the packet yet. It is
// generated later at send-time, by an implementation of ClientCnxnSocket::doIO(),
// where the packet is actually sent.
packet = new Packet(h, r, request, response, watchRegistration);
packet.cb = cb;
packet.ctx = ctx;
packet.clientPath = clientPath;
packet.serverPath = serverPath;
packet.watchDeregistration = watchDeregistration;
// The synchronized block here is for two purpose:
// 1. synchronize with the final cleanup() in SendThread.run() to avoid race
// 2. synchronized against each packet. So if a closeSession packet is added,
// later packet will be notified.
synchronized (state) {
if (!state.isAlive() || closing) {
conLossPacket(packet);
} else {
// If the client is asking to close the session then
// mark as closing
if (h.getType() == OpCode.closeSession) {
closing = true;
}
outgoingQueue.add(packet);
}
}
sendThread.getClientCnxnSocket().packetAdded();
return packet;
}
private void finishPacket(Packet p) {
......
if (p.cb == null) {
synchronized (p) {
p.finished = true;
p.notifyAll();
}
} else {
p.finished = true;
eventThread.queuePacket(p);
}
}
类名:org.apache.zookeeper.ClientCnxn.SendThread
void readResponse(ByteBuffer incomingBuffer) throws IOException {
ByteBufferInputStream bbis = new ByteBufferInputStream(
incomingBuffer);
BinaryInputArchive bbia = BinaryInputArchive.getArchive(bbis);
ReplyHeader replyHdr = new ReplyHeader();
replyHdr.deserialize(bbia, "header");
......
Packet packet;
synchronized (pendingQueue) {
if (pendingQueue.size() == 0) {
throw new IOException("Nothing in the queue, but got "
+ replyHdr.getXid());
}
packet = pendingQueue.remove();
}
/*
* Since requests are processed in order, we better get a response
* to the first request!
*/
try {
if (packet.requestHeader.getXid() != replyHdr.getXid()) {
packet.replyHeader.setErr(
KeeperException.Code.CONNECTIONLOSS.intValue());
throw new IOException("Xid out of order. Got Xid "
+ replyHdr.getXid() + " with err " +
+ replyHdr.getErr() +
" expected Xid "
+ packet.requestHeader.getXid()
+ " for a packet with details: "
+ packet );
}
packet.replyHeader.setXid(replyHdr.getXid());
packet.replyHeader.setErr(replyHdr.getErr());
packet.replyHeader.setZxid(replyHdr.getZxid());
if (replyHdr.getZxid() > 0) {
lastZxid = replyHdr.getZxid();
}
if (packet.response != null && replyHdr.getErr() == 0) {
packet.response.deserialize(bbia, "response");
}
if (LOG.isDebugEnabled()) {
LOG.debug("Reading reply sessionid:0x"
+ Long.toHexString(sessionId) + ", packet:: " + packet);
}
} finally {
finishPacket(packet);
}
}
类名:org.apache.zookeeper.ClientCnxn.EventThread
private void processEvent(Object event) {
try {
......
Packet p = (Packet) event;
int rc = 0;
String clientPath = p.clientPath;
if (p.replyHeader.getErr() != 0) {
rc = p.replyHeader.getErr();
}
if (p.cb == null) {
LOG.warn("Somehow a null cb got to EventThread!");
} else if (p.response instanceof ExistsResponse
|| p.response instanceof SetDataResponse
|| p.response instanceof SetACLResponse) {
StatCallback cb = (StatCallback) p.cb;
if (rc == 0) {
if (p.response instanceof ExistsResponse) {
cb.processResult(rc, clientPath, p.ctx,
((ExistsResponse) p.response)
.getStat());
} else if (p.response instanceof SetDataResponse) {
cb.processResult(rc, clientPath, p.ctx,
((SetDataResponse) p.response)
.getStat());
} else if (p.response instanceof SetACLResponse) {
cb.processResult(rc, clientPath, p.ctx,
((SetACLResponse) p.response)
.getStat());
}
} else {
cb.processResult(rc, clientPath, p.ctx, null);
}
}
......
}
} catch (Throwable t) {
LOG.error("Caught unexpected throwable", t);
}
}
}
可以看到ZooKeeper客户端在提交请求时,同步创建节点调用了cnxn.submitRequest(h, request, response, null)方法,异步创建节点直接调用了cnxn.queuePacket方法。
在同步请求中,submitRequest其实也是对queuePacket进行了包装,让其达到同步的效果。
执行操作时,queuePacket方法先将请求封装为Packet加入到消息待发送队列,然后在cnxn.submitRequest方法中同步Packet对象,循环调用packet.finished判断消息是否处理完成,没有则等待packet.wait()方法等待消息处理完成。IO线程发送消息并在接收到服务端响应后,调用readResponse方法,该方法对返回结果进行反序列化处理,最终并调用finishPacket方法。在finishPacket方法中,可以看到,如果p.cb如果为空的话,则表示同步请求,则调用packet.notifyAll方法通知等待该响应结果的线程;如果p.cb不为空,则将数据包加入到eventQueue中,由事件处理线程调用processEvent处理事件。