在前面的源码分析过程中,我们分析了单机的server和客户端启动过程,大体上较为简单,下面分析下zk的客户端和服务端的连接过程,依然是从上一篇中那个测试类开始,这次重点分析客户端与服务端是如何维持稳定连接,以及客户端和服务端是如何进行收发请求。
上一篇中,讲解了客户端的发送队列,客户端会起一个线程,生成请求包,并通过sendthread发送服务端,服务端处理主要从accepet线程转接到select线程池队列开始,前面有提到过,该nio的网络模式是多主多从的模式,能较好应大量的并发客户端请求。下面从服务端处理请求开始,熟悉处理处理请求过程。
public void run() {
try {
while (!stopped) {//非停止状态
try {
select();//先进行判断selector上有注册的Io事件read write,并处理
processAcceptedConnections(); //处理accept 队列,处理客户端的连接,注册对应的selectKey为可读
processInterestOpsUpdateRequests(); //处理update队列,重置下selectkey的读写事件,更新可读或者可写的连接
} catch (RuntimeException e) {
LOG.warn("Ignoring unexpected runtime exception", e);
} catch (Exception e) {
LOG.warn("Ignoring unexpected exception", e);
}
}
// Close connections still pending on the selector. Any others
// with in-flight work, let drain out of the work queue.
for (SelectionKey key : selector.keys()) {//循环所有key,关闭连接
NIOServerCnxn cnxn = (NIOServerCnxn) key.attachment();
if (cnxn.isSelectable()) {
cnxn.close(ServerCnxn.DisconnectReason.SERVER_SHUTDOWN);
}
cleanupSelectionKey(key);
}
SocketChannel accepted;
while ((accepted = acceptedQueue.poll()) != null) {//关闭socketchannel
fastCloseSock(accepted);
}
updateQueue.clear();//清空up
} finally {
closeSelector();
// This will wake up the accept thread and the other selector
// threads, and tell the worker thread pool to begin shutdown.
NIOServerCnxnFactory.this.stop();
LOG.info("selector thread exitted run method");
}
}
上面属于select线程池队列中的一个处理线程,处理主要是监听selector的事件。循环过程中会先处理是否有io时间,再处理连接事件,连接事件的处理方法如下。
private void processAcceptedConnections() {
SocketChannel accepted;
while (!stopped && (accepted = acceptedQueue.poll()) != null) {
SelectionKey key = null;
try {
key = accepted.register(selector, SelectionKey.OP_READ); //将已经连接的server的key
NIOServerCnxn cnxn = createConnection(accepted, key, this); //生成一个NIOServerCnxn 作为attachment,同时作为唯一标识,标识客户端连接,类似作为一个session
LOG.info("*^^*客户端连接{}",cnxn);
key.attach(cnxn);
addCnxn(cnxn); //更新客户端超时时间,以及客户端连接数
} catch (IOException e) {
// register, createConnection
cleanupSelectionKey(key);
fastCloseSock(accepted);
}
}
}
连接处理方法主要是socketchanel在selector上注册读时间,这样可以接受后续客户端传来包,然后有个很重要的对象 NIOServerCnxn,这个对象相当于客户端连接的信息存储,储存了zkserver、sockechanel、selecyKey以及线程实例。然后将这个attach到slectionKey中,方便后续取出。
除此之外,addCxn中是更新客户端连接时间以及判断每个客户端最大连接数,默认60,以及很常用的判断客户端连接是否超时,若超时的话,超时线程,见下方,会释放这个客户端连接。
private void addCnxn(NIOServerCnxn cnxn) throws IOException {
InetAddress addr = cnxn.getSocketAddress();
if (addr == null) {
throw new IOException("Socket of " + cnxn + " has been closed");
}
Set<NIOServerCnxn> set = ipMap.get(addr);
if (set == null) {
// in general we will see 1 connection from each
// host, setting the initial cap to 2 allows us
// to minimize mem usage in the common case
// of 1 entry -- we need to set the initial cap
// to 2 to avoid rehash when the first entry is added
// Construct a ConcurrentHashSet using a ConcurrentHashMap
set = Collections.newSetFromMap(new ConcurrentHashMap<NIOServerCnxn, Boolean>(2));
// Put the new set in the map, but only if another thread
// hasn't beaten us to it
Set<NIOServerCnxn> existingSet = ipMap.putIfAbsent(addr, set);
if (existingSet != null) {
set = existingSet;
}
}
set.add(cnxn); //限制每个ip的连接数
cnxns.add(cnxn);
touchCnxn(cnxn); //更新连接超时时间集合
}
private class ConnectionExpirerThread extends ZooKeeperThread {
ConnectionExpirerThread() {
super("ConnnectionExpirer");
}
public void run() {
try {
while (!stopped) {
long waitTime = cnxnExpiryQueue.getWaitTime(); //判断是否到下一轮判断连接超时时间
if (waitTime > 0) {
Thread.sleep(waitTime);
continue;
}
//每次poll出一个set集合,更新下一次检查超时时间,每隔一段时间移除到时间的客户端连接
for (NIOServerCnxn conn : cnxnExpiryQueue.poll()) {
//简单来说一个客户端在连接的时候会不断刷新自己的连接超时上限时间,保证自己得连接不被关闭,代码层面是维护一个key为时间,value为set集合的map
//关闭之后,改连接字段stale会置为true,同时zk的datatree会移除该连接的监听
ServerMetrics.getMetrics().SESSIONLESS_CONNECTIONS_EXPIRED.add(1); //连接超时指标
conn.close(ServerCnxn.DisconnectReason.CONNECTION_EXPIRED); //关闭连接
LOG.info("*^^*连接超时信息:{}", conn.toString());
}
}
} catch (InterruptedException e) {
LOG.info("ConnnectionExpirerThread interrupted");
}
}
}
完成连接注册后,selectChanel注册读事件后,会重新进入select方法去处理io事件,具体方法还是便利slector的selectorKeys,看哪一个selectKey包含读写时间,然后进行处理,
private void select() {
try {
selector.select();
Set<SelectionKey> selected = selector.selectedKeys();
ArrayList<SelectionKey> selectedList = new ArrayList<SelectionKey>(selected);
Collections.shuffle(selectedList);
Iterator<SelectionKey> selectedKeys = selectedList.iterator();
while (!stopped && selectedKeys.hasNext()) {
SelectionKey key = selectedKeys.next();
selected.remove(key);
if (!key.isValid()) {
cleanupSelectionKey(key);
continue;
}
if (key.isReadable() || key.isWritable()) {
LOG.info("*^^*处理io:{}", key.attachment());
handleIO(key);
} else {
LOG.warn("Unexpected ops in select {}", key.readyOps());
}
}
} catch (IOException e) {
LOG.warn("Ignoring IOException while selecting", e);
}
}
handio主要是调用io线程池workerpool来处理封装的连接对象IOWorkRequest ,io线程在zkserver启动时已经初始化,可以看下之前的文章。
private void handleIO(SelectionKey key) {
IOWorkRequest workRequest = new IOWorkRequest(this, key);
NIOServerCnxn cnxn = (NIOServerCnxn) key.attachment(); //拿出accept装载的这个attachment
// Stop selecting this key while processing on its
// connection
cnxn.disableSelectable();
key.interestOps(0);//处理这个key时,禁止这个Key的其他事件
touchCnxn(cnxn); //更新客户端连接超时时间
workerPool.schedule(workRequest);
}
正在的处理线程任务调用函数在workRequest对象中,其实最终也是调用 NIOServerCnxn的doio方法。
void doIO(SelectionKey k) throws InterruptedException {
try {
if (!isSocketOpen()) {
LOG.warn("trying to do i/o on a null socket for session: 0x{}", Long.toHexString(sessionId));
return;
}
if (k.isReadable()) {
int rc = sock.read(incomingBuffer);
if (rc < 0) {
handleFailedRead();
}
if (incomingBuffer.remaining() == 0) {
boolean isPayload;
if (incomingBuffer == lenBuffer) { // start of next request
incomingBuffer.flip();
isPayload = readLength(k);
incomingBuffer.clear();
} else {
// continuation
isPayload = true;
}
if (isPayload) { // not the case for 4letterword
readPayload();
} else {
// four letter words take care
// need not do anything else
return;
}
}
}
if (k.isWritable()) {
handleWrite(k);
if (!initialized && !getReadInterest() && !getWriteInterest()) {
throw new CloseRequestException("responded to info probe", DisconnectReason.INFO_PROBE);
}
}
..捕获异常
}
若是selectkey是read时,初始化incommingbuffer后, 进行对读到的请求包反序列化,并且区分第一次客户端请求和后续请求处理逻辑,当第一次连接时,会创建一个sessionid,然后新建一个请求,opcode为创建session,最后提交到了requestThrottler的生产消费队列。
long createSession(ServerCnxn cnxn, byte[] passwd, int timeout) {
if (passwd == null) {
// Possible since it's just deserialized from a packet on the wire.
passwd = new byte[0];
}
long sessionId = sessionTracker.createSession(timeout);
Random r = new Random(sessionId ^ superSecret);
r.nextBytes(passwd);
ByteBuffer to = ByteBuffer.allocate(4);
to.putInt(timeout);
cnxn.setSessionId(sessionId);
Request si = new Request(cnxn, sessionId, 0, OpCode.createSession, to, null);
submitRequest(si);
return sessionId;
}
requestThrottler中的线程会一直消费请求加入的队列,请求来了之后,开始处理请求,处理请求主要是通过链式调用,一开始初始化zkserver时,就初始化了链式调用的请求,可以回顾下
protected void setupRequestProcessors() {
RequestProcessor finalProcessor = new FinalRequestProcessor(this);
RequestProcessor syncProcessor = new SyncRequestProcessor(this, finalProcessor);//后置处理器,链式结构
((SyncRequestProcessor) syncProcessor).start(); //中间处理器
firstProcessor = new PrepRequestProcessor(this, syncProcessor); //前置处理器
((PrepRequestProcessor) firstProcessor).start();
}
请求处理器会依次调用这三个请求完成请求的处理,这三个请求处理器都是线程,都是采用生产消费阻塞线程,来完成请求处理。