ZooKeeper源码分析之完整网络通信流程(一)

2021SC@SDUSC

前言

接下来将进入源码世界来一步一步分析客户端与服务端之间是如何通过ClientCnxn/ServerCnxn来建立起网络通信的。而这次内容将分为三章来讲述,在本章中将介绍客户端是如何将请求发送到服务端的,后两章将分别介绍服务端是如何响应客户端请求的以及客户端收到服务端的响应之后是如何操作的。

ZooKeeper中网络通信执行流程

① 在ZooKeeper的构造函数中,创建了客户端与服务端之间的ClientCnxn交互连接。从而能使客户端发出的请求通过该交互连接传输给服务端,其中createConnection方法返回的是ClientCnxn。

        // 创建客户端连接,并初始化SendThread和EventThread  
        cnxn = createConnection(
            connectStringParser.getChrootPath(),
            hostProvider,
            sessionTimeout,
            this,
            watchManager,
            getClientCnxnSocket(),
            canBeReadOnly);
        // 启动SendThread和EventThread 
        cnxn.start();

创建客户端连接的具体代码如下:

	    protected ClientCnxn createConnection(
	    	// 客户端路径
 	        String chrootPath,
 	        // 服务端
	        HostProvider hostProvider,
	        // 会话超时
	        int sessionTimeout,
	        ZooKeeper zooKeeper,
	        // 客户端监听器
	        ClientWatchManager watcher,
	        // 客户端连接Socket
	        ClientCnxnSocket clientCnxnSocket,
	        boolean canBeReadOnly) throws IOException {
	        return new ClientCnxn(
	            chrootPath,
	            hostProvider,
	            sessionTimeout,
	            this,
	            watchManager,
	            clientCnxnSocket,
	            canBeReadOnly);
	    }

② sendThread是ClientCnxn的内部类,也是ZooKeeper中的一个线程,核心是run()方法。

(1)在run()方法中,如果客户端连接没有开始创建,那么会调用sendThread()中的startConnect()方法进行异步连接。

        public void run() {
            clientCnxnSocket.introduce(this, sessionId, outgoingQueue);
            clientCnxnSocket.updateNow();
            clientCnxnSocket.updateLastSendAndHeard();
            int to;
            long lastPingRwServer = Time.currentElapsedTime();
            final int MAX_SEND_PING_INTERVAL = 10000; //10 seconds
            InetSocketAddress serverAddress = null;

            while (state.isAlive()) {
                try {
                    // 如果客户端连接没有连接起来
                    if (!clientCnxnSocket.isConnected()) {
                        // don't re-establish connection if we are closing
                        if (closing) {
                            break;
                        }
                        if (rwServerAddress != null) {
                            serverAddress = rwServerAddress;
                            rwServerAddress = null;
                        } else {
                            serverAddress = hostProvider.next(1000);
                        }
                        onConnecting(serverAddress);
                        //异步连接
                        startConnect(serverAddress);
                        // Update now to start the connection timer right after we make a connection attempt
                        clientCnxnSocket.updateNow();
                        clientCnxnSocket.updateLastSendAndHeard();
                    }
                    // 如果客户端连接已经连接上服务端
                    if (state.isConnected()) {
                        // determine whether we need to send an AuthFailed event.
                        if (zooKeeperSaslClient != null) {
                            boolean sendAuthEvent = false;
                            if (zooKeeperSaslClient.getSaslState() == ZooKeeperSaslClient.SaslState.INITIAL) {
                                try {
                                    zooKeeperSaslClient.initialize(ClientCnxn.this);
                                } catch (SaslException e) {
                                    LOG.error("SASL authentication with Zookeeper Quorum member failed.", e);
                                    changeZkState(States.AUTH_FAILED);
                                    sendAuthEvent = true;
                                }
                            }
                            KeeperState authState = zooKeeperSaslClient.getKeeperState();
                            if (authState != null) {
                                if (authState == KeeperState.AuthFailed) {
                                    // An authentication error occurred during authentication with the Zookeeper Server.
                                    changeZkState(States.AUTH_FAILED);
                                    sendAuthEvent = true;
                                } else {
                                    if (authState == KeeperState.SaslAuthenticated) {
                                        sendAuthEvent = true;
                                    }
                                }
                            }

                            if (sendAuthEvent) {
                                eventThread.queueEvent(new WatchedEvent(Watcher.Event.EventType.None, authState, null));
                                if (state == States.AUTH_FAILED) {
                                    eventThread.queueEventOfDeath();
                                }
                            }
                        }
                        // 下一次查询超时时间
                        to = readTimeout - clientCnxnSocket.getIdleRecv();
                    } else {
                        // 递减连接超时时间
                        to = connectTimeout - clientCnxnSocket.getIdleRecv();
                    }
                    // 如果会话超时,包括连接超时
                    if (to <= 0) {
                        String warnInfo = String.format(
                            "Client session timed out, have not heard from server in %dms for session id 0x%s",
                            clientCnxnSocket.getIdleRecv(),
                            Long.toHexString(sessionId));
                        LOG.warn(warnInfo);
                        throw new SessionTimeoutException(warnInfo);
                    }
                    // 如果发送为空闲状态,则发送Ping包
                    if (state.isConnected()) {
                        //1000(1 second) is to prevent race condition missing to send the second ping
                        //also make sure not to send too many pings when readTimeout is small
                        int timeToNextPing = readTimeout / 2
                                             - clientCnxnSocket.getIdleSend()
                                             - ((clientCnxnSocket.getIdleSend() > 1000) ? 1000 : 0);
                        //send a ping request either time is due or no packet sent out within MAX_SEND_PING_INTERVAL
                        if (timeToNextPing <= 0 || clientCnxnSocket.getIdleSend() > MAX_SEND_PING_INTERVAL) {
                            sendPing();
                            clientCnxnSocket.updateLastSend();
                        } else {
                            if (timeToNextPing < to) {
                                to = timeToNextPing;
                            }
                        }
                    }

                    // 如果是只读模式,则寻找R/W server,如果找到,则清理之前的连接,并重新连接到R/W server
                    if (state == States.CONNECTEDREADONLY) {
                        long now = Time.currentElapsedTime();
                        int idlePingRwServer = (int) (now - lastPingRwServer);
                        if (idlePingRwServer >= pingRwTimeout) {
                            lastPingRwServer = now;
                            idlePingRwServer = 0;
                            pingRwTimeout = Math.min(2 * pingRwTimeout, maxPingRwTimeout);
                            // 同步测试下个server是否是R/W server,如果是则抛出RWServerFoundException
                            pingRwServer();
                        }
                        to = Math.min(to, pingRwTimeout - idlePingRwServer);
                    }
                    //处理IO
                    clientCnxnSocket.doTransport(to, pendingQueue, ClientCnxn.this);
                } catch (Throwable e) {
                    if (closing) {
                        // closing so this is expected
                        LOG.warn(
                            "An exception was thrown while closing send thread for session 0x{}.",
                            Long.toHexString(getSessionId()),
                            e);
                        break;
                    } else {
                        LOG.warn(
                            "Session 0x{} for server {}, Closing socket connection. "
                                + "Attempting reconnect except it is a SessionExpiredException.",
                            Long.toHexString(getSessionId()),
                            serverAddress,
                            e);

                        // At this point, there might still be new packets appended to outgoingQueue.
                        // they will be handled in next connection or cleared up if closed.
                        cleanAndNotifyState();
                    }
                }
            }

            synchronized (state) {
                //清理之前的连接,找下一台server连接
                cleanup();
            }
            clientCnxnSocket.close();
            if (state.isAlive()) {
                eventThread.queueEvent(new WatchedEvent(Event.EventType.None, Event.KeeperState.Disconnected, null));
            }
            eventThread.queueEvent(new WatchedEvent(Event.EventType.None, Event.KeeperState.Closed, null));
            ZooTrace.logTraceMessage(
                LOG,
                ZooTrace.getTextTraceLevel(),
                "SendThread exited loop for session: 0x" + Long.toHexString(getSessionId()));
        }

(2)在startConnect()中调用了clientCnxnSocket.connect(addr)进行异步连接,默认为NIO实现的连接。

        private void startConnect(InetSocketAddress addr) throws IOException {
            // 初始化并创建连接
            saslLoginFailed = false;
            if (!isFirstConnect) {//如果不是第一次连接,则尝试休眠一段时间后唤醒
                try {
                    Thread.sleep(ThreadLocalRandom.current().nextLong(1000));
                } catch (InterruptedException e) {
                    LOG.warn("Unexpected exception", e);
                }
            }
            //将改变状态为连接中
            changeZkState(States.CONNECTING);
            // 主机端口
            String hostPort = addr.getHostString() + ":" + addr.getPort();
            MDC.put("myid", hostPort);
            setName(getName().replaceAll("\\(.*\\)", "(" + hostPort + ")"));
            if (clientConfig.isSaslClientEnabled()) {
                try {
                    if (zooKeeperSaslClient != null) {
                        zooKeeperSaslClient.shutdown();
                    }
                    zooKeeperSaslClient = new ZooKeeperSaslClient(SaslServerPrincipal.getServerPrincipal(addr, clientConfig), clientConfig);
                } catch (LoginException e) {
                    LOG.warn(
                        "SASL configuration failed. "
                            + "Will continue connection to Zookeeper server without "
                            + "SASL authentication, if Zookeeper server allows it.", e);
                    eventThread.queueEvent(new WatchedEvent(Watcher.Event.EventType.None, Watcher.Event.KeeperState.AuthFailed, null));
                    saslLoginFailed = true;
                }
            }
            logStartConnect(addr);
            // 开始异步连接
            clientCnxnSocket.connect(addr);
        }

(3)connect()方法为具体创建连接的方法,在这里使用默认的NIO实现的连接进行分析,connect()的具体实现在ClientCnxnSocketNIO中。在connect()方法中又调用了registerAndConnect(sock, addr)方法来注册连接事件,尝试连接。

    void connect(InetSocketAddress addr) throws IOException {
        // 创建客户端SocketChannel
        SocketChannel sock = createSock();
        try {
            // 注册连接事件
            registerAndConnect(sock, addr);
        } catch (UnresolvedAddressException | UnsupportedAddressTypeException | SecurityException | IOException e) {
            LOG.error("Unable to open socket to {}", addr);
            sock.close();
            throw e;
        }
        //session还未初始化
        initialized = false;
        //重置2个读buffer,准备下一次读
        lenBuffer.clear();
        incomingBuffer = lenBuffer;
    }

(4)如果连接成功,那么会调用send.Thread.primeConnection()方法来初始化并创建session等操作。

    void registerAndConnect(SocketChannel sock, InetSocketAddress addr) throws IOException {
        sockKey = sock.register(selector, SelectionKey.OP_CONNECT);
        //尝试连接
        boolean immediateConnect = sock.connect(addr);
        //如果连接成功,则调用primeConnection()去创建session等操作
        if (immediateConnect) {
            sendThread.primeConnection();
        }
    }

(5)primeConnection()方法主要是为了创建session,也是为了将客户端发送的请求添加到发送队列中,即outgoingQueue。

        void primeConnection() throws IOException {
            LOG.info(
                "Socket connection established, initiating session, client: {}, server: {}",
                clientCnxnSocket.getLocalSocketAddress(),
                clientCnxnSocket.getRemoteSocketAddress());
            isFirstConnect = false;
            // 客户端sessionId默认为0
            long sessId = (seenRwServerBefore) ? sessionId : 0;
            // 构建连接请求
            ConnectRequest conReq = new ConnectRequest(0, lastZxid, sessionTimeout, sessId, sessionPasswd);
            // We add backwards since we are pushing into the front
            // Only send if there's a pending watch
            // TODO: here we have the only remaining use of zooKeeper in
            // this class. It's to be eliminated!
            if (!clientConfig.getBoolean(ZKClientConfig.DISABLE_AUTO_WATCH_RESET)) {
                List<String> dataWatches = zooKeeper.getDataWatches();
                List<String> existWatches = zooKeeper.getExistWatches();
                List<String> childWatches = zooKeeper.getChildWatches();
                List<String> persistentWatches = zooKeeper.getPersistentWatches();
                List<String> persistentRecursiveWatches = zooKeeper.getPersistentRecursiveWatches();
                if (!dataWatches.isEmpty() || !existWatches.isEmpty() || !childWatches.isEmpty()
                        || !persistentWatches.isEmpty() || !persistentRecursiveWatches.isEmpty()) {
                    Iterator<String> dataWatchesIter = prependChroot(dataWatches).iterator();
                    Iterator<String> existWatchesIter = prependChroot(existWatches).iterator();
                    Iterator<String> childWatchesIter = prependChroot(childWatches).iterator();
                    Iterator<String> persistentWatchesIter = prependChroot(persistentWatches).iterator();
                    Iterator<String> persistentRecursiveWatchesIter = prependChroot(persistentRecursiveWatches).iterator();
                    long setWatchesLastZxid = lastZxid;

                    while (dataWatchesIter.hasNext() || existWatchesIter.hasNext() || childWatchesIter.hasNext()
                            || persistentWatchesIter.hasNext() || persistentRecursiveWatchesIter.hasNext()) {
                        List<String> dataWatchesBatch = new ArrayList<String>();
                        List<String> existWatchesBatch = new ArrayList<String>();
                        List<String> childWatchesBatch = new ArrayList<String>();
                        List<String> persistentWatchesBatch = new ArrayList<String>();
                        List<String> persistentRecursiveWatchesBatch = new ArrayList<String>();
                        int batchLength = 0;

                        // Note, we may exceed our max length by a bit when we add the last
                        // watch in the batch. This isn't ideal, but it makes the code simpler.
                        while (batchLength < SET_WATCHES_MAX_LENGTH) {
                            final String watch;
                            if (dataWatchesIter.hasNext()) {
                                watch = dataWatchesIter.next();
                                dataWatchesBatch.add(watch);
                            } else if (existWatchesIter.hasNext()) {
                                watch = existWatchesIter.next();
                                existWatchesBatch.add(watch);
                            } else if (childWatchesIter.hasNext()) {
                                watch = childWatchesIter.next();
                                childWatchesBatch.add(watch);
                            }  else if (persistentWatchesIter.hasNext()) {
                                watch = persistentWatchesIter.next();
                                persistentWatchesBatch.add(watch);
                            } else if (persistentRecursiveWatchesIter.hasNext()) {
                                watch = persistentRecursiveWatchesIter.next();
                                persistentRecursiveWatchesBatch.add(watch);
                            } else {
                                break;
                            }
                            batchLength += watch.length();
                        }

                        Record record;
                        int opcode;
                        if (persistentWatchesBatch.isEmpty() && persistentRecursiveWatchesBatch.isEmpty()) {
                            // maintain compatibility with older servers - if no persistent/recursive watchers
                            // are used, use the old version of SetWatches
                            record = new SetWatches(setWatchesLastZxid, dataWatchesBatch, existWatchesBatch, childWatchesBatch);
                            opcode = OpCode.setWatches;
                        } else {
                            record = new SetWatches2(setWatchesLastZxid, dataWatchesBatch, existWatchesBatch,
                                    childWatchesBatch, persistentWatchesBatch, persistentRecursiveWatchesBatch);
                            opcode = OpCode.setWatches2;
                        }
                        RequestHeader header = new RequestHeader(ClientCnxn.SET_WATCHES_XID, opcode);
                        Packet packet = new Packet(header, new ReplyHeader(), record, null, null);
                        outgoingQueue.addFirst(packet);
                    }
                }
            }

            for (AuthData id : authInfo) {
                outgoingQueue.addFirst(
                    new Packet(
                        new RequestHeader(ClientCnxn.AUTHPACKET_XID, OpCode.auth),
                        null,
                        new AuthPacket(0, id.scheme, id.data),
                        null,
                        null));
            }
            // 组合成网络层的Packet对象,添加到发送队列,对于ConnectRequest其requestHeader为null
            outgoingQueue.addFirst(new Packet(null, null, conReq, null, null, readOnly));
            // connectionPrimed()方法里面封装了确保读写事件都能监听
            clientCnxnSocket.connectionPrimed();
            LOG.debug("Session establishment request sent on {}", clientCnxnSocket.getRemoteSocketAddress());
        }

③ 当客户端发出的请求进入发送队列中,SendThread这个线程会开始doTransport处理将发送队列的中的请求发送到服务端。

    void doTransport(
        int waitTimeOut,
        Queue<Packet> pendingQueue,
        ClientCnxn cnxn) throws IOException, InterruptedException {
        selector.select(waitTimeOut);
        Set<SelectionKey> selected;
        synchronized (this) {
            selected = selector.selectedKeys();
        }
        // Everything below and until we get back to the select is
        // non blocking, so time is effectively a constant. That is
        // Why we just have to do this once, here
        updateNow();
        for (SelectionKey k : selected) {
            SocketChannel sc = ((SocketChannel) k.channel());
            // 如果之前连接没有立马连上,则在这里处理OP_CONNECT事件
            if ((k.readyOps() & SelectionKey.OP_CONNECT) != 0) {
                if (sc.finishConnect()) {
                    updateLastSendAndHeard();
                    updateSocketAddresses();
                    sendThread.primeConnection();
                }
                //如果可读或者可写,则处理
            } else if ((k.readyOps() & (SelectionKey.OP_READ | SelectionKey.OP_WRITE)) != 0) {
                doIO(pendingQueue, cnxn);
            }
        }
        if (sendThread.getZkState().isConnected()) {// 如果之前的连接已经连上
            // 如果在outgoingQueue中找到可发送的包,则可写
            if (findSendablePacket(outgoingQueue, sendThread.tunnelAuthInProgress()) != null) {
                enableWrite();
            }
        }
        //释放资源
        selected.clear();
    }

④假如我们从doTransport()中获取到了enableWrite()可写资源,即可将请求队列中的请求发送给服务端。SendThread会执行ClientCnxnSocketNIO中的doIO()方法。

        //如果可写
        if (sockKey.isWritable()) {
            // sendThread从发送队列中取出请求包
            Packet p = findSendablePacket(outgoingQueue, sendThread.tunnelAuthInProgress());
            // 如果请求包不为空
            if (p != null) {
                // 修改上一次发送时间
                updateLastSend();
                // 当发送请求包时,必然申请到了缓冲区资源,序列化请求包到缓冲区
                if (p.bb == null) {
                    if ((p.requestHeader != null)
                        && (p.requestHeader.getType() != OpCode.ping)
                        && (p.requestHeader.getType() != OpCode.auth)) {
                        p.requestHeader.setXid(cnxn.getXid());
                    }
                    //序列化
                    p.createBB();
                }
                //写数据
                sock.write(p.bb);
                //如果没有剩余数据,即写完,则发送成功
                if (!p.bb.hasRemaining()) {
                    //已发送的业务包+1
                    sentCount.getAndIncrement();
                    //从发送队列中删除该请求包
                    outgoingQueue.removeFirstOccurrence(p);
                    //如果是业务请求,则添加到Pending队列,方便对server端返回做相应处理,如果是其他请求,发完就丢弃
                    if (p.requestHeader != null
                        && p.requestHeader.getType() != OpCode.ping
                        && p.requestHeader.getType() != OpCode.auth) {
                        synchronized (pendingQueue) {
                            pendingQueue.add(p);
                        }
                    }
                }
            }
            //如果发送队列为空,则收回写的权限
            if (outgoingQueue.isEmpty()) {
                disableWrite();
            } else if (!initialized && p != null && !p.bb.hasRemaining()) {//如果没有写完
    
                disableWrite();
            } else {
                enableWrite();
            }
        }

⑤由于第一个请求包是ConnectRequest连接请求包,它构造的packet没有header,所以发完直接丢弃,但是SendThread还需要监听服务端的返回,以确认连上,并进行session的初始化。至于服务端是如何响应该请求的,将在下一章进行介绍。

总结

上述网络通信流程:根据ClientCnxn创建TCP连接,发出ConnectRequest请求包给服务端。

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值