Zookeeper3.7源码剖析(源码导入 启动流程 网络通讯 业务链处理)

Zookeeper3.7源码剖析

1 Zookeeper源码导入

在这里插入图片描述
Zookeeper是一个高可用的分布式数据管理和协调框架,并且能够很好的保证分布式环境中数据的一致性。在越来越多的分布式系。在越来越多的分布式系统(Hadoop、HBase、Kafka)中,Zookeeper都作为核心组件使用。
在这里插入图片描述我们当前课程主要是研究Zookeeper源码,需要将Zookeeper工程导入到IDEA中,老版的zk是通过ant进行编译的,但最新的zk(3.7)源码中已经没了 build.xml ,而多了 pom.xml ,也就是说构建方式由原先的Ant变成了Maven,源码下下来后,直接编译、运行是跑不起来的,有一些配置需要调整。

1.1 工程导入

Zookeeper各个版本源码下载地址,我们可以在该仓库下选择
不同的版本,我们选择最新版本,当前最新版本为3.7,如下图:
在这里插入图片描述
找到项目下载地址,我们选择 https 地址,并复制该地址,通过该地址把项目导入到 IDEA 中。
在这里插入图片描述
点击IDEA的 VSC>Checkout from Version Controller>GitHub ,操作如下图:
在这里插入图片描述
克隆项目到本地:
在这里插入图片描述
项目导入本地后,效果如下:
在这里插入图片描述
项目运行的时候,缺一个版本对象,创建 org.apache.zookeeper.version.Info ,代码如下:

public interface Info { 
public static final int MAJOR=3; 
public static final int MINOR=4; 
public static final int MICRO=6; 
public static final String QUALIFIER=null; 
public static final int REVISION=-1; 
public static final String REVISION_HASH = "1"; 
public static final String BUILD_DATE="2020-12-03 09:29:06"; 
}

1.2 Zookeeper源码错误解决

在 zookeeper-server 中找到 org.apache.zookeeper.server.quorum.QuorumPeerMain 并启动该类,启动前做如下配置:
在这里插入图片描述
启动的时候会会报很多错误,比如缺包、缺对象,如下几幅图:
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
为了解决上面的错误,我们需要手动引入一些包, pom.xml 引入如下依赖:

<!--引入依赖--> 
<dependency> 
<groupId>io.dropwizard.metrics</groupId> 
<artifactId>metrics-core</artifactId> 
<version>3.1.0</version> 
</dependency> 
<dependency> 
<groupId>org.xerial.snappy</groupId> 
<artifactId>snappy-java</artifactId> 
<version>1.1.7.3</version> 
</dependency> 
<dependency> 
<groupId>org.eclipse.jetty</groupId> 
<artifactId>jetty-server</artifactId> 
</dependency> 
<dependency> 
<groupId>org.eclipse.jetty</groupId> 
<artifactId>jetty-servlet</artifactId> 
</dependency> 
<dependency> 
<groupId>commons-cli</groupId> 
<artifactId>commons-cli</artifactId> 
</dependency>

1.3 Zookeeper命令

我们要想学习Zookeeper,需要先学会使用Zookeeper,它有很多丰富的命令,借助这些命令可以深入理解Zookeeper,我们启动源码中的客户端就可以使用Zookeeper相关命令。
启动客户端 org.apache.zookeeper.ZooKeeperMain ,如下图

在这里插入图片描述
启动后,日志如下:
在这里插入图片描述

1)节点列表: ls /

ls / [dubbo, zookeeper] 
ls /dubbo
[com.service.CarService]

2)查看节点状态: stat /dubbo

stat /dubbo 
cZxid = 0x3 
ctime = Thu Dec 03 09:19:29 CST 2020 
mZxid = 0x3 
mtime = Thu Dec 03 09:19:29 CST 2020 
pZxid = 0x4 
cversion = 1 
dataVersion = 0 
aclVersion = 0 
ephemeralOwner = 0x0 
dataLength = 13 
numChildren = 1

节点信息参数说明如下:

keyvalue
cZxid = 0x3节点被创建时的事物的ID
ctime = Thu Dec 03 09:19:29 CST 2020创建时间
mZxid = 0x31节点最后一次被修改时的事物的ID
mtime = Sat Mar 16 15:38:34 CST 2019最后 一次修改时间
pZxid = 0x31子节点列表最近一次被修改的事物ID
cversion = 0子节点版本号
dataVersion = 0节点被创建时的事物的ID
aclVersion = 0ACL版本号
ephemeralOwner = 0x0创建临时节点的事物ID,持久节点事物为0 dataLength = 22 数据长度,每个节点都可保存数据
numChildren = 0子节点的个数

3)创建节点: create /dubbo/code java

create /dubbo/code java 
Created /dubbo/code

其中code表示节点,java表示节点下的内容。

4)查看节点数据: get /dubbo/code

get /dubbo/code 
java

5)删除节点: delete /dubbo/code || deleteall /dubbo/code
删除没有子节点的节点: delete /dubbo/code
删除所有子节点: deleteall /dubbo/code

6)历史操作命令: history

history 
1 - ls /dubbo 
2 - ls /dubbo/code 
3 - get /dubbo/code 
4 - get /dubbo/code 
5 - create /dubbo/code java 
6 - get /dubbo/code 
7 - get /dubbo/code 
8 - delete /dubbo/code 
9 - get /dubbo/code 
10 - listquota path 
11 - history

1.4 Zookeeper分析工具

Zookeeper安装比较方便,在安装一个集群以后,查看数据却比较麻烦,下面介绍Zookeeper的数据查看工具——ZooInspector。
下载地址

下载压缩包后,解压后,我们需要运行 zookeeper-dev-ZooInspector.jar :
在这里插入图片描述
输入账号密码,就可以连接Zookeeper了,如下图:
在这里插入图片描述
连接后,Zookeeper信息如下:
在这里插入图片描述
节点操作:增加节点、修改节点、删除节点
在这里插入图片描述

2 ZK服务启动流程源码剖析

ZooKeeper 可以以 standalone 、分布式的方式部署, standalone 模式下只有一台机器作为服务器, ZooKeeper 会丧失高可用特性,分布式是使用多个机器,每台机器上部署一个 ZooKeeper 服务器,即使有服务器宕机,只要少于半数, ZooKeeper 集群依然可以正常对外提供服务,集群状态下 Zookeeper 是具备高可用特性。
我们接下来对 ZooKeeper 以 standalone 模式启动以及集群模式做一下源码分析。

2.1 ZK单机/集群启动流程

在这里插入图片描述
如上图,上图是 Zookeeper 单机/集群启动流程,每个细节所做的事情都在上图有说明,我们接下来按照流程图对源码进行分析。

2.2 ZK启动入口分析

启动入口类:QuorumPeerMain
该类是 zookeeper 单机/集群的启动入口类,是用来加载配置、启动 QuorumPeer (选举相关)线程、创建ServerCnxnFactory 等,我们可以把代码切换到该类的主方法( main )中,从该类的主方法开始分析, main 方法代码分析如下:
在这里插入图片描述
上面main方法虽然只是做了初始化配置,但调用了 initializeAndRun() 方法,initializeAndRun() 方法中会根据配置来决定启动单机Zookeeper还是集群Zookeeper,源码如下:
在这里插入图片描述
如果启动单机版,会调用 ZooKeeperServerMain.main(args); ,如果启动集群版,会调用QuorumPeerMain.runFromConfig(config); ,我们接下来对单机版启动做源码详细剖析,集群版在后
面章节中讲解选举机制时详细讲解。

2.3 ZK单机启动源码剖析

针对ZK单机启动源码方法调用链,我们已经提前做了一个方法调用关系图,我们讲解ZK单机启动源码,将和该图进行一一匹对,如下图:
在这里插入图片描述
1)单机启动入口
按照上面的源码分析,我们找到 ZooKeeperServerMain.main(args) 方法,该方法调用了ZooKeeperServerMain 的 initializeAndRun 方法,在 initializeAndRun 方法中执行初始化操作,并运行Zookeeper服务,main方法如下:
在这里插入图片描述
2)配置文件解析
initializeAndRun() 方法会注册JMX,同时解析 zoo.cfg 配置文件,并调用 runFromConfig() 方法启动Zookeeper服务,源码如下
在这里插入图片描述
3)单机启动主流程
runFromConfig 方法是单机版启动的主要方法,该方法会做如下几件事:

1:初始化各类运行指标,比如一次提交数据最大花费多长时间、批量同步数据大小等。 
2:初始化权限操作,例如IP权限、Digest权限。 
3:创建事务日志操作对象,Zookeeper中每次增加节点、修改数据、删除数据都是一次事务操作,都会记录日志。 
4:定义Jvm监控变量和常量,例如警告时间、告警阀值次数、提示阀值次数等。 
5:创建ZookeeperServer,这里只是创建,并不在ZooKeeperServerMain类中启动。 
6:启动Zookeeper的控制台管理对象AdminServer,该对象采用Jetty启动。 
7:创建ServerCnxnFactory,该对象其实是Zookeeper网络通信对象,默认使用了 
NIOServerCnxnFactory。 
8:在ServerCnxnFactory中启动ZookeeperServer服务。 
9:创建并启动ContainerManager,该对象通过Timer定时执行,清理过期的容器节点和TTL节点,执行周期为分钟。 
10:防止主线程结束,阻塞主线程

方法源码如下:
在这里插入图片描述
4)网络通信对象创建
上面方法在创建网络通信对象的时候调用了 ServerCnxnFactory.createFactory() ,该方法其实是根据系统配置创建Zookeeper通信组件,可选的有 NIOServerCnxnFactory(默认) 和 NettyServerCnxnFactory ,该方法源码如下:
在这里插入图片描述
5)单机启动
cnxnFactory.startup(zkServer); 方法其实就是启动了 ZookeeperServer ,它调用NIOServerCnxnFactory 的 startup 方法,该方法中会调用 ZookeeperServer 的 startup 方法启动服务, ZooKeeperServerMain 运行到 shutdownLatch.await(); 主线程会阻塞住,源码如下
在这里插入图片描述
启动后,日志如下
在这里插入图片描述

3 ZK网络通信源码剖析

Zookeeper 作为一个服务器,自然要与客户端进行网络通信,如何高效的与客户端进行通信,让网络 IO 不成为 ZooKeeper 的瓶颈是 ZooKeeper 急需解决的问题, ZooKeeper 中使用 ServerCnxnFactory 管理与客户端的连接,其有两个实现,一个是 NIOServerCnxnFactory ,使用Java原生 NIO 实现;一个是NettyServerCnxnFactory ,使用netty实现;使用 ServerCnxn 代表一个客户端与服务端的连接。
从单机版启动中可以发现 Zookeeper 默认通信组件为 NIOServerCnxnFactory ,他们和ServerCnxnFactory 的关系如下图:
在这里插入图片描述

3.1 NIOServerCnxnFactory工作流程

一般使用Java NIO的思路为使用1个线程组监听 OP_ACCEPT 事件,**负责处理客户端的连接;**使用1个线程组监听客户端连接的 OP_READ 和 OP_WRITE 事件,处理IO事件(netty也是这种实现方式). 但ZooKeeper并不是如此划分线程功能的, NIOServerCnxnFactory 启动时会启动四类线程:

1:accept thread:该线程接收来自客户端的连接,并将其分配给selector thread(启动一个线程)2:selector thread:该线程执行select(),由于在处理大量连接时,select()会成为性能瓶颈,因此启动多个selector thread,使用系统属性zookeeper.nio.numSelectorThreads配置该类线程数,默认 个数为 核心数/2。 
3:worker thread:该线程执行基本的套接字读写,使用系统属性zookeeper.nio.numWorkerThreads 配置该类线程数,默认为核心数∗2核心数∗2.如果该类线程数为0,则另外启动一线程进行IO处理,见下文 

worker thread介绍。 
4:connection expiration thread:若连接上的session已过期,则关闭该连接。

这四个线程在 NIOServerCnxnFactory 类上有说明,如下图:
在这里插入图片描述
ZooKeeper 中对线程需要处理的工作做了更细的拆分,解决了有大量客户端连接的情况
下, selector.select() 会成为性能瓶颈,将 selector.select() 拆分出来,交由 selector thread 处理。

3.2 NIOServerCnxnFactory源码

NIOServerCnxnFactory的源码分析我们将按照上面所介绍的4个线程实现相关分析,并实现数据操作,在程序中获取指定数据。

3.2.1 AcceptThread剖析

为了让大家更容易理解AcceptThread,我们把它的结构和方法调用关系画了一个详细的流程图,如下图:
在这里插入图片描述
在 NIOServerCnxnFactory 类中有一个AcceptThread 线程,为什么说它是一个线程?我们看下它的继承关系: AcceptThread> AbstractSelectThread > ZooKeeperThread > Thread ,该线程接收来自客户端的连接,并将其分配给selector thread(启动一个线程)。
该线程执行流程: run 执行 selector.select() ,并调用 doAccept() 接收客户端连接,因此我们可以
着重关注 doAccept() 方法,该类源码如下:
在这里插入图片描述
doAccept() 方法用于处理客户端链接,当客户端链接 Zookeeper 的时候,首先会调用该方法,调用该方法执行过程如下:

1:和当前服务建立链接。 
2:获取远程客户端计算机地址信息。 
3:判断当前链接是否超出最大限制。 
4:调整为非阻塞模式。 
5:轮询获取一个SelectorThread,将当前链接分配给该SelectorThread。 
6:将当前请求添加到该SelectorThread的acceptedQueue中,并唤醒该SelectorThread。 

在这里插入图片描述
上面代码中 addAcceptedConnection 方法如下:
在这里插入图片描述
我们把项目中的分布式案例服务启动,可以看到如下日志打印:

AcceptThread----------链接服务的IP:127.0.0.1
3.2.2 SelectorThread剖析

在这里插入图片描述
该线程的主要作用是从Socket读取数据,并封装成 workRequest ,并将 workRequest 交给workerPool 工作线程池处理,同时将acceptedQueue中未处理的链接取出,并未每个链接绑定OP_READ 读事件,并封装对应的上下文对象 NIOServerCnxn 。 SelectorThread 的run方法如下:
在这里插入图片描述
run() 方法中会调用 select() ,而 select() 中的核心调用地方是 handleIO() ,我们看名字其实就知道这里是处理客户端请求的数据,但客户端请求数据并非在 SelectorThread 线程中处理,我们接着看 handleIO() 方法。
在这里插入图片描述
handleIO() 方法会封装当前 SelectorThread 为 IOWorkRequest ,并将 IOWorkRequest 交给workerPool 来调度,而 workerPool 调度才是读数据的开始,源码如下:
在这里插入图片描述

3.2.3 WorkerThread剖析

WorkerThread相比上面的线程而言,调用关系颇为复杂,设计到了多个对象方法调用,主要用于处理IO,但并未对数据做出处理,数据处理将有业务链对象RequestProcessor处理,调用关系图如下:
在这里插入图片描述
ZooKeeper 中通过 WorkerService 管理一组 worker thread 线程,前面我们在看 SelectorThread 的时候,能够看到 workerPool 的schedule方法被执行,如下图:
在这里插入图片描述
我们跟踪 workerPool.schedule(workRequest); 可以发现它调用了WorkerService.schedule(workRequest) > WorkerService.schedule(WorkRequest, long) ,该方法创建了一个新的线程 ScheduledWorkRequest ,并启动了该线程,源码如下:
在这里插入图片描述
ScheduledWorkRequest 实现了 Runnable 接口,并在 run() 方法中调用了 IOWorkRequest 中的doWork 方法,在该方法中会调用 doIO 执行IO数据处理,源码如下:
在这里插入图片描述
IOWorkRequest 的 doWork 源码如下:
在这里插入图片描述
接下来的调用链路比较复杂,我们把核心步骤列出,在能直接看到数据读取的地方详细分析源码。上面方法调用链路: NIOServerCnxn.doIO()>readPayload()>readRequest() >ZookeeperServer.processPacket() ,最后一方法是获取核心数据的地方,我们可以修改下代码读取数据
读取数据:
在这里插入图片描述
添加测试代码如下:

//==========测试 Start=========== 
//定义接收输入流对象(输出流) 
ByteArrayOutputStream os=new ByteArrayOutputStream();
//将网络输入流读取到输出流中 
        byte[]buffer=new byte[1024];
        int len=0;
        while((len=bais.read(buffer))!=-1){
        os.write(buffer,0,len);
        }
        String result=new String(os.toByteArray(),"UTF-8");
        System.out.println("processPacket---------------读到的数据:"+result);
//==========测试 End=========== 

我们启动客户端创建一个demo节点,并添加数据为 abcdefg

create /demo abcdefg

控制台数据如下:
在这里插入图片描述
测试完成后,不要忘了将该测试注释掉。我们可以执行其他增删改查操作,可以输出
RequestHeader.type 查看操作类型,操作类型代码在 ZooDefs 中有标识,常用的操作类型如下:

int create = 1; 
int delete = 2; 
int exists = 3; 
int getData = 4; 
int setData = 5; 
int getACL = 6; 
int setACL = 7; 
int getChildren = 8; 
int sync = 9; 
int ping = 11;
2.3.4 ConnectionExpirerThread剖析

后台启动 ConnectionExpirerThread 清理线程清理过期的 session ,线程中无限循环,执行工作如下:
在这里插入图片描述

2.3 ZK通信优劣总结

Zookeeper在通信方面默认使用了NIO,并支持扩展Netty实现网络数据传输。相比传统IO,NIO在网络数据传输方面有很多明显优势:

1:传统IO在处理数据传输请求时,针对每个传输请求生成一个线程,如果IO异常,那么线程阻塞,在IO恢复后唤醒处理线程。在同时处理大量连接时,会实例化大量的线程对象。每个线程的实例化和回收都需要消耗资源,jvm需要为其分配TLAB(缓冲区),然后初始TLAB,最后绑定线程,线程结束时又需要回收TLAB,这些都 需要CPU资源。 
2:NIO使用selector来轮询IO流,内部使用poll或者epoll,以事件驱动形式来相应IO事件的处理。同一时间只需实例化很少的线程对象,通过对线程的复用来提高CPU资源的使用效率。 
3:CPU轮流为每个线程分配时间片的形式,间接的实现单物理核处理多线程。当线程越多时,每个线程分配 到的时间片越短,或者循环分配的周期越长,CPU很多时间都耗费在了线程的切换上。线程切换包含线程上个线程数据的同步(TLAB同步),同步变量同步至主存,下个线程数据的加载等等,他们都是很耗费CPU资源的。 
4:在同时处理大量连接,但活跃连接不多时,NIO的事件响应模式相比于传统IO有着极大的性能提升。NIO 还提供了FileChannel,以zero-copy的形式传输数据,相较于传统的IO,数据不需要拷贝至用户空间, 可直接由物理硬件(磁盘等)通过内核缓冲区后直接传递至网关,极大的提高了性能。 
5:NIO提供了MappedByteBuffer,其将文件直接映射到内存(这里的内存指的是虚拟内存,并不是物理内存),能极大的提高IO吞吐能力

ZK在使用NIO通信虽然大幅提升了数据传输能力,但也存在一些代码诟病问题:

1:Zookeeper通信源码部分学习成本高,需要掌握NIO和多线程 
2:多线程使用频率高,消耗资源多,但性能得到提升 
3:Zookeeper数据处理调用链路复杂,多处存在内部类,代码结构不清晰,写法比较经典

4 RequestProcessor处理请求源码剖析

zookeeper 的业务处理流程就像工作流一样,其实就是一个单链表;在 zookeeper 启动的时候,会确立各个节点的角色特性,即 leader 、 follower 和 observer ,每个角色确立后,就会初始化它的工作责任链;

4.1 RequestProcessor结构

客户端请求过来,每次执行不同事务操作的时候,Zookeeper也提供了一套业务处理流程RequestProcessor , RequestProcessor 的处理流程如下图:
在这里插入图片描述
我们来看一下 RequestProcessor 初始化流程, ZooKeeperServer.setupRequestProcessors() 方法源码如下:
在这里插入图片描述
它的创建步骤:

1:创建finalProcessor。 
2:创建syncProcessor,并将finalProcessor作为它的下一个业务链。 
3:启动syncProcessor。 
4:创建firstProcessor(PrepRequestProcessor),将syncProcessor作为firstProcessor的下一个业务链。 
5:启动firstProcessor。

syncProcessor 创建时,将 finalProcessor 作为参数传递进来源码如下:
在这里插入图片描述
firstProcessor 创建时,将 syncProcessor 作为参数传递进来源码如下:
在这里插入图片描述
PrepRequestProcessor/SyncRequestProcessor 关系图:
在这里插入图片描述
PrepRequestProcessor 和 SyncRequestProcessor 的结构一样,都是实现了 Thread 的一个线程,所以在这里初始化时便启动了这两个线程。

4.2 PrepRequestProcessor剖析

PrepRequestProcessor 是请求处理器的第1个处理器,我们把之前的请求业务处理衔接起来,一步一步分析。
ZooKeeperServer.processPacket()>submitRequest()>enqueueRequest()>RequestThrottler.submitRequest() ,我们来看下 RequestThrottler.submitRequest() 源码,它将当前请求添加到submittedRequests 队列中了,源码如下:
在这里插入图片描述
而 RequestThrottler 继承了 ZooKeeperCriticalThread > ZooKeeperThread > Thread ,也就是说当前 RequestThrottler 是个线程,我们看看它的 run 方法做了什么事,源码如下:
在这里插入图片描述
RequestThrottler 调用了 ZooKeeperServer.submitRequestNow() 方法,而该方法又调用了firstProcessor 的方法,源码如下:
在这里插入图片描述
ZooKeeperServer.submitRequestNow() 方法调用了 firstProcessor.processRequest() 方法,而这里的 firstProcessor 就是初始化业务处理链中的 PrepRequestProcessor ,也就是说三个RequestProecessor 中最先调用的是 PrepRequestProcessor 。
PrepRequestProcessor.processRequest() 方法将当前请求添加到了队列 submittedRequests 中,源码如下:
在这里插入图片描述
上面方法中并未从 submittedRequests 队列中获取请求,如何执行请求的呢,因为PrepRequestProcessor 是一个线程,因此会在 run 中执行,我们查看 run 方法源码的时候发现它调用了 pRequest() 方法, pRequest() 方法源码如下:
在这里插入图片描述
首先先执行 pRequestHelper() 方法,该方法是 PrepRequestProcessor 处理核心业务流程,主要是一些过滤操作,操作完成后,会将请求交给下一个业务链,也就是SyncRequestProcessor.processRequest() 方法处理请求。
我们来看一下 PrepRequestProcessor.pRequestHelper() 方法做了哪些事,源码如下:
在这里插入图片描述
从上面源码可以看出 PrepRequestProcessor.pRequestHelper() 方法判断了客户端操作类型,但无论哪种操作类型几乎都调用了 pRequest2Txn() 方法,我们来看看源码:

    /**
     * This method will be called inside the ProcessRequestThread, which is a
     * singleton, so there will be a single thread calling this code.
     *
     * @param type
     * @param zxid
     * @param request
     * @param record
     */
    protected void pRequest2Txn(int type, long zxid, Request request, Record record, boolean deserialize) throws KeeperException, IOException, RequestProcessorException {
        if (request.getHdr() == null) {
            request.setHdr(new TxnHeader(request.sessionId, request.cxid, zxid,
                    Time.currentWallTime(), type));
        }

        PrecalculatedDigest precalculatedDigest;
        switch (type) {
        case OpCode.create:
        case OpCode.create2:
        case OpCode.createTTL:
        case OpCode.createContainer: {
            pRequest2TxnCreate(type, request, record, deserialize);
            break;
        }
        case OpCode.deleteContainer: {
            String path = new String(request.request.array(), UTF_8);
            String parentPath = getParentPathAndValidate(path);
            ChangeRecord nodeRecord = getRecordForPath(path);
            if (nodeRecord.childCount > 0) {
                throw new KeeperException.NotEmptyException(path);
            }
            if (EphemeralType.get(nodeRecord.stat.getEphemeralOwner()) == EphemeralType.NORMAL) {
                throw new KeeperException.BadVersionException(path);
            }
            //修改快照数据记录
            ChangeRecord parentRecord = getRecordForPath(parentPath);
            //事务信息记录
            request.setTxn(new DeleteTxn(path));
            parentRecord = parentRecord.duplicate(request.getHdr().getZxid());
            parentRecord.childCount--;
            //状态数据记录
            parentRecord.stat.setPzxid(request.getHdr().getZxid());
            parentRecord.precalculatedDigest = precalculateDigest(
                    DigestOpCode.UPDATE, parentPath, parentRecord.data, parentRecord.stat);
            addChangeRecord(parentRecord);

            nodeRecord = new ChangeRecord(request.getHdr().getZxid(), path, null, -1, null);
            nodeRecord.precalculatedDigest = precalculateDigest(DigestOpCode.REMOVE, path);
            setTxnDigest(request, nodeRecord.precalculatedDigest);
            addChangeRecord(nodeRecord);
            break;
        }
        case OpCode.delete:
            zks.sessionTracker.checkSession(request.sessionId, request.getOwner());
            DeleteRequest deleteRequest = (DeleteRequest) record;
            if (deserialize) {
                ByteBufferInputStream.byteBuffer2Record(request.request, deleteRequest);
            }
            String path = deleteRequest.getPath();
            String parentPath = getParentPathAndValidate(path);
            ChangeRecord parentRecord = getRecordForPath(parentPath);
            //权限检查
            zks.checkACL(request.cnxn, parentRecord.acl, ZooDefs.Perms.DELETE, request.authInfo, path, null);
            ChangeRecord nodeRecord = getRecordForPath(path);
            checkAndIncVersion(nodeRecord.stat.getVersion(), deleteRequest.getVersion(), path);
            if (nodeRecord.childCount > 0) {
                throw new KeeperException.NotEmptyException(path);
            }
            request.setTxn(new DeleteTxn(path));
            parentRecord = parentRecord.duplicate(request.getHdr().getZxid());
            parentRecord.childCount--;
            parentRecord.stat.setPzxid(request.getHdr().getZxid());
            parentRecord.precalculatedDigest = precalculateDigest(
                    DigestOpCode.UPDATE, parentPath, parentRecord.data, parentRecord.stat);
            addChangeRecord(parentRecord);

            nodeRecord = new ChangeRecord(request.getHdr().getZxid(), path, null, -1, null);
            nodeRecord.precalculatedDigest = precalculateDigest(DigestOpCode.REMOVE, path);
            setTxnDigest(request, nodeRecord.precalculatedDigest);
            addChangeRecord(nodeRecord);
            break;
        case OpCode.setData:
            zks.sessionTracker.checkSession(request.sessionId, request.getOwner());
            SetDataRequest setDataRequest = (SetDataRequest) record;
            if (deserialize) {
                ByteBufferInputStream.byteBuffer2Record(request.request, setDataRequest);
            }
            path = setDataRequest.getPath();
            validatePath(path, request.sessionId);
            nodeRecord = getRecordForPath(path);
            //权限检查
            zks.checkACL(request.cnxn, nodeRecord.acl, ZooDefs.Perms.WRITE, request.authInfo, path, null);
            int newVersion = checkAndIncVersion(nodeRecord.stat.getVersion(), setDataRequest.getVersion(), path);
            request.setTxn(new SetDataTxn(path, setDataRequest.getData(), newVersion));
            nodeRecord = nodeRecord.duplicate(request.getHdr().getZxid());
            nodeRecord.stat.setVersion(newVersion);
            nodeRecord.stat.setMtime(request.getHdr().getTime());
            nodeRecord.stat.setMzxid(zxid);
            nodeRecord.data = setDataRequest.getData();
            nodeRecord.precalculatedDigest = precalculateDigest(
                    DigestOpCode.UPDATE, path, nodeRecord.data, nodeRecord.stat);
            setTxnDigest(request, nodeRecord.precalculatedDigest);
            addChangeRecord(nodeRecord);
            break;
        case OpCode.reconfig:
            if (!zks.isReconfigEnabled()) {
                LOG.error("Reconfig operation requested but reconfig feature is disabled.");
                throw new KeeperException.ReconfigDisabledException();
            }

            //权限检查
            if (ZooKeeperServer.skipACL) {
                LOG.warn("skipACL is set, reconfig operation will skip ACL checks!");
            }

            zks.sessionTracker.checkSession(request.sessionId, request.getOwner());
            LeaderZooKeeperServer lzks;
            try {
                lzks = (LeaderZooKeeperServer) zks;
            } catch (ClassCastException e) {
                // standalone mode - reconfiguration currently not supported
                throw new KeeperException.UnimplementedException();
            }
            QuorumVerifier lastSeenQV = lzks.self.getLastSeenQuorumVerifier();
            // check that there's no reconfig in progress
            if (lastSeenQV.getVersion() != lzks.self.getQuorumVerifier().getVersion()) {
                throw new KeeperException.ReconfigInProgress();
            }
            ReconfigRequest reconfigRequest = (ReconfigRequest) record;
            long configId = reconfigRequest.getCurConfigId();

            if (configId != -1 && configId != lzks.self.getLastSeenQuorumVerifier().getVersion()) {
                String msg = "Reconfiguration from version "
                             + configId
                             + " failed -- last seen version is "
                             + lzks.self.getLastSeenQuorumVerifier().getVersion();
                throw new KeeperException.BadVersionException(msg);
            }

            String newMembers = reconfigRequest.getNewMembers();

            if (newMembers != null) { //non-incremental membership change
                LOG.info("Non-incremental reconfig");

                // Input may be delimited by either commas or newlines so convert to common newline separated format
                newMembers = newMembers.replaceAll(",", "\n");

                try {
                    Properties props = new Properties();
                    props.load(new StringReader(newMembers));
                    request.qv = QuorumPeerConfig.parseDynamicConfig(props, lzks.self.getElectionType(), true, false);
                    request.qv.setVersion(request.getHdr().getZxid());
                } catch (IOException | ConfigException e) {
                    throw new KeeperException.BadArgumentsException(e.getMessage());
                }
            } else { //incremental change - must be a majority quorum system
                LOG.info("Incremental reconfig");

                List<String> joiningServers = null;
                String joiningServersString = reconfigRequest.getJoiningServers();
                if (joiningServersString != null) {
                    joiningServers = StringUtils.split(joiningServersString, ",");
                }

                List<String> leavingServers = null;
                String leavingServersString = reconfigRequest.getLeavingServers();
                if (leavingServersString != null) {
                    leavingServers = StringUtils.split(leavingServersString, ",");
                }

                if (!(lastSeenQV instanceof QuorumMaj)) {
                    String msg = "Incremental reconfiguration requested but last configuration seen has a non-majority quorum system";
                    LOG.warn(msg);
                    throw new KeeperException.BadArgumentsException(msg);
                }
                Map<Long, QuorumServer> nextServers = new HashMap<Long, QuorumServer>(lastSeenQV.getAllMembers());
                try {
                    if (leavingServers != null) {
                        for (String leaving : leavingServers) {
                            long sid = Long.parseLong(leaving);
                            nextServers.remove(sid);
                        }
                    }
                    if (joiningServers != null) {
                        for (String joiner : joiningServers) {
                            // joiner should have the following format: server.x = server_spec;client_spec
                            String[] parts = StringUtils.split(joiner, "=").toArray(new String[0]);
                            if (parts.length != 2) {
                                throw new KeeperException.BadArgumentsException("Wrong format of server string");
                            }
                            // extract server id x from first part of joiner: server.x
                            Long sid = Long.parseLong(parts[0].substring(parts[0].lastIndexOf('.') + 1));
                            QuorumServer qs = new QuorumServer(sid, parts[1]);
                            if (qs.clientAddr == null || qs.electionAddr == null || qs.addr == null) {
                                throw new KeeperException.BadArgumentsException("Wrong format of server string - each server should have 3 ports specified");
                            }

                            // check duplication of addresses and ports
                            for (QuorumServer nqs : nextServers.values()) {
                                if (qs.id == nqs.id) {
                                    continue;
                                }
                                qs.checkAddressDuplicate(nqs);
                            }

                            nextServers.remove(qs.id);
                            nextServers.put(qs.id, qs);
                        }
                    }
                } catch (ConfigException e) {
                    throw new KeeperException.BadArgumentsException("Reconfiguration failed");
                }
                request.qv = new QuorumMaj(nextServers);
                request.qv.setVersion(request.getHdr().getZxid());
            }
            if (QuorumPeerConfig.isStandaloneEnabled() && request.qv.getVotingMembers().size() < 2) {
                String msg = "Reconfig failed - new configuration must include at least 2 followers";
                LOG.warn(msg);
                throw new KeeperException.BadArgumentsException(msg);
            } else if (request.qv.getVotingMembers().size() < 1) {
                String msg = "Reconfig failed - new configuration must include at least 1 follower";
                LOG.warn(msg);
                throw new KeeperException.BadArgumentsException(msg);
            }

            if (!lzks.getLeader().isQuorumSynced(request.qv)) {
                String msg2 = "Reconfig failed - there must be a connected and synced quorum in new configuration";
                LOG.warn(msg2);
                throw new KeeperException.NewConfigNoQuorum();
            }

            //修改快照设置
            nodeRecord = getRecordForPath(ZooDefs.CONFIG_NODE);
            //权限检查
            zks.checkACL(request.cnxn, nodeRecord.acl, ZooDefs.Perms.WRITE, request.authInfo, null, null);
            SetDataTxn setDataTxn = new SetDataTxn(ZooDefs.CONFIG_NODE, request.qv.toString().getBytes(), -1);
            //事务数据记录
            request.setTxn(setDataTxn);
            nodeRecord = nodeRecord.duplicate(request.getHdr().getZxid());
            //版本信息、修改时间、修改的事务id记录
            nodeRecord.stat.setVersion(-1);
            nodeRecord.stat.setMtime(request.getHdr().getTime());
            nodeRecord.stat.setMzxid(zxid);
            nodeRecord.data = setDataTxn.getData();
            // Reconfig is currently a noop from digest computation
            // perspective since config node is not covered by the digests.
            nodeRecord.precalculatedDigest = precalculateDigest(
                    DigestOpCode.NOOP, ZooDefs.CONFIG_NODE, nodeRecord.data, nodeRecord.stat);
            setTxnDigest(request, nodeRecord.precalculatedDigest);
            addChangeRecord(nodeRecord);

            break;
        case OpCode.setACL:
            zks.sessionTracker.checkSession(request.sessionId, request.getOwner());
            SetACLRequest setAclRequest = (SetACLRequest) record;
            if (deserialize) {
                ByteBufferInputStream.byteBuffer2Record(request.request, setAclRequest);
            }
            path = setAclRequest.getPath();
            validatePath(path, request.sessionId);
            List<ACL> listACL = fixupACL(path, request.authInfo, setAclRequest.getAcl());
            nodeRecord = getRecordForPath(path);
            zks.checkACL(request.cnxn, nodeRecord.acl, ZooDefs.Perms.ADMIN, request.authInfo, path, listACL);
            newVersion = checkAndIncVersion(nodeRecord.stat.getAversion(), setAclRequest.getVersion(), path);
            request.setTxn(new SetACLTxn(path, listACL, newVersion));
            nodeRecord = nodeRecord.duplicate(request.getHdr().getZxid());
            nodeRecord.stat.setAversion(newVersion);
            nodeRecord.precalculatedDigest = precalculateDigest(
                    DigestOpCode.UPDATE, path, nodeRecord.data, nodeRecord.stat);
            setTxnDigest(request, nodeRecord.precalculatedDigest);
            addChangeRecord(nodeRecord);
            break;
        case OpCode.createSession:
            request.request.rewind();
            int to = request.request.getInt();
            request.setTxn(new CreateSessionTxn(to));
            request.request.rewind();
            // only add the global session tracker but not to ZKDb
            zks.sessionTracker.trackSession(request.sessionId, to);
            zks.setOwner(request.sessionId, request.getOwner());
            break;
        case OpCode.closeSession:
            // We don't want to do this check since the session expiration thread
            // queues up this operation without being the session owner.
            // this request is the last of the session so it should be ok
            //zks.sessionTracker.checkSession(request.sessionId, request.getOwner());
            long startTime = Time.currentElapsedTime();
            synchronized (zks.outstandingChanges) {
                // need to move getEphemerals into zks.outstandingChanges
                // synchronized block, otherwise there will be a race
                // condition with the on flying deleteNode txn, and we'll
                // delete the node again here, which is not correct
                Set<String> es = zks.getZKDatabase().getEphemerals(request.sessionId);
                for (ChangeRecord c : zks.outstandingChanges) {
                    if (c.stat == null) {
                        // Doing a delete
                        es.remove(c.path);
                    } else if (c.stat.getEphemeralOwner() == request.sessionId) {
                        es.add(c.path);
                    }
                }
                for (String path2Delete : es) {
                    if (digestEnabled) {
                        parentPath = getParentPathAndValidate(path2Delete);
                        parentRecord = getRecordForPath(parentPath);
                        parentRecord = parentRecord.duplicate(request.getHdr().getZxid());
                        parentRecord.stat.setPzxid(request.getHdr().getZxid());
                        parentRecord.precalculatedDigest = precalculateDigest(
                                DigestOpCode.UPDATE, parentPath, parentRecord.data, parentRecord.stat);
                        addChangeRecord(parentRecord);
                    }
                    nodeRecord = new ChangeRecord(
                            request.getHdr().getZxid(), path2Delete, null, 0, null);
                    nodeRecord.precalculatedDigest = precalculateDigest(
                            DigestOpCode.REMOVE, path2Delete);
                    addChangeRecord(nodeRecord);
                }
                if (ZooKeeperServer.isCloseSessionTxnEnabled()) {
                    request.setTxn(new CloseSessionTxn(new ArrayList<String>(es)));
                }
                zks.sessionTracker.setSessionClosing(request.sessionId);
            }
            ServerMetrics.getMetrics().CLOSE_SESSION_PREP_TIME.add(Time.currentElapsedTime() - startTime);
            break;
        case OpCode.check:
            zks.sessionTracker.checkSession(request.sessionId, request.getOwner());
            CheckVersionRequest checkVersionRequest = (CheckVersionRequest) record;
            if (deserialize) {
                ByteBufferInputStream.byteBuffer2Record(request.request, checkVersionRequest);
            }
            path = checkVersionRequest.getPath();
            validatePath(path, request.sessionId);
            nodeRecord = getRecordForPath(path);
            zks.checkACL(request.cnxn, nodeRecord.acl, ZooDefs.Perms.READ, request.authInfo, path, null);
            request.setTxn(new CheckVersionTxn(
                path,
                checkAndIncVersion(nodeRecord.stat.getVersion(), checkVersionRequest.getVersion(), path)));
            break;
        default:
            LOG.warn("unknown type {}", type);
            break;
        }

        // If the txn is not going to mutate anything, like createSession,
        // we just set the current tree digest in it
        if (request.getTxnDigest() == null && digestEnabled) {
            setTxnDigest(request);
        }
    }

从上面代码可以看出 pRequest2Txn() 方法主要做了权限校验、快照记录、事务信息记录相关的事,还并未涉及数据处理,也就是说 PrepRequestProcessor 其实是做了操作前权限校验、快照记录、事务信息记录相关的事。
我们DEBUG调试一次,看看业务处理流程是否和我们上面所分析的一致。
添加节点

create /zkdemo itheima

DEBUG测试如下:
客户端请求先经过 ZooKeeperServer.submitRequestNow() 方法,并调用firstProcessor.processRequest() 方法,而 firstProcessor = PrepRequestProcessor ,如下图:
在这里插入图片描述
进入 PrepRequestProcessor.pRequest() 方法,执行完 pRequestHelper() 方法后,开始执行下一个业务链的方法,而下一个业务链 nextProcessor = SyncRequestProcessor ,如下测试图:
在这里插入图片描述

4.3 SyncRequestProcessor剖析

分析了 PrepRequestProcessor 处理器后,接着来分析 SyncRequestProcessor ,该处理器主要是将请求数据高效率存入磁盘,并且请求在写入磁盘之前是不会被转发到下个处理器的。
我们先看请求被添加到队列的方法:
在这里插入图片描述
同样 SyncRequestProcessor 是一个线程,执行队列中的请求也在线程中触发,我们看它的run方法,
源码如下:

  @Override
    public void run() {
        try {
            // we do this in an attempt to ensure that not all of the servers
            // in the ensemble take a snapshot at the same time
            //避免所有的server都同时进行snapshot,重置是否需要snapshot判断相关的统计
            resetSnapshotStats();
            lastFlushTime = Time.currentElapsedTime();
            while (true) {
                ServerMetrics.getMetrics().SYNC_PROCESSOR_QUEUE_SIZE.add(queuedRequests.size());

                long pollTime = Math.min(zks.getMaxWriteQueuePollTime(), getRemainingDelay());
                //获取一个需要处理的请求
                Request si = queuedRequests.poll(pollTime, TimeUnit.MILLISECONDS);
                if (si == null) {
                    /* We timed out looking for more writes to batch, go ahead and flush immediately */
                    flush();
                    //阻塞方法获取一个请求
                    si = queuedRequests.take();
                }

                if (si == REQUEST_OF_DEATH) {
                    break;
                }

                long startProcessTime = Time.currentElapsedTime();
                ServerMetrics.getMetrics().SYNC_PROCESSOR_QUEUE_TIME.add(startProcessTime - si.syncQueueStartTime);

                // 跟踪写入日志的记录数量
                if (!si.isThrottled() && zks.getZKDatabase().append(si)) {
                    //shouldSnapshot():保存快照条件:根据logCount日志数量和logSize日志大小与snapCount快照的比较,具有随机性。
                    if (shouldSnapshot()) {
                        //重置是否需要snapshot判断相关的统计
                        resetSnapshotStats();
                        //重置自上次rollLog以来的txn数量
                        zks.getZKDatabase().rollLog();
                        // take a snapshot
                        if (!snapThreadMutex.tryAcquire()) {
                            LOG.warn("Too busy to snap, skipping");
                        } else {
                            new ZooKeeperThread("Snapshot Thread") {
                                //创建线程保存快照数据
                                public void run() {
                                    try {
                                        //保存快照数据
                                        zks.takeSnapshot();
                                    } catch (Exception e) {
                                        LOG.warn("Unexpected exception", e);
                                    } finally {
                                        snapThreadMutex.release();
                                    }
                                }
                            }.start();
                        }
                    }
                } else if (toFlush.isEmpty()) {
                    // optimization for read heavy workloads
                    // iff this is a read or a throttled request(which doesn't need to be written to the disk),
                    // and there are no pending flushes (writes), then just pass this to the next processor
                    if (nextProcessor != null) {
                        nextProcessor.processRequest(si);
                        if (nextProcessor instanceof Flushable) {
                            ((Flushable) nextProcessor).flush();
                        }
                    }
                    continue;
                }
                //将当前请求添加到toFlush队列中,toFlush队列是已经写入并等待刷新到磁盘的事务
                toFlush.add(si);
                if (shouldFlush()) {
                    //提交数据
                    flush();
                }
                ServerMetrics.getMetrics().SYNC_PROCESS_TIME.add(Time.currentElapsedTime() - startProcessTime);
            }
        } catch (Throwable t) {
            handleException(this.getName(), t);
        }
        LOG.info("SyncRequestProcessor exited!");
    }

run 方法会从 queuedRequests 队列中获取一个请求,如果获取不到就会阻塞等待直到获取到一个请求对象,程序才会继续往下执行,接下来会调用 Snapshot Thread 线程实现将客户端发送的数据以快照的方式写入磁盘,最终调用 flush() 方法实现数据提交, flush() 方法源码如下
在这里插入图片描述
flush() 方法实现了数据提交,并且会将请求交给下一个业务链,下一个业务链为FinalRequestProcessor 。

4.4 FinalRequestProcessor剖析

前面分析了 SyncReqeustProcessor ,接着分析请求处理链中最后的一个处理器FinalRequestProcessor ,该业务处理对象主要用于返回Response

  public void processRequest(Request request) {
        LOG.debug("Processing request:: {}", request);

        if (LOG.isTraceEnabled()) {
            long traceMask = ZooTrace.CLIENT_REQUEST_TRACE_MASK;
            if (request.type == OpCode.ping) {
                traceMask = ZooTrace.SERVER_PING_TRACE_MASK;
            }
            ZooTrace.logRequest(LOG, traceMask, 'E', request, "");
        }
        ProcessTxnResult rc = null;
        if (!request.isThrottled()) {
          rc = applyRequest(request);
        }
        if (request.cnxn == null) {
            return;
        }
        ServerCnxn cnxn = request.cnxn;

        long lastZxid = zks.getZKDatabase().getDataTreeLastProcessedZxid();

        String lastOp = "NA";
        // Notify ZooKeeperServer that the request has finished so that it can
        // update any request accounting/throttling limits
        zks.decInProcess();
        zks.requestFinished(request);
        Code err = Code.OK;
        Record rsp = null;
        String path = null;
        int responseSize = 0;
        try {
            if (request.getHdr() != null && request.getHdr().getType() == OpCode.error) {
                AuditHelper.addAuditLog(request, rc, true);
                /*
                 * When local session upgrading is disabled, leader will
                 * reject the ephemeral node creation due to session expire.
                 * However, if this is the follower that issue the request,
                 * it will have the correct error code, so we should use that
                 * and report to user
                 */
                if (request.getException() != null) {
                    throw request.getException();
                } else {
                    throw KeeperException.create(KeeperException.Code.get(((ErrorTxn) request.getTxn()).getErr()));
                }
            }

            KeeperException ke = request.getException();
            if (ke instanceof SessionMovedException) {
                throw ke;
            }
            if (ke != null && request.type != OpCode.multi) {
                throw ke;
            }

            LOG.debug("{}", request);

            if (request.isStale()) {
                ServerMetrics.getMetrics().STALE_REPLIES.add(1);
            }

            if (request.isThrottled()) {
              throw KeeperException.create(Code.THROTTLEDOP);
            }

            AuditHelper.addAuditLog(request, rc);

            switch (request.type) {
            case OpCode.ping: {
                lastOp = "PING";
                //设置响应状态
                updateStats(request, lastOp, lastZxid);
                //响应数据
                responseSize = cnxn.sendResponse(new ReplyHeader(ClientCnxn.PING_XID, lastZxid, 0), null, "response");
                return;
            }
            case OpCode.createSession: {
                lastOp = "SESS";
                //设置响应状态
                updateStats(request, lastOp, lastZxid);
                //响应数据并结束会话
                zks.finishSessionInit(request.cnxn, true);
                return;
            }
            case OpCode.multi: {
                // 多重操作
                lastOp = "MULT";
                rsp = new MultiResponse();

                // 遍历多重操作结果
                for (ProcessTxnResult subTxnResult : rc.multiResult) {

                    OpResult subResult;

                    switch (subTxnResult.type) {
                    case OpCode.check:
                        // 检查
                        subResult = new CheckResult();
                        break;
                    case OpCode.create:
                        // 创建
                        subResult = new CreateResult(subTxnResult.path);
                        break;
                    case OpCode.create2:
                    case OpCode.createTTL:
                    case OpCode.createContainer:
                        subResult = new CreateResult(subTxnResult.path, subTxnResult.stat);
                        break;
                    case OpCode.delete:
                    case OpCode.deleteContainer:
                        // 删除
                        subResult = new DeleteResult();
                        break;
                    case OpCode.setData:
                        // 设置数据
                        subResult = new SetDataResult(subTxnResult.stat);
                        break;
                    case OpCode.error:
                        // 错误
                        subResult = new ErrorResult(subTxnResult.err);
                        if (subTxnResult.err == Code.SESSIONMOVED.intValue()) {
                            throw new SessionMovedException();
                        }
                        break;
                    default:
                        throw new IOException("Invalid type of op");
                    }
                    // 添加至响应结果集中
                    ((MultiResponse) rsp).add(subResult);
                }

                break;
            }
            case OpCode.multiRead: {
                //多重读取操作,同上
                lastOp = "MLTR";
                MultiOperationRecord multiReadRecord = new MultiOperationRecord();
                ByteBufferInputStream.byteBuffer2Record(request.request, multiReadRecord);
                rsp = new MultiResponse();
                OpResult subResult;
                for (Op readOp : multiReadRecord) {
                    try {
                        Record rec;
                        switch (readOp.getType()) {
                        case OpCode.getChildren:
                            rec = handleGetChildrenRequest(readOp.toRequestRecord(), cnxn, request.authInfo);
                            subResult = new GetChildrenResult(((GetChildrenResponse) rec).getChildren());
                            break;
                        case OpCode.getData:
                            rec = handleGetDataRequest(readOp.toRequestRecord(), cnxn, request.authInfo);
                            GetDataResponse gdr = (GetDataResponse) rec;
                            subResult = new GetDataResult(gdr.getData(), gdr.getStat());
                            break;
                        default:
                            throw new IOException("Invalid type of readOp");
                        }
                    } catch (KeeperException e) {
                        subResult = new ErrorResult(e.code().intValue());
                    }
                    // 添加至响应结果集中
                    ((MultiResponse) rsp).add(subResult);
                }
                break;
            }
            case OpCode.create: {
                lastOp = "CREA";
                rsp = new CreateResponse(rc.path);
                err = Code.get(rc.err);
                requestPathMetricsCollector.registerRequest(request.type, rc.path);
                break;
            }
            case OpCode.create2:
            case OpCode.createTTL:
            case OpCode.createContainer: {
                lastOp = "CREA";
                rsp = new Create2Response(rc.path, rc.stat);
                err = Code.get(rc.err);
                requestPathMetricsCollector.registerRequest(request.type, rc.path);
                break;
            }
            case OpCode.delete:
            case OpCode.deleteContainer: {
                lastOp = "DELE";
                err = Code.get(rc.err);
                requestPathMetricsCollector.registerRequest(request.type, rc.path);
                break;
            }
            case OpCode.setData: {
                lastOp = "SETD";
                rsp = new SetDataResponse(rc.stat);
                err = Code.get(rc.err);
                requestPathMetricsCollector.registerRequest(request.type, rc.path);
                break;
            }
            case OpCode.reconfig: {
                lastOp = "RECO";
                rsp = new GetDataResponse(
                    ((QuorumZooKeeperServer) zks).self.getQuorumVerifier().toString().getBytes(UTF_8),
                    rc.stat);
                err = Code.get(rc.err);
                break;
            }
            case OpCode.setACL: {
                lastOp = "SETA";
                rsp = new SetACLResponse(rc.stat);
                err = Code.get(rc.err);
                requestPathMetricsCollector.registerRequest(request.type, rc.path);
                break;
            }
            case OpCode.closeSession: {
                lastOp = "CLOS";
                err = Code.get(rc.err);
                break;
            }
            case OpCode.sync: {
                lastOp = "SYNC";
                SyncRequest syncRequest = new SyncRequest();
                ByteBufferInputStream.byteBuffer2Record(request.request, syncRequest);
                rsp = new SyncResponse(syncRequest.getPath());
                requestPathMetricsCollector.registerRequest(request.type, syncRequest.getPath());
                break;
            }
            case OpCode.check: {
                lastOp = "CHEC";
                rsp = new SetDataResponse(rc.stat);
                err = Code.get(rc.err);
                break;
            }
            case OpCode.exists: {
                lastOp = "EXIS";
                // TODO we need to figure out the security requirement for this!
                ExistsRequest existsRequest = new ExistsRequest();
                ByteBufferInputStream.byteBuffer2Record(request.request, existsRequest);
                path = existsRequest.getPath();
                if (path.indexOf('\0') != -1) {
                    throw new KeeperException.BadArgumentsException();
                }
                Stat stat = zks.getZKDatabase().statNode(path, existsRequest.getWatch() ? cnxn : null);
                rsp = new ExistsResponse(stat);
                requestPathMetricsCollector.registerRequest(request.type, path);
                break;
            }
            case OpCode.getData: {
                lastOp = "GETD";
                GetDataRequest getDataRequest = new GetDataRequest();
                ByteBufferInputStream.byteBuffer2Record(request.request, getDataRequest);
                path = getDataRequest.getPath();
                rsp = handleGetDataRequest(getDataRequest, cnxn, request.authInfo);
                requestPathMetricsCollector.registerRequest(request.type, path);
                break;
            }
            case OpCode.setWatches: {
                lastOp = "SETW";
                SetWatches setWatches = new SetWatches();
                // TODO we really should not need this
                request.request.rewind();
                ByteBufferInputStream.byteBuffer2Record(request.request, setWatches);
                long relativeZxid = setWatches.getRelativeZxid();
                zks.getZKDatabase()
                   .setWatches(
                       relativeZxid,
                       setWatches.getDataWatches(),
                       setWatches.getExistWatches(),
                       setWatches.getChildWatches(),
                       Collections.emptyList(),
                       Collections.emptyList(),
                       cnxn);
                break;
            }
            case OpCode.setWatches2: {
                lastOp = "STW2";
                SetWatches2 setWatches = new SetWatches2();
                // TODO we really should not need this
                request.request.rewind();
                ByteBufferInputStream.byteBuffer2Record(request.request, setWatches);
                long relativeZxid = setWatches.getRelativeZxid();
                zks.getZKDatabase().setWatches(relativeZxid,
                        setWatches.getDataWatches(),
                        setWatches.getExistWatches(),
                        setWatches.getChildWatches(),
                        setWatches.getPersistentWatches(),
                        setWatches.getPersistentRecursiveWatches(),
                        cnxn);
                break;
            }
            case OpCode.addWatch: {
                lastOp = "ADDW";
                AddWatchRequest addWatcherRequest = new AddWatchRequest();
                ByteBufferInputStream.byteBuffer2Record(request.request,
                        addWatcherRequest);
                zks.getZKDatabase().addWatch(addWatcherRequest.getPath(), cnxn, addWatcherRequest.getMode());
                rsp = new ErrorResponse(0);
                break;
            }
            case OpCode.getACL: {
                lastOp = "GETA";
                GetACLRequest getACLRequest = new GetACLRequest();
                ByteBufferInputStream.byteBuffer2Record(request.request, getACLRequest);
                path = getACLRequest.getPath();
                DataNode n = zks.getZKDatabase().getNode(path);
                if (n == null) {
                    throw new KeeperException.NoNodeException();
                }
                zks.checkACL(
                    request.cnxn,
                    zks.getZKDatabase().aclForNode(n),
                    ZooDefs.Perms.READ | ZooDefs.Perms.ADMIN, request.authInfo, path,
                    null);

                Stat stat = new Stat();
                List<ACL> acl = zks.getZKDatabase().getACL(path, stat);
                requestPathMetricsCollector.registerRequest(request.type, getACLRequest.getPath());

                try {
                    zks.checkACL(
                        request.cnxn,
                        zks.getZKDatabase().aclForNode(n),
                        ZooDefs.Perms.ADMIN,
                        request.authInfo,
                        path,
                        null);
                    rsp = new GetACLResponse(acl, stat);
                } catch (KeeperException.NoAuthException e) {
                    List<ACL> acl1 = new ArrayList<ACL>(acl.size());
                    for (ACL a : acl) {
                        if ("digest".equals(a.getId().getScheme())) {
                            Id id = a.getId();
                            Id id1 = new Id(id.getScheme(), id.getId().replaceAll(":.*", ":x"));
                            acl1.add(new ACL(a.getPerms(), id1));
                        } else {
                            acl1.add(a);
                        }
                    }
                    rsp = new GetACLResponse(acl1, stat);
                }
                break;
            }
            case OpCode.getChildren: {
                lastOp = "GETC";
                GetChildrenRequest getChildrenRequest = new GetChildrenRequest();
                ByteBufferInputStream.byteBuffer2Record(request.request, getChildrenRequest);
                path = getChildrenRequest.getPath();
                rsp = handleGetChildrenRequest(getChildrenRequest, cnxn, request.authInfo);
                requestPathMetricsCollector.registerRequest(request.type, path);
                break;
            }
            case OpCode.getAllChildrenNumber: {
                lastOp = "GETACN";
                GetAllChildrenNumberRequest getAllChildrenNumberRequest = new GetAllChildrenNumberRequest();
                ByteBufferInputStream.byteBuffer2Record(request.request, getAllChildrenNumberRequest);
                path = getAllChildrenNumberRequest.getPath();
                DataNode n = zks.getZKDatabase().getNode(path);
                if (n == null) {
                    throw new KeeperException.NoNodeException();
                }
                zks.checkACL(
                    request.cnxn,
                    zks.getZKDatabase().aclForNode(n),
                    ZooDefs.Perms.READ,
                    request.authInfo,
                    path,
                    null);
                int number = zks.getZKDatabase().getAllChildrenNumber(path);
                rsp = new GetAllChildrenNumberResponse(number);
                break;
            }
            case OpCode.getChildren2: {
                lastOp = "GETC";
                GetChildren2Request getChildren2Request = new GetChildren2Request();
                ByteBufferInputStream.byteBuffer2Record(request.request, getChildren2Request);
                Stat stat = new Stat();
                path = getChildren2Request.getPath();
                DataNode n = zks.getZKDatabase().getNode(path);
                if (n == null) {
                    throw new KeeperException.NoNodeException();
                }
                zks.checkACL(
                    request.cnxn,
                    zks.getZKDatabase().aclForNode(n),
                    ZooDefs.Perms.READ,
                    request.authInfo, path,
                    null);
                List<String> children = zks.getZKDatabase()
                                           .getChildren(path, stat, getChildren2Request.getWatch() ? cnxn : null);
                rsp = new GetChildren2Response(children, stat);
                requestPathMetricsCollector.registerRequest(request.type, path);
                break;
            }
            case OpCode.checkWatches: {
                lastOp = "CHKW";
                CheckWatchesRequest checkWatches = new CheckWatchesRequest();
                ByteBufferInputStream.byteBuffer2Record(request.request, checkWatches);
                WatcherType type = WatcherType.fromInt(checkWatches.getType());
                path = checkWatches.getPath();
                boolean containsWatcher = zks.getZKDatabase().containsWatcher(path, type, cnxn);
                if (!containsWatcher) {
                    String msg = String.format(Locale.ENGLISH, "%s (type: %s)", path, type);
                    throw new KeeperException.NoWatcherException(msg);
                }
                requestPathMetricsCollector.registerRequest(request.type, checkWatches.getPath());
                break;
            }
            case OpCode.removeWatches: {
                lastOp = "REMW";
                RemoveWatchesRequest removeWatches = new RemoveWatchesRequest();
                ByteBufferInputStream.byteBuffer2Record(request.request, removeWatches);
                WatcherType type = WatcherType.fromInt(removeWatches.getType());
                path = removeWatches.getPath();
                boolean removed = zks.getZKDatabase().removeWatch(path, type, cnxn);
                if (!removed) {
                    String msg = String.format(Locale.ENGLISH, "%s (type: %s)", path, type);
                    throw new KeeperException.NoWatcherException(msg);
                }
                requestPathMetricsCollector.registerRequest(request.type, removeWatches.getPath());
                break;
            }
            case OpCode.whoAmI: {
                lastOp = "HOMI";
                rsp = new WhoAmIResponse(AuthUtil.getClientInfos(request.authInfo));
                break;
             }
            case OpCode.getEphemerals: {
                lastOp = "GETE";
                GetEphemeralsRequest getEphemerals = new GetEphemeralsRequest();
                ByteBufferInputStream.byteBuffer2Record(request.request, getEphemerals);
                String prefixPath = getEphemerals.getPrefixPath();
                Set<String> allEphems = zks.getZKDatabase().getDataTree().getEphemerals(request.sessionId);
                List<String> ephemerals = new ArrayList<>();
                if (prefixPath == null || prefixPath.trim().isEmpty() || "/".equals(prefixPath.trim())) {
                    ephemerals.addAll(allEphems);
                } else {
                    for (String p : allEphems) {
                        if (p.startsWith(prefixPath)) {
                            ephemerals.add(p);
                        }
                    }
                }
                rsp = new GetEphemeralsResponse(ephemerals);
                break;
            }
            }
        } catch (SessionMovedException e) {
            // session moved is a connection level error, we need to tear
            // down the connection otw ZOOKEEPER-710 might happen
            // ie client on slow follower starts to renew session, fails
            // before this completes, then tries the fast follower (leader)
            // and is successful, however the initial renew is then
            // successfully fwd/processed by the leader and as a result
            // the client and leader disagree on where the client is most
            // recently attached (and therefore invalid SESSION MOVED generated)
            cnxn.sendCloseSession();
            return;
        } catch (KeeperException e) {
            err = e.code();
        } catch (Exception e) {
            // log at error level as we are returning a marshalling
            // error to the user
            LOG.error("Failed to process {}", request, e);
            StringBuilder sb = new StringBuilder();
            ByteBuffer bb = request.request;
            bb.rewind();
            while (bb.hasRemaining()) {
                sb.append(Integer.toHexString(bb.get() & 0xff));
            }
            LOG.error("Dumping request buffer: 0x{}", sb.toString());
            err = Code.MARSHALLINGERROR;
        }

        ReplyHeader hdr = new ReplyHeader(request.cxid, lastZxid, err.intValue());

        updateStats(request, lastOp, lastZxid);

        try {
            if (path == null || rsp == null) {
                responseSize = cnxn.sendResponse(hdr, rsp, "response");
            } else {
                int opCode = request.type;
                Stat stat = null;
                // Serialized read and get children responses could be cached by the connection
                // object. Cache entries are identified by their path and last modified zxid,
                // so these values are passed along with the response.
                switch (opCode) {
                    case OpCode.getData : {
                        GetDataResponse getDataResponse = (GetDataResponse) rsp;
                        stat = getDataResponse.getStat();
                        responseSize = cnxn.sendResponse(hdr, rsp, "response", path, stat, opCode);
                        break;
                    }
                    case OpCode.getChildren2 : {
                        GetChildren2Response getChildren2Response = (GetChildren2Response) rsp;
                        stat = getChildren2Response.getStat();
                        responseSize = cnxn.sendResponse(hdr, rsp, "response", path, stat, opCode);
                        break;
                    }
                    default:
                        responseSize = cnxn.sendResponse(hdr, rsp, "response");
                }
            }

            if (request.type == OpCode.closeSession) {
                cnxn.sendCloseSession();
            }
        } catch (IOException e) {
            LOG.error("FIXMSG", e);
        } finally {
            ServerMetrics.getMetrics().RESPONSE_BYTES.add(responseSize);
        }
    }

4.5 ZK业务链处理优劣总结

Zookeeper业务链处理,思想遵循了AOP思想,但并未采用相关技术,为了提升效率,仍然大幅使用到了多线程。正因为有了业务链路处理先后顺序,使得Zookeeper业务处理流程更清晰更容易理解,但大量混入了多线程,也使得学习成本增加。

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值