一)如何保证对节点数据操作的原子性
在前面的博文中说过,每个数据节点都有其自身属性信息,其中包括三个重要的版本信息
dataVersion | 代表当前znode数据内容的版本,内容修改一次,版本号+1,默认为0 |
cversion | znode子节点数量变化的版本,不包含子内容变化 |
aclVersion | 当前znode的ACL更改的版本号 |
当操作节点的相关属性时,若客户端限制了基于版本号,则ZK在内部会先检查版本号是否相等,值得注意的是如果指定版本号为-1,则表示客户端不要求使用版本号比对。其实版本号就是平时常用的乐观锁效果。
[zk: 192.168.207.128:2181(CONNECTED) 4] create /zookeeper/test content1
Created /zookeeper/test
[zk: 192.168.207.128:2181(CONNECTED) 5] get /zookeeper/test
content1
cZxid = 0x2
ctime = Wed Mar 04 11:55:26 CST 2019
mZxid = 0x2
mtime = Wed Mar 04 11:55:26 CST 2019
pZxid = 0x2
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 8
numChildren = 0
[zk: 192.168.207.128:2181(CONNECTED) 4] set /zookeeper/test contest2 1
WATCHER::
WatchedEvent state:SyncConnected type:NodeDataChanged path:/zookeeper/test
......
[zk: 192.168.207.128:2181(CONNECTED) 5] set /zookeeper/test contest2 -1
cZxid = 0x2
......
二)Watcher机制实现原理,及相关源码解析
关于Watcher的事件状态及事件类型,从Watcher接口的源码中可以很清晰的看出,该接口中定义了相关通知状态、及事件类型
public interface Watcher {
public interface Event {
public enum KeeperState {
@Deprecated
Unknown (-1),
Disconnected (0), //连接断开状态
@Deprecated
NoSyncConnected (1),
SyncConnected (3), //连接状态
Expired (-112); //会话失效状态
......
}
}
public enum EventType {
None (-1), //在不同状态下,代表不同含义。比如可以表示为连接建立成功、断开
NodeCreated (1), //节点创建成功
NodeDeleted (2), //节点被删除
NodeDataChanged (3),//节点数据内容发生变化
NodeChildrenChanged (4); //节点子节点列表数量发生变化
......
}
}
abstract public void process(WatchedEvent event); //回调处理的抽象
}
Watcher注册方式及注册流程
通过ZK原生的客户端,注册事件有四种方式:
new ZooKeeper() | public ZooKeeper(String connectString, int sessionTimeout, Watcher watcher) |
zk.exists(...) | public Stat exists(final String path, Watcher watcher) //同步 public void exists(String path, boolean watch, StatCallback cb, Object ctx) //异步 |
zk.getChildren(...) | public List<String> getChildren(final String path, Watcher watcher) //同步 public void getChildren(final String path, Watcher watcher, ChildrenCallback cb, Object ctx) //异步 |
zk.getData(...) | public byte[] getData(final String path, Watcher watcher, Stat stat) //同步 public void getData(final String path, Watcher watcher,DataCallback cb, Object ctx) //异步 |
这里以exists方式为例,简单说明及贴一下相关代码,其余的注册流程一样。在客户端的exists方法中,会将路径及其注册的Watcher信息封装成一个WatchRegistration对象,该对象又和其它信息被包装成一个Packet对象(Packet是C/S间用来通信的对象,任何传输的对象都会转换为Packet对象),并同步控制将Packet加入一个LinkedList<Packet> outgoingQueue链表中等待SendThread发送, 由于SendThread的循环检测条件是zooKeeper.state.isAlive(),所以当链表中添加了Packet时,通常SendThread能及时发现,从而将其发送到服务端,但注意:虽然WatchRegistration被封装在Packet中,但并被没有被序列化到底层数组中,所以不会进行网络传输,这里只序列化了requestHeader。
请求发送完成后,同样由SendThread中的readResponse()负责接收来自服务端的响应,并最终在其finishPacket中将Watch交由ZkWatchManager管理。
部分代码如下,从这部分代码也可以看出,为什么说会有严格的顺序控制?原因在于客户端的所有请求都会先入队,而入队过程进行了加锁同步控制,所以从同一客户端发起的事务请求,将会严格按照发起顺序执行。
public class ZooKeeper {
...
//Zookeeper中维护ZKWatcherManager对象,管理Watcher
private final ZKWatchManager watchManager = new ZKWatchManager();
private static class ZKWatchManager implements ClientWatchManager {
private final Map<String, Set<Watcher>> dataWatches = new HashMap<String,Set<Watcher>>();
private final Map<String, Set<Watcher>> existWatches = new HashMap<String, Set<Watcher>>();
private final Map<String, Set<Watcher>> childWatches = new HashMap<String, Set<Watcher>>();
...
}
// 创建Zookeeper对象,默认启动cnxn的SendThreade及EventThread
public ZooKeeper(String connectString, int sessionTimeout, Watcher watcher,
long sessionId, byte[] sessionPasswd)
throws IOException{
...
watchManager.defaultWatcher = watcher;
cnxn = new ClientCnxn(connectString, sessionTimeout, this, watchManager,
sessionId, sessionPasswd);
cnxn.start(); //启动其SendThreade及EventThread线程
}
public Stat exists(final String path, Watcher watcher)
throws KeeperException, InterruptedException {
...
//WatchRegistration注册器包封装了watcher对象,并最终把WR对象封装到Package对象中。
WatchRegistration wcb = null;
if (watcher != null) {
wcb = new ExistsWatchRegistration(watcher, clientPath);
}
final String serverPath = prependChroot(clientPath);
RequestHeader h = new RequestHeader();
h.setType(ZooDefs.OpCode.exists);
ExistsRequest request = new ExistsRequest();
request.setPath(serverPath);
request.setWatch(watcher != null);
SetDataResponse response = new SetDataResponse();
ReplyHeader r = cnxn.submitRequest(h, request, response, wcb); //提交请求
...
}
}
public class ClientCnxn {
...
public void start() {
sendThread.start();
eventThread.start();
}
public ReplyHeader submitRequest(RequestHeader h, Record request,
Record response, WatchRegistration watchRegistration)
throws InterruptedException {
ReplyHeader r = new ReplyHeader();
Packet packet = queuePacket(h, r, request, response, null, null, null,
null, watchRegistration);
synchronized (packet) {
while (!packet.finished) {
packet.wait();
}
}
return r;
}
// 构造packet对象,入队并等待SendThread发送
Packet queuePacket(RequestHeader h, ReplyHeader r, Record request,
Record response, AsyncCallback cb, String clientPath,
String serverPath, Object ctx, WatchRegistration watchRegistration)
{
Packet packet = null;
// 同步控制入队过程,由于LinkedList本身并非线程安全,所以这里需要同步控制
synchronized (outgoingQueue) {
if (h.getType() != OpCode.ping && h.getType() != OpCode.auth) {
h.setXid(getXid());
}
// Packet对象封装WatchRegistration对象
packet = new Packet(h, r, request, response, null,
watchRegistration);
packet.cb = cb;
packet.ctx = ctx;
packet.clientPath = clientPath;
packet.serverPath = serverPath;
if (!zooKeeper.state.isAlive()) {
conLossPacket(packet);
} else {
outgoingQueue.add(packet);
}
}
synchronized (sendThread) {
selector.wakeup();
}
return packet;
}
//从这里可以看出,虽然WatchRegistration被封装在Packet中,但并被没有被序列化到底层数组中,所以不会进行网络传输
Packet(RequestHeader header, ReplyHeader replyHeader, Record record,
Record response, ByteBuffer bb,
WatchRegistration watchRegistration) {
this.header = header;
this.replyHeader = replyHeader;
this.request = record;
this.response = response;
if (bb != null) {
this.bb = bb;
} else {
try {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
BinaryOutputArchive boa = BinaryOutputArchive
.getArchive(baos);
boa.writeInt(-1, "len"); // We'll fill this in later
header.serialize(boa, "header");
if (record != null) {
record.serialize(boa, "request");
}
baos.close();
this.bb = ByteBuffer.wrap(baos.toByteArray());
this.bb.putInt(this.bb.capacity() - 4);
this.bb.rewind();
} catch (IOException e) {
LOG.warn("Ignoring unexpected exception", e);
}
}
this.watchRegistration = watchRegistration;
}
class SendThread extends Thread {
...
@Override
public void run() {
try {
if (sockKey == null) {
startConnect(); //注册Selector,并与ZK服务端建立连接
}
}
...
while (zooKeeper.state.isAlive()) {
...
doIO(); //处理网络发送及接收的ID事件
...
}
}
private void startConnect() throws IOException {
...
SocketChannel sock;
sock = SocketChannel.open();
sock.configureBlocking(false);
sock.socket().setSoLinger(false, -1);
sock.socket().setTcpNoDelay(true);
setName(getName().replaceAll("\\(.*\\)",
"(" + addr.getHostName() + ":" + addr.getPort() + ")"));
sockKey = sock.register(selector, SelectionKey.OP_CONNECT); //注册到selector
if (sock.connect(addr)) {
primeConnection(sockKey);
}
...
}
boolean doIO() throws InterruptedException, IOException {
boolean packetReceived = false;
SocketChannel sock = (SocketChannel) sockKey.channel();
if (sock == null) {
throw new IOException("Socket is null!");
}
if (sockKey.isReadable()) { //读取数据
int rc = sock.read(incomingBuffer);
...
readResponse();
...
}
if (sockKey.isWritable()) {
synchronized (outgoingQueue) {
if (!outgoingQueue.isEmpty()) { //链表不为空,则写出
ByteBuffer pbb = outgoingQueue.getFirst().bb;
sock.write(pbb);
if (!pbb.hasRemaining()) {
sentCount++;
Packet p = outgoingQueue.removeFirst();
if (p.header != null
&& p.header.getType() != OpCode.ping
&& p.header.getType() != OpCode.auth) {
pendingQueue.add(p);
}
}
}
}
}
...
}
void readResponse() throws IOException {
...
} finally {
finishPacket(packet);
}
}
private void finishPacket(Packet p) {
if (p.watchRegistration != null) {
p.watchRegistration.register(p.replyHeader.getErr());
}
...
}
}
Watcher服务端处理流程
上面有两个问题,第一个问题是既然不需要传输Watcher那么如何完成Watch注册?第二个问题是为什么要将Watch交由ZkWatcherManager管理?其实这里说的注册Watcher并不是Watcher对象,而是另一种意思,服务端在请求处理的流程中会判断当前请求是否需要注册Watcher,如果需要,那么服务端将当前连接对象(即代表客户端和服务端连接的对象,3.4.10后通常是一个NettyServerCnxn对象)与节点路径进行绑定,最终存储在服务端的WatchManger中。
当有客户端的某些事务操作触发某个节点的事件时,服务器端通过维护的WatchManger可以很方便的找到该节点上注册的连接(找到并移除remove,所以使用是一次性的),将节点路径、通知状态、事件类型信息包装成一个WatchedEvent对象,在传输之前再将其转换成一个WatcherEvent对象,然后传输。这里注意WatchedEvent仅仅只是一个封装好的类,而WatcherEvent提供了序列化机制,所以才有转换流程。
public class FinalRequestProcessor implements RequestProcessor {
...
public void processRequest(Request request) {
...
//代表客户端与服务端的连接
ServerCnxn cnxn = request.cnxn;
...
try {
...
switch (request.type) {
...
//如果是注册事件
case OpCode.setWatches: {
...
//最终会被添加到WatcherManager中
zks.getZKDatabase().setWatches(relativeZxid,
setWatches.getDataWatches(),
setWatches.getExistWatches(),
setWatches.getChildWatches(), cnxn);
break;
}
}
...
try {
//响应
cnxn.sendResponse(hdr, rsp, "response");
...
} catch (IOException e) {
...
}
}
}
public class WatchManager {
private static final Logger LOG = Logger.getLogger(WatchManager.class);
// key为节点路径,V为Watcher列表
private final HashMap<String, HashSet<Watcher>> watchTable =
new HashMap<String, HashSet<Watcher>>();
// Key为Watchr,V为节点列表
private final HashMap<Watcher, HashSet<String>> watch2Paths =
new HashMap<Watcher, HashSet<String>>();
public synchronized void addWatch(String path, Watcher watcher) {
HashSet<Watcher> list = watchTable.get(path);
if (list == null) {
list = new HashSet<Watcher>(4);
watchTable.put(path, list);
}
list.add(watcher);
HashSet<String> paths = watch2Paths.get(watcher);
if (paths == null) {
paths = new HashSet<String>();
watch2Paths.put(watcher, paths);
}
paths.add(path);
}
public Set<Watcher> triggerWatch(String path, EventType type, Set<Watcher> supress) {
//节点路径、通知状态、事件类型信息包装成一个WatchedEvent对象
WatchedEvent e = new WatchedEvent(type,
KeeperState.SyncConnected, path);
HashSet<Watcher> watchers;
synchronized (this) {
// 获取并移除(一次性使用)
watchers = watchTable.remove(path);
if (watchers == null || watchers.isEmpty()) {
if (LOG.isTraceEnabled()) {
ZooTrace.logTraceMessage(LOG,
ZooTrace.EVENT_DELIVERY_TRACE_MASK,
"No watchers for " + path);
}
return null;
}
for (Watcher w : watchers) {
HashSet<String> paths = watch2Paths.get(w);
if (paths != null) {
paths.remove(path);
}
}
}
for (Watcher w : watchers) {
if (supress != null && supress.contains(w)) {
continue;
}
w.process(e); // 调用watcher的process方法
}
return watchers;
}
}
//NIOServerCnxn为Watcher接口的实现
public class NIOServerCnxn implements Watcher, ServerCnxn {
...
synchronized public void process(WatchedEvent event) {
ReplyHeader h = new ReplyHeader(-1, -1L, 0);
if (LOG.isTraceEnabled()) {
ZooTrace.logTraceMessage(LOG, ZooTrace.EVENT_DELIVERY_TRACE_MASK,
"Deliver event " + event + " to 0x"
+ Long.toHexString(this.sessionId)
+ " through " + this);
}
WatcherEvent e = event.getWrapper();
sendResponse(h, e, "notification"); //将WatchedEnvent包装成一个WatcherEvent传输对象,并发送
}
}
Watcher客户端处理流程
当服务端通过连接当事件对象传输到客户端后,其实前面说过,在客户端SendThread线程的readResponse()中进行读取,将反序列化的WatchedEvent对象生产到EventThread.queueEvent队列中,由EventThread消费。EventThread消费者会根据事件类型,将相应的Watcher从ZkWatcherManger对应的类型集合中获取并移除remove。从这里可以看出,其实客户端事件注册后也是一次性失效的。
总结比较重要的几点,及个人见解
- 客户端的请求都会被封装成Packet对象,其中包括Head、节点路径、事件等信息,但事件不会被序列化传输到S端。
- 客户端严格的执行顺序是因为Packet对象入队时的同步控制,但这里为什么不用同步队列呢?
- 所谓的Watch注册,另一种说法就是服务端通过WatcherManger维护了节点及连接对象的映射关系。
- 虽然事件是一次性的,但并不是说,服务端删除事件后,就与客户端断开了连接,而仅仅只是移除上面的映射关系而已
- 注意WatcherEnvet与Watched的区别
三)基本配置(另起)
四)Leader选举(另起)
五)数据与存储(另起)