Zookeeper的视图结构和标准的Unix文件系统非常类似,但没有引入传统文件系统中目录和文件等相关概念,而是使用了其特有的"数据节点"概念,我们称之为ZNode(对应java中的DataNode),每个ZNode上都可以保存数据,同时还挂载子节点,是层次化的,用树表示,zookeeper的数据结构如下图
通过get可以得到节点的信息,如下图
在Java中对应如下代码:定义一个parent指向父节点,子节点用Set<String>children表示
public class DataNode implements Record {
/** the parent of this datanode */
DataNode parent;
/** the data for this datanode */
byte data[];
/**
* the acl map long for this datanode. the datatree has the map
*/
Long acl;
/**
* the stat for this node that is persisted to disk.
*/
public StatPersisted stat;
/**
* the list of children for this node. note that the list of children string
* does not contain the parent path -- just the last part of the path. This
* should be synchronized on except deserializing (for speed up issues).
*/
private Set<String> children = null;
/**
* default constructor for the datanode
*/
DataNode() {
// default constructor
}
/*
*省略部分代码
*/
}
StatePersisted可以看到额外的信息:
public class StatPersisted implements Record {
private long czxid;
private long mzxid;
private long ctime;
private long mtime;
private int version;
private int cversion;
private int aversion;
private long ephemeralOwner;
private long pzxid;
public StatPersisted() {
}
/*
* 省略部分代码
*/
}
StatePersisted属性解释如下表(来源:从Paxos到Zookeeper):
ZKDatabase
/**
* This class maintains the in memory database of zookeeper
* server states that includes the sessions, datatree and the
* committed logs. It is booted up after reading the logs
* and snapshots from the disk.
*/
public class ZKDatabase {
private static final Logger LOG = LoggerFactory.getLogger(ZKDatabase.class);
/**
* make sure on a clear you take care of
* all these members.
*/
protected DataTree dataTree;
protected FileTxnSnapLog snapLog;
protected long minCommittedLog, maxCommittedLog;
protected LinkedList<Proposal> committedLog = new LinkedList<Proposal>();
}
ZKDatabase 这个类在内存中维护了zookeeper的一些状态信息,譬如会话、datatree和已提交日志信息。启动的时候读取磁盘上的日志和快照,把相关信息load到内存里面。
ZKDatabase 里面有多个重要元素组成:DataTree
DataTree
/**
* This class maintains the tree data structure. It doesn't have any networking
* or client connection code in it so that it can be tested in a stand alone
* way.
* <p>
* The tree maintains two parallel data structures: a hashtable that maps from
* full paths to DataNodes and a tree of DataNodes. All accesses to a path is
* through the hashtable. The tree is traversed only when serializing to disk.
*/
public class DataTree {
private static final Logger LOG = LoggerFactory.getLogger(DataTree.class);
/**
* This hashtable provides a fast lookup to the datanodes. The tree is the
* source of truth and is where all the locking occurs
*/
private final ConcurrentHashMap<String, DataNode> nodes =
new ConcurrentHashMap<String, DataNode>();
private final WatchManager dataWatches = new WatchManager();
private final WatchManager childWatches = new WatchManager();
/** the root of zookeeper tree */
private static final String rootZookeeper = "/";
/** the zookeeper nodes that acts as the management and status node **/
private static final String procZookeeper = Quotas.procZookeeper;
/** this will be the string thats stored as a child of root */
private static final String procChildZookeeper = procZookeeper.substring(1);
/**
* the zookeeper quota node that acts as the quota management node for
* zookeeper
*/
private static final String quotaZookeeper = Quotas.quotaZookeeper;
/** this will be the string thats stored as a child of /zookeeper */
private static final String quotaChildZookeeper = quotaZookeeper
.substring(procZookeeper.length() + 1);
/**
* the path trie that keeps track fo the quota nodes in this datatree
*/
private final PathTrie pTrie = new PathTrie();
/**
* This hashtable lists the paths of the ephemeral nodes of a session.
*/
private final Map<Long, HashSet<String>> ephemerals =
new ConcurrentHashMap<Long, HashSet<String>>();
//......
}