zookeeper的数据模型是一个有层次的命名空间,类似于文件系统。zookeeper树里面的每个节点被称为znode。每个节点是用"/"的绝对路径来表示。
数据结构与数据组织方式:
ZKDatabase
/**
* This class maintains the in memory database of zookeeper
* server states that includes the sessions, datatree and the
* committed logs. It is booted up after reading the logs
* and snapshots from the disk.
*/
public class ZKDatabase {
private static final Logger LOG = LoggerFactory.getLogger(ZKDatabase.class);
/**
* make sure on a clear you take care of
* all these members.
*/
protected DataTree dataTree;
protected FileTxnSnapLog snapLog;
protected long minCommittedLog, maxCommittedLog;
protected LinkedList<Proposal> committedLog = new LinkedList<Proposal>();
ZKDatabase 这个类在内存中维护了zookeeper的一些状态信息,譬如会话、datatree和已提交日志信息。启动的时候读取磁盘上的日志和快照,把相关信息load到内存里面。
ZKDatabase 里面有多个重要元素组成:DataTree
DataTree
/**
* This class maintains the tree data structure. It doesn't have any networking
* or client connection code in it so that it can be tested in a stand alone
* way.
* <p>
* The tree maintains two parallel data structures: a hashtable that maps from
* full paths to DataNodes and a tree of DataNodes. All accesses to a path is
* through the hashtable. The tree is traversed only when serializing to disk.
*/
public class DataTree {
private static final Logger LOG = LoggerFactory.getLogger(DataTree.class);
/**
* This hashtable provides a fast lookup to the datanodes. The tree is the
* source of truth and is where all the locking occurs
*/
private final ConcurrentHashMap<String, DataNode> nodes =
new ConcurrentHashMap<String, DataNode>();
datatree维护了一个树状的数据结构。用ConcurrentHashMap维护了路径和datanode(znode)的关系:key:路径,value:datanode
datanode(znode)
/**
* This class contains the data for a node in the data tree.
* <p>
* A data node contains a reference to its parent, a byte array as its data, an
* array of ACLs, a stat object, and a set of its children's paths.
*
*/
public class DataNode implements Record {
/** the parent of this datanode */
DataNode parent;
/** the data for this datanode */
byte data[];
/**
* the acl map long for this datanode. the datatree has the map
*/
Long acl;
/**
* the stat for this node that is persisted to disk.
*/
public StatPersisted stat;
DataNode parent :父亲节点,这样维护了一个数据链的关系
byte data[]:datanode数据
Long acl:用来做访问控制
StatPersisted stat:保存在磁盘的状态
znode的几个概念
- Watches
zookeeper的客户端可以给znode设置Watches,当znode发生改变,会触发并清空watch,并发送一个通知给客户端 - 数据访问
一个client对znode的读取是整个节点的数据,写数据是完全覆盖znode。所以为了安全,每个znode有acl来控制数据访问。
ZooKeeper不是用来设计成一个大型的或者大对象的存储。一般用来存储与管理一些配置,状态等协同的数据。znode不能存储超过1m的数据。大对象建议存储在redis,hdfs中。 - 临时节点
临时节点的生命周期依赖于session,当客户端与ZooKeeper集群断开连接,临时节点自动被删除,临时节点不允许有子节点。 - 顺序节点
可以在一个路径下面创建顺序自增的节点。自增数据是4bytes,超2147483647会溢出。 - Zxid
zookeeper事务id(ZooKeeper Transaction Id),zk节点状态数据的变更会收到一个时间戳格式的zxid,zxid是唯一而且是有序的,譬如zxid1比zxid2小,那么zxid1发生在zxid2之前。