ZooKeeper is a high-performance coordination service for distributed applications. It exposes common services - such as naming, configuration management, synchronization, and group services.
Zookeeper是一个高效的分布式应用程序协调服务。它为分布式应用提供一些基础服务。比如,命名管理、配置管理、分布式同步以及组管理等。
ZooKeeper has a hierarchal name space, much like a distributed file system. The only difference is that each node in the namespace can have data associated with it as well as children. It is like having a file system that allows a file to also be a directory.
ZooKeeper就像分布式文件系统一样,有一个层级的命名空间。与文件系统不同的是,ZooKeeper中每个节点(ZNode)除了和子节点有关联外,还可以有自己的数据。
ZooKeeper源码分析首先需要了解ZooKeeper底层的数据结构。
编解码
ZooKeeper采用Jute作为自己的序列化组件,其位于工程中的zookeeper-jute 模块。在zookeeper-jute 模块的package.html文件中这样描述Hadoop Record I/O。
Software systems of any significant complexity require mechanisms for data interchange with the outside world. These interchanges typically involve the marshaling and unmarshaling of logical units of data to and from data streams (files, network connections, memory buffers etc.).
任何有显著复杂性的软件系统都需要有能与外界进行数据交换的机制。这些数据交换常常涉及逻辑数据单元与数据流之间的编解码。
Applications usually have some code for serializing and deserializing the data types that they manipulate embedded in them. The work of serialization has several features that make automatic code generation for it worthwhile.
应用软件通常在内部嵌套了用于编解码这些数据类型的代码。序列化的工作有一些显著的特征使得代码可以自动生成。
ZooKeeper最初是使用了Jute作为序列化工具,沿用至今。没有使用目前更加通用的Avro、Thrift、Protobuf等,应该考虑到版本的兼容行以及性能上未出现瓶颈吧。zookeeper-jute模块中的zookeeper.jute定义了ZooKeeper中的数据结构、程序中使用的协议以及一些事务定义。等分析到具体的逻辑过程时再参考。
DataTree
ZooKeeper的数据模型是由多个节点组成的树状结构。