Zookeeper

概念定义:

ZooKeeper是一个分布式的,开放源码的分布式应用程序协调服务,是Google的Chubby一个开源的实现,是Hadoop和Hbase的重要组件。它是一个为分布式应用提供一致性服务的软件,提供的功能包括:配置维护、域名服务、分布式同步、组服务等:

特点:
  • 速度很快
  • 顺序一致性
  • 原子性
  • 单系统映像
  • 可靠性
  • 时效性

为了方便查询,这里把配置提前放置,要了解其内部的运行原理,请阅读后面的内容

单节点
配置

Once you’ve downloaded a stable ZooKeeper release unpack it and cd to the root

To start ZooKeeper you need a configuration file. Here is a sample, create it in conf/zoo.cfg:

tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
参数解释:

tickTime

the basic time unit in milliseconds used by ZooKeeper. It is used to do heartbeats and the minimum session timeout will be twice the tickTime.

dataDir

the location to store the in-memory database snapshots and, unless specified otherwise, the transaction log of updates to the database.

clientPort

the port to listen for client connections

Now that you created the configuration file, you can start ZooKeeper:

启动ZooKeeper
bin/zkServer.sh start
链接ZooKeeper
$ bin/zkCli.sh -server 127.0.0.1:2181
查看ZooKeeper状态
zkServer.sh  status
(集群)模式

The required conf/zoo.cfg file for replicated mode is similar to the one used in standalone mode, but with a few differences. Here is an example:

tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888

新曾参数含义解释:

The new entry, ++initLimit++ is timeouts ZooKeeper uses to limit the length of time the ZooKeeper servers in quorum have to connect to a leader. The entry ++syncLimit++ limits how far out of date a server can be from a leader.

With both of these timeouts, you specify the unit of time using tickTime. In this example, the timeout for initLimit is 5 ticks at 2000 milleseconds a tick, or 10 seconds.

The entries of the form server.X list the servers that make up the ZooKeeper service. When the server starts up, it knows which server it is by looking for the file myid in the data directory. That file has the contains the server number, in ASCII.

Finally, note the two port numbers after each server name: " 2888" and ++“3888++”. Peers use the former port to connect to other peers. Such a connection is necessary so that peers can communicate, for example, to agree upon the order of updates. More specifically, a ZooKeeper server uses this port to connect followers to the leader. When a new leader arises, a follower opens a TCP connection to the leader using this port. Because the default ++leader++ ++election++ also uses TCP, we currently require another port for leader election. This is the second port in the server entry.

以下内容大多来自官网的文档,汉语部分是我自己的理解,有不对的地方欢迎指正,谢谢!!!

概念定义:
前奠知识:

Paxos 算法解决的问题是一个分布式系统如何就某个值(决议)达成一致。一个典型的场景是,在一个分布式数据库系统中,如果各节点的初始状态一致,每个节点执行相同的操作序列,那么他们最后能得到一个一致的状态。为保证每个节点执行相同的命令序列,需要在每一条指令上执行一个“一致性算法”以保证每个节点看到的指令一致。一个通用的一致性算法可以应用在许多场景中,是分布式计算中的重要问题。因此从20世纪80年代起对于一致性算法的研究就没有停止过。节点通信存在两种模型:共享内存(Shared memory)和消息传递(Messages passing)。Paxos 算法就是一种基于消息传递模型的一致性算法。

ZooKeeper is a high-performance coordination service for distributed applications. It exposes common services - such as naming, configuration management, synchronization, and group services - in a simple interface so you don't have to write them from scratch. You can use it off-the-shelf to implement consensus, group management, leader election, and presence protocols. And you can build on it for your own, specific needs.

ZooKeeper是一个高性能的分布式协作服务。通过一个简单的接口暴露了一些通用的服务,例如:命名,配置管理,同步和组服务,所以我们不必从头开始编写他们。你可以直接使用它来实现一致性,组管理,leader选举和心跳协议。你可以针对自己的需要来建立。

ZooKeeper is a distributed, open-source coordination service for distributed applications. It exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration maintenance, and groups and naming. It is designed to be easy to program to, and uses a data model styled after the familiar directory tree structure of file systems. It runs in Java and has bindings for both Java and C.

ZooKeeper是一个为分布式应用提供协作的开源的服务。它公开了一组简单的基元,分布式应用可以在其基础上实现更高级别的服务,例如:同步,配置维护,组和命名。它设计的初衷是易于编程,并且使用和文件树结构相似的数据模型。它运行在java之上,并且提供了java和c的绑定操作。

Coordination services are notoriously hard to get right. They are especially prone to errors such as race conditions and deadlock. The motivation behind ZooKeeper is to relieve distributed applications the responsibility of implementing coordination services from scratch.

众所周知,协调服务很难得到正确的结果。他们特别容易发生诸如竞争危害和死锁等错误。ZooKeeper背后的动机是减轻分布式应用程序从头开始实现协调服务的责任。

Design Goals

ZooKeeper is simple. ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchal namespace which is organized similarly to a standard file system. The name space consists of data registers - called znodes, in ZooKeeper parlance - and these are similar to files and directories. Unlike a typical file system, which is designed for storage, ZooKeeper data is kept in-memory, which means ZooKeeper can acheive high throughput and low latency numbers.

ZooKeeper很简单。ZooKeeper允许分布式的进程之间通过结构像一个标准的文件系统的共享层级的namespace相互协调。
命名空间由数据注册器(用ZooKeeper的用语来说,被称作znodes)构成。他们和文件和文件夹类似。和传统的文件系统设计的目的是存储不同,ZooKeeper中的数据保存在内存中,以达到高吞吐,低延迟的目的。

The ZooKeeper implementation puts a premium on high performance, highly available, strictly ordered access. The performance aspects of ZooKeeper means it can be used in large, distributed systems. The reliability aspects keep it from being a single point of failure. The strict ordering means that sophisticated synchronization primitives can be implemented at the client.

ZooKeeper的实现具有额外的特点:高性能,高可用性(HA),严格顺序访问。ZooKeeper的高性能意味着它可以被应用到数据量庞大的分布式系统中。高可用性使他远离单点故障的问题。严格的顺序使复杂的同步在客户端实现成为可能。

ZooKeeper is replicated. Like the distributed processes it coordinates, ZooKeeper itself is intended to be replicated over a sets of hosts called an ensemble.

ZooKeeper是数据冗余的。正如被它协调的分布式进程,ZooKeeper本身意在通过一组主机的冗余来保证其对数据的安全性。

image在这里插入图片描述

The servers that make up the ZooKeeper service must all know about each other. They maintain an in-memory image of state, along with a transaction logs and snapshots in a persistent store. As long as a majority of the servers are available, the ZooKeeper service will be available.

构成ZooKeeper的各个服务必须知道相互之间的变化。他们在内存中维持一个状态image,并且支持将事物的log,和快照存到磁盘。只要大部分的服务是可用的,ZooKeeper的服务就是可以使用的。

Clients connect to a single ZooKeeper server. The client maintains a TCP connection through which it sends requests, gets responses, gets watch events, and sends heart beats. If the TCP connection to the server breaks, the client will connect to a different server.

客户端连接到单个ZooKeeper服务上。客户端和ZooKeeper之间维持一个TCP的长链接,通过这个长连接发送请求,获取响应,获取监听事件和发送心跳。如果这个TCP链接断了,客户端将会链接到另一个的服务。

ZooKeeper is ordered. ZooKeeper stamps each update with a number that reflects the order of all ZooKeeper transactions. Subsequent operations can use the order to implement higher-level abstractions, such as synchronization primitives.

ZooKeeper是有序的。ZooKeeper给每一个更新打上一个标志,映射出在所有ZooKeeper事物中的下标。这种顺序的操作可以被用来实现更高一级的抽象概念,例如同步基元。

ZooKeeper is fast. It is especially fast in "read-dominant" workloads. ZooKeeper applications run on thousands of machines, and it performs best where reads are more common than writes, at ratios of around 10:1.

ZooKeeper的速度很快。在处理读操作占优势的工作时尤其的快。ZooKeeper的应用运行在上千台的机器上,当读操作和写操作比例在10:1附近时,这一点表现的尤其明显。

Data model and the hierarchical namespace

数据模型与分层命名空间

The name space provided by ZooKeeper is much like that of a standard file system. A name is a sequence of path elements separated by a slash (/). Every node in ZooKeeper's name space is identified by a path.

ZooKeeper提供的命名空间和标准的文件系统很相似。一个名字就是一个由“/”分割的路径队列。

image在这里插入图片描述

Nodes and ephemeral nodes

节点和短暂的节点

Unlike is standard file systems, each node in a ZooKeeper namespace can have data associated with it as well as children. It is like having a file-system that allows a file to also be a directory. (ZooKeeper was designed to store coordination data: status information, configuration, location information, etc., so the data stored at each node is usually small, in the byte to kilobyte range.) We use the term znode to make it clear that we are talking about ZooKeeper data nodes.

和标准的文件系统不一样,ZooKeeper的命名空间中的每个节点可以拥有和它相关的数据和子节点。它就像是一个允许一个文件同时是一个目录的文件系统。(ZooKeeper是被设计用来存储协调数据:状态信息,配置,位置信息,等等,所以在每个节点中存储的数据通常都很小,在b到kb的范围内)我们使用znode来明确的指定我们在谈论的是ZooKeeper的数据节点。

Znodes maintain a stat structure that includes version numbers for data changes, ACL changes, and timestamps, to allow cache validations and coordinated updates. Each time a znode's data changes, the version number increases. For instance, whenever a client retrieves data it also receives the version of the data.

Znodes维持一个状态结构包括数据变化的版本数,ACL的变化和一个时间戳,目的是允许缓存的验证和协调的更新。每次一个znode的数据发生了变化,版本数就会增加。举个例子:当一个客户端获取数据时,同时也获取到了数据的版本数。

The data stored at each znode in a namespace is read and written atomically. Reads get all the data bytes associated with a znode and a write replaces all the data. Each node has an Access Control List (ACL) that restricts who can do what.

存在命名空间中的每个znode中的数据读写都是原子的。读的时候会获取和这个znode相关的所有数据的字节,写的时候会替换掉所有的数据。每一个znode都有一个可控制列表(ACL),有来限制那些“人”可以做那些操作。

ZooKeeper also has the notion of ephemeral nodes. These znodes exists as long as the session that created the znode is active. When the session ends the znode is deleted. Ephemeral nodes are useful when you want to implement [tbd].

ZooKeeper同时也有临时节点的概念。这些节点在创建会话的节点是active状态时存在,当会话结束时就被删除。当你想实现tbd时,临时节点是很有用的。

Conditional updates and watches

条件更新和监听

ZooKeeper supports the concept of watches. Clients can set a watch on a znodes. A watch will be triggered and removed when the znode changes. When a watch is triggered the client receives a packet saying that the znode has changed. And if the connection between the client and one of the Zoo Keeper servers is broken, the client will receive a local notification. These can be used to [tbd].

ZooKeeper支持监听的概念。客户端可以在znode上设置一个监听。当znode变化时,这个监听被触发并且被移除。当一个监听触发时,客户端收到一个数据包标识znode已经发生了改变。如果客户端和ZooKeeper的一个服务之间的链接断了,客户端将会收到一个本地的提醒,这个可以用来实现tbd.

Guarantees

ZooKeeper is very fast and very simple. Since its goal, though, is to be a basis for the construction of more complicated services, such as synchronization, it provides a set of guarantees. These are:

ZooKeeper的速度很快并且很简单。由于它的目标是作为构建更复杂服务(如同步)的基础,因此它提供了一组保证。这些是

Sequential Consistency - Updates from a client will be applied in the order that they were sent.

顺序一致性,从客户端发出的更新将被顺序使用。

Atomicity - Updates either succeed or fail. No partial results.

原子性,更新要不全成功,要不全失败,不存在局部生效的问题。

Single System Image - A client will see the same view of the service regardless of the server that it connects to.

单系统映像,不管客户端链接的是哪个服务,看到的视图都是一样的。

Reliability - Once an update has been applied, it will persist from that time forward until a client overwrites the update.

可靠性,当一个更新被应用将被持久化,直到被客户端覆盖掉

Timeliness - The clients view of the system is guaranteed to be up-to-date within a certain time bound.

时效性,系统的客户端视图保证在一定的时间范围内是最新的。

For more information on these, and how they can be used, see [tbd]

Simple API
One of the design goals of ZooKeeper is provide a very simple programming interface. As a result, it supports only these operations:
create

creates a node at a location in the tree

delete

deletes a node

exists

tests if a node exists at a location

get data

reads the data from a node

set data

writes data to a node

get children

retrieves a list of children of a node

sync

waits for data to be propagated

For a more in-depth discussion on these, and how they can be used to implement higher level operations, please refer to [tbd]

Implementation

ZooKeeper Components shows the high-level components of the ZooKeeper service. With the exception of the request processor, each of the servers that make up the ZooKeeper service replicates its own copy of each of components.

ZooKeeper组件显示了ZooKeeper的高级别服务。除了请求处理器,ZooKeeper的每个服务都备份了自己的每个组件。

image在这里插入图片描述

The replicated database is an in-memory database containing the entire data tree. Updates are logged to disk for recoverability, and writes are serialized to disk before they are applied to the in-memory database.

备份的数据库是在内存中的,包含整个数据树。更新被日志记录到磁盘用于恢复,写操作在被应用到内存中的数据库之前先序列化到了磁盘。

Every ZooKeeper server services clients. Clients connect to exactly one server to submit irequests. Read requests are serviced from the local replica of each server database. Requests that change the state of the service, write requests, are processed by an agreement protocol.

每个ZooKeeper服务都对客户端提供服务。客户端连接到其中一个服务上,提交请求。服务读操作时,直接从本地备份的数据库中读取,写操作由协议协议处理。

As part of the agreement protocol all write requests from clients are forwarded to a single server, called the leader. The rest of the ZooKeeper servers, called followers, receive message proposals from the leader and agree upon message delivery. The messaging layer takes care of replacing leaders on failures and syncing followers with leaders.

作为协议协议的一部分,来自客户端的所有写入请求都被转发到一个称为“领导者”的服务器。其余的ZooKeeper称为追随者,接收来自领导者的消息建议,并同意消息传递。消息层负责当失败时替换领导(选举领导),并与领导者同步追随者。

ZooKeeper uses a custom atomic messaging protocol. Since the messaging layer is atomic, ZooKeeper can guarantee that the local replicas never diverge. When the leader receives a write request, it calculates what the state of the system is when the write is to be applied and transforms this into a transaction that captures this new state.

ZooKeeper使用定制的原子消息协议。由于消息传递层是原子的,所以ZooKeKER可以保证本地副本从不偏离。当领导收到写请求时,它计算当应用请求时系统的状态是什么,并将其转换为捕获此新状态的事务。

Uses
The programming interface to ZooKeeper is deliberately simple. With it, however, you can implement higher order operations, such as synchronizations primitives, group membership, ownership, etc. Some distributed applications have used it to: [tbd: add uses from white paper and video presentation.] For more information, see [tbd]
ZooKeeper的编程接口是故意简单的。但是,使用它,您可以实现更高级的操作,例如同步原语、组成员关系、所有权等。

这篇文章里面解释的挺清晰的,推荐给大家:link

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值