ZooKeeper相关资料

ZooKeeper是一个用于分布式系统协调的组件,管理与协调相关数据,但不适用于大量存储。它解决了主从系统中的各种问题,如主备故障、通信失败等。文章深入探讨了ZooKeeper的基本概念、API、会话、架构和选举机制,并介绍了如何利用ZooKeeper实现锁和主从任务分配。此外,还提到了Zookeeper在解决分布式一致性问题中的作用,如通过Zab协议实现原子广播。
摘要由CSDN通过智能技术生成

ZooKeeper: enables coordination for distributed system

Similar to multithread programming, but shared nothing. Easier with a component provide share store, like ZooKeeper.


ZooKeeper manage data related to coordination, not for bulk storage



Master-Slave Key Problems:

1. Master Crashes

2. Slave Crashes

3. Communication Failure(Master and Slave cannot exchange message)


Master Failures

Store the state of the system at the time old primary master crashed.

Split-Brain Problem: more than one master running independently caused by false suspicion of master failed


Question: if there are 5 servers at beginning, when 3 separated from other 2, what will happen? will there be two separate master?(is this actually what split-brain means?) or just the group has the majority of 3?


Worker Failures:

Detect worker failure by master. If the computation has side effects, some recovery procedure might be necessary to clean up the state.


Communication Failure:

Reassigning a task may cause two workers executing the same task, which depends on application may unacceptable.

ephemeral state.

Cannot tell is a node is crashed or just be slow.


Summary of Master-Worker Tasks

1. Master Election

2. Crash Detection

3. Group Membership Management: figure out which workers are available, group member is worker

4. Metadata Management: master and worker be able to store assignments and execution statuses in a reliable manner



ZooKeeper Basics

ZooKeeper not provide primitives, but expose file system like API comprised for a small set of calls.

Recipe to denote these implementation of primitives


API

create /path data

delete /path

exists /path

setData /path data

getData /path

getChildren /path


Modes

Persistent

Ephemeral: delete if client created it crashes or simply closed the connection. (Cannot have children)

Sequential znode: tasks-1, tasks-2 ....


All modes: persistent, ephemeral, persistent-sequential, ephemeral-sequential


Watches and Notifications

Clients register to watch notification instead of POLLING.

【IMPORTANT】One-Shot Notification, Example:

A want to watch on /tasks, but tasks has changed before A successfully set the watch, A may lost the notification, if the notification is important for A, so A get status of /tasks when setting the watch


【IMPORTANT】Notification Guarantee, notifications are delivered to a client before any other change is made to the znode.


Question: What if the Notification is lost?


Versions

use version to check valid update



ZooKeeper Architecture

Client Library: responsible for talking to servers


ZooKeeper Quorums

In Quorum mode, ZooKeeper replicates its data tree across all servers in the ensemble nodes.

quorum minimum number of legislators required to be present for a note. (How Quorum is be used? Solve the problem of delay to store data across all servers before return to client? Solve the partition problem?Solve the split-brain problem,抽屉原理)

当一个网络中被分隔成两部分以后,不管哪一部分被使用,都会有最新的更新


There are other quorums other than majority quorums


ZooKeeper Session

when a session ends for any reason, the ephemeral nodes created during session disappeared.(能有哪些原因?)

Moving a session to different server if has not heard from its current server for a some time. Moving session to a different server is transparently by ZooKeeper library.

Session offer order guarantee



ZooKeeper Lifecycle

Case: Waiting on CONNECTING During Network Partitions


ZooKeeper Quorums 

1. server need to know each other, each zookeeper server is configured with a list of servers, if quorum is reached, then the service is available.

2. client specify the host:port pair it tries to connect(we can limit the hosts client can reached to achieve simple location based balancing)


Can we enhance third-party framework or application to use ZooKeeper do coordination work?


Implementing a Primitive: Locks with ZooKeeper

To acquire a lock, create a ephemeral znode: /lock

Others watch changes for /lock


Implementation of a Master-Worker Example

Roles

Master / Worker / Client


Master

master相当于一个竞争资源,对应lock recipe,创建一个ephemeral node用来表示master

create ephemeral znode: /master


stat /master true

监听(watch)/master节点


Workers, Tasks, Assignments

persistent

create /workers ""

create /tasks "" 

create /assigns ""


ls /workers true

ls /tasks true

监听 /workers 子节点是否发生改变


The Work Role

create -e /workers/worker1.example.com "worker1.example.com:2224"

此时针对 ls /workers true 的watcher将会接受到通知


create -e /assign/worker1.example.com 

创建子节点接收任务分配

ls /assign/worker1.example.com true


The Client Role

create -s  /tasks/task- "cmd"

created /tasks/task-000000

-s 表明自增


ls /tasks/task-000000 true

监听task完成情况


当task创建完成以后,服务器监听到此事件,就去找当前的 tasks 和 workers 去完成任务分配

ls /tasks

=> [task-000000]

ls /workers

=> [worker1.example.com]

create /assign/worker1.example.com/task-000000 ""


此时worker将会接收到新分配任务的通知,执行任务

当worker完成任务以后,创建一个task的对应status node

/tasks/task-000000/status "done"

此时client会监听到task完成



IMPORTANT: ZooKeeper Internals

ZooKeeper Internals

in Ensemble, One leader handling all requests;  followers receive and vote for updates;observer not participate in decision process, they only learn what have been decided


Requests, Transactions, and Identifiers

Requests读写分离

ZooKeeper server process read requests locally (exists, getData, getChildren) .

ZooKeeper leader process write requests (create, delete, setData).


Question: 如果client和Server A建立了session,发起了一个write request,完成。再向Server A发起了一个read request,并不能保证Server A一定能读到最新的?



Transaction: one state update operation (write request)

check the version number

idempotent: apply transaction multi times get same result

zxid: ZooKeeper transaciton id


Leader Election

leader election notification: server identifier (sid) and most transaction it executes (zxid)

p158


Zab: ZooKeeper Atomic Broadcast protocol



References

<<ZooKeeper>>

http://blog.cloudera.com/blog/2013/02/how-to-use-apache-zookeeper-to-build-distributed-apps-and-why/

http://zookeeper.apache.org/doc/trunk/recipes.html

http://www.ibm.com/developerworks/library/bd-zookeeper/

http://highscalability.com/blog/2008/7/15/zookeeper-a-reliable-scalable-distributed-coordination-syste.html

https://engineering.pinterest.com/blog/zookeeper-resilience-pinterest


kafka

http://kafka.apache.org/documentation.html


paxos算法[ZooKeeper内部实现的算法模型]

http://baike.baidu.com/link?url=GZCf2VymTzeZDfGCRs2QgPpjLm6xJEX5TzvtWEQOULy77yEqO0nc6Gy6JOxJjIaBSPTCqv9E1fEKUx420AyX9q

http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf

http://www.ux.uis.no/~meling/papers/2013-paxostutorial-opodis.pdf

https://www.youtube.com/watch?v=JEpsBg0AO6o 【非常好的解释,需要翻墙】

https://distributedthoughts.wordpress.com/2013/09/22/understanding-paxos-part-1/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值