服务注册中心(一)之zookeeper

1. 概念

官网原话
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.

解释
ZooKeeper是用于维护配置信息,命名,提供分布式同步和提供组服务的集中式服务。

2. 流程

在这里插入图片描述

3. ZAB协议

一种分布式事物数据一致性协议

  1. ZAB协议全称:Zookeeper Atomic Broadcast(Zookeeper 原子广播协议)
  2. ZAB协议包含两大部分:崩溃恢复原子广播
  3. 只有一个单一的客户端服务(Leader)去接受事物请求(proposal)
  4. Leader需要将数据信息广播同步给所有的Follower

4. 数据存储

4.1 数据节点

zookeeper比较好奇的还是数据存储这一块,其实内部使用的树形结构存储,每个树的节点称为ZNode,ZNode通过路径唯一标识,每个ZNode可以存储少量数据(默认是1M,可通过配置修改)。

路径唯一标识
如:创建路径命令 create /config/db ‘db’
db就是这个路径的唯一标识
在这里插入图片描述

4.2 节点类型

  • 临时节点(EPHEMERAL):客户端和服务端连接时创建,断开后自动删除,临时节点不能拥有子节点(EPHEMERAL_SEQUENTIAL 临时顺序节点,拥有临时节点特性,并带有序号)
  • 持久节点(PERSISTENT):创建后永久存在,除非主动删除(PERSISTENT_SEQUENTIAL 持久顺序节点,拥有持久节点特性,并带有序号)

这里记录一问题,临时和持久,有序和无序如何选择?
个人理解如下:
临时和持久,就看业务系统和服务器具体场景需要了,如果数据变化不大,并且数据重要,zk服务器稳定,可以选择持久(正常情况下一般选择临时)。

有序和无序,如果数据涉及锁,高频新增和修改,都需要使用带序号,因为这个序号会跟分布式锁或事物处理先后逻辑相关。

4.3 节点访问控制(ACL)

ACL的格式由 [schema] : [id] : [acl] 三段组成。
每个节点都有自己单独的ACL配置,子节点不受影响!

schema取值

  • world 任何人(id配anyone)
  • auth 不需要id或者指定用户
  • digest 通过用户名和密码验证(其中密码需要加密,加密方式为先sha1,再base64处理)
  • host/ip 通过ip或者ip段验证

id取值
标识身份,值依赖于schema做解析,如用户名 或者 用户名密码 或者 ip。

acl权限取值

  • create 创建子节点
  • delete 删除子节点
  • write 在znode节点上写数据
  • read 在znode节点上读数据
  • admin 设置acl权限

一般使用cdwra分别表示create, delete, write, read, admin

# 创建节点路径
create /zookeeper/test 'test'

# 设置权限(world)
setAcl /zookeeper/test world:anyone:cdwra

# 设置权限(auth)
# 需要先认证用户
addauth digest user1:123456
addauth digest user2:123456
# 赋权
setAcl /zookeeper/test auth:user1:cdwra
# 所有认证用户赋权
setAcl /zookeeper/test auth::cdwra

# 设置权限(digest)
setAcl /zookeeper/test digest:user1:密码(加密):cdwra
## 生成密码密文可通过如下命令
echo -n <user>:<password> | openssl dgst -binary -sha1 | openssl base64

# 设置权限(host/ip)
setAcl /zookeeper/test ip:192.168.0.1:cdwra
setAcl /zookeeper/test ip:192.168.0.1/16:cdwra

4.4 数据对象

zookeeper版本 - 3.4.14

zk的内存数据就是存储在DataTree中,而DataTree中其实使用的是ConcurrentHashMap存储数据的,
key是String类型,即是ZNode的路径唯一标识,
value是DataNode,这个就是数据存储的最小单元。

/**
 * 对象内容较多,其他省略了
 */
public class DataTree {
    private static final Logger LOG = LoggerFactory.getLogger(DataTree.class);

    /**
     * This hashtable provides a fast lookup to the datanodes. The tree is the
     * source of truth and is where all the locking occurs
     */
    private final ConcurrentHashMap<String, DataNode> nodes =
        new ConcurrentHashMap<String, DataNode>();

    private final WatchManager dataWatches = new WatchManager();

    private final WatchManager childWatches = new WatchManager();

    /** the root of zookeeper tree */
    private static final String rootZookeeper = "/";

    /** the zookeeper nodes that acts as the management and status node **/
    private static final String procZookeeper = Quotas.procZookeeper;

    /** this will be the string thats stored as a child of root */
    private static final String procChildZookeeper = procZookeeper.substring(1);

    /**
     * the zookeeper quota node that acts as the quota management node for
     * zookeeper
     */
    private static final String quotaZookeeper = Quotas.quotaZookeeper;

    /** this will be the string thats stored as a child of /zookeeper */
    private static final String quotaChildZookeeper = quotaZookeeper
            .substring(procZookeeper.length() + 1);

    /**
     * the path trie that keeps track fo the quota nodes in this datatree
     */
    private final PathTrie pTrie = new PathTrie();

    /**
     * This hashtable lists the paths of the ephemeral nodes of a session.
     */
    private final Map<Long, HashSet<String>> ephemerals =
        new ConcurrentHashMap<Long, HashSet<String>>();

    private final ReferenceCountedACLCache aclCache = new ReferenceCountedACLCache();

	...
}

其中DataNode的源码如下:
DataNode parent 记录上级节点对象
byte data[] 数据存储
Long acl 节点访问控制,每个节点独有,上下级节点不受影响(见4.3)
StatPersisted stat 节点持久化到磁盘的状态(StatPersisted对象大家可以自己去研究一下,大概10来个参数,主要记录当前节点的一些状态信息,像事物id、时间、版本次数等参数记录)
Set<String> children 子节点key集合

package org.apache.zookeeper.server;

import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
import java.io.IOException;
import java.util.HashSet;
import java.util.Set;
import java.util.Collections;

import org.apache.jute.InputArchive;
import org.apache.jute.OutputArchive;
import org.apache.jute.Record;
import org.apache.zookeeper.data.Stat;
import org.apache.zookeeper.data.StatPersisted;

/**
 * This class contains the data for a node in the data tree.
 * <p>
 * A data node contains a reference to its parent, a byte array as its data, an
 * array of ACLs, a stat object, and a set of its children's paths.
 * 
 */
@SuppressFBWarnings("EI_EXPOSE_REP2")
public class DataNode implements Record {
    /** the parent of this datanode */
    DataNode parent;

    /** the data for this datanode */
    byte data[];

    /**
     * the acl map long for this datanode. the datatree has the map
     */
    Long acl;

    /**
     * the stat for this node that is persisted to disk.
     */
    public StatPersisted stat;

    /**
     * the list of children for this node. note that the list of children string
     * does not contain the parent path -- just the last part of the path. This
     * should be synchronized on except deserializing (for speed up issues).
     */
    private Set<String> children = null;

    private static final Set<String> EMPTY_SET = Collections.emptySet();

    /**
     * default constructor for the datanode
     */
    DataNode() {
        // default constructor
    }

    /**
     * create a DataNode with parent, data, acls and stat
     * 
     * @param parent
     *            the parent of this DataNode
     * @param data
     *            the data to be set
     * @param acl
     *            the acls for this node
     * @param stat
     *            the stat for this node.
     */
    public DataNode(DataNode parent, byte data[], Long acl, StatPersisted stat) {
        this.parent = parent;
        this.data = data;
        this.acl = acl;
        this.stat = stat;
    }

    /**
     * Method that inserts a child into the children set
     * 
     * @param child
     *            to be inserted
     * @return true if this set did not already contain the specified element
     */
    public synchronized boolean addChild(String child) {
        if (children == null) {
            // let's be conservative on the typical number of children
            children = new HashSet<String>(8);
        }
        return children.add(child);
    }

    /**
     * Method that removes a child from the children set
     * 
     * @param child
     * @return true if this set contained the specified element
     */
    public synchronized boolean removeChild(String child) {
        if (children == null) {
            return false;
        }
        return children.remove(child);
    }

    /**
     * convenience method for setting the children for this datanode
     * 
     * @param children
     */
    public synchronized void setChildren(HashSet<String> children) {
        this.children = children;
    }

    /**
     * convenience methods to get the children
     * 
     * @return the children of this datanode
     */
    public synchronized Set<String> getChildren() {
        if (children == null) {
            return EMPTY_SET;
        }

        return Collections.unmodifiableSet(children);
    }

    synchronized public void copyStat(Stat to) {
        to.setAversion(stat.getAversion());
        to.setCtime(stat.getCtime());
        to.setCzxid(stat.getCzxid());
        to.setMtime(stat.getMtime());
        to.setMzxid(stat.getMzxid());
        to.setPzxid(stat.getPzxid());
        to.setVersion(stat.getVersion());
        to.setEphemeralOwner(stat.getEphemeralOwner());
        to.setDataLength(data == null ? 0 : data.length);
        int numChildren = 0;
        if (this.children != null) {
            numChildren = children.size();
        }
        // when we do the Cversion we need to translate from the count of the creates
        // to the count of the changes (v3 semantics)
        // for every create there is a delete except for the children still present
        to.setCversion(stat.getCversion()*2 - numChildren);
        to.setNumChildren(numChildren);
    }

    synchronized public void deserialize(InputArchive archive, String tag)
            throws IOException {
        archive.startRecord("node");
        data = archive.readBuffer("data");
        acl = archive.readLong("acl");
        stat = new StatPersisted();
        stat.deserialize(archive, "statpersisted");
        archive.endRecord("node");
    }

    synchronized public void serialize(OutputArchive archive, String tag)
            throws IOException {
        archive.startRecord(this, "node");
        archive.writeBuffer(data, "data");
        archive.writeLong(acl, "acl");
        stat.serialize(archive, "statpersisted");
        archive.endRecord(this, "node");
    }
}

5. 监听(Watcher)

zk的监听会存储在ZKWatchManager的defaultWatcher里面,如下源码
其中materialize方法是为了获取该节点所有的监听。

/**
     * Manage watchers & handle events generated by the ClientCnxn object.
     *
     * We are implementing this as a nested class of ZooKeeper so that
     * the public methods will not be exposed as part of the ZooKeeper client
     * API.
     */
    private static class ZKWatchManager implements ClientWatchManager {
        private final Map<String, Set<Watcher>> dataWatches =
            new HashMap<String, Set<Watcher>>();
        private final Map<String, Set<Watcher>> existWatches =
            new HashMap<String, Set<Watcher>>();
        private final Map<String, Set<Watcher>> childWatches =
            new HashMap<String, Set<Watcher>>();

        private volatile Watcher defaultWatcher;

        final private void addTo(Set<Watcher> from, Set<Watcher> to) {
            if (from != null) {
                to.addAll(from);
            }
        }

        /* (non-Javadoc)
         * @see org.apache.zookeeper.ClientWatchManager#materialize(Event.KeeperState, 
         *                                                        Event.EventType, java.lang.String)
         */
        @Override
        public Set<Watcher> materialize(Watcher.Event.KeeperState state,
                                        Watcher.Event.EventType type,
                                        String clientPath)
        {
            Set<Watcher> result = new HashSet<Watcher>();

            switch (type) {
            case None:
                result.add(defaultWatcher);
                boolean clear = ClientCnxn.getDisableAutoResetWatch() &&
                        state != Watcher.Event.KeeperState.SyncConnected;

                synchronized(dataWatches) {
                    for(Set<Watcher> ws: dataWatches.values()) {
                        result.addAll(ws);
                    }
                    if (clear) {
                        dataWatches.clear();
                    }
                }

                synchronized(existWatches) {
                    for(Set<Watcher> ws: existWatches.values()) {
                        result.addAll(ws);
                    }
                    if (clear) {
                        existWatches.clear();
                    }
                }

                synchronized(childWatches) {
                    for(Set<Watcher> ws: childWatches.values()) {
                        result.addAll(ws);
                    }
                    if (clear) {
                        childWatches.clear();
                    }
                }

                return result;
            case NodeDataChanged:
            case NodeCreated:
                synchronized (dataWatches) {
                    addTo(dataWatches.remove(clientPath), result);
                }
                synchronized (existWatches) {
                    addTo(existWatches.remove(clientPath), result);
                }
                break;
            case NodeChildrenChanged:
                synchronized (childWatches) {
                    addTo(childWatches.remove(clientPath), result);
                }
                break;
            case NodeDeleted:
                synchronized (dataWatches) {
                    addTo(dataWatches.remove(clientPath), result);
                }
                // XXX This shouldn't be needed, but just in case
                synchronized (existWatches) {
                    Set<Watcher> list = existWatches.remove(clientPath);
                    if (list != null) {
                        addTo(list, result);
                        LOG.warn("We are triggering an exists watch for delete! Shouldn't happen!");
                    }
                }
                synchronized (childWatches) {
                    addTo(childWatches.remove(clientPath), result);
                }
                break;
            default:
                String msg = "Unhandled watch event type " + type
                    + " with state " + state + " on path " + clientPath;
                LOG.error(msg);
                throw new RuntimeException(msg);
            }

            return result;
        }
    }

注册watcher可以通过zookeeper提供的getData、exists 和 getChildren三个方法,注册逻辑基本差不多,看下getData方法

  1. 通过new DataWatchRegistration(watcher, clientPath)封装watcher
  2. 通过cnxn.submitRequest(h, request, response, wcb)发送到服务端
public byte[] getData(final String path, Watcher watcher, Stat stat)
        throws KeeperException, InterruptedException
     {
        final String clientPath = path;
        PathUtils.validatePath(clientPath);

        // the watch contains the un-chroot path
        WatchRegistration wcb = null;
        if (watcher != null) {
            wcb = new DataWatchRegistration(watcher, clientPath);
        }

        final String serverPath = prependChroot(clientPath);

        RequestHeader h = new RequestHeader();
        h.setType(ZooDefs.OpCode.getData);
        GetDataRequest request = new GetDataRequest();
        request.setPath(serverPath);
        request.setWatch(watcher != null);
        GetDataResponse response = new GetDataResponse();
        ReplyHeader r = cnxn.submitRequest(h, request, response, wcb);
        if (r.getErr() != 0) {
            throw KeeperException.create(KeeperException.Code.get(r.getErr()),
                    clientPath);
        }
        if (stat != null) {
            DataTree.copyStat(response.getStat(), stat);
        }
        return response.getData();
    }

Watcher注册后做了哪些事情(比如触发和回调),感兴趣的小伙伴可以自己去研究一下~

记录一些Watcher的特性:

  • Watcher一旦被触发,就会移除,需要重新注册
  • Watcher执行是有顺序性的,多个Watcher会按顺序先后执行
  • Watcher触发时,是主动将变更信息推送给客户端的

6. 问题

接下来通过问题的方式去跟踪了解一些其他相关点~

问题1 Leader怎么产生?

zk内部选举主要是使用Paxos算法,不清楚该算法的可以参考一下上一篇用故事的方式说Paxos和Fast Paxos算法

问题2 Leader挂掉怎么办?

这就要提到ZAB协议里面的崩溃恢复了,当Leader服务器出现宕机、网络异常等问题时,就会进入崩溃回复模式,此时停止读取数据和数据同步,进行Leader选举。

待Leader选举成功之后,重新开始恢复,随后进入消息广播模式。

问题3 Observer角色是干啥的?

Observer角色其实是为了zk扩展用的,类似于上面提的Follower角色,但是Follower角色会参与Leader选举,而Observer角色是不参与的。

因为不参与选举,只需要接受同步数据即可,所以还可以实行跨域部署。


差不多写到这里,后面有时间再补充一些~

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值