Curator
- 原生的 Java API 开发存在的问题
- 会话连接是异步的,需要自己去处理。比如使用 CountDownLatch
- Watch 需要重复注册,不然就不能生效
- 开发的复杂性还是比较高的
- 不支持多节点删除和创建。需要自己去递归
- Curator 是一个专门解决分布式锁的框架,解决了原生 JavaAPI 开发分布式遇到的问题。
- 详情请查看官方文档:https://curator.apache.org/index.html
Curator基本操作及应用场景
客户端基本操作
会话创建
public static CuratorFramework newClient(String connectString, RetryPolicy retryPolicy)
{
return newClient(connectString, DEFAULT_SESSION_TIMEOUT_MS, DEFAULT_CONNECTION_TIMEOUT_MS, retryPolicy);
}
public static CuratorFramework newClient(String connectString, int sessionTimeoutMs, int connectionTimeoutMs, RetryPolicy retryPolicy)
{
return builder().
connectString(connectString).
sessionTimeoutMs(sessionTimeoutMs).
connectionTimeoutMs(connectionTimeoutMs).
retryPolicy(retryPolicy).
build();
}
参数说明如下
参数名 | 说明 | |
---|---|---|
sessionTimeoutMs | 会话超时时间,DEFAULT_SESSION_TIMEOUT_MS=60*1000毫秒 | |
connectionTimeoutMs | 连接超时时间,DEFAULT_CONNECTION_TIMEOUT_MS=15*1000毫秒 | |
connectString | zk服务器地址,多个逗号分隔,如host1:port1,host2:port2 | |
retryPolicy | 重试策略 |
创建客户端对象
public static void main(String[] args) {
String zkServerAddress = "127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184";
ExponentialBackoffRetry retryPolicy = new ExponentialBackoffRetry(1000,3,5000);
CuratorFramework zkClient = CuratorFrameworkFactory.builder()
.connectString(zkServerAddress)
.sessionTimeoutMs(5000)
.connectionTimeoutMs(5000)
.retryPolicy(retryPolicy)
.build();
zkClient.getConnectionStateListenable().addListener( (client, newState) -> {
if(newSate == ConnectionState.CONNECTED){
log.info("连接成功")
}
})
//很重要 一定要调用start来创建session链接
zkClient.start();
try {
Thread.sleep(Integer.MAX_VALUE);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
新增结果-create()
forPath
public static void main(String[] args) {
CuratorFramework zkClient = getZkClient();
try {
Stat stat = zkClient.checkExists.forPath("/createNode");
if(stat == null){
//zkClient.create().forPath("/createNode");
zkClient.create().forPath("/createNodeWithData","chenyin".getBytes());
}else{
log.info("节点已经存在")
}
} catch (Exception e) {
e.printStackTrace();
}
}
creatingParentsIfNeeded(创建一个二级目录节点)
public static void main(String[] args) {
CuratorFramework zkClient = getZkClient();
try {
//在创建节点时,若父节点不存在,会递归创建节点的父节点,特别需要注意的是,该方法递归创建的父节点一定为持久节点
zkClient.create().creatingParentsIfNeeded().forPath("/createNode1/createNode2");
} catch (Exception e) {
e.printStackTrace();
}
}
/**
* 创建指定类型节点,默认节点类型为持久节点
*/
public T withMode(CreateMode mode);
CreateMode枚举如下
PERSISTENT(永久节点)
PERSISTENT_SEQUENTIAL(永久有序节点)
EPHEMERAL(临时节点)
EPHEMERAL_SEQUENTIAL(临时有序节点)
//客户端发送命令到服务端创建一个节点,服务端接收到命令但是发送给客户端的过程中,服务器宕机;客户端认为当前请求失败(服务端重启,**在Seession允许的时间内**),但是服务端已经创建成功.curator自动重试后,节点如果是带序号的会被创建两次.
//protection模式,防止由于异常原因,导致僵尸节点
zkClient.create()
//创建结点的时候,生成一个随机数,重试的话根据随机数判断有没有创建成功,如果创建成功不在创建
.withProtection()
.withMode(CreateMode,EPHEMERAL_SEQUENTIAL)
.forPath("/curator-node","some-data".getBytes);
创建的节点带着UUID,这样的节点不会创建两次,如果因为网络原因,会判断这个UUID是否已经存在,如果已经存在,则不会在创建
查询节点-getData()
- 获取节点数据代码如下:forPath返回byte[]数组,需要转换成自己需要的数据类型
public static void main(String[] args) {
CuratorFramework zkClient = getZkClient();
try {
byte[] bytes = zkClient.getData().forPath("/createNodeWithData");
System.out.println(new String(bytes, StandardCharsets.UTF_8));
} catch (Exception e) {
e.printStackTrace();
}
//获得当前节点下的子节点
List<String> strings = zkClient.getChildren().forpath("/createNodeWithData");
}
- 获取节点属性信息
public static void main(String[] args) {
CuratorFramework zkClient = getZkClient();
try {
Stat stat = new Stat();
byte[] bytes = zkClient.getData().storingStatIn(stat).forPath("/createNodeWithData");
System.out.println(new String(bytes, StandardCharsets.UTF_8));
System.out.println(JSON.toJSONString(stat));
} catch (Exception e) {
e.printStackTrace();
}
}
//自定义线程池来异步运行
ExecutorService executorService = Executors.newSingleThreadExecutor()
zkClient.getData().inBackgroud((client,event) -> {
log.info("xxxxxx")
},executorService).forPath("/zk-node")
更新节点-getData()
- 使用默认数据更新节点信息
zkClient.setData().forPath(path)
- 使用指定数据更新节点信息
zkClient.setData().forPath(path,byte[] data)
- cas更新节点信息,其中version是节点属性中的dataVersion信息,每次更新version字段加一,即等于数据库中update node set data = newData where version = oldVersion,乐观锁更新操作
zkClient.setData().withVersion(version).forPath(path,byte[] data)
- 例子如下
- 创建一个名为updateNode的节点,并获取其版本号
- 两次使用第一次创建节点时的属性信息中的version来进行更新,查看更新结果
public static void main(String[] args) {
CuratorFramework zkClient = getZkClient();
try {
String path = "/updateNode";
String data = "data";
zkClient.create().forPath(path, data.getBytes());
Stat stat = new Stat();
zkClient.getData().storingStatIn(stat).forPath(path);
System.out.println("第一次创建节点,节点版本号:"+stat.getVersion());
//第一次更新 此时version为0
zkClient.setData().withVersion(stat.getVersion()).forPath(path);
//第二次更新 此时version为1 再用0来更新 报错
zkClient.setData().withVersion(stat.getVersion()).forPath(path);
} catch (Exception e) {
e.printStackTrace();
}
}
//执行到第二次更新时,报错信息如下
org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /updateNode
删除节点-delete()
- 删除叶子节点信息
zkClient.delete().forPath(path)
- 递归删除节点及其子节点
zkClient.delete().deletingChildrenIfNeeded().forPath(path)
- 强制删除节点(加入失败重试机制),zk客户端进行删除操作时,有可能操作失败,使用guaranteed()方法后,zk客户端会记录下失败的删除请求,会在会话有效期内不断重试,直到删除成功
zkClient.delete().guaranteed().forPath(path)
Curator应用场景(一)分布式计数器
curator-recipes功能简介
curator-recipes包中包含了对zookeeper场景应用场景的封装,好的项目源码让人从包名就能看出其功能,下面先看下recipes的包结构
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RFo2dI2e-1657506932284)(C:\Users\Ausware01\AppData\Roaming\Typora\typora-user-images\image-20220606103606496.png)]
简单介绍下不同包及其对应功能
包名 | 功能简介 |
---|---|
atomic | 分布式计数器(DistributedAtomicLong),能在分布式环境下实现原子自增 |
barriers | 分布式屏障(DistributedBarrier),使用屏障来阻塞分布式环境中进程的运行,直到满足特定的条件 |
cache | 监听机制,分为NodeCache(监听节点数据变化),PathChildrenCache(监听节点的子节点数据变化),TreeCache(既能监听自身节点数据变化也能监听子节点数据变化) |
leader | leader选举 |
locks | 分布式锁 |
nodes | 提供持久化节点(PersistentNode)服务,即使客户端与zk服务的连接或者会话断开 |
queue | 分布式队列(包括优先级队列DistributedPriorityQueue,延迟队列DistributedDelayQueue等) |
shared | 分布式计数器SharedCount |
分布式线程安全原子自增
下面介绍Curator基于Zookeeper实现的分布式计数器
Curator recipes包下实现了DistributedAtomicInteger,DistributedAtomicLong等分布式原子自增计数器
基本用法
public class Zookeeper {
static CountDownLatch countDownLatch = new CountDownLatch(10);
public static void main(String[] args) throws Exception {
CuratorFramework zkClient = getZkClient();
//指定计数器存放路径 及重试策略
DistributedAtomicInteger distributedAtomicInteger = new DistributedAtomicInteger(zkClient, "/counter", new ExponentialBackoffRetry(1000, 3));
//多线程自增10*100次
for (int i = 0; i < 10; i++) {
new Thread(() -> {
for (int j = 0; j < 100; j++) {
try {
//调用add方法自增
AtomicValue<Integer> result = distributedAtomicInteger.add(1);
} catch (Exception e) {
e.printStackTrace();
}
}
countDownLatch.countDown();
}).start();
}
try {
countDownLatch.await();
} catch (InterruptedException e) {
e.printStackTrace();
}
//查看结果
System.out.println("多线程自增结果" + distributedAtomicInteger.get().postValue());
}
private static CuratorFramework getZkClient() {
String zkServerAddress = "127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184";
ExponentialBackoffRetry retryPolicy = new ExponentialBackoffRetry(1000, 3, 5000);
CuratorFramework zkClient = CuratorFrameworkFactory.builder()
.connectString(zkServerAddress)
.sessionTimeoutMs(5000)
.connectionTimeoutMs(5000)
.retryPolicy(retryPolicy)
.build();
zkClient.start();
return zkClient;
}
}
-
按道理说,自增完后,结果是不是应该是10*100?,道友们运行几次后会发现实际值不全是1000。为什么呢?这玩意是假的?
我们看下AtomicValue result = distributedAtomicInteger.add(1)这行代码中,add()方法源码
/**
* Add delta to the current value and return the new value information. Remember to always
* check {@link AtomicValue#succeeded()}.
*
* @param delta amount to add
* @return value info
* @throws Exception ZooKeeper errors
*/
@Override
public AtomicValue<Integer> add(Integer delta) throws Exception
{
return worker(delta);
}
其中写道 Remember to always check {@link AtomicValue#succeeded()}. 也就是说,这个方法的自增是不一定会成功的,在前面初始化分布式机器数对象时,传入了重试策略,如果分布式环境中出现了并发自增的情况,会不断重试,如果重试后还失败,则结果返回失败
原理分析
自增add方法中调用了woker(delta)方法,下面看下该方法源码
private AtomicValue<Integer> worker(final Integer addAmount) throws Exception
{
Preconditions.checkNotNull(addAmount, "addAmount cannot be null");
MakeValue makeValue = new MakeValue()
{
@Override
public byte[] makeFrom(byte[] previous)
{
int previousValue = (previous != null) ? bytesToValue(previous) : 0;
int newValue = previousValue + addAmount;
return valueToBytes(newValue);
}
};
AtomicValue<byte[]> result = value.trySet(makeValue);
return new AtomicInteger(result);
}
再跟踪到trySet中
AtomicValue<byte[]> trySet(MakeValue makeValue) throws Exception
{
MutableAtomicValue<byte[]> result = new MutableAtomicValue<byte[]>(null, null, false);
//先尝试乐观锁更新
tryOptimistic(result, makeValue);
//如果乐观锁更新失败,则加分布式锁,再进行更新
if ( !result.succeeded() && (mutex != null) )
{
tryWithMutex(result, makeValue);
}
return result;
}
乐观锁更新
可以从方法名就可以看出实现思路,先来看下tryOptimistic(result, makeValue)方法,其中makeValue对象包含了根据当前值计算新值的计算逻辑
private void tryOptimistic(MutableAtomicValue<byte[]> result, MakeValue makeValue) throws Exception
{
long startMs = System.currentTimeMillis();
//重试次数
int retryCount = 0;
//是否成功
boolean done = false;
while ( !done )
{
//增加统计次数
result.stats.incrementOptimisticTries();
//如果更新成功,直接返回
if ( tryOnce(result, makeValue) )
{
result.succeeded = true;
done = true;
}
else
{
//如果更新不成功,看是否还允许重试(和定义的重试策略有关)
//如果不允许重试,直接返回
if ( !retryPolicy.allowRetry(retryCount++, System.currentTimeMillis() - startMs, RetryLoop.getDefaultRetrySleeper()) )
{
done = true;
}
}
}
result.stats.setOptimisticTimeMs(System.currentTimeMillis() - startMs);
}
更新方法在tryOnce代码块中,源码如下
private boolean tryOnce(MutableAtomicValue<byte[]> result, MakeValue makeValue) throws Exception
{
Stat stat = new Stat();
//是否需要创建节点,如果节点存在,返回false,标识不需要创建,并且将节点stat信息存入stat中,如果节点不存在,返回true
boolean createIt = getCurrentValue(result, stat);
boolean success = false;
try
{
//计算期望新值
byte[] newValue = makeValue.makeFrom(result.preValue);
if ( createIt )
{
//节点不存在,则新建节点
client.create().creatingParentContainersIfNeeded().forPath(path, newValue);
}
else
{
//节点已经存在的,根据节点stat中的dataVersion来进行乐观锁更新
client.setData().withVersion(stat.getVersion()).forPath(path, newValue);
}
result.postValue = Arrays.copyOf(newValue, newValue.length);
success = true;
}
catch ( KeeperException.NodeExistsException e )
{
// do Retry
}
...
return success;
}
分布式锁更新
如果尝试乐观锁形式更新失败后,会尝试进行加分布式锁对value进行更新,tryWithMutex(result, makeValue)代码如下
//分布式锁
private final InterProcessMutex mutex;
private void tryWithMutex(MutableAtomicValue<byte[]> result, MakeValue makeValue) throws Exception
{
long startMs = System.currentTimeMillis();
int retryCount = 0;
//抢锁成功
if ( mutex.acquire(promotedToLock.getMaxLockTime(), promotedToLock.getMaxLockTimeUnit()) )
{
try
{
boolean done = false;
while ( !done )
{
//增加统计次数
result.stats.incrementPromotedTries();
//尝试乐观锁更新
if ( tryOnce(result, makeValue) )
{
result.succeeded = true;
done = true;
}
else
{
if ( !promotedToLock.getRetryPolicy().allowRetry(retryCount++, System.currentTimeMillis() - startMs, RetryLoop.getDefaultRetrySleeper()) )
{
done = true;
}
}
}
}
finally
{
mutex.release();
}
}
result.stats.setPromotedTimeMs(System.currentTimeMillis() - startMs);
}
可以看到加锁形式,与乐观锁形式的区别就是先使用分布式锁锁住,其余内容一致。
下面理一理整体流程,流程不是很复杂,就不画流程图了
- 先尝试乐观锁形式更新
- 如果计数器zkPath对应节点不存在,新建节点并塞入最新值
- 如果计数器zkPath对应节点存在,则利用stata的dataversion进行乐观锁更新
- 若更新成功,返回
- 若更新失败,判断重试次数是否在重试策略允许范围内,若允许重试,重复2、3两步
- 如果1-5步乐观锁形式更新失败,尝试加锁,再在同步块内进行乐观锁更新
- 若加锁失败,返回失败
Curator应用场景(二)Watch监听机制
原生监听-usingWatcher
先看下org.apache.curator.framework.CuratorFramework#getData方法的返回值
GetDataBuilder的构造
public interface GetDataBuilder extends
Watchable<BackgroundPathable<byte[]>>,//事件监听
BackgroundPathable<byte[]>,//异步操作
Statable<WatchPathable<byte[]>>,//节点属性存储相关
Decompressible<GetDataWatchBackgroundStatable>//数据压缩
{
}
其中org.apache.curator.framework.api.Watchable接口中方法如下
public interface Watchable<T>
{
public T watched();
//设置监听器
public T usingWatcher(Watcher watcher);
public T usingWatcher(CuratorWatcher watcher);
}
我们可以使用usingWatcher()方法来新增一个节点的监听器
public class Watcher {
public static void main(String[] args) throws Exception {
CuratorFramework zkClient = getZkClient();
String path = "/watchNode";
byte[] initData = "initData".getBytes();
//先创建一个用于事件监听的测试节点
zkClient.create().forPath(path, initData);
//设置监听器
zkClient.getData().usingWatcher(new org.apache.zookeeper.Watcher() {
@Override
public void process(WatchedEvent watchedEvent) {
System.out.println("监听到节点事件:" + JSON.toJSONString(watchedEvent));
}
}).forPath(path);
//第一次更新
zkClient.setData().forPath(path, "1".getBytes());
//第二次更新
zkClient.setData().forPath(path, "2".getBytes());
//Sleep等待监听事件触发
Thread.sleep(Integer.MAX_VALUE);
}
private static CuratorFramework getZkClient() {
String zkServerAddress = "127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184";
ExponentialBackoffRetry retryPolicy = new ExponentialBackoffRetry(1000, 3, 5000);
CuratorFramework zkClient = CuratorFrameworkFactory.builder()
.connectString(zkServerAddress)
.sessionTimeoutMs(5000)
.connectionTimeoutMs(5000)
.retryPolicy(retryPolicy)
.build();
zkClient.start();
return zkClient;
}
}
可以看到,代码中进行了两次修改,监听事件却只触发了一次,类型为NodeDataChanged,这也是原生监听事件的不足,即原生Watch事件只能触发一次
Curator-Cache
为了免去开发人员重复注册Watcher的麻烦,org.apache.curator.framework.recipes.cache下提供了对监听监听事件的高级封装,主要类有以下三个
类名 | 用途 |
---|---|
NodeCache | 监听节点对应的增,删,改 |
PathChildrenCache | 监听节点下一级子节点对应的增,删,改 |
TreeCache | 可以将指定的路径节点作为根节点,对其所有的子节点操作进行监听,呈现树形目录的监听 |
NodeCache
- 用法
public class Watcher {
public static void main(String[] args) throws Exception {
CuratorFramework zkClient = getZkClient();
String path = "/nodeCache";
byte[] initData = "initData".getBytes();
//创建节点用于测试
zkClient.create().forPath(path, initData);
//new NodeCache()需要传入的是zkClient和path(要监控的路径)
NodeCache nodeCache = new NodeCache(zkClient, path);
//调用start方法开始监听
nodeCache.start();
//添加NodeCacheListener监听器
nodeCache.getListenable().addListener(new NodeCacheListener() {
@Override
public void nodeChanged() throws Exception {
System.out.println("监听到事件变化,当前数据:"+new String(nodeCache.getCurrentData().getData()));
}
});
//第一次更新
zkClient.setData().forPath(path, "first update".getBytes());
Thread.sleep(1000);
//第二次更新
zkClient.setData().forPath(path, "second update".getBytes());
Thread.sleep(1000);
//第三次更新
zkClient.setData().forPath(path, "third update".getBytes());
Thread.sleep(Integer.MAX_VALUE);
}
private static CuratorFramework getZkClient() {
String zkServerAddress = "127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184";
ExponentialBackoffRetry retryPolicy = new ExponentialBackoffRetry(1000, 3, 5000);
CuratorFramework zkClient = CuratorFrameworkFactory.builder()
.connectString(zkServerAddress)
.sessionTimeoutMs(5000)
.connectionTimeoutMs(5000)
.retryPolicy(retryPolicy)
.build();
zkClient.start();
return zkClient;
}
}
命令行输出如下,可以看到,事件监听机制没有失效,可以重复触发
监听到事件变化,当前数据:first update
监听到事件变化,当前数据:second update
监听到事件变化,当前数据:third update
下面再来分析下NodeCache实现原理,先画出重点,核心类如下
org.apache.curator.framework.recipes.cache.NodeCache
//nodeCache对应监听器
org.apache.curator.framework.recipes.cache.NodeCacheListener
//nodeCache内部节点信息存储对象
org.apache.curator.framework.recipes.cache.ChildData
- 先从NodeCache入手,看下内部有哪些属性
public class NodeCache implements Closeable
{
//zk客户端
private final CuratorFramework client;
//监听路径
private final String path;
//是否开启数据压缩
private final boolean dataIsCompressed;
//本地节点信息(监听的zk节点的备份) 使用AtomicReference来确保更新的原子性
private final AtomicReference<ChildData> data = new AtomicReference<ChildData>(null);
//Nodecache状态 使用AtomicReference来确保更新的原子性
private final AtomicReference<State> state = new AtomicReference<State>(State.LATENT);
//监听器
private final ListenerContainer<NodeCacheListener> listeners = new ListenerContainer<NodeCacheListener>();
//是否连接
private final AtomicBoolean isConnected = new AtomicBoolean(true);
//zk客户端监听器 监听连接事件
private ConnectionStateListener connectionStateListener = new ConnectionStateListener()
{
@Override
public void stateChanged(CuratorFramework client, ConnectionState newState)
{
if ( (newState == ConnectionState.CONNECTED) || (newState == ConnectionState.RECONNECTED) )
{
if ( isConnected.compareAndSet(false, true) )
{
try
{
//监听到连接事件后,重新为节点设置监听事件(usingWatcher)
reset();
}
catch ( Exception e )
{
ThreadUtils.checkInterrupted(e);
log.error("Trying to reset after reconnection", e);
}
}
}
else
{
isConnected.set(false);
}
}
};
//usingWatcher 时使用的watcher
private Watcher watcher = new Watcher()
{
@Override
public void process(WatchedEvent event)
{
try
{
reset();
}
catch(Exception e)
{
ThreadUtils.checkInterrupted(e);
handleException(e);
}
}
};
//nodeCache状态
private enum State
{
//初始状态
LATENT,
//调用start()方法变为启动
STARTED,
//调用close()方法变为关闭
CLOSED
}
//异步后台处理事件,触发后重新设置监听事件
private final BackgroundCallback backgroundCallback = new BackgroundCallback()
{
@Override
public void processResult(CuratorFramework client, CuratorEvent event) throws Exception
{
processBackgroundResult(event);
}
};
}
可以看到几个关键属性如下
AtomicReference<ChildData> data
将zk节点信息在本地做备份存储Watcher watcher
监听器,每次触发时重新设置用到的监听器
- 再从流程入口
start()
方法开始分析其原理
public void start(boolean buildInitial) throws Exception
{
//因为state被声明为AtomicReference类型,利用其原子特性,保证nodeCache实例只能被启动一次
Preconditions.checkState(state.compareAndSet(State.LATENT, State.STARTED), "Cannot be started more than once");
//为zkClient设置链接监听器,监听器监听到连接事件后会触发reset方法
client.getConnectionStateListenable().addListener(connectionStateListener);
//如果设置buildInitial为true,会在启动时在internalRebuild()方法中将zk节点信息缓存到本地
if ( buildInitial )
{
client.checkExists().creatingParentContainersIfNeeded().forPath(path);
internalRebuild();
}
//设置usingWatcher监听事件
reset();
}
- 先看下
internalRebuild()
初始化本地节点信息方法
private void internalRebuild() throws Exception
{
try
{
Stat stat = new Stat();
//如果开启了数据压缩,先解压,否则直接读取
byte[] bytes = dataIsCompressed ? client.getData().decompressed().storingStatIn(stat).forPath(path) : client.getData().storingStatIn(stat).forPath(path);
//将节点封装成本地ChildData并存储
data.set(new ChildData(path, stat, bytes));
}
catch ( KeeperException.NoNodeException e )
{
data.set(null);
}
}
- 好多地方都看到了
reset()
,看下源码
private void reset() throws Exception
{
//如果nodeCache状态为启动,并且zk连接状态为true
if ( (state.get() == State.STARTED) && isConnected.get() )
{
//给节点设置watcher事件,同时利用zk异步特性,在backgroundCallback中处理
client.checkExists().creatingParentContainersIfNeeded().usingWatcher(watcher).inBackground(backgroundCallback).forPath(path);
}
}
- 异步处理回调backgroundCallback中
processBackgroundResult
方法节点变化后的回调事件,使用异步的主要原因是为了防止线程阻塞
代码如下
private void processBackgroundResult(CuratorEvent event) throws Exception
{
switch ( event.getType() )
{
//如果是getData()事件,并且返回成功
case GET_DATA:
{
if ( event.getResultCode() == KeeperException.Code.OK.intValue() )
{
ChildData childData = new ChildData(path, event.getStat(), event.getData());
//缓存事件 并触发NodeCacheListener中nodeChanged事件
setNewData(childData);
}
break;
}
//EXISTS对应checkExists()操作
case EXISTS:
{
//如果节点不存在 设置空
if ( event.getResultCode() == KeeperException.Code.NONODE.intValue() )
{
setNewData(null);
}
//节点存在 且操作成功
else if ( event.getResultCode() == KeeperException.Code.OK.intValue() )
{
if ( dataIsCompressed )
{
//如果开了数据压缩,先解压,再设置监听器和异步回调
client.getData().decompressed().usingWatcher(watcher).inBackground(backgroundCallback).forPath(path);
}
else
{
//设置监听器,并设置异步回调
client.getData().usingWatcher(watcher).inBackground(backgroundCallback).forPath(path);
}
}
break;
}
}
}
- 更新节点数据方法
setNewData
方法如下
private void setNewData(ChildData newData) throws InterruptedException
{
//原子更新本地缓存中节点数据
ChildData previousData = data.getAndSet(newData);
if ( !Objects.equal(previousData, newData) )
{
//循环触发每个监听器的nodeChanged事件
listeners.forEach
(
new Function<NodeCacheListener, Void>()
{
@Override
public Void apply(NodeCacheListener listener)
{
try
{
listener.nodeChanged();
}
catch ( Exception e )
{
ThreadUtils.checkInterrupted(e);
log.error("Calling listener", e);
}
return null;
}
}
);
//可以忽略 测试用
if ( rebuildTestExchanger != null )
{
try
{
rebuildTestExchanger.exchange(new Object());
}
catch ( InterruptedException e )
{
Thread.currentThread().interrupt();
}
}
}
}
- 可以看到NodeCache解决重复注册的关键点就在于processBackgroundResult()方法,其作用类似于一个状态机(递归),能够不断重复设置监听器,设置异步回调来更新NodeCache中节点数据并触发监听器nodeChanged()事件
- 下面整理下整体流程
- 调用start()方法
- 为zk客户端设置连接监听器
- 若设置了初始化节点值参数为true,则读取zk节点信息并缓存到本地
- 调用reset方法并使用usingWatcher()方法给节点设置监听器,同时设置异步回调方法backgroundCallback(),来不断触发本地节点缓存的更新以及重新设置节点监听器
PathChildrenCache
- 用法
public class Watcher {
public static void main(String[] args) throws Exception {
CuratorFramework zkClient = getZkClient();
String path = "/pathChildrenCache";
byte[] initData = "initData".getBytes();
//创建节点用于测试
zkClient.create().forPath(path, initData);
PathChildrenCache pathChildrenCache = new PathChildrenCache(zkClient, path, true);
//调用start方法开始监听 ,设置启动模式为同步加载节点数据
pathChildrenCache.start(PathChildrenCache.StartMode.BUILD_INITIAL_CACHE);
//添加监听器
pathChildrenCache.getListenable().addListener(new PathChildrenCacheListener() {
@Override
public void childEvent(CuratorFramework client, PathChildrenCacheEvent event) throws Exception {
System.out.println("节点数据变化,类型:" + event.getType() + ",路径:" + event.getData().getPath());
}
});
String childNodePath = path + "/child";
//创建子节点
zkClient.create().forPath(childNodePath, "111".getBytes());
Thread.sleep(1000);
//更新子节点
zkClient.setData().forPath(childNodePath, "222".getBytes());
Thread.sleep(1000);
//删除子节点
zkClient.delete().forPath(childNodePath);
Thread.sleep(Integer.MAX_VALUE);
}
private static CuratorFramework getZkClient() {
String zkServerAddress = "127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184";
ExponentialBackoffRetry retryPolicy = new ExponentialBackoffRetry(1000, 3, 5000);
CuratorFramework zkClient = CuratorFrameworkFactory.builder()
.connectString(zkServerAddress)
.sessionTimeoutMs(5000)
.connectionTimeoutMs(5000)
.retryPolicy(retryPolicy)
.build();
zkClient.start();
return zkClient;
}
}
- 输出如下,可以看到监听到了子节点的增(CHILD_ADD)、删(CHILD_REMOVED)、改(CHILD_UPDATED)事件
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-BlWuhe8i-1657506932295)(C:\Users\Ausware01\AppData\Roaming\Typora\typora-user-images\image-20220525170325336.png)]
- 使用PathChildrenCache需注意以下两点
- 无法对监听路径所在节点进行监听(即不能监听path对应节点的变化)
- 只能监听path对应节点下一级目录的子节点的变化内容(即只能监听/path/node1的变化,而不能监听/path/node1/node2 的变化)
- PathChildrenCache在调用start()方法时,有3种启动模式,分别为
- NORMAL-初始化缓存数据为空
- BUILD_INITIAL_CACHE-在start方法返回前,初始化获取每个子节点数据并缓存
- POST_INITIALIZED_EVENT-在后台异步初始化数据完成后,会发送一个INITIALIZED初始化完成事件
TreeCache
public class Watcher {
public static void main(String[] args) throws Exception {
CuratorFramework zkClient = getZkClient();
String path = "/treeCache";
byte[] initData = "initData".getBytes();
//创建节点用于测试
zkClient.create().forPath(path, initData);
TreeCache treeCache = new TreeCache(zkClient, path);
//调用start方法开始监听
treeCache.start();
//添加TreeCacheListener监听器
treeCache.getListenable().addListener(new TreeCacheListener() {
@Override
public void childEvent(CuratorFramework client, TreeCacheEvent event) throws Exception {
System.out.println("监听到节点数据变化,类型:"+event.getType()+",路径:"+event.getData().getPath());
}
});
Thread.sleep(1000);
//更新父节点数据
zkClient.setData().forPath(path, "222".getBytes());
Thread.sleep(1000);
String childNodePath = path + "/child";
//创建子节点
zkClient.create().forPath(childNodePath, "111".getBytes());
Thread.sleep(1000);
//更新子节点
zkClient.setData().forPath(childNodePath, "222".getBytes());
Thread.sleep(1000);
//删除子节点
zkClient.delete().forPath(childNodePath);
Thread.sleep(Integer.MAX_VALUE);
}
private static CuratorFramework getZkClient() {
String zkServerAddress = "127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184";
ExponentialBackoffRetry retryPolicy = new ExponentialBackoffRetry(1000, 3, 5000);
CuratorFramework zkClient = CuratorFrameworkFactory.builder()
.connectString(zkServerAddress)
.sessionTimeoutMs(5000)
.connectionTimeoutMs(5000)
.retryPolicy(retryPolicy)
.build();
zkClient.start();
return zkClient;
}
}
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-lgU0rKii-1657506932297)(C:\Users\Ausware01\AppData\Roaming\Typora\typora-user-images\image-20220606095804143.png)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-jsIMBt3h-1657506932301)(C:\Users\Ausware01\AppData\Roaming\Typora\typora-user-images\image-20220606095726503.png)]
图一里可以看到 监听到了父节点的数据变化,但是有个空指针异常,这是因为TreeNode不像NodeCache和PathChidrenCache在start()时能传入参数来决定是否初始化本地缓存中的节点数据,TreeNode在start()后没有初始化好本地节点缓存,这个时候在监听器内调用event.getData()方法时实际取的是本地缓存中的节点数据,所以会报错。所以正确的写法应该在监听器的处理方法中根据event事件类型来做不同处理,不能无脑调用getData()方法。
图二里可以看到子节点信息变化也监听到了
Curator应用场景(三)LeaderLatch,LeaderSelector使用及原理分析
LeaderLatch
基本原理
选择一个根路径,例如"/leader_select",多个机器同时向该根路径下创建临时顺序节点,如"/leader_latch/node_3",“/leader_latch/node_1”,“/leader_latch/node_2”,节点编号最小(这里为node_1)的zk客户端成为leader,没抢到Leader的节点都监听前一个节点的删除事件,在前一个节点删除后进行重新抢主
关键API与方法
LeaderLatch
org.apache.curator.framework.recipes.leader.LeaderLatch
- 关键方法如下
//调用start方法开始抢主
void start()
//调用close方法释放leader权限
void close()
//await方法阻塞线程,尝试获取leader权限,但不一定成功,超时失败
boolean await(long, java.util.concurrent.TimeUnit)
//判断是否拥有leader权限
boolean hasLeadership()
LeaderSelector
org.apache.curator.framework.recipes.leader.LeaderLatchListener
- 关键方法如下
//抢主成功时触发
void isLeader()
//抢主失败时触发
void notLeader()
用法
先看下LeaderLatch构造方法
public LeaderLatch(CuratorFramework client, String latchPath)
{
this(client, latchPath, "", CloseMode.SILENT);
}
public LeaderLatch(CuratorFramework client, String latchPath, String id)
{
this(client, latchPath, id, CloseMode.SILENT);
}
public LeaderLatch(CuratorFramework client, String latchPath, String id, CloseMode closeMode)
{
this.client = Preconditions.checkNotNull(client, "client cannot be null");
this.latchPath = PathUtils.validatePath(latchPath);
this.id = Preconditions.checkNotNull(id, "id cannot be null");
this.closeMode = Preconditions.checkNotNull(closeMode, "closeMode cannot be null");
}
参数说明如下
参数 | 说明 |
---|---|
client | zk客户端实例 |
leaderPath | Leader选举根节点路径 |
id | 客户端id,用来标记客户端,即客户端编号、名称 |
CloseMode | Latch关闭策略,SILENT-关闭时不触发监听器回调,NOTIFY_LEADER-关闭时触发监听器回调方法,默认不触发 |
- 如果想添加监听器,可以调用addListener()方法
leaderLatch.addListener(new LeaderLatchListener() {
@Override
public void isLeader() {
}
@Override
public void notLeader() {
}
});
- 附上我的一段测试代码,模拟了10个客户端抢主的情况,客户端成为leader后手动调用close()释放leader权限并退出leader争夺
public class LeaderLatchTest {
static int CLINET_COUNT = 10;
static String LOCK_PATH = "/leader_latch";
public static void main(String[] args) throws Exception {
List<CuratorFramework> clientsList = Lists.newArrayListWithCapacity(CLINET_COUNT);
List<LeaderLatch> leaderLatchList = Lists.newArrayListWithCapacity(CLINET_COUNT);
//创建10个zk客户端模拟leader选举
for (int i = 0; i < CLINET_COUNT; i++) {
CuratorFramework client = getZkClient();
clientsList.add(client);
LeaderLatch leaderLatch = new LeaderLatch(client, LOCK_PATH, "CLIENT_" + i);
leaderLatchList.add(leaderLatch);
//必须调用start()方法来进行抢主
leaderLatch.start();
}
//判断当前leader是哪个客户端
checkLeader(leaderLatchList);
}
private static void checkLeader(List<LeaderLatch> leaderLatchList) throws Exception {
//Leader选举需要时间 等待10秒
Thread.sleep(10000);
for (int i = 0; i < leaderLatchList.size(); i++) {
LeaderLatch leaderLatch = leaderLatchList.get(i);
//通过hasLeadership()方法判断当前节点是否是leader
if (leaderLatch.hasLeadership()) {
System.out.println("当前leader:"+leaderLatch.getId());
//释放leader权限 重新进行抢主
leaderLatch.close();
checkLeader(leaderLatchList);
}
}
}
private static CuratorFramework getZkClient() {
String zkServerAddress = "127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184";
ExponentialBackoffRetry retryPolicy = new ExponentialBackoffRetry(1000, 3, 5000);
CuratorFramework zkClient = CuratorFrameworkFactory.builder()
.connectString(zkServerAddress)
.sessionTimeoutMs(5000)
.connectionTimeoutMs(5000)
.retryPolicy(retryPolicy)
.build();
zkClient.start();
return zkClient;
}
}
需要手动调用close()方法来释放leader权限
命令行中会依次输出10个节点成为Leader的信息,如果我们去zk服务器上看指定的路径(latchPath)下的内容,信息如下,每个节点后面都跟了个顺序编号,这就是每个节点抢主时在latchPath路径下产生的临时节点,格式都为 xxxxxx-latch-n,n为临时顺序节点编号
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-dW0xHdoR-1657506932304)(C:\Users\Ausware01\AppData\Roaming\Typora\typora-user-images\image-20220606101110631.png)]
源码分析
- 还是从start()方法入手
public void start() throws Exception
{
//通过AtomicReference原子操作 判断是否已经启动过
Preconditions.checkState(state.compareAndSet(State.LATENT, State.STARTED), "Cannot be started more than once"); startTask.set(AfterConnectionEstablished.execute(client, new Runnable()
{
@Override
public void run()
{
try
{
//在与zk服务器建立连接后 调用internalStart()方法初始化
internalStart();
}
finally
{
startTask.set(null);
}
}
}));
}
//internalStart()如下,注意到加了synchronized关键字
private synchronized void internalStart()
{
if ( state.get() == State.STARTED )
{
//为zk添加连接监听器,连接器监听到重连时间后也会调用reset()方法
client.getConnectionStateListenable().addListener(listener);
try
{
//初始化事件
reset();
}
catch ( Exception e )
{
ThreadUtils.checkInterrupted(e);
log.error("An error occurred checking resetting leadership.", e);
}
}
}
//reset方法如下
void reset() throws Exception
{
//设置当前没成为到leader
setLeadership(false);
setNode(null);
BackgroundCallback callback = new BackgroundCallback()
{
@Override
public void processResult(CuratorFramework client, CuratorEvent event) throws Exception
{
if ( debugResetWaitLatch != null )
{
debugResetWaitLatch.await();
debugResetWaitLatch = null;
}
if ( event.getResultCode() == KeeperException.Code.OK.intValue() )
{
setNode(event.getName());
if ( state.get() == State.CLOSED )
{
setNode(null);
}
else
{
//为latchPath下每个children设置监听事件
getChildren();
}
}
else
{
log.error("getChildren() failed. rc = " + event.getResultCode());
}
}
};
//在latchPath下创建临时有序节点,节点内容为serverId,并设置异步回调 client.create().creatingParentContainersIfNeeded().withProtection().withMode(CreateMode.EPHEMERAL_SEQUENTIAL).inBackground(callback).forPath(ZKPaths.makePath(latchPath, LOCK_NAME), LeaderSelector.getIdBytes(id));
}
创建完临时有序节点后,会触发到回调BackgroundCallback里的getChildren()方法,代码如下
private void getChildren() throws Exception
{
BackgroundCallback callback = new BackgroundCallback()
{
@Override
public void processResult(CuratorFramework client, CuratorEvent event) throws Exception
{
if ( event.getResultCode() == KeeperException.Code.OK.intValue() )
{
checkLeadership(event.getChildren());
}
}
};
//获取latchPath下子节点信息,获取成功后触发异步回调callback
client.getChildren().inBackground(callback).forPath(ZKPaths.makePath(latchPath, null));
}
最终在获取到latchPath下子节点信息后,进入checkLeadership()方法,该方法是核心,大家睁大眼睛了
private void checkLeadership(List<String> children) throws Exception
{
final String localOurPath = ourPath.get();
//按节点编号排序
List<String> sortedChildren = LockInternals.getSortedChildren(LOCK_NAME, sorter, children);
int ourIndex = (localOurPath != null) ? sortedChildren.indexOf(ZKPaths.getNodeFromPath(localOurPath)) : -1;
if ( ourIndex < 0 )
{
log.error("Can't find our node. Resetting. Index: " + ourIndex);
reset();
}
else if ( ourIndex == 0 )
{
//如果当前节点编号最小 即抢主成功 设当前节点为leader
setLeadership(true);
}
else
{
//抢主失败 监听前面一个(节点编号更小的)节点
String watchPath = sortedChildren.get(ourIndex - 1);
Watcher watcher = new Watcher()
{
@Override
public void process(WatchedEvent event)
{
//监听前一个节点的删除事件,重新进入getChildren方法判断是否抢主成功
if ( (state.get() == State.STARTED) && (event.getType() == Event.EventType.NodeDeleted) && (localOurPath != null) )
{
try
{
getChildren();
}
catch ( Exception ex )
{
ThreadUtils.checkInterrupted(ex);
log.error("An error occurred checking the leadership.", ex);
}
}
}
};
BackgroundCallback callback = new BackgroundCallback()
{
@Override
public void processResult(CuratorFramework client, CuratorEvent event) throws Exception
{
if ( event.getResultCode() == KeeperException.Code.NONODE.intValue() )
{
// previous node is gone - reset
reset();
}
}
};
//设置对前一个节点删除时间的监听器,并在异步回调里重新进行抢主
client.getData().usingWatcher(watcher).inBackground(callback).forPath(ZKPaths.makePath(latchPath, watchPath));
}
}
核心流程
- zk客户端往同一路径下创建临时节点,创建后回调callBack
- 在回调事件中判断自身节点是否是节点编号最小的一个
- 如果是,则抢主成功,如果不是,设置对前一个节点(编号更小的)的删除事件的监听器,删除事件触发后重新进行抢主
LeaderSelector
基本原理
- 利用Curator中InterProcessMutex分布式锁进行抢主,抢到锁的即为Leader
关键API 与方法
- LeaderSelector
- 关键方法如下
//开始抢主
void start()
//在抢到leader权限并释放后,自动加入抢主队列,重新抢主
void autoRequeue()
- LeaderSelectorListener
- LeaderSelectorListener是LeaderSelector客户端节点成为Leader后回调的一个监听器,在takeLeadership()回调方法中编写获得Leader权利后的业务处理逻辑
//抢主成功后的回调
void takeLeadership()
- LeaderSelectorListenerAdapter
- LeaderSelectorListenerAdapter是实现了LeaderSelectorListener接口的一个抽象类,封装了客户端与zk服务器连接挂起或者断开时的处理逻辑(抛出抢主失败CancelLeadershipException),一般监听器推荐实现该类
用法
先来看下LeaderSelector的构造方法
public LeaderSelector(CuratorFramework client, String leaderPath, LeaderSelectorListener listener)
{
this(client, leaderPath, new CloseableExecutorService(Executors.newSingleThreadExecutor(defaultThreadFactory), true), listener);
}
public LeaderSelector(CuratorFramework client, String leaderPath, ExecutorService executorService, LeaderSelectorListener listener)
{
this(client, leaderPath, new CloseableExecutorService(executorService), listener);
}
参数说明如下
参数 | 说明 |
---|---|
client | zk客户端实例 |
leaderPath | Leader选举根节点路径 |
executorService | master选举使用的线程池 |
listener | 节点成为Leader后的回调监听器 |
- 贴上改造自官方LeaderSelector Demo的一段测试代码:模拟10个客户端进行leader选举,客户端成为leader后触发takeLeadership()回调,执行完takeLeadership()方法后释放leader权限,同时设置autoRequeue()来保证客户端释放leader权限后能够重新加入leader权限争夺中
public class LeaderSelectorTest {
static int CLINET_COUNT = 10;
static String LOCK_PATH = "/leader_selector";
public static void main(String[] args) throws Exception {
List<CuratorFramework> clientsList = Lists.newArrayListWithCapacity(CLINET_COUNT);
//启动10个zk客户端,每几秒进行一次leader选举
for (int i = 0; i < CLINET_COUNT; i++) {
CuratorFramework client = getZkClient();
clientsList.add(client);
ExampleClient exampleClient = new ExampleClient(client, LOCK_PATH, "client_" + i);
exampleClient.start();
}
//sleep 以观察抢主过程
Thread.sleep(Integer.MAX_VALUE);
}
static class ExampleClient extends LeaderSelectorListenerAdapter implements Closeable {
private final String name;
private final LeaderSelector leaderSelector;
private final AtomicInteger leaderCount = new AtomicInteger();
public ExampleClient(CuratorFramework client, String path, String name) {
this.name = name;
leaderSelector = new LeaderSelector(client, path, this);
// 该方法能让客户端在释放leader权限后 重新加入leader权限的争夺中
leaderSelector.autoRequeue();
}
public void start() throws IOException {
leaderSelector.start();
}
@Override
public void close() throws IOException {
leaderSelector.close();
}
@Override
public void takeLeadership(CuratorFramework client) throws Exception {
// 抢到leader权限后sleep一段时间,并释放leader权限
final int waitSeconds = (int) (5 * Math.random()) + 1;
System.out.println(name + " is now the leader. Waiting " + waitSeconds + " seconds...");
System.out.println(name + " has been leader " + leaderCount.getAndIncrement() + " time(s) before.");
try {
Thread.sleep(TimeUnit.SECONDS.toMillis(waitSeconds));
} catch (InterruptedException e) {
System.err.println(name + " was interrupted.");
Thread.currentThread().interrupt();
} finally {
System.out.println(name + " relinquishing leadership.\n");
}
}
}
private static CuratorFramework getZkClient() {
String zkServerAddress = "127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184";
ExponentialBackoffRetry retryPolicy = new ExponentialBackoffRetry(1000, 3, 5000);
CuratorFramework zkClient = CuratorFrameworkFactory.builder()
.connectString(zkServerAddress)
.sessionTimeoutMs(5000)
.connectionTimeoutMs(5000)
.retryPolicy(retryPolicy)
.build();
zkClient.start();
return zkClient;
}
}
takeLeadership方法执行完后自动释放leader权限,如果需要不断重新抢主,需调用autoRequeue()
源码分析
也是从start()方法入手,直接省去前面几个无关紧要的方法调用start()和requeue(),直接看internalRequeue(),里面是异步处理,故该方法能直接返回
private synchronized boolean internalRequeue()
{
//没有进入抢主并且已经调用过start()方法
if ( !isQueued && (state.get() == State.STARTED) )
{
isQueued = true;
Future<Void> task = executorService.submit(new Callable<Void>()
{
@Override
public Void call() throws Exception
{
try
{
//开始抢主
doWorkLoop();
}
finally
{
clearIsQueued();
//如果设置了释放权限自动抢主 则重新开始抢主
if ( autoRequeue.get() )
{
internalRequeue();
}
}
return null;
}
});
ourTask.set(task);
return true;
}
return false;
}
流程进入到了doWorkLoop()中的doWork()方法,也是核心方法
void doWork() throws Exception
{
hasLeadership = false;
try
{
//利用Curator的分布式锁InterProcessMutex抢锁
mutex.acquire();
hasLeadership = true;
try
{
//测试用 可以忽略
if ( debugLeadershipLatch != null )
{
debugLeadershipLatch.countDown();
}
if ( debugLeadershipWaitLatch != null )
{
debugLeadershipWaitLatch.await();
}
//抢锁成功 触发监听器takeLeadership()方法
listener.takeLeadership(client);
}
catch ( InterruptedException e )
{
Thread.currentThread().interrupt();
throw e;
}
catch ( Throwable e )
{
ThreadUtils.checkInterrupted(e);
}
finally
{
clearIsQueued();
}
}
catch ( InterruptedException e )
{
Thread.currentThread().interrupt();
throw e;
}
finally
{
//触发完监听器方法后 释放leader权限 释放分布式锁
if ( hasLeadership )
{
hasLeadership = false;
try
{
mutex.release();
}
catch ( Exception e )
{
ThreadUtils.checkInterrupted(e);
log.error("The leader threw an exception", e);
// ignore errors - this is just a safety
}
}
}
}
- 可以看到流程也比较简单,主要就是利用了Curator内置的InterProcessMutex分布式锁来实现Leader选举,InterProcessMutex内部抢锁基本原理同LeaderLatch非常相似
- LeaderSelector相对LeaderLatch也更灵活,在执行完takerLeaderShip中的逻辑后会自动释放Leader权限,也能调用autoRequeue自动重新抢主
区别
- leaderlatch需要调用close方法才能释放主导权,其它节点才会继续争夺,并且不能重新获得。leaderselector当执行完takeleadership方法后自动释放主导权,并且可以设置autorequeue重新再获取领导权
- 实现方式不同leaderselector使用分布式锁InterProcessMutex实现
- (1条消息) Curator 主节点选举LeaderLatch和LeaderSelector_蓝墨49的博客-CSDN博客_leaderlatch
Curator应用场景(四)-分布式锁InterProcessMutex使用及原理分析
API说明
- InterProcessMutex有两个构造方法
public InterProcessMutex(CuratorFramework client, String path)
{
this(client, path, new StandardLockInternalsDriver());
}
public InterProcessMutex(CuratorFramework client, String path, LockInternalsDriver driver)
{
this(client, path, LOCK_NAME, 1, driver);
}
参数说明如下
参数 | 说明 |
---|---|
client | curator中zk客户端对象 |
path | 抢锁路径,同一个锁path需一致 |
driver | 可自定义lock驱动实现分布式锁 |
主要方法如下
//获取锁,若失败则阻塞等待直到成功,支持重入
public void acquire() throws Exception
//超时获取锁,超时失败
public boolean acquire(long time, TimeUnit unit) throws Exception
//释放锁
public void release() throws Exception
注意点,调用acquire()方法后需相应调用release()来释放锁
使用简介
下面的例子模拟了100个线程同时抢锁,抢锁成功的线程睡眠1秒钟后释放锁,通知其他等待的线程重新抢锁,比较简单,不多说
public class InterprocessLock {
static CountDownLatch countDownLatch = new CountDownLatch(10);
public static void main(String[] args) {
CuratorFramework zkClient = getZkClient();
String lockPath = "/lock";
InterProcessMutex lock = new InterProcessMutex(zkClient, lockPath);
//模拟100个线程抢锁
for (int i = 0; i < 100; i++) {
new Thread(new TestThread(i, lock)).start();
}
}
static class TestThread implements Runnable {
private Integer threadFlag;
private InterProcessMutex lock;
public TestThread(Integer threadFlag, InterProcessMutex lock) {
this.threadFlag = threadFlag;
this.lock = lock;
}
@Override
public void run() {
try {
lock.acquire();
System.out.println("第"+threadFlag+"线程获取到了锁");
//等到1秒后释放锁
Thread.sleep(1000);
} catch (Exception e) {
e.printStackTrace();
}finally {
try {
lock.release();
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
private static CuratorFramework getZkClient() {
String zkServerAddress = "127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184";
ExponentialBackoffRetry retryPolicy = new ExponentialBackoffRetry(1000, 3, 5000);
CuratorFramework zkClient = CuratorFrameworkFactory.builder()
.connectString(zkServerAddress)
.sessionTimeoutMs(5000)
.connectionTimeoutMs(5000)
.retryPolicy(retryPolicy)
.build();
zkClient.start();
return zkClient;
}
}
源码分析
从获取锁acquire()方法入手
public void acquire() throws Exception
{
if ( !internalLock(-1, null) )
{
throw new IOException("Lost connection while trying to acquire lock: " + basePath);
}
}
看到调用了internalLock方法,进到internalLock方法中
private boolean internalLock(long time, TimeUnit unit) throws Exception
{
/*
Note on concurrency: a given lockData instance
can be only acted on by a single thread so locking isn't necessary
*/
Thread currentThread = Thread.currentThread();
//先判断当前线程是否持有了锁,如果是,则加锁次数count+1,返回成功
LockData lockData = threadData.get(currentThread);
if ( lockData != null )
{
// re-entering
lockData.lockCount.incrementAndGet();
return true;
}
//调用LockInternals的attemptLock()方法进行加锁
String lockPath = internals.attemptLock(time, unit, getLockNodeBytes());
//加锁成功,则将当前线程对应加锁数据加到map中
if ( lockPath != null )
{
LockData newLockData = new LockData(currentThread, lockPath);
threadData.put(currentThread, newLockData);
return true;
}
return false;
}
进到LockInternals的attemptLock()中,看下代码
String attemptLock(long time, TimeUnit unit, byte[] lockNodeBytes) throws Exception
{
//开始时间,后面用做超时判断
final long startMillis = System.currentTimeMillis();
//超时时间,转换为毫秒
final Long millisToWait = (unit != null) ? unit.toMillis(time) : null;
//节点数据
final byte[] localLockNodeBytes = (revocable.get() != null) ? new byte[0] : lockNodeBytes;
//重试次数
int retryCount = 0;
//lockPath
String ourPath = null;
//是否持有锁
boolean hasTheLock = false;
//是否处理完成
boolean isDone = false;
//循环处理
while ( !isDone )
{
isDone = true;
try
{
//在path下创建一个临时有序节点
ourPath = driver.createsTheLock(client, path, localLockNodeBytes);
//抢锁并判断是否拥有锁
hasTheLock = internalLockLoop(startMillis, millisToWait, ourPath);
}
catch ( KeeperException.NoNodeException e )
{
// 重试范围内时进行重试
if ( client.getZookeeperClient().getRetryPolicy().allowRetry(retryCount++, System.currentTimeMillis() - startMillis, RetryLoop.getDefaultRetrySleeper()) )
{
isDone = false;
}
else
{
throw e;
}
}
}
if ( hasTheLock )
{
return ourPath;
}
return null;
}
创建临时有序节点createsTheLock方法如下,比较简单
public String createsTheLock(CuratorFramework client, String path, byte[] lockNodeBytes) throws Exception
{
String ourPath;
if ( lockNodeBytes != null )
{
ourPath = client.create().creatingParentContainersIfNeeded().withProtection().withMode(CreateMode.EPHEMERAL_SEQUENTIAL).forPath(path, lockNodeBytes);
}
else
{
ourPath = client.create().creatingParentContainersIfNeeded().withProtection().withMode(CreateMode.EPHEMERAL_SEQUENTIAL).forPath(path);
}
return ourPath;
}
判断是否拥有锁的方法internalLockLoop才是核心,下面注意了
private boolean internalLockLoop(long startMillis, Long millisToWait, String ourPath) throws Exception
{
boolean haveTheLock = false;
boolean doDelete = false;
try
{
if ( revocable.get() != null )
{
client.getData().usingWatcher(revocableWatcher).forPath(ourPath);
}
//自旋
while ( (client.getState() == CuratorFrameworkState.STARTED) && !haveTheLock )
{
//获取path下对应临时有序节点,并按节点编号从小到大排序
List<String> children = getSortedChildren();
//获取当前线程创建的临时节点名称
String sequenceNodeName = ourPath.substring(basePath.length() + 1); // +1 to include the slash
//判断当前节点编号是否<maxLease,若是,则抢到了锁,maxLease这里为1,所以只有index为0时才抢到锁,标识只有1个线程能抢到锁
PredicateResults predicateResults = driver.getsTheLock(client, children, sequenceNodeName, maxLeases);
if ( predicateResults.getsTheLock() )
{
haveTheLock = true;
}
else
{
//前一个节点编号较小的节点的路径
String previousSequencePath = basePath + "/" + predicateResults.getPathToWatch();
synchronized(this)
{
try
{
// use getData() instead of exists() to avoid leaving unneeded watchers which is a type of resource leak
//如果没抢到锁,监听前一个节点事件
client.getData().usingWatcher(watcher).forPath(previousSequencePath);
if ( millisToWait != null )
{
判断是否超时
millisToWait -= (System.currentTimeMillis() - startMillis);
startMillis = System.currentTimeMillis();
if ( millisToWait <= 0 )
{
//超时 直接退出,并标记 删除节点doDelete标记=true
doDelete = true; // timed out - delete our node
break;
}
wait(millisToWait);
}
else
{
//调用Object.wait(),等待线程被notify唤醒
wait();
}
}
catch ( KeeperException.NoNodeException e )
{
// it has been deleted (i.e. lock released). Try to acquire again
}
}
}
}
}
catch ( Exception e )
{
ThreadUtils.checkInterrupted(e);
doDelete = true;
throw e;
}
finally
{
//如果标记了删除,删除节点数据
if ( doDelete )
{
deleteOurPath(ourPath);
}
}
return haveTheLock;
}
可以看到逻辑比较清晰,N个线程同时在path下创建临时顺序节点,编号最小的获取锁,没抢到锁的会调用wait()方法等待被唤醒
那么是在哪里调用了notify()方法来唤醒其他节点的呢?
答案是在监听器wacher里,该监听器会在前一个(节点编号较小)的节点被删除后触发
先分析下释放锁的方法release
看下源码
public void release() throws Exception
{
/*
Note on concurrency: a given lockData instance
can be only acted on by a single thread so locking isn't necessary
*/
Thread currentThread = Thread.currentThread();
LockData lockData = threadData.get(currentThread);
if ( lockData == null )
{
throw new IllegalMonitorStateException("You do not own the lock: " + basePath);
}
//如果锁被当前线程获取了超过1次,将count-1,直接返回
int newLockCount = lockData.lockCount.decrementAndGet();
if ( newLockCount > 0 )
{
return;
}
if ( newLockCount < 0 )
{
throw new IllegalMonitorStateException("Lock count has gone negative for lock: " + basePath);
}
try
{
//释放锁
internals.releaseLock(lockData.lockPath);
}
finally
{
threadData.remove(currentThread);
}
}
最终调用releaseLock方法中的deleteOurPath中
void releaseLock(String lockPath) throws Exception
{
revocable.set(null);
deleteOurPath(lockPath);
}
private void deleteOurPath(String ourPath) throws Exception
{
try
{
//直接调用client删除节点
client.delete().guaranteed().forPath(ourPath);
}
catch ( KeeperException.NoNodeException e )
{
// ignore - already deleted (possibly expired session, etc.)
}
}
节点被删除后,会触发抢锁过程中的wather监听器,看下监听器中内容
private final Watcher watcher = new Watcher()
{
@Override
public void process(WatchedEvent event)
{
notifyFromWatcher();
}
};
private synchronized void notifyFromWatcher()
{
notifyAll();
}
可以看到节点path被删除后,会通知后面一个节点进行notify操作,notify操作后,重新进入while自旋中,重新判断是否抢到了锁
节点被删除后触发
先分析下释放锁的方法release
看下源码
public void release() throws Exception
{
/*
Note on concurrency: a given lockData instance
can be only acted on by a single thread so locking isn’t necessary
*/
Thread currentThread = Thread.currentThread();
LockData lockData = threadData.get(currentThread);
if ( lockData == null )
{
throw new IllegalMonitorStateException("You do not own the lock: " + basePath);
}
//如果锁被当前线程获取了超过1次,将count-1,直接返回
int newLockCount = lockData.lockCount.decrementAndGet();
if ( newLockCount > 0 )
{
return;
}
if ( newLockCount < 0 )
{
throw new IllegalMonitorStateException("Lock count has gone negative for lock: " + basePath);
}
try
{
//释放锁
internals.releaseLock(lockData.lockPath);
}
finally
{
threadData.remove(currentThread);
}
}
最终调用releaseLock方法中的deleteOurPath中
void releaseLock(String lockPath) throws Exception
{
revocable.set(null);
deleteOurPath(lockPath);
}
private void deleteOurPath(String ourPath) throws Exception
{
try
{
//直接调用client删除节点
client.delete().guaranteed().forPath(ourPath);
}
catch ( KeeperException.NoNodeException e )
{
// ignore - already deleted (possibly expired session, etc.)
}
}
节点被删除后,会触发抢锁过程中的wather监听器,看下监听器中内容
private final Watcher watcher = new Watcher()
{
@Override
public void process(WatchedEvent event)
{
notifyFromWatcher();
}
};
private synchronized void notifyFromWatcher()
{
notifyAll();
}
可以看到节点path被删除后,会通知后面一个节点进行notify操作,notify操作后,重新进入while自旋中,重新判断是否抢到了锁