本篇是基于Flink1.16对老版本(Flink1.12以下)高可用原理、以Standalone模式下的WebMonitorEndpoint为例的一篇更新
本篇聚焦于Zookeeper的高可用原理。
1. leader的选取
分布式任务调度系统往往是一个服务集群,但是为了防止任务重复执行,通常只有一个leader去任务池里取任务,leaderLatch和leaderSelector 就是Curator基于zookeeper封装的leader选举工具类。
LeaderLatch 原理是利用临时有序节点,最先创建的序号最小的节点成为leader节点
LeaderSelector利用Curator中InterProcessMutex分布式锁进行抢主,抢到锁的即为Leader
leaderLatch和Selector 看起来原理都是利用分布式锁原理,区别在于,selector中leader丢失leader之后会重新进入leader争夺,即在这个目录下再创建一个临时节点,等待。
Flink中既是利用LeaderLatch作为Dispacher、ResourceManager、WebMonitorEndpoint三大组件的选主工具类。
2. 依赖
pom.xml
<dependencies>
<!--lombok-->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.22</version>
</dependency>
<!--curator-->
<dependency>
<groupId>org.apache.curator</groupId>
<artifactId>curator-framework</artifactId>
<version>5.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.curator</groupId>
<artifactId>curator-recipes</artifactId>
<version>5.4.0</version>
</dependency>
<!--log4j2-->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>${log4j.version}</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>${log4j.version}</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-jul</artifactId>
<version>${log4j.version}</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
<version>${log4j.version}</version>
</dependency>
</dependencies>
log4j2.xml (非必要)
<?xml version="1.0" encoding="UTF-8"?>
<!--日志级别以及优先级排序: OFF > FATAL > ERROR > WARN > INFO > DEBUG > TRACE > ALL -->
<!--Configuration后面的status,这个用于设置log4j2自身内部的信息输出,可以不设置,当设置成trace时,你会看到log4j2内部各种详细输出-->
<!--monitorInterval:Log4j能够自动检测修改配置 文件和重新配置本身,设置间隔秒数-->
<configuration status="INFO" monitorInterval="30">
<!--先定义所有的appender-->
<appenders>
<!--这个输出控制台的配置-->
<console name="Console" target="SYSTEM_OUT">
<!--输出日志的格式-->
<PatternLayout pattern="[%d{yyyy-MM-dd HH:mm:ss:SSS}] [%p] - %l - %m%n"/>
</console>
</appenders>
<!--然后定义logger,只有定义了logger并引入的appender,appender才会生效-->
<loggers>
<root level="ERROR">
<appender-ref ref="Console"/>
</root>
</loggers>
</configuration>
3. 代码
public class CuratorFrameworkProperties {
// 连接地址
public static final String CONNECT_ADDRESS = "localhost:2181";
// 连接超时时间
public static final int CONNECTION_TIMEOUT_MS = 5000;
// Session超时时间
public static final int SESSION_TIMEOUT_MS = 5000;
// 命名空间
public static final String NAMESPACE = "MyCuratorDemo";
// 重试策略
public static final RetryPolicy RETRY_POLICY = new ExponentialBackoffRetry(1000, 1);
}
public class LeaderLatchRunnable implements Runnable {
private static final ExecutorService EXECUTOR_SERVICE = Executors.newCachedThreadPool();
@SneakyThrows
@Override
public void run() {
// 使用不同的CuratorFramework实例,表示不同的分布式服务节点
CuratorFramework curator = getCuratorFramework();
curator.start();
assert curator.getState().equals(CuratorFrameworkState.STARTED);
// 模拟随机加入的分布式服务节点
int randomSleep = new Random().nextInt(1000);
Thread.sleep(randomSleep);
// 创建LeaderLatch实例(用于Leader选举)
// curator是CuratorFramework实例,用于与ZooKeeper交互
// "/services/leader"是latchPath,Leader节点会成功创建该节点(其他节点则会失败)
// 将线程名(Thread.currentThread().getName())作为分布式服务节点的id
// LeaderLatch.CloseMode.NOTIFY_LEADER表示close模式,即节点进行close操作后的模式
LeaderLatch latch = new LeaderLatch(
curator,
"/services/leader",
Thread.currentThread().getName(),
LeaderLatch.CloseMode.NOTIFY_LEADER);
// 给LeaderLatch实例添加监听器(LeaderLatchListenerImpl实例)
// EXECUTOR_SERVICE表示执行该LeaderLatchListenerImpl实例的Executor实例
latch.addListener(new LeaderLatchListenerImpl(latch), EXECUTOR_SERVICE);
System.out.println(latch.getId() + "准备好了!");
// 开始Leader选举
latch.start();
System.out.println(latch.getId() + "开始Leader选举!");
}
private CuratorFramework getCuratorFramework() {
// 创建CuratorFramework实例
return CuratorFrameworkFactory.builder()
.connectString(CuratorFrameworkProperties.CONNECT_ADDRESS)
.retryPolicy(CuratorFrameworkProperties.RETRY_POLICY)
.connectionTimeoutMs(CuratorFrameworkProperties.CONNECTION_TIMEOUT_MS)
.sessionTimeoutMs(CuratorFrameworkProperties.SESSION_TIMEOUT_MS)
.namespace(CuratorFrameworkProperties.NAMESPACE)
.build();
}
}
@RequiredArgsConstructor
public class LeaderLatchListenerImpl implements LeaderLatchListener {
private final LeaderLatch LATCH;
@SneakyThrows
@Override
public void isLeader() {
System.out.println("--------------------------------" + LATCH.getId() +
"被选举为Leader--------------------------------");
LATCH.getParticipants().forEach(System.out::println);
// 睡眠5秒就close(该节点会从Leader选举中移除),其他节点会重新进行Leader选举
Thread.sleep(5000);
LATCH.close();
}
@SneakyThrows
@Override
public void notLeader() {
// 节点调用了close方法,只有在LeaderLatch.CloseMode.NOTIFY_LEADER模式下会调用该方法
// LeaderLatch.CloseMode.SILENT模式下不会调用该方法
System.out.println("--------------------------------" + LATCH.getId() +
"离开,重新进行Leader选举--------------------------------");
}
}
public class Application {
private static final ExecutorService EXECUTOR_SERVICE = Executors.newCachedThreadPool();
public static void main(String[] args) throws Exception {
for (int i = 0; i < 7; i++) {
EXECUTOR_SERVICE.execute(new LeaderLatchRunnable());
}
Thread.sleep(10000000);
}
}
控制台
pool-1-thread-1准备好了!
pool-1-thread-1开始Leader选举!
--------------------------------pool-1-thread-1被选举为Leader--------------------------------
Participant{id=‘pool-1-thread-1’, isLeader=true}
pool-1-thread-6准备好了!
pool-1-thread-6开始Leader选举!
pool-1-thread-5准备好了!
pool-1-thread-5开始Leader选举!
pool-1-thread-2准备好了!
pool-1-thread-2开始Leader选举!
pool-1-thread-3准备好了!
pool-1-thread-3开始Leader选举!
pool-1-thread-7准备好了!
pool-1-thread-7开始Leader选举!
pool-1-thread-4准备好了!
pool-1-thread-4开始Leader选举!
--------------------------------pool-1-thread-1离开,重新进行Leader选举--------------------------------
--------------------------------pool-1-thread-6被选举为Leader--------------------------------
Participant{id=‘pool-1-thread-6’, isLeader=true}
Participant{id=‘pool-1-thread-5’, isLeader=false}
Participant{id=‘pool-1-thread-2’, isLeader=false}
Participant{id=‘pool-1-thread-3’, isLeader=false}
Participant{id=‘pool-1-thread-7’, isLeader=false}
Participant{id=‘pool-1-thread-4’, isLeader=false}
…
ZkCli.sh
[zk: localhost:2181(CONNECTED) 97] ls /MyCuratorDemo/services/leader
[_c_691dbf34-67a2-4700-a314-24e4f1a914b9-latch-0000000004, _c_743eca28-ee10-4f38-ba98-8aa41cc10960-latch-0000000000, _c_7642e5f2-6c67-4113-a671-8bc4d14d9c8d-latch-0000000001, _c_774d824a-79b0-49e1-a13a-fcbba9104f5e-latch-0000000005, _c_b88258b4-4a3d-4fd6-a278-d16c76b39fd4-latch-0000000006, _c_cb797157-fbf3-4524-b9aa-5e9a2b9188f2-latch-0000000002, _c_cd3e7b12-d400-4ee1-9291-e726ea8c272a-latch-0000000003]