需求
flink standalone模式下不使用hdfs作为state的存储,而使用其他更加轻便的存储系统,本文使用minio作为statebackend的存储目录,环境全部使用docker构建。项目完成:
1.flink使用docker搭建,代码完成从kafka中消费,同时写入kafka,测试checkpoint是否能写入minio。
2.保存savepoint,同时使用savepoint,查看是否能重放消费kafka数据。
官网:
Flink’s checkpointing mechanism interacts with durable storage for streams and state. In general, it requires:
A persistent (or durable) data source that can replay records for a certain amount of time. Examples for such sources are persistent messages queues (e.g., Apache Kafka, RabbitMQ, Amazon Kinesis, Google PubSub) or file systems (e.g., HDFS, S3, GFS, NFS, Ceph, …).
A persistent storage for state, typically a distributed filesystem (e.g., HDFS, S3, GFS, NFS, Ceph, …)
minio介绍
轻量级的对象存储服务,兼容Amazon’ S3,简单理解相当于本地的S3存储服务。
关于存储的技术选型参考:https://blog.csdn.net/lily_214/article/details/106606729
环境搭建
version: '3.1'
services:
# kafka管理工具
km:
image: km
container_name: km
environment:
- ZK_HOSTS=${ip}:2181/kafka
ports:
- "8088:9000"
volumes:
- /etc/hosts:/etc/hosts
restart: always
# kafka
#------3.4.14 无法将zk文件持久化到本地,所以zk容器不能删除--------
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
restart: always
# kakfa端口和隐射的端口不一致会出现异常,导致broker无法连接
kafka:
image: wurstmeister/kafka:2.12-2.3.1
ports:
- "9092:9092"
environment:
- TZ=Asia/Shanghai
- KAFKA_BROKER_ID=0
- KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181/kafka
- KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://${ip}:9092
- KAFKA_LISTENERS=PLAINTEXT://:9092
- KAFKA_AUTO_CREATE_TOPICS_ENABLE=true
volumes:
- /etc/hosts:/etc/hosts
- /var/run/docker.sock:/var/run/docker.sock
- ./kafka/data:/kafka
- ./kafka/log:/opt/kafka/logs
restart: always
# flink
jobmanager:
image: registry.cn-hangzhou.aliyuncs.com/binlin/flink:1.9.2-scala_2.12
expose:
- "6123"
ports:
- "8081:8081"
command: jobmanager
environment:
- TZ=Asia/Shanghai
- JOB_MANAGER_RPC_ADDRESS=jobmanager
volumes:
- /etc/hosts:/etc/hosts
# 容器是flink用户,挂载到本地会有权限问题,如果出现权限问题chmod对应修改即可
- ./flink/conf:/opt/flink/conf
- ./flink/docker-entrypoint.sh:/docker-entrypoint.sh
taskmanager:
image: registry.cn-hangzhou.aliyuncs.com/binlin/flink:1.9.2-scala_2.12
expose:
- "6121"
- "6122"
depends_on:
- jobmanager
command: taskmanager
links:
- "jobmanager:jobmanager"
environment:
- TZ=Asia/Shanghai
- JOB_MANAGER_RPC_ADDRESS=jobmanager
volumes:
- /etc/hosts:/etc/hosts
- ./flink/conf:/opt/flink/conf
- ./flink/docker-entrypoint.sh:/docker-entrypoint.sh
# minio 兼容s3的存储 ,密码必须大于等于8位
minio:
image: minio/minio:latest
entrypoint: sh
environment:
- MINIO_ACCESS_KEY=root
- MINIO_SECRET_KEY=hellowin(☄⊙ω⊙)☄彩蛋
expose:
- "9000"
ports:
- "9000:9000"
volumes:
- ./minio:/data
command: -c '/usr/bin/minio server /data'
配置flink-conf.yaml:
state.backend: filesystem (or rocksdb)
state.checkpoints.dir: s3://state/checkpoint
s3.endpoint: http://${ip}:9000
s3.path.style.access: true
s3.access-key: root
s3.secret-key: 12345678
flink代码摘要
public class S3CheckpointKakfa {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(15000, CheckpointingMode.EXACTLY_ONCE);
env.setRestartStrategy(RestartStrategies.noRestart());
CheckpointConfig checkpointConfig = env.getCheckpointConfig();
checkpointConfig.setMaxConcurrentCheckpoints(3);
checkpointConfig.setMinPauseBetweenCheckpoints(3000);
checkpointConfig.enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
FlinkKafkaConsumer<String> consumer = new FlinkKafkaConsumer<String>("mytest"
, new SimpleStringSchema(Charset.defaultCharset())
, consumerConfig());
SingleOutputStreamOperator<String> source = env.addSource(consumer).uid("111").setParallelism(2).filter(new FilterFunction<String>() {
@Override
public boolean filter(String value) throws Exception {
return !StringUtils.isNullOrWhitespaceOnly(value);
}
}).setParallelism(1);
source.addSink(new FlinkKafkaProducer<String>("target", new KafkaSerializationSchema<String>() {
@Override
public ProducerRecord<byte[], byte[]> serialize(String element, @Nullable Long timestamp) {
return new ProducerRecord<>("target", element.getBytes());
}
}, producerConfig(), FlinkKafkaProducer.Semantic.EXACTLY_ONCE, 3)).uid("222").setParallelism(3);
env.execute("s3 test");
}
}
执行
打包 拷贝至容器内部,执行:
docker-compose exec jobmanager bin/flink run -c kafka.S3CheckpointKakfa flink-java-1.0.jar
结果
checkpoint创建正常,savepoint正常
保存savepoint
docker-compose exec jobmanager bin/flink savepoint 2beed3f30424b5203872a23204589cc9 s3://state/savepoint-1
通过savepoint重启程序,kafka也能回放offset消费。