Flink Standalone集群模式下使用minio作为statebackend存储

需求

flink standalone模式下不使用hdfs作为state的存储,而使用其他更加轻便的存储系统,本文使用minio作为statebackend的存储目录,环境全部使用docker构建。项目完成:

1.flink使用docker搭建,代码完成从kafka中消费,同时写入kafka,测试checkpoint是否能写入minio。

2.保存savepoint,同时使用savepoint,查看是否能重放消费kafka数据。

官网:

Flink’s checkpointing mechanism interacts with durable storage for streams and state. In general, it requires:

A persistent (or durable) data source that can replay records for a certain amount of time. Examples for such sources are persistent messages queues (e.g., Apache Kafka, RabbitMQ, Amazon Kinesis, Google PubSub) or file systems (e.g., HDFS, S3, GFS, NFS, Ceph, …).
A persistent storage for state, typically a distributed filesystem (e.g., HDFS, S3, GFS, NFS, Ceph, …)

minio介绍

轻量级的对象存储服务,兼容Amazon’ S3,简单理解相当于本地的S3存储服务。

关于存储的技术选型参考:https://blog.csdn.net/lily_214/article/details/106606729

环境搭建

version: '3.1'
services:
# kafka管理工具
  km:
    image: km
    container_name: km
    environment:
      - ZK_HOSTS=${ip}:2181/kafka
    ports:
      - "8088:9000"
    volumes:
      - /etc/hosts:/etc/hosts
    restart: always
# kafka
  #------3.4.14 无法将zk文件持久化到本地,所以zk容器不能删除--------
  zookeeper:
    image: wurstmeister/zookeeper
    ports:
      - "2181:2181"
    restart: always

  # kakfa端口和隐射的端口不一致会出现异常,导致broker无法连接
  kafka:
    image: wurstmeister/kafka:2.12-2.3.1
    ports:
      - "9092:9092"
    environment:
      - TZ=Asia/Shanghai
      - KAFKA_BROKER_ID=0
      - KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181/kafka
      - KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://${ip}:9092
      - KAFKA_LISTENERS=PLAINTEXT://:9092
      - KAFKA_AUTO_CREATE_TOPICS_ENABLE=true
    volumes:
      - /etc/hosts:/etc/hosts
      - /var/run/docker.sock:/var/run/docker.sock
      - ./kafka/data:/kafka
      - ./kafka/log:/opt/kafka/logs
    restart: always
  # flink
  jobmanager:
    image: registry.cn-hangzhou.aliyuncs.com/binlin/flink:1.9.2-scala_2.12
    expose:
      - "6123"
    ports:
      - "8081:8081"
    command: jobmanager
    environment:
      - TZ=Asia/Shanghai
      - JOB_MANAGER_RPC_ADDRESS=jobmanager
    volumes:
      - /etc/hosts:/etc/hosts
      # 容器是flink用户,挂载到本地会有权限问题,如果出现权限问题chmod对应修改即可
      - ./flink/conf:/opt/flink/conf
      - ./flink/docker-entrypoint.sh:/docker-entrypoint.sh

  taskmanager:
    image: registry.cn-hangzhou.aliyuncs.com/binlin/flink:1.9.2-scala_2.12
    expose:
      - "6121"
      - "6122"
    depends_on:
      - jobmanager
    command: taskmanager
    links:
      - "jobmanager:jobmanager"
    environment:
      - TZ=Asia/Shanghai
      - JOB_MANAGER_RPC_ADDRESS=jobmanager
    volumes:
      - /etc/hosts:/etc/hosts
      - ./flink/conf:/opt/flink/conf
      - ./flink/docker-entrypoint.sh:/docker-entrypoint.sh
 # minio 兼容s3的存储 ,密码必须大于等于8位
  minio:
    image: minio/minio:latest
    entrypoint: sh
    environment:
      - MINIO_ACCESS_KEY=root
      - MINIO_SECRET_KEY=hellowin(☄⊙ω⊙)☄彩蛋
    expose:
      - "9000"
    ports:
      - "9000:9000"
    volumes:
      - ./minio:/data
    command: -c '/usr/bin/minio server /data'

配置flink-conf.yaml:

state.backend: filesystem (or rocksdb)
state.checkpoints.dir: s3://state/checkpoint
s3.endpoint: http://${ip}:9000
s3.path.style.access: true
s3.access-key: root
s3.secret-key: 12345678

flink代码摘要

public class S3CheckpointKakfa {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.enableCheckpointing(15000, CheckpointingMode.EXACTLY_ONCE);

        env.setRestartStrategy(RestartStrategies.noRestart());

        CheckpointConfig checkpointConfig = env.getCheckpointConfig();
        checkpointConfig.setMaxConcurrentCheckpoints(3);
        checkpointConfig.setMinPauseBetweenCheckpoints(3000);

        checkpointConfig.enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);

        FlinkKafkaConsumer<String> consumer = new FlinkKafkaConsumer<String>("mytest"
                , new SimpleStringSchema(Charset.defaultCharset())
                , consumerConfig());

        SingleOutputStreamOperator<String> source = env.addSource(consumer).uid("111").setParallelism(2).filter(new FilterFunction<String>() {
            @Override
            public boolean filter(String value) throws Exception {
                return !StringUtils.isNullOrWhitespaceOnly(value);
            }
        }).setParallelism(1);

        source.addSink(new FlinkKafkaProducer<String>("target", new KafkaSerializationSchema<String>() {
            @Override
            public ProducerRecord<byte[], byte[]> serialize(String element, @Nullable Long timestamp) {
                return new ProducerRecord<>("target", element.getBytes());
            }
        }, producerConfig(), FlinkKafkaProducer.Semantic.EXACTLY_ONCE, 3)).uid("222").setParallelism(3);

        env.execute("s3 test");
    }
}

执行

打包 拷贝至容器内部,执行:

docker-compose exec jobmanager bin/flink run  -c kafka.S3CheckpointKakfa flink-java-1.0.jar

结果

checkpoint创建正常,savepoint正常

保存savepoint

docker-compose exec jobmanager bin/flink savepoint 2beed3f30424b5203872a23204589cc9 s3://state/savepoint-1

通过savepoint重启程序,kafka也能回放offset消费。

 

评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值