Flink整合kafka的两阶段提交结论

1.Flink+kafka是如何实现exactly-once语义的

两段提交: 预提交 确认提交

Flink通过checkpoint来保存数据是否处理完成的状态

由JobManager协调各个TaskManager进行checkpoint存储,checkpoint保存在 StateBackend中,默认StateBackend是内存级的,也可以改为文件级的进行持久化保存。

执行过程实际上是一个两段式提交,每个算子执行完成**,会进行“预提交”,直到执行完sink操作,会发起“确认提交”,如果执行失败,预提交会放弃掉。**(相当于mysql的事务操作)

如果宕机需要通过StateBackend进行恢复,只能恢复所有未确认提交的操作。

2.WC案例的如何做chekcpoint

   //1.kakfa source
    val stream = FlinkKafkaUtils.createKafkaStream(parameters)
    //2.WC逻辑
    val result = stream.flatMap(_.split(","))
      .map((_,1))
      //***每一个subtask有一个operation stat(kafkaStat)还有一个keyedState(keyBy)***
      .keyBy(x=>x._1)
      .sum(1)

    result.print()
   //3.sink DB
    result.map(x => (wc", x._1, x._2.toLong)).addSink(new RedisSink)
    FlinkKafkaUtils.env.execute(getClass.getCanonicalName)

【重点】两阶段提交:
1.第一阶段:每个算子执行完成,会进行“预提交”。
(1)WC中sum(1)算子完成就会预提交offset到StateBackend

2.第二阶段:直到执行完sink操作,会发起“确认提交”,如果执行失败,预提交会放弃掉
(1) 如果addSink(new RedisSink)成功,则确认提交offset到StateBackend
(2) 如果addSink(new RedisSink)失败,则回滚,预提交offset到StateBackend操作被放弃
在这里插入图片描述

3.源码分析

且听下回分解

4.kafkaConsumer在与Flink整合的思考

4.1 kafka和flink的整合其offset是存在两个地方

(1)查看 checkpoint statbackend中存的offset
(2)查看 __consumer_offsets中存的offset(不断刷新offset)

	kafka-console-consumer.sh --bootstrap-server kafka01:9092,kafka02:9092,kafka03:9092
	--topic __consumer_offsets --formatter  "kafka.coordinator.group.GroupMetadataManager \OffsetsMessageFormatter" | grep
	--color=auto topicName

4.2 flink整合kafka的官网参考

 https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/kafka.html

重要的两句话:

Kafka Consumers Offset Committing Behaviour Configuration
The Flink Kafka Consumer allows configuring the behaviour of how offsets are committed back to Kafka brokers.
Note that the Flink Kafka Consumer does not rely on the committed offsets for fault tolerance guarantees.
The committed offsets are only a means to expose the consumer’s progress for monitoring purposes.

The way to configure offset commit behaviour is different, depending on whether checkpointing is enabled for the job.

(1)禁用checkpoint

    Checkpointing disabled: if checkpointing is disabled, the Flink Kafka Consumer relies on the automatic periodic offset
    committing capability of the internally used Kafka clients. Therefore, to disable or enable offset committing,
    simply set the enable.auto.commit / auto.commit.interval.ms keys to appropriate values in the provided Properties configuration.

原文翻译: 禁用checkPoint:如果禁用checkpoint,Flink的kafka消费者利用kafka客户端自带的 自动周期性offset提交能力.因此,要开启或禁用自动提交,
设置nable.auto.commit / auto.commit.interval.ms为相应的值即可.

人话理解:
如果checkpoint关闭,flink是依赖kafka自身的自动周期(默认5秒)提交offset

enable.auto.commit  true/false(默认true)
auto.commit.interval.ms 5000 (默认每5秒自动提交一次,所以kafka的消费者__consumer_offset会自动刷新

(2)开启checkpoint

Checkpointing enabled: if checkpointing is enabled, the Flink Kafka Consumer will commit the offsets stored in the checkpointed
states when the checkpoints are completed. This ensures that the committed offsets in Kafka brokers is consistent with the
offsets in the checkpointed states. Users can choose to disable or enable offset committing by calling the setCommitOffsetsOnCheckpoints(boolean)
method on the consumer (by default, the behaviour is true). Note that in this scenario, the automatic periodic offset committing s
ettings in Properties is completely ignored.

原文翻译:
开启checkpoint:如果启用了checkpoint,Flink kafka消费者在checkpoints完成时会将offset存储到checkpoint的状态.这种方式确保了在kafka broker中已提交的offset与checkpoint状态中的一致.用户能通过调用etCommitOffsetsOnCheckpoints(boolean)方法开启或禁用offset的提交(默认为true).
注意,开启checkpoint后,自动周期性提交offset的配置完全被忽略.

人话翻译理解:
a.如果checkpoint打开,当checkpoint完成后,flink将会将offset提交到operation state。
b.setCommitOffsetsOnCheckpoints(true) 默认情况下true,改为false后,只会将offset提交到stateBackend中,不会提交到__consumer_offset.生产使用默认为true即可.
c.上面所讲的在kafka broker中已提交的offset就是__consumer_offsets(存放消费者offset的kafka自带topic)中存的offset,checkPoint状态中就是checkpoint statbackend(operation state)中存的offset;
d.自定周期性提交配置被忽略掉,查看源码:

    -->FlinkKafkaConsumer类中的方法 getIsAutoCommitEnabled()
    @Override
    protected boolean getIsAutoCommitEnabled() {
        return getBoolean(properties, ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, true) &&
          PropertiesUtil.getLong(properties, ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, 5000) > 0;
      }

    -->FlinkKafkaConsumer类中的方法 open方法 调用 getIsAutoCommitEnabled
  @Override
  public void open(Configuration configuration) throws Exception {
      // determine the offset commit mode
      this.offsetCommitMode = OffsetCommitModes.fromConfiguration(
          getIsAutoCommitEnabled(),
          enableCommitOnCheckpoints,
          ((StreamingRuntimeContext) getRuntimeContext()).isCheckpointingEnabled());
   ...}
    --> FlinkKafkaConsumer类中的方法 open方法中的fromConfiguration方法

    public class OffsetCommitModes {

    /**
      * Determine the offset commit mode using several configuration values.
      *
      * @param enableAutoCommit whether or not auto committing is enabled in the provided Kafka properties.
    * @param enableCommitOnCheckpoint whether or not committing on checkpoints is enabled.
    * @param enableCheckpointing whether or not checkpoint is enabled for the consumer.
    *
    * @return the offset commit mode to use, based on the configuration values.
    */
    public static OffsetCommitMode fromConfiguration(
        boolean enableAutoCommit,
        boolean enableCommitOnCheckpoint,
        boolean enableCheckpointing) {

      //首先代码中env.enableCheckpointing(5000)
      if (enableCheckpointing) {
        // if checkpointing is enabled, the mode depends only on whether committing on checkpoints is enabled
        //其次默认kafkaSource.setCommitOffsetsOnCheckpoints(true),则使用模式OffsetCommitMode.ON_CHECKPOINTS
        return (enableCommitOnCheckpoint) ? OffsetCommitMode.ON_CHECKPOINTS : OffsetCommitMode.DISABLED;
      } else {
        // else, the mode depends only on whether auto committing is enabled in the provided Kafka properties
        return (enableAutoCommit) ? OffsetCommitMode.KAFKA_PERIODIC : OffsetCommitMode.DISABLED;
      }
    }
  }

  --> 查看枚举类 OffsetCommitMode
       /**
      * The offset commit mode represents the behaviour of how offsets are externally committed
      * back to Kafka brokers / Zookeeper.
      *
      * <p>The exact value of this is determined at runtime in the consumer subtasks.
      */
    @Internal
    public enum OffsetCommitMode {

      /** Completely disable offset committing. */
      DISABLED,

      /** Commit offsets back to Kafka only when checkpoints are completed. */
      ON_CHECKPOINTS,

      /** Commit offsets periodically back to Kafka, using the auto commit functionality of internal Kafka clients. */
      KAFKA_PERIODIC;
    }

图解:
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

4.3 生产上如何使用?

1.生产上只需要开启checkPoint即可

 env.enableCheckpointing(60000);

分析:开启checkPoint后,自动周期性提交offset的配置完全被忽略(注意此时无需配置enable.auto.commit=false,因为配置无效),另外commitOffsetsOnCheckPoints默认为true,即实现了kafka Offset在sink成功后才被提交(提交到checkPoint Operater state + __consumer_offsets),如果sink失败,则回滚不提交kafka Offset

2.如果代码执行失败,如何恢复??
指定-s checkpoint路径,则从checkPoint Operater state中的kafka Offsets处消费;如果不指定-s,则直接从__consumer_offsets中的kafka Offsets处消费.由此可见,指不指定-s在绝大多数情况下一样(除非是chekcPoint成功,此时kafka问题而导致提交到__consumer_offsets失败)

生产上一般还是指定-s进行恢复,以代码中sink为准进行恢复是最精准的.

参考资料

# 官网flink两阶段提交解释
https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html

# Flink两阶段提交
https://blog.csdn.net/lisenyeahyeah/article/details/90288231
  • 7
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值