flink开发遇到的问题-kafka source offset
背景:
flink运行程序提交偏移量,每次启动的时候(不恢复状态 (ckp 或者sp)从Kafka保存的偏移量恢复程序)发现总是会从earliest,latest消费,并未按照预想的那样从kafka保存的偏移量接着消费
KafkaSourceBuilder<T> sourceBuilder = KafkaSource.<T>builder()
.setBootstrapServers(host)
.setTopics(topic).setGroupId(group)
.setStartingOffsets(OffsetsInitializer.latest())
.setProperty("security.protocol", "SASL_PLAINTEXT")
.setProperty("sasl.mechanism", "GSSAPI")
.setProperty("sasl.kerberos.service.name", "kafka")
.setProperty("sasl.jaas.config", "com.sun.security.auth.module.Krb5LoginModule required\n" +
"\tuseKeyTab=true storeKey=true\n" +
"\tkeyTab=\""+kerberosKeyTab+"\"\n" +
// "\tkeyTab=\"krb5.keytab\"\n" +
"\tprincipal=\""+principal+"\";")
.setDeserializer(serialize)
.setProperty("max.poll.records", "1000000");
问题产生原因
setStartingOffsets(OffsetsInitializer.latest()) 消息会每次都从最新的消费 ,并不是设置的第一次从最新的消费
解决办法
setStartingOffsets(OffsetsInitializer.committedOffsets(OffsetResetStrategy.valueOf(offset.toUpperCase())))
KafkaSource.builder()
// Start from committed offset of the consuming group, without reset strategy
.setStartingOffsets(OffsetsInitializer.committedOffsets())
// Start from committed offset, also use EARLIEST as reset strategy if committed offset doesn't exist
.setStartingOffsets(OffsetsInitializer.committedOffsets(OffsetResetStrategy.EARLIEST))
// Start from the first record whose timestamp is greater than or equals a timestamp (milliseconds)
.setStartingOffsets(OffsetsInitializer.timestamp(1657256176000L))
// Start from earliest offset
.setStartingOffsets(OffsetsInitializer.earliest())
// Start from latest offset
.setStartingOffsets(OffsetsInitializer.latest());
解决问题中间的临时处理办法
每次重启任务的时候从最后一次ckp启动 但是遇到了修改程序的并行度 无法从ckp恢复 只能保持并行度不变消费完历史数据,再重新设置并行度重新消费,而且每次都要找ckp的路径 ,发生表更的时候多是在半夜 ,人容易急躁 ,所以极为不便,