文章目录
1. 需求
现有如下需求,以kafka作为source,使用pushgateway+prometheus架构实时统计flink任务的消费偏移量current-offset
和分区偏移量总长度log-end-offset
,并计算两者差值得到消费延迟lag
,如图:
2. 名词解释
2.1 committed-offsets
每一次kafka消费者调用consumer.poll()
后得到一批数据,然后会调用consumer.commitAsync()
之类的方法进行提交,代码如下:
ConsumerRecords<byte[], byte[]> records = consumer.poll(pollTimeoutMs);
for (ConsumerRecord<byte[], byte[]> record : records) {
...
}
consumer.commitAsync();
提交后的offset会被存储到zookeeper(已废弃)或者kafka内部topic _consumer_offsets
中
2.2 current-offsets
指一次poll()
方法所拉取的一批数据的最大的那个偏移量,因此current-offsets
是业务强相关的,无法在kafka broker或者kafka client中查询到。在flink kafka source connector中,current-offsets
有如下诠释:
This refers to the offset of the last element that we retrieved and emitted successfully
2.3 visible-offset(我自己命名的)
topic中的可见消息总量,当consumer的隔离级别为read_uncommitted,visible-offset等于high watermark;当consumer的隔离级别为read_committed,visible-offset等于last stable offset
what is visible offset
2.4 log-end-offset
待写入的最新消息的偏移
3. 自定义Metrics
那么,为了满足需求,我们究竟需要使用上述的哪些指标呢?我们来分析一下flink kafka source connector源码中关于offset的提交。
3.1 flink kafka source connector源码分析
flink kafka source connector中共有三种提交模式:
public enum OffsetCommitMode {
/** Completely disable offset committing. */
DISABLED,
/** Commit offsets back to Kafka only when checkpoints are completed. */
ON_CHECKPOINTS,
/** Commit offsets periodically back to Kafka, using the auto commit functionality of internal Kafka clients. */
KAFKA_PERIODIC;
}
3.1.2 周期提交
Properties properties = new Properties();
properties.put("enable.auto.commit", "true");
properties.setProperty("auto.commit.interval.ms", "1000");
new FlinkKafkaConsumer<>("foo", new KafkaEventSchema(), properties)
这种提交模式下,我们显然只能使用current-offsets
作为监控指标,因为committed-offsets
是周期提交的,当到达周期准备提交offset时,flink已经处理了千条万条数据了
3.1.3 Checkpoint时提交
在做 checkpoint 的时候会调用 FlinkKafkaConsumerBase#snapshotState
方法,其中 pendingOffsetsToCommit 会保存要提交的 offset。
public final void snapshotState(FunctionSnapshotContext context) throws Exception {
if (offsetCommitMode == OffsetCommitMode.ON_CHECKPOINTS) {
// the map cannot be asynchronously updated, because only one checkpoint call can happen
// on this function at a time: either snapshotState() or notifyCheckpointComplete()
// 保存等待提交的current-offsets
pendingOffsetsToCommit.put(context.getCheckpointId(), currentOffsets);
}
for (Map.Entry<KafkaTopicPartition, Long> kafkaTopicPartitionLongEntry : currentOffsets.entrySet()) {
// 将各个分区的current-offset写入状态
unionOffsetStates.add(
Tuple2.of(kafkaTopicPartitionLongEntry.getKey(), kafkaTopicPartitionLongEntry.getValue()));
}
}
在 checkpoint 完成以后,task 会调用 notifyCheckpointComplete() 方法
// FlinkKafkaConsumerBase.java
public final void notifyCheckpointComplete(long checkpointId) throws Exception {
...
}
最终会将要提交的 offset 通过 KafkaFetcher#doCommitInternalOffsetsToKafka 方法中的 consumerThread.setOffsetsToCommit(offsetsToCommit, commitCallback); 保存到 KafkaConsumerThread.java 中的 nextOffsetsToCommit 成员变量里面,并进行提交
// KafkaConsumerThread.java
void setOffsetsToCommit(
...
extOffsetsToCommit.getAndSet(Tuple2.of(offsetsToCommit, commitCallback)
...
}
public void run() {
while (running) {
...
final Tuple2<Map<TopicPartition, OffsetAndMetadata>, KafkaCommitCallback> commitOffsetsAndCallback = nextOffsetsToCommit.getAndSet(null);
...
consumer.commitAsync(commitOffsetsAndCallback.f0, new CommitCallback(commitOffsetsAndCallback.f1));
...
}
}
这种提交模式下,我们显然也只能使用current-offsets
作为监控指标,因为commit-offsets
只有在checkpoint做完之后,才会进行提交。一次checkpoint的时间往往会被设置成几分钟,这之间flink早已消费了一批又一批的数据了,差之毫厘谬以千里。
综上,我们根据需求和各概念相应的释义明确了需要的Metrics指标,即current-offsets
和visible-offset
,lag
通过两者差值来计算。幸运的是,我们不需要分别获取两种offset,KafkaConsumer
下的SubScriptionState
已经提供了这两个offset,并提供了计算方法。我们只需要拿到SubScriptionState
对象即可。具体如何拿到SubScriptionState
对象,可以参考这张图
3.2 定义HighWatermark Metrics
3.2.1 自定义flink kafka consumer
public class CustomerJsonConsumer extends FlinkKafkaConsumer011<Row> {
private static final long serialVersionUID = -1234567890L;
// 自定义序列化器
private CustomerJsonDeserialization customerJsonDeserialization;
public CustomerJsonConsumer(String topic, AbsKafkaDeserialization<Row> valueDeserializer, Properties props) {
// 构造器传入自定义序列化器
super(Arrays.asList(topic.split(",")), valueDeserializer, props);
this.customerJsonDeserialization = (CustomerJsonDeserialization) valueDeserializer;
}
public CustomerJsonConsumer(Pattern subscriptionPattern,
AbsKafkaDeserialization<Row> valueDeserializer, Properties props) {
super(subscriptionPattern, valueDeserializer, props);
this.customerJsonDeserialization = (CustomerJsonDeserialization) valueDeserializer;
}
// run()方法是task启动的入口
@Override
public void run(SourceContext<Row> sourceContext) throws Exception {
// 给反序列化器传入上下文
customerJsonDeserialization.setRuntimeContext(getRuntimeContext());
customerJsonDeserialization.initMetric();
super.run(sourceContext);
}
@Override
protected AbstractFetcher<Row, ?> createFetcher(SourceContext<Row> sourceContext,
Map<KafkaTopicPartition, Long> assignedPartitionsWithInitialOffsets,
SerializedValue<AssignerWithPeriodicWatermarks<Row>> watermarksPeriodic,
SerializedValue<AssignerWithPunctuatedWatermarks<Row>> watermarksPunctuated,
StreamingRuntimeContext runtimeContext, OffsetCommitMode offsetCommitMode,
MetricGroup consumerMetricGroup,
boolean useMetrics) throws Exception {
AbstractFetcher<Row, ?> fetcher = super.createFetcher(
sourceContext,
assignedPartitionsWithInitialOffsets,
watermarksPeriodic,
watermarksPunctuated,
runtimeContext,
offsetCommitMode,
consumerMetricGroup,
useMetrics);
// 向自定义序列化器中传入fetcher,fetcher持有kafka consumer客户端,
// 给反序列化器传入Fetcher
customerJsonDeserialization.setFetcher(fetcher);
return fetcher;
}
}
3.2.2 自定义反序列化器
利用反射层层抽丝剥茧,拿到SubscriptionState
对象
public class CustomerJsonDeserialization extends AbsKafkaDeserialization<Row> {
private static final Logger LOG = LoggerFactory.getLogger(CustomerJsonDeserialization.class);
private static final long serialVersionUID = 2385115520960444192L;
String DT_TOPIC_GROUP = "topic";
String DT_PARTITION_GROUP = "partition";
private AbstractFetcher<Row, ?> fetcher;
public CustomerJsonDeserialization(TypeInformation<Row> typeInfo) {
super(typeInfo);
this.runtimeConverter = createConverter(this.typeInfo);
}
@Override
public Row deserialize(byte[] message) {
if(openMetric && firstMsg){
try {
// 只有在第一条数据到来的时候,才会调用该方法
registerPtMetric(fetcher);
} catch (Exception e) {
LOG.error("register topic partition metric error.", e);
}
firstMsg = false;
}
try {
Row row;
try {
final JsonNode root = objectMapper.readTree(message);
row = (Row) super.runtimeConverter.convert(objectMapper, root);
} catch (Throwable t) {
throw new IOException("Failed to deserialize JSON object.", t);
}
return row;
} catch (Exception e) {
// add metric of dirty data
LOG.error(e.getMessage(), e);
return null;
}
}
// fetcher由自定义flink kafka consumer传入
public void setFetcher(AbstractFetcher<Row, ?> fetcher) {
this.fetcher = fetcher;
}
protected void registerPtMetric(AbstractFetcher<Row, ?> fetcher) throws Exception {
// 通过反射获取fetcher中的kafka消费者等信息, 反射获取属性路径如下:
// Flink: Fetcher -> KafkaConsumerThread -> KafkaConsumer ->
// Kafka Consumer: KafkaConsumer -> SubscriptionState -> partitionLag()
Field consumerThreadField = fetcher.getClass().getSuperclass().getDeclaredField("consumerThread");
consumerThreadField.setAccessible(true);
KafkaConsumerThread consumerThread = (KafkaConsumerThread) consumerThreadField.get(fetcher);
Field hasAssignedPartitionsField = consumerThread.getClass().getDeclaredField("hasAssignedPartitions");
hasAssignedPartitionsField.setAccessible(true);
boolean hasAssignedPartitions = (boolean) hasAssignedPartitionsField.get(consumerThread);
if(!hasAssignedPartitions){
throw new RuntimeException("wait 50 secs, but not assignedPartitions");
}
Field consumerField = consumerThread.getClass().getDeclaredField("consumer");
consumerField.setAccessible(true);
KafkaConsumer kafkaConsumer = (KafkaConsumer) consumerField.get(consumerThread);
Field subscriptionStateField = kafkaConsumer.getClass().getDeclaredField("subscriptions");
subscriptionStateField.setAccessible(true);
SubscriptionState subscriptionState = (SubscriptionState) subscriptionStateField.get(kafkaConsumer);
Set<TopicPartition> assignedPartitions = subscriptionState.assignedPartitions();
for(TopicPartition topicPartition : assignedPartitions){
MetricGroup metricGroup = getRuntimeContext().getMetricGroup().addGroup(DT_TOPIC_GROUP, topicPartition.topic())
.addGroup(DT_PARTITION_GROUP, topicPartition.partition() + "");
metricGroup.gauge(DT_TOPIC_PARTITION_LAG_GAUGE, new KafkaTopicPartitionLagMetric(subscriptionState, topicPartition));
}
}
}
3.2.1 自定义消费延迟Metrics
public class KafkaTopicPartitionLagMetric implements Gauge<Long> {
private SubscriptionState subscriptionState;
private TopicPartition tp;
public KafkaTopicPartitionLagMetric(SubscriptionState subscriptionState, TopicPartition tp) {
this.subscriptionState = subscriptionState;
this.tp = tp;
}
@Override
public Long getValue() {
// 计算消费延迟
return subscriptionState.partitionLag(tp, IsolationLevel.READ_UNCOMMITTED);
}
}
public class SubscriptionState {
// 使用visible-offset和position(current-offset)计算消费延迟
public Long partitionLag(TopicPartition tp, IsolationLevel isolationLevel) {
TopicPartitionState topicPartitionState = assignedState(tp);
if (isolationLevel == IsolationLevel.READ_COMMITTED)
return topicPartitionState.lastStableOffset == null ? null : topicPartitionState.lastStableOffset - topicPartitionState.position;
else
return topicPartitionState.highWatermark == null ? null : topicPartitionState.highWatermark - topicPartitionState.position;
}
}
参考:
https://ververica.cn/developers/flink-kafka-source-sink-source-analysis/
https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-connector-metrics
https://www.cnblogs.com/huxi2b/p/7453543.html
https://blog.csdn.net/u013256816/article/details/88985769