Flink WebUI中查看成功提交的偏移量指标为负值

1.问题

在一个flink任务提交后,我去 WebUi去查看偏移量的指标,发现值竟然是个负数(如下图):
在这里插入图片描述
这个指标官网上给出的释义是:对于每个分区,最后一次成功提交到Kafka的偏移量;所以不管怎么说,这个值都不是正常的。

2.结论

先给问题的结论(解决办法):之所以出现这样的值,是因为没有设置checkpoints的原因。

3.详解

但为什么不设置ck,这个指标就会变成负值呢?这就要我们深入源码去看啦

首先,我们定位到这个类:

org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher

/**
 * For each partition, register a new metric group to expose current offsets and committed offsets.
 * Per-partition metric groups can be scoped by user variables {@link KafkaConsumerMetricConstants#OFFSETS_BY_TOPIC_METRICS_GROUP}
 * and {@link KafkaConsumerMetricConstants#OFFSETS_BY_PARTITION_METRICS_GROUP}.
 *
 * <p>Note: this method also registers gauges for deprecated offset metrics, to maintain backwards compatibility.
 *
 * @param consumerMetricGroup The consumer metric group
 * @param partitionOffsetStates The partition offset state holders, whose values will be used to update metrics
 */
private void registerOffsetMetrics(
		MetricGroup consumerMetricGroup,
		List<KafkaTopicPartitionState<KPH>> partitionOffsetStates) {

	for (KafkaTopicPartitionState<KPH> ktp : partitionOffsetStates) {
		MetricGroup topicPartitionGroup = consumerMetricGroup
			.addGroup(OFFSETS_BY_TOPIC_METRICS_GROUP, ktp.getTopic())
			.addGroup(OFFSETS_BY_PARTITION_METRICS_GROUP, Integer.toString(ktp.getPartition()));

		topicPartitionGroup.gauge(CURRENT_OFFSETS_METRICS_GAUGE, new OffsetGauge(ktp, OffsetGaugeType.CURRENT_OFFSET));
		topicPartitionGroup.gauge(COMMITTED_OFFSETS_METRICS_GAUGE, new OffsetGauge(ktp, OffsetGaugeType.COMMITTED_OFFSET));

		legacyCurrentOffsetsMetricGroup.gauge(getLegacyOffsetsMetricsGaugeName(ktp), new OffsetGauge(ktp, OffsetGaugeType.CURRENT_OFFSET));
		legacyCommittedOffsetsMetricGroup.gauge(getLegacyOffsetsMetricsGaugeName(ktp), new OffsetGauge(ktp, OffsetGaugeType.COMMITTED_OFFSET));
	}
}

我们可以看到registerOffsetMetrics这个方法中:

topicPartitionGroup.gauge(COMMITTED_OFFSETS_METRICS_GAUGE, new OffsetGauge(ktp, OffsetGaugeType.COMMITTED_OFFSET));

很明显,要提交的offset被封装到ktp的这个对象中;于是进入该对象,是个KafkaTopicPartitionState类型:

public KafkaTopicPartitionState(KafkaTopicPartition partition, KPH kafkaPartitionHandle) {
		this.partition = partition;
		this.kafkaPartitionHandle = kafkaPartitionHandle;
		this.offset = KafkaTopicPartitionStateSentinel.OFFSET_NOT_SET;
		this.committedOffset = KafkaTopicPartitionStateSentinel.OFFSET_NOT_SET;
	}
/** Magic number that defines an unset offset. */
	public static final long OFFSET_NOT_SET = -915623761776L;

这里我们就可以看到我们要的指标committedOffset在一开始的时候,就会被初始化一个这样的负值;
到这里我们似乎明白了在web ui中看到的负值是怎么来的了,但是为什么我们提交的偏移量没有赋值到committedOffset这个指标上呢?那我们就要把问题定位到提交offset的代码上了,我们继续往下看这个类:

org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase

public final void notifyCheckpointComplete(long checkpointId) throws Exception {
		if (!running) {
			LOG.debug("notifyCheckpointComplete() called on closed source");
			return;
		}

		final AbstractFetcher<?, ?> fetcher = this.kafkaFetcher;
		if (fetcher == null) {
			LOG.debug("notifyCheckpointComplete() called on uninitialized source");
			return;
		}

		if (offsetCommitMode == OffsetCommitMode.ON_CHECKPOINTS) {
			// only one commit operation must be in progress
			if (LOG.isDebugEnabled()) {
				LOG.debug("Consumer subtask {} committing offsets to Kafka/ZooKeeper for checkpoint {}.",
					getRuntimeContext().getIndexOfThisSubtask(), checkpointId);
			}

			try {
				final int posInMap = pendingOffsetsToCommit.indexOf(checkpointId);
				if (posInMap == -1) {
					LOG.warn("Consumer subtask {} received confirmation for unknown checkpoint id {}",
						getRuntimeContext().getIndexOfThisSubtask(), checkpointId);
					return;
				}

				@SuppressWarnings("unchecked")
				Map<KafkaTopicPartition, Long> offsets =
					(Map<KafkaTopicPartition, Long>) pendingOffsetsToCommit.remove(posInMap);

				// remove older checkpoints in map
				for (int i = 0; i < posInMap; i++) {
					pendingOffsetsToCommit.remove(0);
				}

				if (offsets == null || offsets.size() == 0) {
					LOG.debug("Consumer subtask {} has empty checkpoint state.", getRuntimeContext().getIndexOfThisSubtask());
					return;
				}

				fetcher.commitInternalOffsetsToKafka(offsets, offsetCommitCallback);
			} catch (Exception e) {
				if (running) {
					throw e;
				}
				// else ignore exception if we are no longer running
			}
		}
	}

上面的方法里有一个判断逻辑,然后当你的提交模式用了ck时,走了一个commitInternalOffsetsToKafka的方法:

 if (offsetCommitMode == OffsetCommitMode.ON_CHECKPOINTS)
//......
fetcher.commitInternalOffsetsToKafka(offsets, offsetCommitCallback);

进入该方法后发现又调用了一个doCommitInternalOffsetsToKafka方法:

public final void commitInternalOffsetsToKafka(
			Map<KafkaTopicPartition, Long> offsets,
			@Nonnull KafkaCommitCallback commitCallback) throws Exception {
		// Ignore sentinels. They might appear here if snapshot has started before actual offsets values
		// replaced sentinels
		doCommitInternalOffsetsToKafka(filterOutSentinels(offsets), commitCallback);
	}

然后会跳到org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher 的 doCommitInternalOffsetsToKafka方法:

protected void doCommitInternalOffsetsToKafka(
			Map<KafkaTopicPartition, Long> offsets,
			@Nonnull KafkaCommitCallback commitCallback) throws Exception {

		@SuppressWarnings("unchecked")
		List<KafkaTopicPartitionState<TopicPartition>> partitions = subscribedPartitionStates();

		Map<TopicPartition, OffsetAndMetadata> offsetsToCommit = new HashMap<>(partitions.size());

		for (KafkaTopicPartitionState<TopicPartition> partition : partitions) {
			Long lastProcessedOffset = offsets.get(partition.getKafkaTopicPartition());
			if (lastProcessedOffset != null) {
				checkState(lastProcessedOffset >= 0, "Illegal offset value to commit");

				// committed offsets through the KafkaConsumer need to be 1 more than the last processed offset.
				// This does not affect Flink's checkpoints/saved state.
				long offsetToCommit = lastProcessedOffset + 1;

				offsetsToCommit.put(partition.getKafkaPartitionHandle(), new OffsetAndMetadata(offsetToCommit));
				partition.setCommittedOffset(offsetToCommit);
			}
		}

里面有个逻辑:

partition.setCommittedOffset(offsetToCommit)

在这里,发现offsetToCommit被重置了,至此真相大白;

梳理一下的话是这样的:
当flink提交offsetToCommit会判断当前的模式

if (offsetCommitMode == OffsetCommitMode.ON_CHECKPOINTS)

如果当前没用ck,offsetToCommit这个指标的值没有更新,是初始值-915623761776L;
如果当前用了ck,那么最后会走到一个

partition.setCommittedOffset(offsetToCommit)

这样的逻辑来更新offsetToCommit的指标;所以没有设置ck时,webui上的该值为负数。

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值