原文地址 mp.weixin.qq.com
01 幂等性如此重要
Kafka 作为分布式 MQ,大量用于分布式系统中,如消息推送系统、业务平台系统(如结算平台),就拿结算来说,业务方作为上游把数据打到结算平台,如果一份数据被计算、处理了多次,产生的后果将会特别严重。
02 哪些因素影响幂等性
使用 Kafka 时, 需要保证 exactly-once 语义。要知道在分布式系统中,出现网络分区是不可避免的,如果 kafka broker 在回复 ack 时,出现网络故障或者是 full gc 导致 ack timeout,producer 将会重发,如何保证 producer 重试时不造成重复 or 乱序?又或者 producer 挂了,新的 producer 并没有 old producer 的状态数据,这个时候如何保证幂等?即使 Kafka 发送消息满足了幂等,consumer 拉取到消息后,把消息交给线程池 workers,workers 线程对 message 的处理可能包含异步操作,又会出现以下情况:
-
先 commit,再执行业务逻辑:提交成功,处理失败 。造成丢失
-
先执行业务逻辑,再 commit:提交失败,执行成功。造成重复执行
-
先执行业务逻辑,再 commit:提交成功,异步执行 fail。造成丢失
本文将针对以上问题作出讨论
03 Kafka 保证发送幂等性
针对以上的问题,kafka 在 0.11 版新增了幂等型 producer 和事务型 producer。前者解决了单会话幂等性等问题,后者解决了多会话幂等性。
单会话幂等性
为解决 producer 重试引起的乱序和重复。Kafka 增加了 pid 和 seq。Producer 中每个 RecordBatch 都有一个单调递增的 seq; Broker 上每个 tp 也会维护 pid-seq 的映射,并且每 Commit 都会更新 lastSeq。这样 recordBatch 到来时,broker 会先检查 RecordBatch 再保存数据:如果 batch 中 baseSeq(第一条消息的 seq)比 Broker 维护的序号 (lastSeq) 大 1,则保存数据,否则不保存(inSequence 方法)。
ProducerStateManager.scala
private def maybeValidateAppend(producerEpoch: Short, firstSeq: Int, offset: Long): Unit = {
validationType match {
case ValidationType.None =>
case ValidationType.EpochOnly =>
checkProducerEpoch(producerEpoch, offset)
case ValidationType.Full =>
checkProducerEpoch(producerEpoch, offset)
checkSequence(producerEpoch, firstSeq, offset)
}
}
private def checkSequence(producerEpoch: Short, appendFirstSeq: Int, offset: Long): Unit = {
if (producerEpoch != updatedEntry.producerEpoch) {
if (appendFirstSeq != 0) {
if (updatedEntry.producerEpoch != RecordBatch.NO_PRODUCER_EPOCH) {
throw new OutOfOrderSequenceException(s"Invalid sequence number for new epoch at offset $offset in " +
s"partition $topicPartition: $producerEpoch (request epoch), $appendFirstSeq (seq. number)")
} else {
throw new UnknownProducerIdException(s"Found no record of producerId=$producerId on the broker at offset $offset" +
s"in partition $topicPartition. It is possible that the last message with the producerId=$producerId has " +
"been removed due to hitting the retention limit.")
}
}
} else {
val currentLastSeq = if (!updatedEntry.isEmpty)
updatedEntry.lastSeq
else if (producerEpoch == currentEntry.producerEpoch)
currentEntry.lastSeq
else
RecordBatch.NO_SEQUENCE
if (currentLastSeq == RecordBatch.NO_SEQUENCE && appendFirstSeq != 0) {
s"for producerId=$producerId at offset $offset in partition $topicPartition, but the next expected " +
"sequence number is not known.")
} else if (!inSequence(currentLastSeq, appendFirstSeq)) {
throw new OutOfOrderSequenceException(s"Out of order sequence number for producerId $producerId at " +
s"offset $offset in partition $topicPartition: $appendFirstSeq (incoming seq. number), " +
s"$currentLastSeq (current end sequence number)")
}
}
}
private def inSequence(lastSeq: Int, nextSeq: Int): Boolean = {
nextSeq == lastSeq + 1L || (nextSeq == 0 && lastSeq == Int.MaxValue)
}
引申:Kafka producer 对有序性做了哪些处理
假设我们有 5 个请求,batch1、batch2、batch3、batch4、batch5;如果只有 batch2 ack failed,3、4、5 都保存了,那 2 将会随下次 batch 重发而造成重复。我们可以设置 max.in.flight.requests.per.connection=1(客户端在单个连接上能够发送的未响应请求的个数)来解决乱序,但降低了系统吞吐。
新版本 kafka 设置 enable.idempotence=true 后能够动态调整 max-in-flight-request。正常情况下 max.in.flight.requests.per.connection 大于 1。当重试请求到来且时,batch 会根据 seq 重新添加到队列的合适位置,并把 max.in.flight.requests.per.connection 设为 1,这样它 前面的 batch 序号都比它小,只有前面的都发完了,它才能发。
private void insertInSequenceOrder(Deque<ProducerBatch> deque, ProducerBatch batch) {
// When we are requeing and have enabled idempotence, the reenqueued batch must always have a sequence.
if (batch.baseSequence() == RecordBatch.NO_SEQUENCE)
throw new IllegalStateException("Trying to re-enqueue a batch which doesn't have a sequence even " +
"though idempotency is enabled.");
if (transactionManager.nextBatchBySequence(batch.topicPartition) == null)
throw new IllegalStateException("We are re-enqueueing a batch which is not tracked as part of the in flight " +
"requests. batch.topicPartition: " + batch.topicPartition + "; batch.baseSequence: " + batch.baseSequence());
ProducerBatch firstBatchInQueue = deque.peekFirst();
if (firstBatchInQueue != null && firstBatchInQueue.hasSequence() && firstBatchInQueue.baseSequence() < batch.baseSequence()) {
List<ProducerBatch> orderedBatches = new ArrayList<>();
while (deque.peekFirst() != null && deque.peekFirst().hasSequence() && deque.peekFirst().baseSequence() < batch.baseSequence())
orderedBatches.add(deque.pollFirst());
log.debug("Reordered incoming batch with sequence {} for partition {}. It was placed in the queue at " +
"position {}", batch.baseSequence(), batch.topicPartition, orderedBatches.size())
deque.addFirst(batch);
// Now we have to re insert the previously queued batches in the right order.
for (int i = orderedBatches.size() - 1; i >= 0; --i) {
deque.addFirst(orderedBatches.get(i));
}
// At this point, the incoming batch has been queued in the correct place according to its sequence.
} else {
deque.addFirst(batch);
}
}
多会话幂等性
在单会话幂等性中介绍,kafka 通过引入 pid 和 seq 来实现单会话幂等性,但正是引入了 pid,当应用重启时,新的 producer 并没有 old producer 的状态数据。可能重复保存。
Kafka 事务通过隔离机制来实现多会话幂等性
kafka 事务引入了 transactionId 和 Epoch,设置 transactional.id 后,一个 transactionId 只对应一个 pid, 且 Server 端会记录最新的 Epoch 值。这样有新的 producer 初始化时,会向 TransactionCoordinator 发送 InitPIDRequest 请求, TransactionCoordinator 已经有了这个 transactionId 对应的 meta,会返回之前分配的 PID,并把 Epoch 自增 1 返回,这样当 old producer 恢复过来请求操作时,将被认为是无效 producer 抛出异常。 如果没有开启事务,TransactionCoordinator 会为新的 producer 返回 new pid,这样就起不到隔离效果,因此无法实现多会话幂等。
private def maybeValidateAppend(producerEpoch: Short, firstSeq: Int, offset: Long): Unit = {
validationType match {
case ValidationType.None =>
case ValidationType.EpochOnly =>
checkProducerEpoch(producerEpoch, offset)
case ValidationType.Full => //开始事务,执行这个判断
checkProducerEpoch(producerEpoch, offset)
checkSequence(producerEpoch, firstSeq, offset)
}
}
private def checkProducerEpoch(producerEpoch: Short, offset: Long): Unit = {
if (producerEpoch < updatedEntry.producerEpoch) {
throw new ProducerFencedException(s"Producer's epoch at offset $offset is no longer valid in " +
s"partition $topicPartition: $producerEpoch (request epoch), ${updatedEntry.producerEpoch} (current epoch)")
}
}
04 Consumer 端幂等性
如上所述,consumer 拉取到消息后,把消息交给线程池 workers,workers 对 message 的 handle 可能包含异步操作,又会出现以下情况:
-
先 commit,再执行业务逻辑:提交成功,处理失败 。造成丢失
-
先执行业务逻辑,再 commit:提交失败,执行成功。造成重复执行
-
先执行业务逻辑,再 commit:提交成功,异步执行 fail。造成丢失
对此我们常用的方法时,works 取到消息后先执行如下 code:
if(cache.contain(msgId)){
// cache中包含msgId,已经处理过
continue;
}else {
lock.lock();
cache.put(msgId,timeout);
commitSync();
lock.unLock();
}
// 后续完成所有操作后,删除cache中的msgId,只要msgId存在cache中,就认为已经处理过。Note:需要给cache设置有消息