清理offset_Kafka的日志清理-LogCleaner

最新推荐文章于 2024-07-18 09:36:02 发布

明室

最新推荐文章于 2024-07-18 09:36:02 发布

阅读量895

点赞数

文章标签：清理offset

本文链接：https://blog.csdn.net/weixin_33365844/article/details/113720631

版权

本文介绍了Kafka的日志清理策略，重点讲解了基于Compact策略的LogCleaner。Compact策略适用于包含Key的消息，通过合并相同Key的message，只保留最新的一条。清理过程中涉及LogCleaningInProgress、LogCleaningAborted和LogCleaningPaused三种状态的转换。Kafka通过LogToClean类选择最需要清理的日志，并根据cleanableRatio决定清理顺序。清理过程包括构建OffsetMap、分组LogSegments和实际的清理操作。在整个流程中，Kafka考虑了IO效率和日志的清理比率，确保数据的高效管理。

摘要由CSDN通过智能技术生成

这里说的日志,是指Kafka保存写入消息的文件;

Kafka日志清除策略包括中间:基于时间和大小的删除策略;

Compact清理策略;

我们这里主要介绍基于Compact策略的Log Clean;Compact策略说明Compact就是压缩, 只能针对特定的topic应用此策略,即写入的message都带有Key, 合并相同Key的message, 只留下最新的message;

在压缩过程中, 针对message的payload为null的也将会去除掉;

官网上扒了一张图, 大家先感受下:

AAffA0nNPuCLAAAAAElFTkSuQmCC

110.png日志清理过程中的状态主要涉及三种状态: LogCleaningInProgress, LogCleaningAborted,和LogCleaningPaused, 从字面上就很容易理解是什么意思,下面是源码中的注释:If a partition is to be cleaned, it enters the LogCleaningInProgress state.

While a partition is being cleaned, it can be requested to be aborted and paused. Then the partition first enters

the LogCleaningAborted state. Once the cleaning task is aborted, the partition enters the LogCleaningPaused state.

While a partition is in the LogCleaningPaused state, it won't be scheduled for cleaning again, until cleaning is requested to be resumed.LogCleanerManager类管理所有清理的log的状态及转换:def abortCleaning(topicAndPartition: TopicAndPartition)def abortAndPauseCleaning(topicAndPartition: TopicAndPartition)def resumeCleaning(topicAndPartition: TopicAndPartition)def checkCleaningAborted(topicAndPartition: TopicAndPartition)要清理的日志的选取因为这个compact清理过程涉及到log和index等文件的重写,比较耗IO, 因此kafka会作流控, 每次compact时都会先按规则确定要清理哪些TopicAndPartiton的log;

使用LogToClean类来表示要被清理的Log:private case class LogToClean(topicPartition: TopicAndPartition, log: Log, firstDirtyOffset: Long) extends Ordered[LogToClean] {

val cleanBytes = log.logSegments(-1, firstDirtyOffset).map(_.size).sum

val dirtyBytes = log.logSegments(firstDirtyOffset, math.max(firstDirtyOffset, log.activeSegment.baseOffset)).map(_.size).sum

val cleanableRatio = dirtyBytes / totalBytes.toDouble def totalBytes = cleanBytes + dirtyBytes

override def compare(that: LogToClean): Int = math.signum(this.cleanableRatio - that.cleanableRatio).toInt

}firstDirtyOffset:表示本次清理的起始点, 其前边的offset将被作清理,与在其后的message作key的合并;

val cleanableRatio = dirtyBytes / totalBytes.toDouble, 需要清理的log的比例,这个值越大,越可能被最后选中作清理;

每次清理完,要更新当前已经清理到的位置, 记录在cleaner-offset-checkpoint文件中,作为下一次清理时生成fir

最低0.47元/天解锁文章

明室

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫