LogManager 是 kafka 日志数据操作的入口,基于上一节分析的 Log 类对象提供了对日志数据的加载、创建、删除,以及查询等功能。log.dirs 目录均由 LogManager 负责管理,LogManager 在启动时会校验 log.dirs 配置。每个 log 目录下包含多个 topic 分区目录,每个 topic 分区目录由一个 Log 类对象对其进行管理,LogManager 会记录每个 topic 分区对象及其对应的 Log 之间的映射关系。LogManager 类的定义如下:
class LogManager(logDirs: Seq[File], // log 目录集合,对应 log.dirs 配置,一般选择 log 数目最少的目录进行创建
initialOfflineDirs: Seq[File],
val topicConfigs: Map[String, LogConfig], // note that this doesn't get updated after creation
val initialDefaultConfig: LogConfig,
val cleanerConfig: CleanerConfig, // log cleaner 相关配置
recoveryThreadsPerDataDir: Int,
val flushCheckMs: Long,
val flushRecoveryOffsetCheckpointMs: Long,
val flushStartOffsetCheckpointMs: Long,
val retentionCheckMs: Long,
val maxPidExpirationMs: Int,
scheduler: Scheduler, // 定时任务调度器
val brokerState: BrokerState, // 当前 broker 节点的状态
brokerTopicStats: BrokerTopicStats,
logDirFailureChannel: LogDirFailureChannel,
time: Time) extends Logging with KafkaMetricsGroup {
import LogManager._
val LockFile = ".lock"
val InitialTaskDelayMs = 30 * 1000
/** 创建或删除 Log 时的锁对象 */
private val logCreationOrDeletionLock = new Object
/** 记录每个 topic 分区对象与 Log 对象之间的映射关系 */
private val currentLogs = new Pool[TopicPartition, Log]()
// Future logs are put in the directory with "-future" suffix. Future log is created when user wants to move replica
// from one log directory to another log directory on the same broker. The directory of the future log will be renamed
// to replace the current log of the partition after the future log catches up with the current log
private val futureLogs = new Pool[TopicPartition, Log]()
// Each element in the queue contains the log object to be deleted and the time it is scheduled for deletion.
/** 记录需要被删除的 Log 对象 */
private val logsToBeDeleted = new LinkedBlockingQueue[(Log, Long)]()
private val _liveLogDirs: ConcurrentLinkedQueue[File] = createAndValidateLogDirs(logDirs, initialOfflineDirs)
@volatile private var _currentDefaultConfig = initialDefaultConfig
@volatile private var numRecoveryThreadsPerDataDir = recoveryThreadsPerDataDir
// This map contains all partitions whose logs are getting loaded and initialized. If log configuration
// of these partitions get updated at the same time, the corresponding entry in this map is set to "true",
// which triggers a config reload after initialization is finished (to get the latest config value).
// See KAFKA-8813 for more detail on the race condition
// Visible for testing
private[log] val partitionsInitializing = new ConcurrentHashMap[TopicPartition, Boolean]().asScala
/** 尝试对每个 log 目录在文件系统层面加锁,这里加的是进程锁 */
private val dirLocks = lockLogDirs(liveLogDirs)
/**
* 遍历为每个 log 目录创建一个操作其名下 recovery-point-offset-checkpoint 文件的 OffsetCheckpoint 对象,
* 并建立映射关系
*/
@volatile private var recoveryPointCheckpoints = liveLogDirs.map(dir =>
(dir, new OffsetCheckpointFile(new File(dir, RecoveryPointCheckpointFile), logDirFailureChannel))).toMap
@volatile private var logStartOffsetCheckpoints = liveLogDirs.map(dir =>
(dir, new OffsetCheckpointFile(new File(dir, LogStartOffsetCheckpointFile), logDirFailureChannel))).toMap
}
在log类里面,segment的定义为
/**
* 当前 Log 包含的 LogSegment 集合,SkipList 结构:
* - 以 baseOffset 作为 key
* - 以 LogSegment 对象作为 value
*/
protected val segments: ConcurrentNavigableMap[java.lang.Long, LogSegment] = new ConcurrentSkipListMap[java.lang.Long, LogSegment]
1. 它是线程安全的,这样 Kafka 源码不需要自行确保日志段操作过程中的线程安全;
2. 它是键值(Key)可排序的 Map。Kafka 将每个日志段的起始位移值作为 Key,这样一来,我们就能够很方便地根据所有日志段的起始位移值对它们进行排序和比较,同时还能快速地找到与给定位移值相近的前后两个日志段。
接下来介绍LogManager初始化的过程。
loadLogs执行加载每个 log 目录下的日志文件,并为每个 topic 分区对应的日志目录创建一个 Log 对象,对于标记为需要删除的 topic 分区目录(对应“-delete”后缀的目录),则将其 Log 对象添加到 LogManager#logsToBeDeleted 字段中,等待后面的周期性任务(kafka-delete-logs)对其进行删除。
/**
* Recover and load all logs in the given data directories
* LogManager 在实例化时会为每个 log 目录创建一个指定大小的线程池,然后对目录下的子目录(不包括文件)进行并发加载,
* 最终将每个 topic 分区目录下的日志相关数据封装成 Log 对象,并记录到 LogManager#logs 字段中,这是一个 Pool[K, V] 类型的字段,
* 基于 ConcurrentHashMap 实现,其中这里的 key 为 Log 对象所属的 topic 分区对象。
*/
private def loadLogs(): Unit = {
info("Loading logs.")
val startMs = time.milliseconds
// 用于记录所有 log 目录对应的线程池
val threadPools = ArrayBuffer.empty[ExecutorService]
val offlineDirs = mutable.Set.empty[(String, IOException)]
val jobs = mutable.Map.empty[File, Seq[Future[_]]]
// 遍历处理每个 log 目录
for (dir <- liveLogDirs) {
try {
// 为每个 log 目录创建一个 ioThreads 大小的线程池
val pool = Executors.newFixedThreadPool(numRecoveryThreadsPerDataDir)
threadPools.append(pool)
// 尝试获取 .kafka_cleanshutdown 文件,如果该文件存在则说明 broker 节点是正常关闭的
val cleanShutdownFile = new File(dir, Log.CleanShutdownFile)
if (cleanShutdownFile.exists) {
debug(s"Found clean shutdown file. Skipping recovery for all logs in data directory: ${dir.getAbsolutePath}")
} else {
// log recovery itself is being performed by `Log` class during initialization
// 当前 broker 不是正常关闭,设置 broker 状态为 RecoveringFromUncleanShutdown,表示正在从上次异常关闭中恢复
brokerState.newState(RecoveringFromUncleanShutdown)
}
// 读取每个 log 目录下的 recovery-point-offset-checkpoint 文件,返回 topic 分区对象与 HW 之间的映射关系
var recoveryPoints = Map[TopicPartition, Long]()
try {
recoveryPoints = this.recoveryPointCheckpoints(dir).read
} catch {
case e: Exception =>
warn(s"Error occurred while reading recovery-point-offset-checkpoint file of directory $dir", e)
warn("Resetting the recovery checkpoint to 0")
}
var logStartOffsets = Map[TopicPartition, Long]()
try {
logStartOffsets = this.logStartOffsetCheckpoints(dir).read
} catch {
case e: Exception =>
warn(s"Error occurred while reading log-start-offset-checkpoint file of directory $dir", e)
}
// 遍历当前 log 目录的子目录,仅处理目录,忽略文件
val jobsForDir = for {
dirContent <- Option(dir.listFiles).toList
logDir <- dirContent if logDir.isDirectory
} yield {
// 为每个 Log 目录创建一个 Runnable 任务
CoreUtils.runnable {
try {
loadLog(logDir, recoveryPoints, logStartOffsets)
} catch {
case e: IOException =>
offlineDirs.add((dir.getAbsolutePath, e))
error(s"Error while loading log dir ${dir.getAbsolutePath}", e)
}
}
}
// 提交上面创建的任务,并将提交结果封装到 jobs 集合中,jobsForDir 是 List[Runnable] 类型
jobs(cleanShutdownFile) = jobsForDir.map(pool.submit)
} catch {
case e: IOException =>
offlineDirs.add((dir.getAbsolutePath, e))
error(s"Error while loading log dir ${dir.getAbsolutePath}", e)
}
}
// 阻塞等待上面提交的任务执行完成,即等待所有 log 目录下 topic 分区对应的目录文件加载完成
try {
for ((cleanShutdownFile, dirJobs) <- jobs) {
dirJobs.foreach(_.get)
try {
// 删除对应的 .kafka_cleanshutdown 文件
cleanShutdownFile.delete()
} catch {
case e: IOException =>
offlineDirs.add((cleanShutdownFile.getParent, e))
error(s"Error while deleting the clean shutdown file $cleanShutdownFile", e)
}
}
offlineDirs.foreach { case (dir, e) =>
logDirFailureChannel.maybeAddOfflineLogDir(dir, s"Error while deleting the clean shutdown file in dir $dir", e)
}
} catch {
case e: ExecutionException =>
error(s"There was an error in one of the threads during logs loading: ${e.getCause}")
throw e.getCause
} finally {
// 遍历关闭线程池
threadPools.foreach(_.shutdown())
}
info(s"Logs loading complete in ${time.milliseconds - startMs} ms.")
}
创建具体的log对象
private def loadLog(logDir: File, recoveryPoints: Map[TopicPartition, Long], logStartOffsets: Map[TopicPartition, Long]): Unit = {
debug(s"Loading log '${logDir.getName}'")
// 依据目录名解析得到对应的 topic 分区对象
val topicPartition = Log.parseTopicPartitionName(logDir)
// 获取当前 topic 分区对应的配置
val config = topicConfigs.getOrElse(topicPartition.topic, currentDefaultConfig)
// 获取 topic 分区对应的 HW 值
val logRecoveryPoint = recoveryPoints.getOrElse(topicPartition, 0L)
val logStartOffset = logStartOffsets.getOrElse(topicPartition, 0L)
/* DMS Add No consumed and no delete: use newLog */
// 创建对应的 Log 对象,每个 topic 分区目录对应一个 Log 对象
val log = newLog(
/* DMS Add */
dir = logDir,
config = config,
logStartOffset = logStartOffset,
recoveryPoint = logRecoveryPoint,
maxProducerIdExpirationMs = maxPidExpirationMs,
producerIdExpirationCheckIntervalMs = LogManager.ProducerIdExpirationCheckIntervalMs,
scheduler = scheduler,
time = time,
brokerTopicStats = brokerTopicStats,
logDirFailureChannel = logDirFailureChannel)
// 如果当前 log 是需要被删除的文件,topic被删除,则记录到 logsToBeDeleted 队列中,会有周期性任务对其执行删除操作
if (logDir.getName.endsWith(Log.DeleteDirSuffix)) {
addLogToBeDeleted(log)
} else {
// 建立 topic 分区对象与其 Log 对象之间的映射关系,不允许一个 topic 分区对象对应多个目录
val previous = {
if (log.isFuture)
this.futureLogs.put(topicPartition, log)
else
this.currentLogs.put(topicPartition, log)
}
if (previous != null) {
if (log.isFuture)
throw new IllegalStateException("Duplicate log directories found: %s, %s!".format(log.dir.getAbsolutePath, previous.dir.getAbsolutePath))
else
throw new IllegalStateException(s"Duplicate log directories for $topicPartition are found in both ${log.dir.getAbsolutePath} " +
s"and ${previous.dir.getAbsolutePath}. It is likely because log directory failure happened while broker was " +
s"replacing current replica with future replica. Recover broker from this failure by manually deleting one of the two directories " +
s"for this partition. It is recommended to delete the partition in the log directory that is known to have failed recently.")
}
}
}
日志段管理提供了增删查改四个方法。
增加
Log 对象中定义了添加日志段对象的方法:addSegment。
def addSegment(segment: LogSegment): LogSegment = this.segments.put(segment.baseOffset, segment)
很简单吧,就是调用 Map 的 put 方法将给定的日志段对象添加到 segments 中。
删除
Kafka 有很多留存策略,包括基于时间维度的、基于空间维度的和基于 Log Start Offset 维度的。那啥是留存策略呢?其实,它本质上就是根据一定的规则决定哪些日志段可以删除。删除的任务在之后的定时任务中介绍。
修改
源码里面不涉及修改日志段对象,所谓的修改或更新也就是替换而已,如segments.put(1L, newSegment) 语句在没有 Key=1 时是添加日志段,否则就是替换已有日志段。
查询
主要都是利用了 ConcurrentSkipListMap 的现成方法。
segments.firstEntry:获取第一个日志段对象;
segments.lastEntry:获取最后一个日志段对象,即 Active Segment;
segments.higherEntry:获取第一个起始位移值≥给定 Key 值的日志段对象;
segments.floorEntry:获取最后一个起始位移值≤给定 Key 值的日志段对象。