kafka日志存储（七）：LogManager的定时任务

最新推荐文章于 2021-04-26 14:10:41 发布

其实系一个须刨

最新推荐文章于 2021-04-26 14:10:41 发布

阅读量332

点赞数

分类专栏： Kafka-0.10

本文链接：https://blog.csdn.net/lianggx3/article/details/93422558

版权

Kafka-0.10 专栏收录该内容

73 篇文章 11 订阅

订阅专栏

在一个brokers上所有的log都是由LogManager管理的，LogManager提供了加载Log、创建Log、删除Log和查询的功能。分别是日志刷写(log-flusher)、日志保留(log-retention)、检查点更新(recovery-point-checkpoint)以及日志清理(Cleaner)。
LogManager各个字段的功能：
logDirs：log目录集合，通过log.dirs配置。
ioThreads：完成Log加载的相爱过年关操作，每个log目录下分配指定的线程执行加载。
scheduler：KafkaScheduler对象，用来执行周期任务的线程池。
logs：Pool[TopicAndPartition, Log]类型，管理TopicAndPartition和Log之间的对应关系。底层使用hashMap实现。
dirLocks：FileLock集合，对每个log目录加锁。
recoveryPointCheckpoints：Map[File,OffsetCheckpoint]类型。管理每个log目录下和它的RecoveryPointCheckpoint文件的映射关系。
LogManager中的定时任务

log-retention任务：按照两个条件进行清理，一是存活时间，二是Log的大小。

  def cleanupLogs() {
    debug("Beginning log cleanup...")
    var total = 0
    val startMs = time.milliseconds
    //心如果Log的cleanup.policy配置不是delete，就不会刹车农户
    for(log <- allLogs; if !log.config.compact) {
      debug("Garbage collecting '" + log.name + "'")
      //删除的任务给了cleanupExpiredSegments和cleanupSegmentsToMaintainSize，分别根据时间和大小删除
      total += cleanupExpiredSegments(log) + cleanupSegmentsToMaintainSize(log)
    }
    debug("Log cleanup completed. " + total + " files deleted in " +
                  (time.milliseconds - startMs) / 1000 + " seconds")
  }

cleanupExpiredSegments根据存活的时间来删除日志：

  private def cleanupExpiredSegments(log: Log): Int = {
    if (log.config.retentionMs < 0)
      return 0
    val startMs = time.milliseconds
    //LogManager管理的是Log，不能直接管理LogSegment，删除的LogSegment的任务给Log处理。
    //删除的条件是LogSegment的日志文件在最近一段时间内没有被修改(retentionMs)
    log.deleteOldSegments(startMs - _.lastModified > log.config.retentionMs)
  }

  def deleteOldSegments(predicate: LogSegment => Boolean): Int = {
    lock synchronized {
      //获取activeSegment
      val lastEntry = segments.lastEntry
      val deletable =
        if (lastEntry == null) Seq.empty
        //通过logSegments方法得到segments跳表中value集合的迭代器，循环检测LogSegment是否符合删除条件
        else logSegments.takeWhile(s => predicate(s) && (s.baseOffset != lastEntry.getValue.baseOffset || s.size > 0))
      val numToDelete = deletable.size
      if (numToDelete > 0) {
        // 全部都符合条件，那至少得保留一个LogSegment，创建一个activeSegment。
        if (segments.size == numToDelete)
          roll()
        // 删除LogSegment
        deletable.foreach(deleteSegment(_))
      }
      numToDelete
    }
  }
  
  private def deleteSegment(segment: LogSegment) {
    info("Scheduling log segment %d for log %s for deletion.".format(segment.baseOffset, name))
    lock synchronized {
      //从segments集合中删除LogSegment对象
      segments.remove(segment.baseOffset)
      //异步删除日志文件和索引文件。
      asyncDeleteSegment(segment)
    }
  }
  
  private def asyncDeleteSegment(segment: LogSegment) {
      //日志文件和索引文件改成.deleted后缀。
    segment.changeFileSuffixes("", Log.DeletedFileSuffix)
    def deleteSeg() {
      //删除日志文件和索引文件的定时任务。
      info("Deleting segment %d from log %s.".format(segment.baseOffset, name))
      segment.delete()
    }
    scheduler.schedule("delete-file", deleteSeg, delay = config.fileDeleteDelayMs)
  }

cleanupSegmentsToMaintainSize根据retention.bytes配置项预当前Log的大小判断是否删除LogSegment。

  private def cleanupSegmentsToMaintainSize(log: Log): Int = {
    if(log.config.retentionSize < 0 || log.size < log.config.retentionSize)
      return 0
    //计算需要删除的字节数
    var diff = log.size - log.config.retentionSize
    //判断这个LogSegment是否需要删除.
    def shouldDelete(segment: LogSegment) = {
      if(diff - segment.size >= 0) {
        diff -= segment.size
        true
      } else {
        false
      }
    }
    log.deleteOldSegments(shouldDelete)
  }

log-flusher任务会周期性地执行flush操作,判断Log未刷新的时长是否大于flush.ms.

  private def flushDirtyLogs() = {
    debug("Checking for dirty logs to flush...")
    //遍历logs集合
    for ((topicAndPartition, log) <- logs) {
      try {
        val timeSinceLastFlush = time.milliseconds - log.lastFlushTime
        debug("Checking if flush is needed on " + topicAndPartition.topic + " flush interval  " + log.config.flushMs +
              " last flushed " + log.lastFlushTime + " time since last flush: " + timeSinceLastFlush)
              //检测是否到时间执行flush操作,调用log.flush方法
        if(timeSinceLastFlush >= log.config.flushMs)
          log.flush
      } catch {
        case e: Throwable =>
          error("Error flushing topic " + topicAndPartition.topic, e)
      }
    }
  }

每个目录下都有一个recoveryPointCheckpoints文件,记录这个log目录下每个Log的recoveryPoint值.recoveryPointCheckpoints在Brokers启动时帮组Brokers进行Log的恢复工作.
recovery-point-checkpoint会周期性地调用LogManager.checkpointRecoveryPointOffsets完成recoveryPointCheckpoints文件的更新

  def checkpointRecoveryPointOffsets() {'
    //对每个目录调用checkpointLogsInDir方法.
    this.logDirs.foreach(checkpointLogsInDir)
  }
  
  private def checkpointLogsInDir(dir: File): Unit = {
      //获取log目录下的TopicAndPartition值,以及对应的Log对象
    val recoveryPoints = this.logsByDir.get(dir.toString)
    if (recoveryPoints.isDefined) {
        //更新recoveryPointCheckpoints文件
      this.recoveryPointCheckpoints(dir).write(recoveryPoints.get.mapValues(_.recoveryPoint))
    }
  }
  recoveryPointCheckpoints文件的更新在OffsetCheckPoint中实现的,更新方式就是把log目录下所有的recoveryPoint写到tmp文件中，然后用tmp去替换。
  def write(offsets: Map[TopicAndPartition, Long]) {
    lock synchronized {
      // write to temp file and then swap with the existing file
      val fileOutputStream = new FileOutputStream(tempPath.toFile)
      val writer = new BufferedWriter(new OutputStreamWriter(fileOutputStream))
      try {
        //写入当前版本号
        writer.write(CurrentVersion.toString)
        writer.newLine()
        //写入记录条数
        writer.write(offsets.size.toString)
        writer.newLine()
        //写入topic名称、分区编号和对应Log的recoveryPoint
        offsets.foreach { case (topicPart, offset) =>
          writer.write(s"${topicPart.topic} ${topicPart.partition} $offset")
          writer.newLine()
        }
        //刷新磁盘
        writer.flush()
        fileOutputStream.getFD().sync()
      } catch {
        case e: FileNotFoundException =>
          if (FileSystems.getDefault.isReadOnly) {
            fatal("Halting writes to offset checkpoint file because the underlying file system is inaccessible : ", e)
            Runtime.getRuntime.halt(1)
          }
          throw e
      } finally {
        writer.close()
      }
      //tmp临时文件替换到  recoveryPointCheckpoints文件
      Utils.atomicMoveWithFallback(tempPath, path)
    }
  }

其实系一个须刨

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
kafka日志存储（七）：LogManager的定时任务

在一个brokers上所有的log都是由LogManager管理的，LogManager提供了加载Log、创建Log、删除Log和查询的功能。分别是日志刷写(log-flusher)、日志保留(log-retention)、检查点更新(recovery-point-checkpoint)以及日志清理(Cleaner)。LogManager各个字段的功能：logDirs：log目录集合，通过lo...
复制链接

扫一扫

专栏目录