kafka存储-segment

介绍

消息队列在kafka中被称为Topic。因为kafka的分布式,Topic会有多个Partition组成,分布在不同的机器上。kafka为了进一步的增加读取效率,会将Partition分为多个Segment。这篇文章将详细的介绍Segment的消息的添加,查找和索引的恢复。

添加消息

  • 添加消息到文件,调用FileRecords的append方法
  • 更新记录到索引文件,调用OffsetIndex的append方法
class LogSegment(val log: FileRecords,
                 val index: OffsetIndex,
                 val timeIndex: TimeIndex,
                 val txnIndex: TransactionIndex,
                 val baseOffset: Long,
                 val indexIntervalBytes: Int,
                 val rollJitterMs: Long,
                 time: Time) extends Logging {

  def append(firstOffset: Long,  largestOffset: Long, largestTimestamp: Long,
                         shallowOffsetOfMaxTimestamp: Long, records: MemoryRecords): Unit = {
    if (records.sizeInBytes > 0) {
      trace("Inserting %d bytes at offset %d at position %d with largest timestamp %d at shallow offset %d"
          .format(records.sizeInBytes, firstOffset, log.sizeInBytes(), largestTimestamp, shallowOffsetOfMaxTimestamp))
      // 获取FileRecord文件的末尾
      val physicalPosition = log.sizeInBytes()
      if (physicalPosition == 0)
        rollingBasedTimestamp = Some(largestTimestamp)
      // 检查offset的范围
      require(canConvertToRelativeOffset(largestOffset), "largest offset in message set can not be safely converted to relative offset.")
      // 调用FileRecords添加records
      val appendedBytes = log.append(records)
      trace(s"Appended $appendedBytes to ${log.file()} at offset $firstOffset")
      // 更新最大的timestamp和对应的offset.
      if (largestTimestamp > maxTimestampSoFar) {
        maxTimestampSoFar = largestTimestamp
        offsetOfMaxTimestamp = shallowOffsetOfMaxTimestamp
      }
      // index记录是当对应的FileRecrods中的记录大于indexIntervalBytes时,会添加新的index纪录
      if(bytesSinceLastIndexEntry > indexIntervalBytes) {
        // 添加index纪录
        index.append(firstOffset, physicalPosition)
        timeIndex.maybeAppend(maxTimestampSoFar, offsetOfMaxTimestamp)
        bytesSinceLastIndexEntry = 0
      }
      bytesSinceLastIndexEntry += records.sizeInBytes
    }
  }

  // 文件中存储的是offset与baseOffset的差值,只能用四个字节表示
  def canConvertToRelativeOffset(offset: Long): Boolean = {
    (offset - baseOffset) <= Integer.MAX_VALUE
  }

}

查找消息位置

  • 从索引文件查找对应的索引记录,调用OffsetIndex的lookup方法
  • 从文件中查找对应的RecordBatch,调用FileReocrds的searchForOffsetWithSize方法
class LogSegment  {

  private[log] def translateOffset(offset: Long, startingFilePosition: Int = 0): LogOffsetPosition = {
    // 查找对应的索引记录
    val mapping = index.lookup(offset)
    // 从文件中查找RecordBatch
    log.searchForOffsetWithSize(offset, max(mapping.position, startingFilePosition))
  }
}

class FileRecords {
    public LogOffsetPosition searchForOffsetWithSize(long targetOffset, int startingPosition) {
        // 从指定的位置startingPosition,开始顺序遍历RecordBatch
        for (FileChannelRecordBatch batch : batchesFrom(startingPosition)) {
            long offset = batch.lastOffset();
            if (offset >= targetOffset)
                // 直到找到第一个,batch的tlastOffset大于argetOffset
                // 返回 lastOffset, batch的开始位置,数据大小
                return new LogOffsetPosition(offset, batch.position(), batch.sizeInBytes());
        }
        return null;
    }
}

读取消息

read方法提供了方便查找消息的作用, 它能够指定的条件中读取消息。

  • 消息的offset的范围startOffset和maxOffset
  • 消息在物理文件中的位置不能大于maxPosition
  • 返回的数据大小不能大于maxSize
  • 是否需要读取一条完整的记录,minOneMessage
  def read(startOffset: Long, maxOffset: Option[Long], maxSize: Int, maxPosition: Long = size,
           minOneMessage: Boolean = false): FetchDataInfo = {
    if (maxSize < 0)
      throw new IllegalArgumentException("Invalid max size for log read (%d)".format(maxSize))

    val logSize = log.sizeInBytes
    // 查找startOffset的位置信息
    val startOffsetAndSize = translateOffset(startOffset)

    // if the start position is already off the end of the log, return null
    if (startOffsetAndSize == null)
      return null

    val startPosition = startOffsetAndSize.position
    val offsetMetadata = new LogOffsetMetadata(startOffset, this.baseOffset, startPosition)

    val adjustedMaxSize =
      // 如果需要保证读取一条完整的数据,则至少需要startOffset的对应的ReocrdBatch的大小
      if (minOneMessage) math.max(maxSize, startOffsetAndSize.size)
      else maxSize

    if (adjustedMaxSize == 0)
      return FetchDataInfo(offsetMetadata, MemoryRecords.EMPTY)

    val fetchSize: Int = maxOffset match {
      case None =>
        // 如果没有指定maxOffset,则只需要考虑大小。
        min((maxPosition - startPosition).toInt, adjustedMaxSize)
      case Some(offset) =>
        if (offset < startOffset)
          return FetchDataInfo(offsetMetadata, MemoryRecords.EMPTY, firstEntryIncomplete = false)
        // 获取maxPosition对应的位置
        val mapping = translateOffset(offset, startPosition)
        val endPosition =
          if (mapping == null)
            logSize
          else
            mapping.position
        // 查找满足所有条件,即之间的最小值
        min(min(maxPosition, endPosition) - startPosition, adjustedMaxSize).toInt
    }

    FetchDataInfo(offsetMetadata, log.read(startPosition, fetchSize),
      firstEntryIncomplete = adjustedMaxSize < startOffsetAndSize.size)
  }

索引文件恢复

当索引文件被损坏时,kafka会自动重建索引。

  def recover(producerStateManager: ProducerStateManager, leaderEpochCache: Option[LeaderEpochCache] = None): Int = {
    // 截断索引文件
    index.truncate()
    index.resize(index.maxIndexSize)
    // 截断时间索引文件
    timeIndex.truncate()
    timeIndex.resize(timeIndex.maxIndexSize)
    // 截断事务索引文件
    txnIndex.truncate()
    // 文件的读取位置
    var validBytes = 0
    var lastIndexEntry = 0
    maxTimestampSoFar = RecordBatch.NO_TIMESTAMP
    try {
      // 遍历数据文件的RecordBatch
      for (batch <- log.batches.asScala) {
        batch.ensureValid()

        // 更新最大的timestamp和对应的offset
        if (batch.maxTimestamp > maxTimestampSoFar) {
          maxTimestampSoFar = batch.maxTimestamp
          offsetOfMaxTimestamp = batch.lastOffset
        }

        // 如果数据间隔大小超过指定数值indexIntervalBytes,则添加索引记录
        if(validBytes - lastIndexEntry > indexIntervalBytes) {
          val startOffset = batch.baseOffset
          index.append(startOffset, validBytes)
          timeIndex.maybeAppend(maxTimestampSoFar, offsetOfMaxTimestamp)
          lastIndexEntry = validBytes
        }
        // 更新validBytes
        validBytes += batch.sizeInBytes()

        if (batch.magic >= RecordBatch.MAGIC_VALUE_V2) {
          leaderEpochCache.foreach { cache =>
            if (batch.partitionLeaderEpoch > cache.latestEpoch()) // this is to avoid unnecessary warning in cache.assign()
              cache.assign(batch.partitionLeaderEpoch, batch.baseOffset)
          }
          updateProducerState(producerStateManager, batch)
        }
      }
    } catch {
      case e: CorruptRecordException =>
        logger.warn("Found invalid messages in log segment %s at byte offset %d: %s."
          .format(log.file.getAbsolutePath, validBytes, e.getMessage))
    }

    // 检查数据文件,是否有多于的数据
    val truncated = log.sizeInBytes - validBytes
    if (truncated > 0)
      logger.debug(s"Truncated $truncated invalid bytes at the end of segment ${log.file.getAbsoluteFile} during recovery")
    log.truncateTo(validBytes)
    index.trimToValidSize()
    timeIndex.maybeAppend(maxTimestampSoFar, offsetOfMaxTimestamp, skipFullCheck = true)
    timeIndex.trimToValidSize()
    truncated
  }

概括

Segment是由数据文件FileReocrds和索引文件IndexOffset共同组成。当添加新的消息时,会更新两者。当查找消息时,也会充分的利用索引文件。

转载于:https://my.oschina.net/u/569730/blog/1504824

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值