Log分析

Log是多个LogSegment的顺序组合,形成一个逻辑上的日志,为了实现快速定位LogSegment,Log使用跳跃表SkipList对LogSegment进行管理

private val segments:ConcurrentNavigableMap[java.lang.Long, LogSeg

ment] = new ConcurrentSkipListMap[java.lang.Long,LogSegment]

 

在Log中,将每一个LogSegment的baseOffset作为key, LogSegment对象作为value存放在这个跳跃表segments中,比如我们现在要查找offset 大于4205的消息,可以首先通过segments跳表快速定位到消息所在的LogSegment,然后再调用read方法从LogSegment读取数据

 

在Log中追加消息时是顺序写入的,所以只有最后一个LogSegment才是可以写入的,他前面的LogSegment表示已经写满了,不能写入了。

 

我们可以使用Log.activeSegment获取最后一个LogSegment,即segments跳表中最后一个元素,当activeSegment对应的日志文件达到一个阀值,就需要创建新的activeSegment,然后后续新的消息写入新的activeSegment

 

一 核心字段

dir: 日志文件对应的磁盘目录

lock: 由于可能多个线程同时写,所以需要加锁

segments: 用于管理LogSegment的跳跃表

recoveryPoint: 指定恢复操作的其实offset

nextOffsetMetadata: LogOffsetMetadata主要用于产生offset分配给消息,同时也是当前副本的LEO

二 重要方法

2.1 append 添加消息

添加消息集合到日志有效的segment,如有必要,滚动到另一个段,主要负责给消息分配offsets,但是如果assignOffsets=false,我们只是检查存在的offsets是否有效

def append(messages: ByteBufferMessageSet, assignOffsets: Boolean = true): LogAppendInfo = {
  // 分析和校验ByteBufferMessageSet并返回LogAppendInfo对象,在这个对象封装了ByteBufferMessageSet
  //
第一个offset,最后一个offset,生产者采用的压缩方式,追加到log的时间戳,服务端采用的压缩方式,外层
  //
消息个数,通过验证的总字节数
 
val appendInfo = analyzeAndValidateMessageSet(messages)

  // 如果没有有效的消息,则直接返回
 
if (appendInfo.shallowCount == 0)
    return appendInfo

 
// 在将其添加到磁盘日志之前,清除未验证通过的message
 
var validMessages = trimInvalidBytes(messages, appendInfo)

  try {
    // 开始将消息写入文件磁盘
   
lock
synchronized {
      // 判断是否需要分配offsets,默认是需要的
     
if (assignOffsets) {
        // message set分配offset
       
val offset = new LongRef(nextOffsetMetadata.messageOffset)
        // 将分配的offset置为firstOffset
       
appendInfo.firstOffset = offset.value
       
val now = time.milliseconds
       
// message set做进一步的验证:消息格式转换,调整magic的值,修改时间戳等操作,并为message分配offset
       
val validateAndOffsetAssignResult = try {
          validMessages.validateMessagesAndAssignOffsets(offset, now, appendInfo.sourceCodec, appendInfo.targetCodec,
            config.compact, config.messageFormatVersion.messageFormatVersion,
            config.messageTimestampType, config.messageTimestampDifferenceMaxMs)
        } catch {
          case e: IOException => throw new KafkaException("Error in validatingmessages while appending to log '%s'".format(name), e)
        }
        // 获取验证后的消息
       
validMessages = validateAndOffsetAssignResult.validatedMessages
       
// 获取最大的时间戳
       
appendInfo.maxTimestamp = validateAndOffsetAssignResult.maxTimestamp
       
// 获取时间戳的offset
       
appendInfo.offsetOfMaxTimestamp = validateAndOffsetAssignResult.offsetOfMaxTimestamp
       
// 更新last offset
       
appendInfo.lastOffset = offset.value - 1
       
// 设置日志添加时间
       
if (config.messageTimestampType == TimestampType.LOG_APPEND_TIME)
          appendInfo.logAppendTime = now

       
// 如果 message size是够改变,重新校验
       
if (validateAndOffsetAssignResult.messageSizeMaybeChanged) {
          for (messageAndOffset <- validMessages.shallowIterator) {
            if (MessageSet.entrySize(messageAndOffset.message) > config.maxMessageSize) {
              // we recordthe original message set size instead of the trimmed size
              // to be consistent withpre-compression bytesRejectedRate recording
             
BrokerTopicStats.getBrokerTopicStats(topicAndPartition.topic).bytesRejectedRate.mark(messages.sizeInBytes)
              BrokerTopicStats.getBrokerAllTopicsStats.bytesRejectedRate.mark(messages.sizeInBytes)
              throw new RecordTooLargeException("Messagesize is %d bytes which exceeds the maximum configured message size of %d."
               
.format(MessageSet.entrySize(messageAndOffset.message), config.maxMessageSize))
            }
          }
        }

      } else {
        // 判断appendInfooffset不是递增的抛出异常
       
if (!appendInfo.offsetsMonotonic || appendInfo.firstOffset < nextOffsetMetadata.messageOffset)
          throw new IllegalArgumentException("Out oforder offsets found in " + messages)
      }

      // checkmessages set size may be exceed config.segmentSize
      //
检查message set的大小是否超过了配置文件的segmentsize(segment.bytes),超过了抛出异常
     
if (validMessages.sizeInBytes > config.segmentSize) {
        throw new RecordBatchTooLargeException("Message set size is %d bytes which exceeds the maximum configuredsegment size of %d."
         
.format(validMessages.sizeInBytes, config.segmentSize))
      }
      // segment如果满了,需要滚动到一个新的segment,否则还是用当前activeSegment
     
val segment = maybeRoll(messagesSize = validMessages.sizeInBytes,
                              maxTimestampInMessages = appendInfo.maxTimestamp)

      // 现在开始添加日志到segment
     
segment.append(firstOffset = appendInfo.firstOffset, largestTimestamp = appendInfo.maxTimestamp,
        offsetOfLargestTimestamp= appendInfo.offsetOfMaxTimestamp, messages = validMessages)

      // 更新LogEndOffset(LOE)nextOffsetMetadata字段
     
updateLogEndOffset(appendInfo.lastOffset + 1)

      trace("Appended message set to log %s with first offset: %d, next offset:%d, and messages: %s"
       
.format(this.name, appendInfo.firstOffset, nextOffsetMetadata.messageOffset, validMessages))

      // 判断是否满足flush条件,刷到磁盘文件
     
if (unflushedMessages >= config.flushInterval)
        flush()

      appendInfo
   
}
  } catch {
    case e: IOException => throw new KafkaStorageException("I/Oexception in append to log '%s'".format(name), e)
  }
}

 

2.2 analyzeAndValidateMessageSet

 

校验一下内容:1 每一个消息的crc值 2 每一个消息的size

也会计算一下内容:

1 message set的 first offset

2 message set的last offset

3 消息数量

4 字节数

5 offset是否单调递增

6 compression codec 是否被使用

private def analyzeAndValidateMessageSet(messages: ByteBufferMessageSet): LogAppendInfo = {
  var shallowMessageCount = 0 // 记录外层消息数量
  var validBytesCount = 0 // 记录通过验证的消息的字节数和
  var firstOffset, lastOffset = -1L // 记录第一条消息和最后一条消息
  var sourceCodec: CompressionCodec = NoCompressionCodec
  var monotonic = true
  var maxTimestamp = Message.NoTimestamp
  var offsetOfMaxTimestamp = -1L
  for(messageAndOffset <- messages.shallowIterator) {
    //更新第一条消息的offset,此时的offset还是生产者分配的offset
    if(firstOffset < 0)
      firstOffset = messageAndOffset.offset
    // 判断内部offset是否单调递增
    if(lastOffset >= messageAndOffset.offset)
      monotonic = false
    // 更新最后一个消息的offset
    lastOffset = messageAndOffset.offset

    val m = messageAndOffset.message

    // 检测消息的长度
    val messageSize = MessageSet.entrySize(m)
    if(messageSize > config.maxMessageSize) {
      BrokerTopicStats.getBrokerTopicStats(topicAndPartition.topic).bytesRejectedRate.mark(messages.sizeInBytes)
      BrokerTopicStats.getBrokerAllTopicsStats.bytesRejectedRate.mark(messages.sizeInBytes)
      throw new RecordTooLargeException("Message size is %d bytes which exceeds the maximum configured message size of %d."
        .format(messageSize, config.maxMessageSize))
    }

    // 检测消息的CRC32校验码
    m.ensureValid()
    if (m.timestamp > maxTimestamp) {
      maxTimestamp = m.timestamp
      offsetOfMaxTimestamp = lastOffset
    }
    shallowMessageCount += 1 // 通过则增加外层消息数
    validBytesCount += messageSize // 增加通过检测的字节数

    val messageCodec = m.compressionCodec
    if(messageCodec != NoCompressionCodec)
      sourceCodec = messageCodec // 记录生产者采用的压缩方式
  }

  // 记录服务器端采用的压缩方式
  val targetCodec = BrokerCompressionCodec.getTargetCompressionCodec(config.compressionType, sourceCodec)

  LogAppendInfo(firstOffset, lastOffset, maxTimestamp, offsetOfMaxTimestamp, Message.NoTimestamp, sourceCodec, targetCodec, shallowMessageCount, validBytesCount, monotonic)
}

 

2.3 validateMessagesAndAssignOffsets更新message set的offsets,做消息的进一步验证

更新message set的offsets,做消息的进一步验证:

消息必须有key

消息的magic value ==0,那么messageFormatVersion就等于1

消息的magic value ==1,那么messageFormatVersion就等于0

对于消息而言,如果没有格式化转换或者重写是必须,则该方法将在适当的位置执行操作和避免重新压缩。

返回ValidationAndOffsetAssignResult对象,该对象包含验证过的message set,最大的时间戳以及shallow message的offset

private[kafka] def validateMessagesAndAssignOffsets(offsetCounter: LongRef,
                                                    now: Long,
                                                    sourceCodec: CompressionCodec,
                                                    targetCodec: CompressionCodec,
                                                    compactedTopic: Boolean = false,
                                                    messageFormatVersion: Byte = Message.CurrentMagicValue,
                                                    messageTimestampType: TimestampType,
                                                    messageTimestampDiffMaxMs: Long): ValidationAndOffsetAssignResult = {
  // 源压缩类型和目标压缩类型都没有
  if (sourceCodec == NoCompressionCodec && targetCodec == NoCompressionCodec) {
    // 检查所有的messagemagic value是否与指定的一样
    if (!isMagicValueInAllWrapperMessages(messageFormatVersion))
      // 因为存在messagemagic value不一致,则需要进行统一,可能导致消息总长度变化
      // 需要创建新的ByteBufferMessageSet,同时还会进行offset的分配,验证并更新CRC323,时间戳等信息
      convertNonCompressedMessages(offsetCounter, compactedTopic, now, messageTimestampType, messageTimestampDiffMaxMs,
        messageFormatVersion)
    else
      // 处理非压缩消息且magic值统一的情况,长度不会改变,主要是进行offset的分配,验真并更新CRC32 时间戳等信息
      validateNonCompressedMessagesAndAssignOffsetInPlace(offsetCounter, now, compactedTopic, messageTimestampType,
        messageTimestampDiffMaxMs)
  } else { // 处理消息压缩情况
    // 不能复用当前的ByteBufferMessage的情况
    // 1. 消息当前的压缩类型与指定的压缩类型不一致,需要重新压缩
    // 2. magic0时需要重写消息的offset为绝对offset
    // 3. magic大于0,但是内部压缩消息某些字段需要修改,例如时间戳
    // 4. 需要转换消息格式

    // 是否可以直接复用当前的ByteBufferMessage
    var inPlaceAssignment = sourceCodec == targetCodec && messageFormatVersion > Message.MagicValue_V0

    var maxTimestamp = Message.NoTimestamp
    var offsetOfMaxTimestamp = -1L
    val expectedInnerOffset = new LongRef(0)
    val validatedMessages = new mutable.ArrayBuffer[Message]

    this.internalIterator(isShallow = false, ensureMatchingMagic = true).foreach { messageAndOffset =>
      val message = messageAndOffset.message
      validateMessageKey(message, compactedTopic)// 校验消息的key

      if (message.magic > Message.MagicValue_V0 && messageFormatVersion > Message.MagicValue_V0) {
        // No in place assignment situation 3
        // 校验时间戳
        validateTimestamp(message, now, messageTimestampType, messageTimestampDiffMaxMs)
        // 检查情况3 内部offset是否正常
        if (messageAndOffset.offset != expectedInnerOffset.getAndIncrement())
          inPlaceAssignment = false
        if (message.timestamp > maxTimestamp) {
          maxTimestamp = message.timestamp
          offsetOfMaxTimestamp = offsetCounter.value + expectedInnerOffset.value - 1
        }
      }

      if (sourceCodec != NoCompressionCodec && message.compressionCodec != NoCompressionCodec)
        throw new InvalidMessageException("Compressed outer message should not have an inner message with a " +
          s"compression attribute set: $message")

      // 检查情况4
      if (message.magic != messageFormatVersion)
        inPlaceAssignment = false
      // 保存通过上述检测和转换的Message集合
      validatedMessages += message.toFormatVersion(messageFormatVersion)
    }
    // 不能复用当前的ByteBufferMessage的场景
    if (!inPlaceAssignment) {
      // Cannot do in place assignment.
      val (largestTimestampOfMessageSet, offsetOfMaxTimestampInMessageSet) = {
        if (messageFormatVersion == Message.MagicValue_V0)
          (Some(Message.NoTimestamp), -1L)
        else if (messageTimestampType == TimestampType.CREATE_TIME)
          (Some(maxTimestamp), {if (targetCodec == NoCompressionCodec) offsetOfMaxTimestamp else offsetCounter.value + validatedMessages.length - 1})
        else // Log append time
          (Some(now), {if (targetCodec == NoCompressionCodec) offsetCounter.value else offsetCounter.value + validatedMessages.length - 1})
      }
      // 创建新的ByteBufferMessageSet对象。重新压缩
      ValidationAndOffsetAssignResult(validatedMessages = new ByteBufferMessageSet(compressionCodec = targetCodec,
                                                                                   offsetCounter = offsetCounter,
                                                                                   wrapperMessageTimestamp = largestTimestampOfMessageSet,
                                                                                   timestampType = messageTimestampType,
                                                                                   messages = validatedMessages: _*),
                                      maxTimestamp = largestTimestampOfMessageSet.get,
                                      offsetOfMaxTimestamp = offsetOfMaxTimestampInMessageSet,
                                      messageSizeMaybeChanged = true)
    } else {// 复用当前的ByteBufferMessage对象,可以减少一次压缩操作
      // 更新外层消息的offset,将其offset更新为内部最后一条压缩消息的offset
      buffer.putLong(0, offsetCounter.addAndGet(validatedMessages.size) - 1)
      // validate the messages
      validatedMessages.foreach(_.ensureValid())

      var crcUpdateNeeded = true
      val timestampOffset = MessageSet.LogOverhead + Message.TimestampOffset
      val attributeOffset = MessageSet.LogOverhead + Message.AttributesOffset
      val timestamp = buffer.getLong(timestampOffset)
      val attributes = buffer.get(attributeOffset)
      // 更新外层的时间戳等
      buffer.putLong(timestampOffset, maxTimestamp)
      if (messageTimestampType == TimestampType.CREATE_TIME && timestamp == maxTimestamp)
        // We don't need to recompute crc if the timestamp is not updated.
        crcUpdateNeeded = false
      else if (messageTimestampType == TimestampType.LOG_APPEND_TIME) {
        // Set timestamp type and timestamp
        buffer.putLong(timestampOffset, now)
        buffer.put(attributeOffset, messageTimestampType.updateAttributes(attributes))
      }

      if (crcUpdateNeeded) {
        // need to recompute the crc value
        buffer.position(MessageSet.LogOverhead)
        val wrapperMessage = new Message(buffer.slice())
        Utils.writeUnsignedInt(buffer, MessageSet.LogOverhead + Message.CrcOffset, wrapperMessage.computeChecksum)
      }
      buffer.rewind()
      // For compressed messages,
      ValidationAndOffsetAssignResult(validatedMessages = this,
                                      maxTimestamp = buffer.getLong(timestampOffset),
                                      offsetOfMaxTimestamp = buffer.getLong(0),
                                      messageSizeMaybeChanged = false)
    }
  }
}

 

2.4 maybeRoll 检测是否需要创建新的segment,如果需要则创建新的activeSegment

private def maybeRoll(messagesSize: Int, maxTimestampInMessages: Long): LogSegment = {
  val segment = activeSegment
  val now = time.milliseconds // 获取当前的时间
  // 获取segment的最长存活时间
  val reachedRollMs = segment.timeWaitedForRoll(now, maxTimestampInMessages) > config.segmentMs - segment.rollJitterMs
  // 如果LogSegment的大小+这次消息的大小已经超过配置的LogSegment的最大长度或者LogSegment最长存活时间已经到期
  if (segment.size > config.segmentSize - messagesSize ||
      (segment.size > 0 && reachedRollMs) ||
      segment.index.isFull || segment.timeIndex.isFull) {
    debug(s"Rolling new log segment in $name (log_size = ${segment.size}/${config.segmentSize}}, " +
        s"index_size = ${segment.index.entries}/${segment.index.maxEntries}, " +
        s"time_index_size = ${segment.timeIndex.entries}/${segment.timeIndex.maxEntries}, " +
        s"inactive_time_ms = ${segment.timeWaitedForRoll(now, maxTimestampInMessages)}/${config.segmentMs - segment.rollJitterMs}).")
    roll()
  } else {
    segment
  }
}
def roll(): LogSegment = {
  val start = time.nanoseconds
  lock synchronized {
    val newOffset = logEndOffset
    val logFile = logFilename(dir, newOffset)
    val indexFile = indexFilename(dir, newOffset)
    val timeIndexFile = timeIndexFilename(dir, newOffset)
    for(file <- List(logFile, indexFile, timeIndexFile); if file.exists) {
      warn("Newly rolled segment file " + file.getName + " already exists; deleting it first")
      file.delete()
    }

    segments.lastEntry() match {
      case null =>
      case entry => {
        val seg = entry.getValue
        seg.onBecomeInactiveSegment()
        seg.index.trimToValidSize()
        seg.timeIndex.trimToValidSize()
        seg.log.trim()
      }
    }
    val segment = new LogSegment(dir,
                                 startOffset = newOffset,
                                 indexIntervalBytes = config.indexInterval,
                                 maxIndexSize = config.maxIndexSize,
                                 rollJitterMs = config.randomSegmentJitter,
                                 time = time,
                                 fileAlreadyExists = false,
                                 initFileSize = initFileSize,
                                 preallocate = config.preallocate)
    val prev = addSegment(segment)
    if(prev != null)
      throw new KafkaException("Trying to roll a new log segment for topic partition %s with start offset %d while it already exists.".format(name, newOffset))
    // We need to update the segment base offset and append position data of the metadata when log rolls.
    // The next offset should not change.
    updateLogEndOffset(nextOffsetMetadata.messageOffset)
    // schedule an asynchronous flush of the old segment
    scheduler.schedule("flush-log", () => flush(newOffset), delay = 0L)

    info("Rolled new log segment for '" + name + "' in %.0f ms.".format((System.nanoTime - start) / (1000.0*1000.0)))

    segment
  }
}

 

2.6 flush 将数据刷新到磁盘

def flush(offset: Long) : Unit = {
  // offset之前的数据已经全部刷到磁盘,所以不需要刷新
  if (offset <= this.recoveryPoint)
    return
  debug("Flushing log '" + name + " up to offset " + offset + ", last flushed: " + lastFlushTime + " current time: " +
        time.milliseconds + " unflushed = " + unflushedMessages)
  // logSegments查找到recoveryPointoffset之间的LogSegment对象
  for(segment <- logSegments(this.recoveryPoint, offset))
    segment.flush()// 调用操作系统fsync命令刷新到磁盘
  lock synchronized {
    if(offset > this.recoveryPoint) {
      this.recoveryPoint = offset // 后移或者更新recoveryPoint
      lastflushedTime.set(time.milliseconds) // 修改lastflushedTime
    }
  }
}

 

2.7 read 从LogSegment读取消息

通过segments跳跃表,快速定位到需要读取的LogSegment

def read(startOffset: Long, maxLength: Int, maxOffset: Option[Long] = None, minOneMessage: Boolean = false): FetchDataInfo = {
  trace("Reading %d bytes from offset %d in log %s of length %d bytes".format(maxLength, startOffset, name, size))

  // 因为我们不使用锁来读取数据,所以同步有点棘手
  // 所以它在查询之前,将nextOffsetMetadatavolatile修饰)保存成方法的局部变量,从而避免线程安全问题
  val currentNextOffsetMetadata = nextOffsetMetadata
  val next = currentNextOffsetMetadata.messageOffset
  // 开始的offset 和下一个offset相等,直接返回,表示没有数据
  if(startOffset == next)
    return FetchDataInfo(currentNextOffsetMetadata, MessageSet.Empty)
  // 存放小于等于startOffsetentry
  var entry = segments.floorEntry(startOffset)

  // attempt to read beyond the log end offset is an error
  if(startOffset > next || entry == null)
    throw new OffsetOutOfRangeException("Request for offset %d but we only have log segments in the range %d to %d.".format(startOffset, segments.firstKey, next))

  // 基于一个比目标offset更小的offset去读取segment,但是如果永不他们大的offset读取segment不包含任何信息,
  // 将继续读取成功的segments,直到我我们获取消息或者我们到达了日志的末尾
  while(entry != null) {
    // 如果在当前激活的segmentfetch,这儿可能有竞态条件,比如2fetch请求都在消息添加后,但是在nextOffsetMetadata更新之前,
    // 如果那样,第二个fetch可能会出现offset越界的异常,为了解决这个问题,我们cap这个读取的暴露的位置代替使用active segment
    // log end
    val maxPosition = {
      if (entry == segments.lastEntry) {
        // 需要暴露的位置
        val exposedPos = nextOffsetMetadata.relativePositionInSegment.toLong
        // Check the segment again in case a new segment has just rolled out.
        // 再次检查新的segmeng防止新的segment已经完成了日志滚动
        if (entry != segments.lastEntry)
          // New log segment has rolled out, we can read up to the file end.
          entry.getValue.size
        else
          exposedPos
      } else {
        entry.getValue.size
      }
    }
    // 读取
    val fetchInfo = entry.getValue.read(startOffset, maxOffset, maxLength, maxPosition, minOneMessage)
    // 如果没有
    if(fetchInfo == null) {
      // 则返回大于指定keyentry
      entry = segments.higherEntry(entry.getKey)
    } else {
      return fetchInfo
    }
  }
  FetchDataInfo(nextOffsetMetadata, MessageSet.Empty)
}

 

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

莫言静好、

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值