Kafka源代码阅读(2):日志操作

kafka后台抽象的Log由若干个LogSegment组成。每一个LogSegments都有一个base offset,是该logsegment第一条消息的偏移量。Server根据时间或者大小限制来创建新的LogSegment。
Log是一个partition的一个replica的存储。
各个server的日志存储不一定相同,即使是相同的topic里面相同partion的副本,存储的起始offset也不相同。

后台维护线程

日志操作主要包含一下几个后台线程

      ##LogManager.scala
      //遍历所有Log。负责清理未压缩的日志,清除条件
      //1.日志超过保留时间 2.日志大小超过保留大小】
      scheduler.schedule("kafka-log-retention",
                         cleanupLogs _,
                         delay = InitialTaskDelayMs,
                         period = retentionCheckMs,
                         TimeUnit.MILLISECONDS)
      info("Starting log flusher with a default period of %d ms.".format(flushCheckMs))
      //将超过写回限制时间且存在更新的Log写回磁盘。
      //调用JAVA NIO中的FileChannel中的force,该方法将负责将channel中的所有未未写入磁盘的内容写入磁盘。
      scheduler.schedule("kafka-log-flusher",
                         flushDirtyLogs _,
                         delay = InitialTaskDelayMs,
                         period = flushCheckMs,
                         TimeUnit.MILLISECONDS)

      //向路径中写入当前的恢复点,避免在重启时需要重新恢复全部数据
      scheduler.schedule("kafka-recovery-point-checkpoint",
                         checkpointLogRecoveryOffsets _,
                         delay = InitialTaskDelayMs,
                         period = flushRecoveryOffsetCheckpointMs,
                         TimeUnit.MILLISECONDS)
      //向日志目录写入当前存储的日志中的start offset。避免读到已经被删除的日志
      scheduler.schedule("kafka-log-start-offset-checkpoint",
                         checkpointLogStartOffsets _,
                         delay = InitialTaskDelayMs,
                         period = flushStartOffsetCheckpointMs,
                         TimeUnit.MILLISECONDS)
      //清理已经被标记为删除的日志
      scheduler.schedule("kafka-delete-logs",
                         deleteLogs _,
                         delay = InitialTaskDelayMs,
                         period = defaultConfig.fileDeleteDelayMs,
                         TimeUnit.MILLISECONDS)

Log的flush会具体到某个segment的flush

 //Log.scala
 def flush(offset: Long) : Unit = {
    maybeHandleIOException(s"Error while flushing log for $topicPartition in dir ${dir.getParent} with offset $offset") {
      if (offset <= this.recoveryPoint)
        return
      debug("Flushing log '" + name + " up to offset " + offset + ", last flushed: " + lastFlushTime + " current time: " +
        time.milliseconds + " unflushed = " + unflushedMessages)
      for (segment <- logSegments(this.recoveryPoint, offset))
        segment.flush()

      lock synchronized {
        checkIfMemoryMappedBufferClosed()
        if (offset > this.recoveryPoint) {
          this.recoveryPoint = offset
          lastflushedTime.set(time.milliseconds)
        }
      }
    }

最终的日为止文件操作在FileRecords.java实现。封装了JAVA NIO中FILE CHANNEL的常见操作。

日志追加流程

最终channel.write将MemoryRecords写入硬盘文件中。
MemoryRecords是Kakfa中Record在内存中的实现形式。

##MemoryRecords.java
public class MemoryRecords extends AbstractRecords {
    //封装了NIO中的ByteBuffer
    private final ByteBuffer buffer;
##FileRecords.java
public class FileRecords extends AbstractRecords implements Closeable {
    //访问该文件的channel
    private final FileChannel channel;
    //在打开文件的时候使用java.io的File类初始化该实例的channel
    public static FileRecords open(File file,
                                   boolean mutable,
                                   boolean fileAlreadyExists,
                                   int initFileSize,
                                   boolean preallocate) throws IOException {
        FileChannel channel = openChannel(file, mutable, fileAlreadyExists, initFileSize, preallocate);
        int end = (!fileAlreadyExists && preallocate) ? 0 : Integer.MAX_VALUE;
        return new FileRecords(file, channel, 0, end, false);
    }
    //把内存中的记录追加到文件中
    public int append(MemoryRecords records) throws IOException {
        //底层实现就是channel.write(buffer)
        int written = records.writeFullyTo(channel);
        size.getAndAdd(written);
        return written;
    }

##MemoryRecords.java
    public int writeFullyTo(GatheringByteChannel channel) throws IOException {
        buffer.mark();
        int written = 0;
        while (written < sizeInBytes())
            written += channel.write(buffer);
        buffer.reset();
        return written;
    }

第二种方式如下

日志读取流程

FileRecords类提供了两种方式来读取日志文件
+ 采用NIO中的channel.read将内容读到NIO中的ByteBuffer里。
+ 采用NIO中fileChannel.transferTo将内容直接零拷贝到socketchannel中。注意,这一步,broker端既不对数据进行解压缩,而是将压缩数据直接发给客户端,让客户端进行解压缩。
第一种方式如下:

##FileRecords.java
    public ByteBuffer readInto(ByteBuffer buffer, int position) throws IOException {
        Utils.readFully(channel, buffer, position + this.start);
        //buffer从写入模式切换到了读出模式,返回
        buffer.flip();
        return buffer;
    }
##Utils.java    
    public static void readFully(FileChannel channel, ByteBuffer destinationBuffer, long position) throws IOException {
        if (position < 0) {
            throw new IllegalArgumentException("The file channel position cannot be negative, but it is " + position);
        }
        long currentPosition = position;
        int bytesRead;
        do {
            bytesRead = channel.read(destinationBuffer, currentPosition);
            currentPosition += bytesRead;
        } while (bytesRead != -1 && destinationBuffer.hasRemaining());
    }

第二种方式如下:

##FileRecords.java
    @Override
    public long writeTo(GatheringByteChannel destChannel, long offset, int length) throws IOException {
        final long bytesTransferred;
        if (destChannel instanceof TransportLayer) {
            //写入传输层的socket
            TransportLayer tl = (TransportLayer) destChannel;
            bytesTransferred = tl.transferFrom(channel, position, count);
        } else {
            bytesTransferred = channel.transferTo(position, count, destChannel);
        }
        return bytesTransferred;
    }

##PlaintextTransportLayer.java
public class PlaintextTransportLayer implements TransportLayer {
    //实例保存的具体socket
    private final SocketChannel socketChannel;
    @Override
    public long transferFrom(FileChannel fileChannel, long position, long count) throws IOException {
        //NIO包方法,从filechannel零拷贝到socketchannel。
        return fileChannel.transferTo(position, count, socketChannel);
    }

在操作系统支持的情况下,该数据并不需要将源数据从内核态拷贝到用户态,再从用户态拷贝到内核态。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值