leveldb之log_write

leveldb在更新数据时,首先通过log_write将变更记录到日志文件,确保在内存数据丢失时能够恢复。log数据以32KB块的形式存储,包含校验和、长度和类型信息。Writer类负责添加记录,其构造函数和AddRecord方法处理数据的写入,包括计算块剩余空间、设置类型,并通过EmitPhysicalRecord和Append将数据写入磁盘。当缓冲区满时,通过FlushBuffer和WriteUnbuffered将数据持久化。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

leveldb之log_write

路径:/leveldb/log_writer.h

leveldb在修改数据时会先写入日志,然后再更新内存memtable。这样保证了内存数据丢失时可以恢复。

log数据格式

log格式在官方文档中定义,详细的可以看官方链接:

block := record* trailer?
record :=
  checksum: uint32     // crc32c of type and data[] ; little-endian
  length: uint16       // little-endian
  type: uint8          // One of FULL, FIRST, MIDDLE, LAST
  data: uint8[length]

在leveldb/db/log_format.h中也有定义

enum RecordType {
  // Zero is reserved for preallocated files
  kZeroType = 0,

  kFullType = 1,

  // For fragments
  kFirstType = 2,
  kMiddleType = 3,
  kLastType = 4
};
static const int kMaxRecordType = kLastType;

static const int kBlockSize = 32768;

// Header is checksum (4 bytes), length (2 bytes), type (1 byte).
static const int kHeaderSize = 4 + 2 + 1;

每个块的大小为32768B即32KB,log格式如图所示,头部为校验和大小为4字节,长度为2字节,类型为1字节:

在这里插入图片描述

操作类型如表所示:

类型作用
kZeroType空块
kFullType说明该块包含整个用户记录的内容
kFirstType用户记录的第一个片段的类型
kMiddleType用户记录的所有内部片段的类型
kLastType用户记录的最后一个片段的类型

kFirstType、kMiddleType、kLastType表示用户记录被分成多个片段

定义

class Writer {
 public:
  // Create a writer that will append data to "*dest".
  // "*dest" must be initially empty.
  // "*dest" must remain live while this Writer is in use.
  explicit Writer(WritableFile* dest);

  // Create a writer that will append data to "*dest".
  // "*dest" must have initial length "dest_length".
  // "*dest" must remain live while this Writer is in use.
  Writer(WritableFile* dest, uint64_t dest_length);

  Writer(const Writer&) = delete;
  Writer& operator=(const Writer&) = delete;

  ~Writer();

  Status AddRecord(const Slice& slice);

 private:
  Status EmitPhysicalRecord(RecordType type, const char* ptr, size_t length);

  WritableFile* dest_;
  int block_offset_;  // Current offset in block
  uint32_t type_crc_[kMaxRecordType + 1];
};

可以看到Writer有两个构造函数,只提供一个AddRecord接口,功能和类名一样,只提供记录功能。

block_offset_用来记录块的偏移值。

type_crc_用来保存记录类型的crc。

从下面WritableFile的定义可以看出:dest是一个操作类,dest实现了一些基本的文件功能,Writer就是通过操作dest来实现记录功能。

class LEVELDB_EXPORT WritableFile {
 public:
  WritableFile() = default;

  WritableFile(const WritableFile&) = delete;
  WritableFile& operator=(const WritableFile&) = delete;

  virtual ~WritableFile();

  virtual Status Append(const Slice& data) = 0;
  virtual Status Close() = 0;
  virtual Status Flush() = 0;
  virtual Status Sync() = 0;
};

构造函数

static void InitTypeCrc(uint32_t* type_crc) {
  for (int i = 0; i <= kMaxRecordType; i++) {
    char t = static_cast<char>(i);
    type_crc[i] = crc32c::Value(&t, 1);
  }
}

Writer::Writer(WritableFile* dest) : dest_(dest), block_offset_(0) {
  InitTypeCrc(type_crc_);
}

Writer::Writer(WritableFile* dest, uint64_t dest_length)
    : dest_(dest), block_offset_(dest_length % kBlockSize) {
  InitTypeCrc(type_crc_);
}

添加记录

AddRecord

Status Writer::AddRecord(const Slice& slice) {
  const char* ptr = slice.data();
  size_t left = slice.size();

  // Fragment the record if necessary and emit it.  Note that if slice
  // is empty, we still want to iterate once to emit a single
  // zero-length record
  Status s;
  bool begin = true;
  do {
    const int leftover = kBlockSize - block_offset_;
    assert(leftover >= 0);
    if (leftover < kHeaderSize) {
      // Switch to a new block
      if (leftover > 0) {
        // Fill the trailer (literal below relies on kHeaderSize being 7)
        static_assert(kHeaderSize == 7, "");
        dest_->Append(Slice("\x00\x00\x00\x00\x00\x00", leftover));
      }
      block_offset_ = 0;
    }

    // Invariant: we never leave < kHeaderSize bytes in a block.
    assert(kBlockSize - block_offset_ - kHeaderSize >= 0);

    const size_t avail = kBlockSize - block_offset_ - kHeaderSize;
    const size_t fragment_length = (left < avail) ? left : avail;

    RecordType type;
    const bool end = (left == fragment_length);
    if (begin && end) {
      type = kFullType;
    } else if (begin) {
      type = kFirstType;
    } else if (end) {
      type = kLastType;
    } else {
      type = kMiddleType;
    }

    s = EmitPhysicalRecord(type, ptr, fragment_length);
    ptr += fragment_length;
    left -= fragment_length;
    begin = false;
  } while (s.ok() && left > 0);
  return s;
}

添加记录的流程如下:

1.先计算出块剩余空间大小leftover;

2.如果剩余空间大小leftover小于头部大小kHeaderSize,则当前数据块填充0,块内偏移量block_offset_重置为0,将数据放入下一个数据块中;

3.计算块内可用来存放数据大小avail和实际能放入的数据大小fragment_length

4.设置type,type设置可以参考前面的表格

5.EmitPhysicalRecord提交给物理记录,循环直到数据全部放入

在这里插入图片描述

EmitPhysicalRecord

流程很简单,生成一个头部,计算crc,写入磁盘。

Status Writer::EmitPhysicalRecord(RecordType t, const char* ptr,
                                  size_t length) {
  assert(length <= 0xffff);  // Must fit in two bytes
  assert(block_offset_ + kHeaderSize + length <= kBlockSize);

  // Format the header
  char buf[kHeaderSize];
  buf[4] = static_cast<char>(length & 0xff);
  buf[5] = static_cast<char>(length >> 8);
  buf[6] = static_cast<char>(t);

  // Compute the crc of the record type and the payload.
  uint32_t crc = crc32c::Extend(type_crc_[t], ptr, length);
  crc = crc32c::Mask(crc);  // Adjust for storage
  EncodeFixed32(buf, crc);

  // Write the header and the payload
  Status s = dest_->Append(Slice(buf, kHeaderSize));
  if (s.ok()) {
    s = dest_->Append(Slice(ptr, length));
    if (s.ok()) {
      s = dest_->Flush();
    }
  }
  block_offset_ += kHeaderSize + length;
  return s;
}

Append

写入磁盘操作Append代码在/util/env_posix.cc中实现

Status Append(const Slice& data) override {
    size_t write_size = data.size();
    const char* write_data = data.data();

    // Fit as much as possible into buffer.
    size_t copy_size = std::min(write_size, kWritableFileBufferSize - pos_);
    std::memcpy(buf_ + pos_, write_data, copy_size);
    write_data += copy_size;
    write_size -= copy_size;
    pos_ += copy_size;
    if (write_size == 0) {
        return Status::OK();
    }

    // Can't fit in buffer, so need to do at least one write.
    Status status = FlushBuffer();
    if (!status.ok()) {
        return status;
    }

    // Small writes go to buffer, large writes are written directly.
    if (write_size < kWritableFileBufferSize) {
        std::memcpy(buf_, write_data, write_size);
        pos_ = write_size;
        return Status::OK();
    }
    return WriteUnbuffered(write_data, write_size);
}

Append把数据写入缓冲区buff_中,无法放入缓冲区时,主动调用FlushBuffer将缓存写入磁盘。

FlushBuffer、WriteUnbuffered

FlushBuffer、WriteUnbuffered实现如下:

Status FlushBuffer() {
    Status status = WriteUnbuffered(buf_, pos_);
    pos_ = 0;
    return status;
}

Status WriteUnbuffered(const char* data, size_t size) {
    while (size > 0) {
        ssize_t write_result = ::write(fd_, data, size);
        if (write_result < 0) {
            if (errno == EINTR) {
                continue;  // Retry
            }
            return PosixError(filename_, errno);
        }
        data += write_result;
        size -= write_result;
    }
    return Status::OK();
}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值