leveldb之log_write
路径:/leveldb/log_writer.h
leveldb在修改数据时会先写入日志,然后再更新内存memtable。这样保证了内存数据丢失时可以恢复。
log数据格式
log格式在官方文档中定义,详细的可以看官方链接:
block := record* trailer?
record :=
checksum: uint32 // crc32c of type and data[] ; little-endian
length: uint16 // little-endian
type: uint8 // One of FULL, FIRST, MIDDLE, LAST
data: uint8[length]
在leveldb/db/log_format.h中也有定义
enum RecordType {
// Zero is reserved for preallocated files
kZeroType = 0,
kFullType = 1,
// For fragments
kFirstType = 2,
kMiddleType = 3,
kLastType = 4
};
static const int kMaxRecordType = kLastType;
static const int kBlockSize = 32768;
// Header is checksum (4 bytes), length (2 bytes), type (1 byte).
static const int kHeaderSize = 4 + 2 + 1;
每个块的大小为32768B即32KB,log格式如图所示,头部为校验和大小为4字节,长度为2字节,类型为1字节:

操作类型如表所示:
| 类型 | 作用 |
|---|---|
| kZeroType | 空块 |
| kFullType | 说明该块包含整个用户记录的内容 |
| kFirstType | 用户记录的第一个片段的类型 |
| kMiddleType | 用户记录的所有内部片段的类型 |
| kLastType | 用户记录的最后一个片段的类型 |
kFirstType、kMiddleType、kLastType表示用户记录被分成多个片段
定义
class Writer {
public:
// Create a writer that will append data to "*dest".
// "*dest" must be initially empty.
// "*dest" must remain live while this Writer is in use.
explicit Writer(WritableFile* dest);
// Create a writer that will append data to "*dest".
// "*dest" must have initial length "dest_length".
// "*dest" must remain live while this Writer is in use.
Writer(WritableFile* dest, uint64_t dest_length);
Writer(const Writer&) = delete;
Writer& operator=(const Writer&) = delete;
~Writer();
Status AddRecord(const Slice& slice);
private:
Status EmitPhysicalRecord(RecordType type, const char* ptr, size_t length);
WritableFile* dest_;
int block_offset_; // Current offset in block
uint32_t type_crc_[kMaxRecordType + 1];
};
可以看到Writer有两个构造函数,只提供一个AddRecord接口,功能和类名一样,只提供记录功能。
block_offset_用来记录块的偏移值。
type_crc_用来保存记录类型的crc。
从下面WritableFile的定义可以看出:dest是一个操作类,dest实现了一些基本的文件功能,Writer就是通过操作dest来实现记录功能。
class LEVELDB_EXPORT WritableFile {
public:
WritableFile() = default;
WritableFile(const WritableFile&) = delete;
WritableFile& operator=(const WritableFile&) = delete;
virtual ~WritableFile();
virtual Status Append(const Slice& data) = 0;
virtual Status Close() = 0;
virtual Status Flush() = 0;
virtual Status Sync() = 0;
};
构造函数
static void InitTypeCrc(uint32_t* type_crc) {
for (int i = 0; i <= kMaxRecordType; i++) {
char t = static_cast<char>(i);
type_crc[i] = crc32c::Value(&t, 1);
}
}
Writer::Writer(WritableFile* dest) : dest_(dest), block_offset_(0) {
InitTypeCrc(type_crc_);
}
Writer::Writer(WritableFile* dest, uint64_t dest_length)
: dest_(dest), block_offset_(dest_length % kBlockSize) {
InitTypeCrc(type_crc_);
}
添加记录
AddRecord
Status Writer::AddRecord(const Slice& slice) {
const char* ptr = slice.data();
size_t left = slice.size();
// Fragment the record if necessary and emit it. Note that if slice
// is empty, we still want to iterate once to emit a single
// zero-length record
Status s;
bool begin = true;
do {
const int leftover = kBlockSize - block_offset_;
assert(leftover >= 0);
if (leftover < kHeaderSize) {
// Switch to a new block
if (leftover > 0) {
// Fill the trailer (literal below relies on kHeaderSize being 7)
static_assert(kHeaderSize == 7, "");
dest_->Append(Slice("\x00\x00\x00\x00\x00\x00", leftover));
}
block_offset_ = 0;
}
// Invariant: we never leave < kHeaderSize bytes in a block.
assert(kBlockSize - block_offset_ - kHeaderSize >= 0);
const size_t avail = kBlockSize - block_offset_ - kHeaderSize;
const size_t fragment_length = (left < avail) ? left : avail;
RecordType type;
const bool end = (left == fragment_length);
if (begin && end) {
type = kFullType;
} else if (begin) {
type = kFirstType;
} else if (end) {
type = kLastType;
} else {
type = kMiddleType;
}
s = EmitPhysicalRecord(type, ptr, fragment_length);
ptr += fragment_length;
left -= fragment_length;
begin = false;
} while (s.ok() && left > 0);
return s;
}
添加记录的流程如下:
1.先计算出块剩余空间大小leftover;
2.如果剩余空间大小leftover小于头部大小kHeaderSize,则当前数据块填充0,块内偏移量block_offset_重置为0,将数据放入下一个数据块中;
3.计算块内可用来存放数据大小avail和实际能放入的数据大小fragment_length
4.设置type,type设置可以参考前面的表格
5.EmitPhysicalRecord提交给物理记录,循环直到数据全部放入

EmitPhysicalRecord
流程很简单,生成一个头部,计算crc,写入磁盘。
Status Writer::EmitPhysicalRecord(RecordType t, const char* ptr,
size_t length) {
assert(length <= 0xffff); // Must fit in two bytes
assert(block_offset_ + kHeaderSize + length <= kBlockSize);
// Format the header
char buf[kHeaderSize];
buf[4] = static_cast<char>(length & 0xff);
buf[5] = static_cast<char>(length >> 8);
buf[6] = static_cast<char>(t);
// Compute the crc of the record type and the payload.
uint32_t crc = crc32c::Extend(type_crc_[t], ptr, length);
crc = crc32c::Mask(crc); // Adjust for storage
EncodeFixed32(buf, crc);
// Write the header and the payload
Status s = dest_->Append(Slice(buf, kHeaderSize));
if (s.ok()) {
s = dest_->Append(Slice(ptr, length));
if (s.ok()) {
s = dest_->Flush();
}
}
block_offset_ += kHeaderSize + length;
return s;
}
Append
写入磁盘操作Append代码在/util/env_posix.cc中实现
Status Append(const Slice& data) override {
size_t write_size = data.size();
const char* write_data = data.data();
// Fit as much as possible into buffer.
size_t copy_size = std::min(write_size, kWritableFileBufferSize - pos_);
std::memcpy(buf_ + pos_, write_data, copy_size);
write_data += copy_size;
write_size -= copy_size;
pos_ += copy_size;
if (write_size == 0) {
return Status::OK();
}
// Can't fit in buffer, so need to do at least one write.
Status status = FlushBuffer();
if (!status.ok()) {
return status;
}
// Small writes go to buffer, large writes are written directly.
if (write_size < kWritableFileBufferSize) {
std::memcpy(buf_, write_data, write_size);
pos_ = write_size;
return Status::OK();
}
return WriteUnbuffered(write_data, write_size);
}
Append把数据写入缓冲区buff_中,无法放入缓冲区时,主动调用FlushBuffer将缓存写入磁盘。
FlushBuffer、WriteUnbuffered
FlushBuffer、WriteUnbuffered实现如下:
Status FlushBuffer() {
Status status = WriteUnbuffered(buf_, pos_);
pos_ = 0;
return status;
}
Status WriteUnbuffered(const char* data, size_t size) {
while (size > 0) {
ssize_t write_result = ::write(fd_, data, size);
if (write_result < 0) {
if (errno == EINTR) {
continue; // Retry
}
return PosixError(filename_, errno);
}
data += write_result;
size -= write_result;
}
return Status::OK();
}
leveldb在更新数据时,首先通过log_write将变更记录到日志文件,确保在内存数据丢失时能够恢复。log数据以32KB块的形式存储,包含校验和、长度和类型信息。Writer类负责添加记录,其构造函数和AddRecord方法处理数据的写入,包括计算块剩余空间、设置类型,并通过EmitPhysicalRecord和Append将数据写入磁盘。当缓冲区满时,通过FlushBuffer和WriteUnbuffered将数据持久化。

被折叠的 条评论
为什么被折叠?



