leveldb 代码阅读一

lihaihui1991

已于 2024-08-16 17:30:12 修改

阅读量224

点赞数 4

分类专栏： leveldb 文章标签： db

于 2024-08-16 11:26:02 首次发布

本文链接：https://blog.csdn.net/lihaihui1991/article/details/141255125

版权

leveldb 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

1 coding.h 数据编码

1.1 uint 编码示例

函数示例

编码函数

 inline void EncodeFixed32(char* dst, uint32_t value)
 inline void EncodeFixed64(char* dst, uint64_t value)

解码函数

inline uint32_t DecodeFixed32(const char* ptr)
inline uint64_t DecodeFixed64(const char* ptr)

函数实现

编码函数

inline void EncodeFixed32(char* dst, uint32_t value) {
  uint8_t* const buffer = reinterpret_cast<uint8_t*>(dst);

  if (port::kLittleEndian) {
    // Fast path for little-endian CPUs. All major compilers optimize this to a
    // single mov (x86_64) / str (ARM) instruction.
    std::memcpy(buffer, &value, sizeof(uint32_t));
    return;
  }

  // Platform-independent code.
  // Currently, only gcc optimizes this to a single mov / str instruction.
  buffer[0] = static_cast<uint8_t>(value);
  buffer[1] = static_cast<uint8_t>(value >> 8);
  buffer[2] = static_cast<uint8_t>(value >> 16);
  buffer[3] = static_cast<uint8_t>(value >> 24);
}

uint64 同理

解码函数

inline uint32_t DecodeFixed32(const char* ptr) {
  const uint8_t* const buffer = reinterpret_cast<const uint8_t*>(ptr);

  if (port::kLittleEndian) {
    // Fast path for little-endian CPUs. All major compilers optimize this to a
    // single mov (x86_64) / ldr (ARM) instruction.
    uint32_t result;
    std::memcpy(&result, buffer, sizeof(uint32_t));
    return result;
  }

  // Platform-independent code.
  // Clang and gcc optimize this to a single mov / ldr instruction.
  return (static_cast<uint32_t>(buffer[0])) |
         (static_cast<uint32_t>(buffer[1]) << 8) |
         (static_cast<uint32_t>(buffer[2]) << 16) |
         (static_cast<uint32_t>(buffer[3]) << 24);
}

函数解读
- 区分大小端
- 在大端时 buffer[0…3]低位存放低字节到高字节

1.2 Varint 编码

Varint 编码是一种变长编码，用来减少数据的存储空间。在varint的每个字节中，如果最高位bit为1，表示后续的字节也是该数字的一部分，如果该位是0，则结束。其它的7个比特都用来表示数据。因此小于128的数字都可以用一个字节来表示。

函数示例

编码函数

char* EncodeVarint32(char* dst, uint32_t v) 
char* EncodeVarint64(char* dst, uint64_t v)

解码函数

inline const char* GetVarint32Ptr(const char* p, const char* limit,  uint32_t* value) 
const char* GetVarint64Ptr(const char* p, const char* limit, uint64_t* value)

函数示例

编译函数

char* EncodeVarint32(char* dst, uint32_t v) {
  // Operate on characters as unsigneds
  uint8_t* ptr = reinterpret_cast<uint8_t*>(dst);
  static const int B = 128;
  if (v < (1 << 7)) {
    *(ptr++) = v;
  } else if (v < (1 << 14)) {
    *(ptr++) = v | B;
    *(ptr++) = v >> 7;
  } else if (v < (1 << 21)) {
    *(ptr++) = v | B;
    *(ptr++) = (v >> 7) | B;
    *(ptr++) = v >> 14;
  } else if (v < (1 << 28)) {
    *(ptr++) = v | B;
    *(ptr++) = (v >> 7) | B;
    *(ptr++) = (v >> 14) | B;
    *(ptr++) = v >> 21;
  } else {
    *(ptr++) = v | B;
    *(ptr++) = (v >> 7) | B;
    *(ptr++) = (v >> 14) | B;
    *(ptr++) = (v >> 21) | B;
    *(ptr++) = v >> 28;
  }
  return reinterpret_cast<char*>(ptr);
}

解码函数

inline const char* GetVarint32Ptr(const char* p, const char* limit,
                                  uint32_t* value) {
  if (p < limit) {
    uint32_t result = *(reinterpret_cast<const uint8_t*>(p));
    if ((result & 128) == 0) {  //字节最高 bit 为0，说明编码结束。
      *value = result;
      return p + 1;
    }
  }
  return GetVarint32PtrFallback(p, limit, value);
}

函数解读

1000 为例
编码过程
1000 二进制为 1111101000

取右边7bit，1101000
剩下的是111
小端存储要转换位置
字节1：01101000
字节2： 00000111
每个字节位为标识1，还有后续。所以字节1最高是1，字节2最高位为0
编码结果是： 11101000 00000111

解码过程
11101000 00000111
高位1有后续，与下一个字节一同表示一个数值
1101000 00000111
字节交换位置结果是： 111 1101000

1.3 PutLengthPrefixedSlice & GetLengthPrefixedSlice

Put 函数：

void PutLengthPrefixedSlice(std::string* dst, const Slice& value) {
  PutVarint32(dst, value.size());
  dst->append(value.data(), value.size());
}

对于一个字符串，这里首先将它的长度放到 dst 中，然后再将数值放入。
在这里插入图片描述
Get 函数：

bool GetLengthPrefixedSlice(Slice* input, Slice* result) {
  uint32_t len;
  if (GetVarint32(input, &len) && input->size() >= len) {
    *result = Slice(input->data(), len);
    input->remove_prefix(len);
    return true;
  } else {
    return false;
  }
}

这段代码的意思相当于将编码后的 string 中的长度信息去除，提出出原始 string。

2. 跳跃表

跳跃表文章阅读：
https://blog.csdn.net/ict2014/article/details/17394259?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.control&dist_request_id=&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.control

3. Memtable

3.1 key分类

在说 memtable 的 get 和 add 操作前，我们先了解一下 leveldb 中的几种 key 类型。
在 leveldb 中做 search 操作时，search 的过程大概是:
memtable->immutable memtable -> sstable
这里涉及到 2 个 search 用到的 key，一个在 memtable 中用，一个在 sstable 中用的 key。
其实还有 1 个 key，那就是用户自己输入的 key，user-key。
总结起来就 3 种 key，如下：

memtable: 逻辑上称为memtable_key
sstalbe: 逻辑上称为internal_key
key: 用户提供的key，我们称之为user_key

LookupKey详细
leveldb 用一种类 LookupKey 包含了这 3 种 key，我们要用的 memtable_key 其实就是 Lookupkey

3.2 Add 操作

对 key 分类有了初步的认识后，我们就来看 MemTable 是如何将一个 user key 封装成一个 memtable_key，然后将 key value 插入到 skiplist 中的。

// Value types encoded as the last component of internal keys.
// DO NOT CHANGE THESE ENUM VALUES: they are embedded in the on-disk
// data structures.
enum ValueType { kTypeDeletion = 0x0, kTypeValue = 0x1 };