leveldb深度剖析-SkipList跳表

最新推荐文章于 2023-11-27 14:05:10 发布

xxb249

最新推荐文章于 2023-11-27 14:05:10 发布

阅读量1.1k

点赞数 1

分类专栏：开源软件存储文章标签： leveldb跳表 SkipList实现跳表实现

本文链接：https://blog.csdn.net/xxb249/article/details/87932932

版权

开源软件同时被 2 个专栏收录

53 篇文章 5 订阅

订阅专栏

存储

22 篇文章 15 订阅

订阅专栏

上一篇介绍了leveldb整体存储结构，了解整体存储结构有助于我们深入理解leveldb源码。本篇介绍一下leveldb在内存中存储结构。

链表的特点是插入、删除很好,但是查找性能就比较差了，需要从头开始遍历。对于查找场景我们通常使用二叉树,但是二叉树在插入、删除场景下需要调平衡，性能不是很好。所以为解决这两种数据结构的弊端，跳跃表应运而生。

一、SkipList思想

跳跃表的大体存储形式为:

跳跃表特点:

1) 跳跃表有分层概念，每一层都是一个独立的链表。除了最底层节点以外其他的节点都有向下和向后的指针。

2) 查询流程按照从上到下，从左到右的顺序开始查找，例如上图所示查找15。

3) 层级越高，节点数越少。

二、Leveldb实现SkipList

//跳表定义
template<typename Key, class Comparator>
class SkipList {
 private:
  struct Node;

 public:
  // Create a new SkipList object that will use "cmp" for comparing keys,
  // and will allocate memory using "*arena".  Objects allocated in the arena
  // must remain allocated for the lifetime of the skiplist object.
  explicit SkipList(Comparator cmp, Arena* arena);

  // Insert key into the list.
  // REQUIRES: nothing that compares equal to key is currently in the list.
  void Insert(const Key& key);

  // Returns true iff an entry that compares equal to key is in the list.
  // 查找接口
  bool Contains(const Key& key) const;

  // Iteration over the contents of a skip list
  // SkipList迭代器
  class Iterator {
   public:
    // Initialize an iterator over the specified list.
    // The returned iterator is not valid.
    explicit Iterator(const SkipList* list);

    // Returns true iff the iterator is positioned at a valid node.
    bool Valid() const;

    // Returns the key at the current position.
    // REQUIRES: Valid()
    const Key& key() const;

    // Advances to the next position.
    // REQUIRES: Valid()
    void Next();

    // Advances to the previous position.
    // REQUIRES: Valid()
    void Prev();

    // Advance to the first entry with a key >= target
    void Seek(const Key& target);

    // Position at the first entry in list.
    // Final state of iterator is Valid() iff list is not empty.
    void SeekToFirst();

    // Position at the last entry in list.
    // Final state of iterator is Valid() iff list is not empty.
    void SeekToLast();

   private:
    const SkipList* list_;
    Node* node_;
    // Intentionally copyable
  };

 private:
  enum { kMaxHeight = 12 }; //跳表最高12层

  // Immutable after construction
  Comparator const compare_;
  Arena* const arena_;    //内存池 Arena used for allocations of nodes

  Node* const head_;

  // Modified only by Insert().  Read racily by readers, but stale
  // values are ok.
  port::AtomicPointer max_height_;   // 当前最大层数 从1开始 Height of the entire list

  inline int GetMaxHeight() const {
    return static_cast<int>(
        reinterpret_cast<intptr_t>(max_height_.NoBarrier_Load()));//起始值为1
  }

  // Read/written only by Insert().
  Random rnd_;//自定义实现的随机数

  Node* NewNode(const Key& key, int height);
  int RandomHeight();
  bool Equal(const Key& a, const Key& b) const { return (compare_(a, b) == 0); } // db/memtable.cc

  // Return true if key is greater than the data stored in "n"
  bool KeyIsAfterNode(const Key& key, Node* n) const;

  // Return the earliest node that comes at or after key.
  // Return NULL if there is no such node.
  //
  // If prev is non-NULL, fills prev[level] with pointer to previous
  // node at "level" for every level in [0..max_height_-1].
  Node* FindGreaterOrEqual(const Key& key, Node** prev) const;

  // Return the latest node with a key < key.
  // Return head_ if there is no such node.
  Node* FindLessThan(const Key& key) const;

  // Return the last node in the list.
  // Return head_ if list is empty.
  Node* FindLast() const;

  // No copying allowed
  SkipList(const SkipList&);
  void operator=(const SkipList&);
};

通过上面类定义可知，leveldb提供SkipList接口非常少，只有插入(Insert)、查找(Contains)以及迭代器，接下来详细看一下具体实现。

2.0、结构说明

1) SkipList最大层次为12层，从1开始

2) SkipList以链表方式管理，链表节点类型为Node，其中成员next_是指针数组(大小为12struct定义为1,在申请内存的时候申请了11个长度,所以总大小为12)，数组每一个元素代表SkipList层次，例如next_[0]对应1层，next[11]对应12层,具体定义如下：

// Implementation details follow 跳表节点定义
template<typename Key, class Comparator>
struct SkipList<Key,Comparator>::Node {
  explicit Node(const Key& k) : key(k) { }

  Key const key; //key值 注意这里key是leveldb再次抽象,对应业务中key-value的统称

  // Accessors/mutators for links.  Wrapped in methods so we can
  // add the appropriate barriers as necessary.
  Node* Next(int n) {
    assert(n >= 0);
    // Use an 'acquire load' so that we observe a fully initialized
    // version of the returned Node.
    return reinterpret_cast<Node*>(next_[n].Acquire_Load());
  }
  void SetNext(int n, Node* x) {
    assert(n >= 0);
    // Use a 'release store' so that anybody who reads through this
    // pointer observes a fully initialized version of the inserted node.
    next_[n].Release_Store(x);
  }

  // No-barrier variants that can be safely used in a few locations.
  Node* NoBarrier_Next(int n) {
    assert(n >= 0);
    return reinterpret_cast<Node*>(next_[n].NoBarrier_Load());
  }
  void NoBarrier_SetNext(int n, Node* x) {
    assert(n >= 0);
    next_[n].NoBarrier_Store(x);
  }

 private:
  // Array of length equal to the node height.  next_[0] is lowest level link.
  // 这里虽然定义是1个元素大小 但是在malloc的时候是kMaxHeight - 1个 数组索引表示跳表level层
  port::AtomicPointer next_[1];
};

/**
 * 分配新节点
 * @param key 关键字
 * @param height 层高 -- 当前为12
 */
template<typename Key, class Comparator>
typename SkipList<Key,Comparator>::Node*
SkipList<Key,Comparator>::NewNode(const Key& key, int height) {
  char* mem = arena_->AllocateAligned(
      sizeof(Node) + sizeof(port::AtomicPointer) * (height - 1));
  return new (mem) Node(key); //百度 new 三种使用方式
}

2.1、查找FindGreatOrEqual

这个方法是最核心的方法，查找接口Contains和插入接口Insert都是基于该方法实现,所以必须搞清楚该方法功能实现。

/**
 * 查找Key所在的节点
 * @param key   Key值
 * @param prev  保存key所在节点的 前一个节点 若查找场景则为null  该参数为输出参数
 * @return 返回key值所在节点 
 * 注意: 不一定是key值所在节点 也可能是比key值大的节点
 */
template<typename Key, class Comparator>
typename SkipList<Key,Comparator>::Node* SkipList<Key,Comparator>::FindGreaterOrEqual(const Key& key, Node** prev)
    const {
  Node* x = head_;// head_在构造方法中初始化
  int level = GetMaxHeight() - 1;//跳表层次 从层次编号从1开始
  while (true) {//循环遍历当前层level 每个节点Node
    Node* next = x->Next(level);
    if (KeyIsAfterNode(key, next)) {//如果key值比Node中key大 则继续查找 在当前层继续查找
      // Keep searching in this list
      x = next;
    } else {//降层 查找
      if (prev != NULL) prev[level] = x; //保存降层时 前一个节点 方便后面在不同层之间做插入操作
      if (level == 0) {
        return next;
      } else {
        // Switch to next list 跳表层次减小 进行遍历比较
        level--;
      }
    }
  }
}

说明：

1) 在第一次进入while循环时，Key不和当前层首节点(head_节点)进行比较，而是与第二个节点进行比较。如果Key比第二个节点比大，则Key一定比第一个节点大。

2) 进入else分支表示需要进程降层，prev在插入场景下记录带插入位置，新值插入在prev后面。当level==0时表示遍历完成则退出循环

2.2、插入Insert

/**
 * 插入
 * @param key 
 */
template<typename Key, class Comparator>
void SkipList<Key,Comparator>::Insert(const Key& key) {
  // TODO(opt): We can use a barrier-free variant of FindGreaterOrEqual()
  // here since Insert() is externally synchronized.
  Node* prev[kMaxHeight];
  Node* x = FindGreaterOrEqual(key, prev);

  // Our data structure does not allow duplicate insertion
  assert(x == NULL || !Equal(key, x->key));

  int height = RandomHeight();
  if (height > GetMaxHeight()) {//随机生成新的高度
    for (int i = GetMaxHeight(); i < height; i++) {
      prev[i] = head_;
    }

    max_height_.NoBarrier_Store(reinterpret_cast<void*>(height));
  }

  x = NewNode(key, height);//创建新的节点
  for (int i = 0; i < height; i++) {//查找层插入新节点
    // NoBarrier_SetNext() suffices since we will add a barrier when
    // we publish a pointer to "x" in prev[i].
    // 挂在链表
    x->NoBarrier_SetNext(i, prev[i]->NoBarrier_Next(i));
    prev[i]->SetNext(i, x);
  }
}

说明：

1) 在文章开始的时候就有提到SkipList层次是随机的，那么如何确定是否需要增加层次，leveldb实现了一个简易随机数生成器RandomHeight，当前随机数大于最大高度则表示需要提高新层次

2) 创建新节点，按顺序插入每层中

2.3、迭代器

迭代器Iterator主要用于遍历SkipList，内部实现大部分是基于FindGreaterOrEqual函数实现的，这里不在进行深入说明

三、总结

以上就是leveldb实现的SkipList，功能比较简单。这里有一个问题:为什么SkipList没有提供删除节点的方法? 对于leveldb删除操作其实也是插入，只不过插入的数据被标记为删除，因此在SkipList中并没有删除接口。

xxb249

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
leveldb深度剖析-SkipList跳表

上一篇介绍了leveldb整体存储结构，了解整体存储结构有助于我们深入理解leveldb源码。本篇介绍一下leveldb在内存中存储结构。链表的特点是插入、删除很好,但是查找性能就比较差了，需要从头开始遍历。对于查找场景我们通常使用二叉树,但是二叉树在插入、删除场景下需要调平衡，性能不是很好。所以为解决这两种数据结构的弊端，跳跃表应运而生。一、SkipList思想跳跃表的大体存储形式为...
复制链接

扫一扫