SkipList(跳表)数据结构是用于Memtable,Immutable Memtable表中,对于此二表的作用点此查看Memtable作用。
Memtable是内存中的表,用于存储插入的KV数据。SkipList的作用就是解决KV的快速插入和查询。
一、介绍
SkipList使用空间换时间的设计思路,通过构建多级索引来提高查询的效率,实现了基于链表的“二分查找”。跳表是一种动态数据结构,支持快速的插入、删除、查找操作,时间复杂度都是 O(logn)。
skiplist实现了基于链表的“二分查找”,其通过空间换时间的设计,利用构建多级所以来提高查询效率。
复杂度如下:
- 插入、删除、查找的时间复杂度都是O(logn);
- 空间复杂度是O(n)。
二、结构
沿用网上一张图,Leveldb实现的SkipLisp初始化时head部分是12个指针点。
三、源码分析
Node
这里用到了内存序的相关知识,有不清楚的可以百度查下,这里就不介绍了。
// Implementation details follow
template <typename Key, class Comparator>
struct SkipList<Key, Comparator>::Node {
explicit Node(const Key& k) : key(k) {}
Key const key;
<!有内存屏障操作>
// Accessors/mutators for links. Wrapped in methods so we can
// add the appropriate barriers as necessary.
Node* Next(int n) {
assert(n >= 0);
// Use an 'acquire load' so that we observe a fully initialized
// version of the returned Node.
return next_[n].load(std::memory_order_acquire);
}
void SetNext(int n, Node* x) {
assert(n >= 0);
// Use a 'release store' so that anybody who reads through this
// pointer observes a fully initialized version of the inserted node.
next_[n].store(x, std::memory_order_release);
}
<!无内存屏障操作,相比于无内存屏障操作,性能损耗更小>
// No-barrier variants that can be safely used in a few locations.
Node* NoBarrier_Next(int n) {
assert(n >= 0);
return next_[n].load(std::memory_order_relaxed);
}
void NoBarrier_SetNext(int n, Node* x) {
assert(n >= 0);
next_[n].store(x, std::memory_order_relaxed);
}
private:
// Array of length equal to the node height. next_[0] is lowest level link.
<!
*当前节点的每个等级的下一个结点
*第2级 N1 N2
*第1级 N1 N2
*如果N1是本节点,则next_[x] 保存的是N2
*
*next_[0]就是原始链表。
>
std::atomic<Node*> next_[1]; //大小是一个Node
};
SkipList
<!内存管理>
class Arena;
template <typename Key, class Comparator>
class SkipList {
private:
struct Node;
public:
// Create a new SkipList object that will use "cmp" for comparing keys,
// and will allocate memory using "*arena". Objects allocated in the arena
// must remain allocated for the lifetime of the skiplist object.
explicit SkipList(Comparator cmp, Arena* arena);
SkipList(const SkipList&) = delete;
SkipList& operator=(const SkipList&) = delete;
// Insert key into the list.
// REQUIRES: nothing that compares equal to key is currently in the list.
void Insert(const Key& key);
// Returns true iff an entry that compares equal to key is in the list.
bool Contains(const Key& key) const;
<!迭代器,英文注释可直接看>
// Iteration over the contents of a skip list
class Iterator {
public:
// Initialize an iterator over the specified list.
// The returned iterator is not valid.
explicit Iterator(const SkipList* list);
// Returns true iff the iterator is positioned at a valid node.
bool Valid() const;
// Returns the key at the current position.
// REQUIRES: Valid()
const Key& key() const;
// Advances to the next position.
// REQUIRES: Valid()
void Next();
// Advances to the previous position.
// REQUIRES: Valid()
void Prev();
// Advance to the first entry with a key >= target
void Seek(const Key& target);
// Position at the first entry in list.
// Final state of iterator is Valid() iff list is not empty.
void SeekToFirst();
// Position at the last entry in list.
// Final state of iterator is Valid() iff list is not empty.
void SeekToLast();
private:
<!当前迭代器关联的SkipList>
const SkipList* list_;
<!当前迭代器所指向的值>
Node* node_;
// Intentionally copyable
};
private:
<!跳表的层数,最底层是第0层>
enum { kMaxHeight = 12 };
<!获取当前跳表是多少层>
inline int GetMaxHeight() const {
return max_height_.load(std::memory_order_relaxed);
}
<!新建一个节点>
Node* NewNode(const Key& key, int height);
<!返回需要插入值的随机高度,比方说4,
那第0~3层都要插入对应的Node。
>
int RandomHeight();
<!等值判断>
bool Equal(const Key& a, const Key& b) const { return (compare_(a, b) == 0); }
<!当前key是否在节点n后面>
// Return true if key is greater than the data stored in "n"
bool KeyIsAfterNode(const Key& key, Node* n) const;
// Return the earliest node that comes at or after key.
// Return nullptr if there is no such node.
//
// If prev is non-null, fills prev[level] with pointer to previous
// node at "level" for every level in [0..max_height_-1].
Node* FindGreaterOrEqual(const Key& key, Node** prev) const;
// Return the latest node with a key < key.
// Return head_ if there is no such node.
Node* FindLessThan(const Key& key) const;
// Return the last node in the list.
// Return head_ if list is empty.
Node* FindLast() const;
// Immutable after construction
Comparator const compare_;
Arena* const arena_; // Arena used for allocations of nodes
<!跳表第0层的头指针,指向第一个元素>
Node* const head_;
// Modified only by Insert(). Read racily by readers, but stale
// values are ok.
std::atomic<int> max_height_; // Height of the entire list
<!用于产生随机数>
// Read/written only by Insert().
Random rnd_;
};
<!产生一个新节点,值是key,height表示此key存在于多少层,
最底层是第0层,所以直接new,剩下的层就是(height - 1)个指针指示。
此处通过Arena获取内对齐的内存,提升CPU访问速度。
>
template <typename Key, class Comparator>
typename SkipList<Key, Comparator>::Node* SkipList<Key, Comparator>::NewNode(
const Key& key, int height) {
<!一个实际节点值,其它都是指针。>
char* const node_memory = arena_->AllocateAligned(
sizeof(Node) + sizeof(std::atomic<Node*>) * (height - 1));
<!在已经分配好内存的node_memory上构造一个Node对象>
return new (node_memory) Node(key);
}
<!迭代器构造,迭代器都是基于第0层进行操作的>
template <typename Key, class Comparator>
inline SkipList<Key, Comparator>::Iterator::Iterator(const SkipList* list) {
list_ = list;
node_ = nullptr;
}
<!当前迭代器指向节点是否有效>
template <typename Key, class Comparator>
inline bool SkipList<Key, Comparator>::Iterator::Valid() const {
return node_ != nullptr;
}
template <typename Key, class Comparator>
inline const Key& SkipList<Key, Comparator>::Iterator::key() const {
assert(Valid());
return node_->key;
}
template <typename Key, class Comparator>
inline void SkipList<Key, Comparator>::Iterator::Next() {
assert(Valid());
node_ = node_->Next(0); //迭代器都是操作的第0层的数据
}
<!当前节点的前一个节点>
template <typename Key, class Comparator>
inline void SkipList<Key, Comparator>::Iterator::Prev() {
// Instead of using explicit "prev" links, we just search for the
// last node that falls before key.
assert(Valid());
node_ = list_->FindLessThan(node_->key);
if (node_ == list_->head_) {
node_ = nullptr;
}
}
<!定位到大于or等于此target的位置>
template <typename Key, class Comparator>
inline void SkipList<Key, Comparator>::Iterator::Seek(const Key& target) {
node_ = list_->FindGreaterOrEqual(target, nullptr);
}
<!定位到第0层的一个节点值>
template <typename Key, class Comparator>
inline void SkipList<Key, Comparator>::Iterator::SeekToFirst() {
node_ = list_->head_->Next(0); //迭代器操作的都是第0层,head是一个无值,只是一个指针。
}
template <typename Key, class Comparator>
inline void SkipList<Key, Comparator>::Iterator::SeekToLast() {
node_ = list_->FindLast();
if (node_ == list_->head_) {
node_ = nullptr;
}
}
<!生成要随机插入的层高,比如4,那就是[0...3]都要插入>
template <typename Key, class Comparator>
int SkipList<Key, Comparator>::RandomHeight() {
// Increase height with probability 1 in kBranching
static const unsigned int kBranching = 4;
int height = 1;
while (height < kMaxHeight && ((rnd_.Next() % kBranching) == 0)) {
height++;
}
assert(height > 0);
assert(height <= kMaxHeight);
return height;
}
<!判断key是否在节点node之后>
template <typename Key, class Comparator>
bool SkipList<Key, Comparator>::KeyIsAfterNode(const Key& key, Node* n) const {
// null n is considered infinite
return (n != nullptr) && (compare_(n->key, key) < 0);
}
<!找到大于或等于key的节点,从最高层开始。
1、如果未找到对应的Node,这返回的next是null。
2、如果prev不为null,则将每一层最近小于key的node
地址保存起来。
>
template <typename Key, class Comparator>
typename SkipList<Key, Comparator>::Node*
SkipList<Key, Comparator>::FindGreaterOrEqual(const Key& key,
Node** prev) const {
Node* x = head_;
int level = GetMaxHeight() - 1;
while (true) {
Node* next = x->Next(level);
if (KeyIsAfterNode(key, next)) {
// Keep searching in this list
x = next;
} else {
if (prev != nullptr) prev[level] = x;
if (level == 0) {
return next;
} else {
// Switch to next list
level--;
}
}
}
}
<!查找最近小于key的node,从最高层开始查起,
如果未找到,返回head_。
>
template <typename Key, class Comparator>
typename SkipList<Key, Comparator>::Node*
SkipList<Key, Comparator>::FindLessThan(const Key& key) const {
Node* x = head_;
int level = GetMaxHeight() - 1;
while (true) {
assert(x == head_ || compare_(x->key, key) < 0);
Node* next = x->Next(level);
if (next == nullptr || compare_(next->key, key) >= 0) {
if (level == 0) {
return x;
} else {
// Switch to next list
level--;
}
} else {
x = next;
}
}
}
<!从最高层开始,定位到最后一个元素>
template <typename Key, class Comparator>
typename SkipList<Key, Comparator>::Node* SkipList<Key, Comparator>::FindLast()
const {
Node* x = head_;
int level = GetMaxHeight() - 1;
while (true) {
Node* next = x->Next(level);
if (next == nullptr) {
if (level == 0) {
return x;
} else {
// Switch to next list
level--;
}
} else {
x = next;
}
}
}
<!SkipList构造,及一些值的初始化>
template <typename Key, class Comparator>
SkipList<Key, Comparator>::SkipList(Comparator cmp, Arena* arena)
: compare_(cmp),
arena_(arena),
head_(NewNode(0 /* any key will do */, kMaxHeight)),
max_height_(1),
rnd_(0xdeadbeef) {
for (int i = 0; i < kMaxHeight; i++) {
head_->SetNext(i, nullptr);
}
}
<!插入key>
template <typename Key, class Comparator>
void SkipList<Key, Comparator>::Insert(const Key& key) {
// TODO(opt): We can use a barrier-free variant of FindGreaterOrEqual()
// here since Insert() is externally synchronized.
<!找到大于等于key的节点x,并记录没一层最近不大于key的节点>
Node* prev[kMaxHeight];
Node* x = FindGreaterOrEqual(key, prev);
<!要么未找到这样节点,如果找到了也可能和插入的值相等>
// Our data structure does not allow duplicate insertion
assert(x == nullptr || !Equal(key, x->key));
<!产生需要随机插入的高度>
int height = RandomHeight();
if (height > GetMaxHeight()) {
<!如果高度超过现有的,则超过部分的前置节点赋值为head_>
for (int i = GetMaxHeight(); i < height; i++) {
prev[i] = head_;
}
// It is ok to mutate max_height_ without any synchronization
// with concurrent readers. A concurrent reader that observes
// the new value of max_height_ will see either the old value of
// new level pointers from head_ (nullptr), or a new value set in
// the loop below. In the former case the reader will
// immediately drop to the next level since nullptr sorts after all
// keys. In the latter case the reader will use the new node.
max_height_.store(height, std::memory_order_relaxed);
}
<!生存一个新节点>
x = NewNode(key, height);
<!以下插入过程就是移动指针,从这里我们也明白了为什么
要有prev[kMaxHeight]了。
>
for (int i = 0; i < height; i++) {
// NoBarrier_SetNext() suffices since we will add a barrier when
// we publish a pointer to "x" in prev[i].
x->NoBarrier_SetNext(i, prev[i]->NoBarrier_Next(i));
prev[i]->SetNext(i, x);
}
}
<!跳表中是否存在此key>
template <typename Key, class Comparator>
bool SkipList<Key, Comparator>::Contains(const Key& key) const {
Node* x = FindGreaterOrEqual(key, nullptr);
if (x != nullptr && Equal(key, x->key)) {
return true;
} else {
return false;
}
}
四、总结
- SkipList的所有操作都是从上到下去执行。
- 随机层数为什么是%4?因为第x层节点数是第x-1层的1/4,换一个角度每个元素出现在每层的概率就是1/4,这样通过%4来决定节点一共出现在多少层。
- leveldb用模板方式实现的SkipList,这样更通用。
- leveldb的SkipList未实现del操作是因为元素删除也是插入,删除某个Key的Value在 Memtable 内是作为插入一条记录实施的,但是会打上一个 Key 的删除标记,真正的删除操作是Lazy的,会在以后的 Compaction 过程中去掉这个KV。
- leveldb为什么使用SkipList来实现数据的插入查询呢?为什么不是红黑树或其它数据结构?
SkipList按照区间查找数据效率比较高,而且实现起来也不是太复杂。