让链表跳起来–SkipList的原理及实现
关注我的个人网站体验更佳
引言
Skip lists are a data structure that can be used in place of balanced trees. Skip lists use probabilistic balancing rather than strictly enforced balancing and as a result the algorithms for insertion and deletion in skip lists are much simpler and significantly faster than equivalent algorithms for balanced trees.
– William Pugh
在计算机世界的数据结构家族中,其成员都性格各异(不同的应用场景)。其中有这么一个成员,性能优异且实现简单,这个成员就是跳表(Skiplist)。跳表由William Pugh在1990年提出,它是一种动态数据结构,效率高且实现简单。
简介和应用场景
跳表是一种随机化的数据结构,效率不输平衡二叉树,是一种平衡二叉树的替代方案。
对于动态操作和静态操作都有O(logn)的时间复杂度的数据结构,我们很自然会想到平衡二叉树家族:AVL树、伸展树以及红黑树等。对于这些数据结构而言,平均效率和空间占用都是非常之优秀,然而在代码实现方面是一大难题,主要难点是对于二叉树的平衡的维护,也就是每次动态操作之后,需要对树的结构进行调整以满足平衡二叉树的特性。
跳表的优势就在此体现了出来,由于跳表是一种基于概率学的数据结构,因此它是概率上的平衡。也就是说数据越多,越接近平衡,使得其代码实现不会很复杂。
跳表的实现简单。
跳表的结构是由多路链表组成,先来看链表。链表的动态操作是O(1)时间复杂度,因为只需要处理其前后指针的指向即可。
因为链表在内存中是离散存储由指针连接,因此其静态操作是O(n)时间复杂度,无法进行随机查找。如图所示:
可见如果要找到元素34,就必须从head开始一直往后找,6->7->8->10->13->15->19->23->31->34,需要遍历全部链表。
但是如果再加一层索引:
那么查找元素34,只需要花费单个链表一半的路程,7->13->19->31->34。在单链表的基础上增加多个索引,在索引的基础上再加索引,便构成了多条链表叠加的数据结构,这便是跳表:
跳表的索引的增加,相当于可以对链表进行二分查找。
由于跳表的这些优势,因此很多优秀的开源软件也有使用到这一数据结构:
- redis的有序集合zset
- LevelDB、RocksDB和HBase的Memtable
- ApacheLucene中的TermDictionary、Posting List
跳表的一个相当大的优势是范围查询,先找到最小元素,然后跳转到最后一层,往后遍历即可。
其数据结构如下:
typedef struct SkipNode
{
int key; //键(这里用整型来表示)
std::string value; //值
int level; //层级
SkipNode *right; //指向同层下一个节点
SkipNode *down; //指向下一层对应节点
SkipNode(): key(-1), value("head"), level(0), right(nullptr), down(nullptr){}
SkipNode(int k, std::string val, int lev): key(k), value(val), level(lev), right(nullptr), down(nullptr){}
}SkipNode;
class SkipList
{
public:
SkipList(): head(new SkipNode()), random(static_cast<uint32_t>(time(0))), maxLevel(20) {};
SkipList(int maxLev): head(new SkipNode()), random(static_cast<uint32_t>(time(0))), maxLevel(maxLev) {};
SkipList(const SkipList& list);
SkipList& operator=(const SkipList& list);
//新增算法
void insert(const int key, const std::string& value);
//删除算法
bool remove(const int key);
//查找算法
std::string find(const int key) const;
~SkipList();
void print() const;
private:
SkipNode *head; //永远指向最高层head
Random random; //随机数生成器
int maxLevel; //层数上限
int randomLevel();
SkipNode *copy() const;
SkipNode *findNode(int key) const;
};
查找算法
跳表查找元素非常简单,和平衡二叉树类似的查询过程。
从最高层的head节点开始查询:
- 向右遍历,找到最后一个不大于目标值的节点。
- 向下一层。
- 重复步骤1,2,直到找到目标值。
代码如下:
SkipNode *SkipList::findNode(int key) const
{
SkipNode *temp = this->head;
while(temp != nullptr) {
while(temp->right != nullptr && temp->right->key <= key) { //1. 一直向后遍历
temp = temp->right;
if(temp->key == key) { //直到找到目标值
return temp;
}
}
temp = temp->down; //2. 向下一层
}
return temp;
}
std::string SkipList::find(const int key) const
{
SkipNode* node = findNode(key);
if (node) {
return node->value;
}
else return ""; //未找到
}
插入算法
插入元素是所有跳表算法中较复杂的,因为涉及到链表高度的随机化生成。我们这里每一个节点都有1/2的几率提升一层,因此如果一个节点想要高度是3层,那么其概率是1/(2^3) = 1/8。这样的随机算法可以使得每一层的元素个数都在概率上接近于下一层的一半,也就接近于跳表的平衡。
-
根据随机算法生成当前节点的高度(level)。
int SkipList::randomLevel() { int randomNum = 0; int level = -1; while (randomNum < 5 && level < this->maxLevel) { //1/2几率 & 限制最大高度 level++; randomNum = random.Uniform(10); //使用到了LevelDB的random算法 } return level; }
-
将head指针指向计算出的当前节点高度的最高层(level)。
-
向右遍历直到最后一个小于待插入节点temp的节点cur,将temp插入到cur之后。
-
将up指向temp节点(因为要将待插入的节点上下连接)。
-
记录已插入节点temp为up。
-
cur节点往下走一层,回到步骤1,直到最后一层。
代码如下:
bool SkipList::insert(const int key, const std::string& value)
{
int level = randomLevel(); //生成高度
SkipNode *cur = this->head;
int i = cur->level+1;
while(cur->level < level) { // 如果head小于当前高度,生成新的head
SkipNode *temp = new SkipNode(-1, "head", i);
temp->down = cur;
cur = temp;
i++;
}
head = cur;
while(cur->level > level) { //如果head大于当前高度,向下
cur = cur->down;
}
SkipNode *up = nullptr;
while(cur != nullptr) { //直到最后一层
while(cur->right != nullptr && cur->right->key < key) { //向右遍历
cur = cur->right;
}
if (cur->right && cur->right->key == key) { //节点已存在,插入失败
return false;
}
SkipNode *temp = new SkipNode(key, value, cur->level);
if(up) up->down = temp; //记录已插入节点
up = temp;
temp->right = cur->right; //插入节点
cur->right = temp;
cur = cur->down; //向下一层
}
return true;
}
删除算法
跳表的删除算法需要注意删除所有层级的目标节点。
- 从左上角head开始,查找该节点,同跳表查找算法,直到找到该节点。
- 删除该节点,向下一层。
- 重复步骤2,直到最后一层。
代码如下:
bool SkipList::remove(int key)
{
SkipNode *cur = head;
bool flag = false;
while(cur != nullptr) {
while(cur->right != nullptr && cur->right->key < key) {
cur = cur->right;
}
if(cur->right && cur->right->key == key) {
flag = true;
SkipNode *temp = cur->right;
cur->right = cur->right->right;
delete temp;
}
cur = cur->down;
}
return flag;
}
空间复杂度&时间复杂度分析
这里只给出结论,不做推导。
跳表的时间复杂度是O(logn),空间复杂度是O(n)。
具体如何推导可以观看视频斯坦福大学算法导论公开课第12节。
上图来自跳表作者的论文《Skip lists: a probabilistic alternative to balanced trees》,可以看出跳表的增删改查性能平均甚至优于平衡二叉树。
完整实现
给出我自己对跳表SkipList的完整实现(C++),包括打印、赋值重载、拷贝、析构的实现,代码已通过测试,数据结构部分见简介。
SkipList::SkipList(const SkipList& list): random(static_cast<uint32_t>(time(0))) //随机种子这里没有拷贝原list
{
this->maxLevel = list.maxLevel;
this->head = list.copy();
}
SkipList& SkipList::operator=(const SkipList& list)
{
if(this == &list) return *this;
this->maxLevel = list.maxLevel;
this->head = list.copy();
return *this;
}
int SkipList::randomLevel()
{
int randomNum = 0;
int level = -1;
while (randomNum < 5 && level < this->maxLevel) {
level++;
randomNum = random.Uniform(10);
}
return level;
}
SkipNode *SkipList::copy() const
{
std::map<int, SkipNode*> dict;
SkipNode *newHead = new SkipNode();
SkipNode *node = this->head;
SkipNode *newNode = newHead;
while(node) {
SkipNode* cur = node->right;
SkipNode* newCur = newNode;
while(cur) {
newCur->right = new SkipNode(cur->key, cur->value, cur->level);
if(dict.count(cur->key))
dict[cur->key]->down = newCur->right;
dict[cur->key] = newCur->right;
newCur = newCur->right;
cur = cur->right;
}
node = node->down;
if(node) newNode->down = new SkipNode();
newNode = newNode->down;
}
return newHead;
}
SkipNode *SkipList::findNode(int key) const
{
SkipNode *temp = this->head;
while(temp != nullptr) {
while(temp->right != nullptr && temp->right->key <= key) {
temp = temp->right;
if(temp->key == key) {
return temp;
}
}
temp = temp->down;
}
return temp;
}
std::string SkipList::find(const int key) const
{
SkipNode* node = findNode(key);
if (node) {
return node->value;
}
else return "";
}
bool SkipList::insert(const int key, const std::string& value)
{
if (findNode(key)) {
update(key, value);
return true;
}
int level = randomLevel();
SkipNode *cur = this->head;
int i = cur->level+1;
while(cur->level < level) {
SkipNode *temp = new SkipNode(-1, "head", i);
temp->down = cur;
cur = temp;
i++;
}
head = cur;
while(cur->level > level) {
cur = cur->down;
}
SkipNode *up = nullptr;
while(cur != nullptr) {
while(cur->right != nullptr && cur->right->key < key) {
cur = cur->right;
}
if (cur->right && cur->right->key == key) {
return false;
}
SkipNode *temp = new SkipNode(key, value, cur->level);
if(up) up->down = temp;
up = temp;
temp->right = cur->right;
cur->right = temp;
cur = cur->down;
}
return true;
}
bool SkipList::remove(int key)
{
SkipNode *cur = head;
bool flag = false;
while(cur != nullptr) {
while(cur->right != nullptr && cur->right->key < key) {
cur = cur->right;
}
if(cur->right && cur->right->key == key) {
flag = true;
SkipNode *temp = cur->right;
cur->right = cur->right->right;
delete temp;
}
cur = cur->down;
}
return flag;
}
void SkipList::print() const
{
SkipNode *node = this->head;
SkipNode *cur;
while(node != nullptr) {
cur = node;
while(cur != nullptr) {
std::cout << cur->key << ":" << cur->value << " -> ";
cur = cur->right;
}
node = node->down;
std::cout << "null" << std::endl;
}
}
SkipList::~SkipList()
{
SkipNode *node = head;
while(node) {
SkipNode* cur = node->right;
while(cur) {
SkipNode* temp = cur;
cur = cur->right;
delete temp;
}
SkipNode* temp = node;
node = node->down;
delete temp;
}
}
其中用到了 LevelDB 的random算法:
// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file. See the AUTHORS file for names of contributors.
#include <stdint.h>
//typedef unsigned int uint32_t;
//typedef unsigned long long uint64_t;
// A very simple random number generator. Not especially good at
// generating truly random bits, but good enough for our needs in this
// package.
class Random {
private:
uint32_t seed_;
public:
explicit Random(uint32_t s) : seed_(s & 0x7fffffffu) {
// Avoid bad seeds.
if (seed_ == 0 || seed_ == 2147483647L) {
seed_ = 1;
}
}
uint32_t Next() {
static const uint32_t M = 2147483647L; // 2^31-1
static const uint64_t A = 16807; // bits 14, 8, 7, 5, 2, 1, 0
// We are computing
// seed_ = (seed_ * A) % M, where M = 2^31-1
//
// seed_ must not be zero or M, or else all subsequent computed values
// will be zero or M respectively. For all other values, seed_ will end
// up cycling through every number in [1,M-1]
uint64_t product = seed_ * A;
// Compute (product % M) using the fact that ((x << 31) % M) == x.
seed_ = static_cast<uint32_t>((product >> 31) + (product & M));
// The first reduction may overflow by 1 bit, so we may need to
// repeat. mod == M is not possible; using > allows the faster
// sign-bit-based test.
if (seed_ > M) {
seed_ -= M;
}
return seed_;
}
// Returns a uniformly distributed value in the range [0..n-1]
// REQUIRES: n > 0
uint32_t Uniform(int n) { return (Next() % n); }
// Randomly returns true ~"1/n" of the time, and false otherwise.
// REQUIRES: n > 0
bool OneIn(int n) { return (Next() % n) == 0; }
// Skewed: pick "base" uniformly from range [0,max_log] and then
// return "base" random bits. The effect is to pick a number in the
// range [0,2^max_log-1] with exponential bias towards smaller numbers.
uint32_t Skewed(int max_log) {
return Uniform(1 << Uniform(max_log + 1));
}
};
参考
Skip lists: a probabilistic alternative to balanced trees
https://blog.csdn.net/ict2014/article/details/17394259/
https://zhuanlan.zhihu.com/p/113227225
https://blog.csdn.net/ryo1060732496/article/details/109458405
https://blog.csdn.net/qq_34412579/article/details/101731935
画图工具:ProcessOn
由于时间有限,内容和代码难免有所瑕疵,希望各位批评指正!
(全文完)