算法训练营（四）跳表

最新推荐文章于 2023-02-22 22:12:57 发布

王小闹儿

最新推荐文章于 2023-02-22 22:12:57 发布

阅读量282

点赞数 1

分类专栏：刷题数据结构

本文链接：https://blog.csdn.net/qq_29996285/article/details/109173361

版权

刷题同时被 2 个专栏收录

56 篇文章 5 订阅

订阅专栏

数据结构

6 篇文章 0 订阅

订阅专栏

4.1 LRU Cache-Linked list

一、引——如何给链表加速？

思想——空间换时间

通过添加索引的方式来加速

时间复杂度分析——O(logn)

二、基本概念

跳跃表（skiplist）是一种随机化的数据，由 William Pugh 在论文《Skip lists: a probabilistic alternative to balanced trees》中提出，跳跃表以有序的方式在层次化的链表中保存元素

现实中的跳表形态

从图中可以看到，跳跃表主要由以下部分构成：

表头（head）：负责维护跳跃表的节点指针。
跳跃表节点：保存着元素值，以及多个层。
层：保存着指向其他元素的指针。高层的指针越过的元素数量大于等于低层的指针，为了提高查找的效率，程序总是从高层先开始访问，然后随着元素值范围的缩小，慢慢降低层次。
表尾：全部由 NULL 组成，表示跳跃表的末尾。

三、跳表的实现

鸽

四、工程中的应用

4.1 LRU Cache-Linked list

4.1.1 什么是缓存

缓存可以有效地解决存储器性能与容量的这对矛盾。

从本质上来说，缓存之所以有效是因为程序和数据的局部性（locality）。程序会按固定的顺序执行，数据会存放在连续的内存空间并反复读写。这些特点使得我们可以缓存那些经常用到的数据，从而提高读写速度。

缓存的大小是固定的，它应该只保存最常被访问的那些数据。然而未来不可预知，我们只能从过去的访问序列做预测，于是就有了各种各样的缓存替换策略。

4.1.2 lru的实现

假设缓存的大小固定，初始状态为空。每发生一次读内存操作，首先查找待读取的数据是否存在于缓存中，若是，则缓存命中，返回数据；若否，则缓存未命中，从内存中读取数据，并把该数据添加到缓存中。

向缓存添加数据时，如果缓存已满，则需要删除访问时间最早的那条数据，这种更新缓存的方法就叫做LRU。

我们需要一种既按访问时间排序，又能在常数时间内随机访问的数据结构来实现lru。可以通过HashMap+双向链表实现。

HashMap保证通过key访问数据的时间为O(1)
双向链表则按照访问时间的顺序依次穿过每个数据。

之所以选择双向链表而不是单链表，是为了可以从中间任意结点修改链表结构，而不必从头结点开始遍历。

如下图所示，黑色部分为HashMap的结构，红色箭头则是双向链表的正向连接（逆向连接未画出）。可以清晰地看到，数据的访问顺序是1->3->5->6->10。我们只需要在每次访问过后改变链表的连接顺序即可。

4.1.2 LRU缓存机制

运用你所掌握的数据结构，设计和实现一个 LRU (最近最少使用) 缓存机制。它应该支持以下操作：获取数据 get 和写入数据 put 。

获取数据 get(key) - 如果关键字 (key) 存在于缓存中，则获取关键字的值（总是正数），否则返回 -1。
写入数据 put(key, value) - 如果关键字已经存在，则变更其数据值；如果关键字不存在，则插入该组「关键字/值」。

当缓存容量达到上限时，它应该在写入新数据之前删除最久未使用的数据值，从而为新的数据值留出空间。

进阶:

你是否可以在 O(1) 时间复杂度内完成这两种操作？

示例:

LRUCache cache = new LRUCache( 2 /* 缓存容量 */ );

cache.put(1, 1);
cache.put(2, 2);
cache.get(1); // 返回 1
cache.put(3, 3); // 该操作会使得关键字 2 作废
cache.get(2); // 返回 -1 (未找到)
cache.put(4, 4); // 该操作会使得关键字 1 作废
cache.get(1); // 返回 -1 (未找到)
cache.get(3); // 返回 3
cache.get(4); // 返回 4

来源：力扣（LeetCode）
链接：https://leetcode-cn.com/problems/lru-cache
著作权归领扣网络所有。商业转载请联系官方授权，非商业转载请注明出处。

优质题解

实现代码

class LRUCache {
public:
    LRUCache(int capacity) : _capacity(capacity) {}
    
    int get(int key) {
        auto it = _table.find(key);
        if (it != _table.end()) {
            _lru.splice(_lru.begin(), _lru, it->second);
            return it->second->second;
        }
        return -1;
    }
    
    void put(int key, int value) {
        auto it = _table.find(key);
        if (it != _table.end()) {
            _lru.splice(_lru.begin(), _lru, it->second);      //Transfer elements from list to list
            it->second->second = value;
            return;
        }
        
        _lru.emplace_front(key, value);  //emplace_front: Construct and insert element at beginning
        _table[key] = _lru.begin();
        
        //清除多余数据
        if (_table.size() > _capacity) {
            _table.erase(_lru.back().first);
            _lru.pop_back();
        }
    }
private:
    unordered_map<int, std::list<std::pair<int, int>>::iterator> _table;
    std::list<std::pair<int, int>> _lru;
    int _capacity;
};

/**
 * Your LRUCache object will be instantiated and called as such:
 * LRUCache* obj = new LRUCache(capacity);
 * int param_1 = obj->get(key);
 * obj->put(key,value);
 */

4.2 redis中跳表的实现

4.2.1 为什么使用跳表

请看开发者说的，他为什么选用skiplist

There are a few reasons:

1) They are not very memory intensive. It's up to you basically. Changing parameters about the probability of a node to have a given number of levels will make then less memory intensive than btrees.

2) A sorted set is often target of many ZRANGE or ZREVRANGE operations, that is, traversing the skip list as a linked list. With this operation the cache locality of skip lists is at least as good as with other kind of balanced trees.

3) They are simpler to implement, debug, and so forth. For instance thanks to the skip list simplicity I received a patch (already in Redis master) with augmented skip lists implementing ZRANK in O(log(N)). It required little changes to the code.

About the Append Only durability & speed, I don't think it is a good idea to optimize Redis at cost of more code and more complexity for a use case that IMHO should be rare for the Redis target (fsync() at every command). Almost no one is using this feature even with ACID SQL databases, as the performance hint is big anyway.

About threads: our experience shows that Redis is mostly I/O bound. I'm using threads to serve things from Virtual Memory. The long term solution to exploit all the cores, assuming your link is so fast that you can saturate a single core, is running multiple instances of Redis (no locks, almost fully scalable linearly with number of cores), and using the "Redis Cluster" solution that I plan to develop in the future.

4.2.2 redis中的跳表

为了满足自身的功能需要， Redis 基于 William Pugh 论文中描述的跳跃表进行了以下修改：

允许重复的 score 值：多个不同的 member 的 score 值可以相同。
进行对比操作时，不仅要检查 score 值，还要检查 member ：当 score 值可以重复时，单靠 score 值无法判断一个元素的身份，所以需要连 member 域都一并检查才行。
每个节点都带有一个高度为 1 层的后退指针，用于从表尾方向向表头方向迭代：当执行 ZREVRANGE 或 ZREVRANGEBYSCORE 这类以逆序处理有序集的命令时，就会用到这个属性。

这个修改版的跳跃表由 redis.h/zskiplist 结构定义：

typedef struct zskiplist {

    // 头节点，尾节点
    struct zskiplistNode *header, *tail;

    // 节点数量
    unsigned long length;

    // 目前表内节点的最大层数
    int level;

} zskiplist;

跳跃表的节点由 redis.h/zskiplistNode 定义：

typedef struct zskiplistNode {

    // member 对象
    robj *obj;

    // 分值
    double score;

    // 后退指针
    struct zskiplistNode *backward;

    // 层
    struct zskiplistLevel {

        // 前进指针
        struct zskiplistNode *forward;

        // 这个层跨越的节点数量
        unsigned int span;

    } level[];

} zskiplistNode;

以下是操作这两个数据结构的 API ，API 的用途与相应的算法复杂度：

函数	作用	复杂度
`zslCreateNode`	创建并返回一个新的跳跃表节点	最坏 O(1)O(1)
`zslFreeNode`	释放给定的跳跃表节点	最坏 O(1)O(1)
`zslCreate`	创建并初始化一个新的跳跃表	最坏 O(1)O(1)
`zslFree`	释放给定的跳跃表	最坏 O(N)O(N)
`zslInsert`	将一个包含给定 `score` 和 `member` 的新节点添加到跳跃表中	最坏 O(N)O(N) 平均 O(logN)O(log⁡N)
`zslDeleteNode`	删除给定的跳跃表节点	最坏 O(N)O(N)
`zslDelete`	删除匹配给定 `member` 和 `score` 的元素	最坏 O(N)O(N) 平均 O(logN)O(log⁡N)
`zslFirstInRange`	找到跳跃表中第一个符合给定范围的元素	最坏 O(N)O(N) 平均 O(logN)O(log⁡N)
`zslLastInRange`	找到跳跃表中最后一个符合给定范围的元素	最坏 O(N)O(N) 平均 O(logN)O(log⁡N)
`zslDeleteRangeByScore`	删除 `score` 值在给定范围内的所有节点	最坏 O(N2)O(N2)
`zslDeleteRangeByRank`	删除给定排序范围内的所有节点	最坏 O(N2)O(N2)
`zslGetRank`	返回目标元素在有序集中的排位	最坏 O(N)O(N) 平均 O(logN)O(log⁡N)
`zslGetElementByRank`	根据给定排位，返回该排位上的元素节点	最坏 O(N)O(N) 平均 O(logN)

跳跃表在 Redis 的唯一作用，就是实现有序集数据类型。

王小闹儿

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
算法训练营（四）跳表

一、引——如何给链表加速？思想——空间换时间通过添加索引的方式来加速时间复杂度分析——O(logn)二、基本概念跳跃表（skiplist）是一种随机化的数据，由 William Pugh 在论文《Skip lists: a probabilistic alternative to balanced trees》中提出，跳跃表以有序的方式在层次化的链表中保存元素现实中的跳表形态从图中可以看到，跳跃表主要由以下部分构成：表...
复制链接

扫一扫