Redis 源码分析（二）一个 rehash 也不阻塞的哈希表

最新推荐文章于 2024-07-17 08:41:12 发布

原创最新推荐文章于 2024-07-17 08:41:12 发布 · 959 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#redis #源码

Redis 专栏收录该内容

2 篇文章

订阅专栏

本文深入探讨了Redis中的哈希表实现及其独特的rehash机制。通过分析Redis的源代码，展示了它是如何通过单线程架构实现高效的rehash过程，从而避免了阻塞操作并保持高性能。

Redis 的架构设计挺巧妙的，舍弃了主流的多线程架构，别出心裁的使用单线程架构，说实话，作为一个 kv，我一开始认为多线程并行的访问应该是一个默认选项，但是 Redis 的高效，用事实证明，这显然不是。这个单线程的事件系统另开一坑再聊吧，今天主要是看一下这个有趣的哈希表。

typedef struct dict {
    dictType *type;
    void *privdata;
    dictht ht[2];
    int rehashidx; /* rehashing not in progress if rehashidx == -1 */
    int iterators; /* number of iterators currently running */
} dict;

这就是 Redis 里面存哈希表的数据结构，真正的哈希表是哪个 dictht，dictht[0] 是一个哈希表，dictht[1] 是另一个哈希表。这里两个哈希表的设计主要是为了完成一个操作—— rehash，并且是不阻塞的 rehash。
哈希表中最耗时的操作就是 rehash 了，作为一个单线程生物，Redis 不会另外开一个线程去搞这个事情，增删改查还有 rehash 都在一个线程里跑，那么如何能让 rehash 的过程不影响其他的操作呢？
我们来随便找一个哈希表的操作函数，就拿哈希表的查找函数来讲吧

dictEntry *dictFind(dict *d, const void *key)
{
    dictEntry *he;
    unsigned int h, idx, table;

    if (d->ht[0].size == 0) return NULL; /* We don't have a table at all */
    if (dictIsRehashing(d)) _dictRehashStep(d);// 注意
    h = dictHashKey(d, key);
    for (table = 0; table <= 1; table++) {
        idx = h & d->ht[table].sizemask;
        he = d->ht[table].table[idx];
        while(he) {
            if (dictCompareHashKeys(d, key, he->key))
                return he;
            he = he->next;
        }
        if (!dictIsRehashing(d)) return NULL;
    }
    return NULL;
}

如果你看了我上一篇文章的话，这个函数应该已经见过了，同样不需要看整个函数，只需要看我标注的地方就好了，就一行，意思呢，很明白，这个哈希表是不是在 rehash 呀？如果是的话执行 _dictRehashStep 这个函数（开头加了个 _ 这个符号，假装私有函数。。）这个函数是什么意思呢？

static void _dictRehashStep(dict *d) {
    if (d->iterators == 0) dictRehash(d,1);
}

里面那个 dictRehash 是执行 rehash 的地方，直接进来

int dictRehash(dict *d, int n) {
    if (!dictIsRehashing(d)) return 0;

    while(n--) {
        dictEntry *de, *nextde;

        /* Check if we already rehashed the whole table... */
        if (d->ht[0].used == 0) {
            _dictFree(d->ht[0].table);
            d->ht[0] = d->ht[1];
            _dictReset(&d->ht[1]);
            d->rehashidx = -1;
            return 0;
        }

        /* Note that rehashidx can't overflow as we are sure there are more
         * elements because ht[0].used != 0 */
        while(d->ht[0].table[d->rehashidx] == NULL) d->rehashidx++;
        de = d->ht[0].table[d->rehashidx];
        /* Move all the keys in this bucket from the old to the new hash HT 简单说就是找到我们该搬的桶，搬空它，然后结束战斗，就只搬一个桶*/
        while(de) {
            unsigned int h;

            nextde = de->next;
            /* Get the index in the new hash table */
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;
            d->ht[0].used--;
            d->ht[1].used++;
            de = nextde;
        }
        d->ht[0].table[d->rehashidx] = NULL;
        d->rehashidx++;
    }
    return 1;
}

上文代码中的中文应该很引人注目（因为代码还是不如人话好懂啊~），这里这个函数就是找到这个哈希表中需要被搬运的第一个桶，然后把这个桶里面的所有项一个个重新哈希一下，搬到第二个哈希表中，就是从 dictht 中的 ht[0] 搬运到 ht[1]，然后结束之后，指针交换一下就可以了呀。
既然了解了这个搬运工函数的作用，我们来看一下哪些部分调用了这个函数呢？
dictAdd
dictFind
dictGenericDelete
增删改查（改是先删再add）里面都用到了呀，也就是在线上不停的增删改查中不知不觉就 rehash 完了，一个 O(n) 的操作就这样变成了均摊 O(1) 的，当然不会阻塞啦。
Redis 是一个在线服务，其数据结构也是根据这个特性来设计的，把一个大的操作均摊到每个细小的操作中来降低算法复杂度，这种思想并不罕见，比如带懒惰标记的线段树，伸展树，STL 中的 vector 也是均摊的来算复杂度，这种方法虽然有点耍赖皮，但是相当实用啊。
下一讲来讲 Redis 的事件系统吧，这个系统一方面使得 Redis 效率极高，另一方面也降低了很多的编码复杂度，也是一个精妙的设计。