Redis rehash

最新推荐文章于 2023-11-14 19:49:17 发布

xuefeng0707

最新推荐文章于 2023-11-14 19:49:17 发布

阅读量1.1k

点赞数

分类专栏： Redis

本文链接：https://blog.csdn.net/xuefeng0707/article/details/80413146

版权

Redis 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Redis整体上是以KV形式存储的，V可以有几种类型：string、hash、list、set、zset。

KV存储对应的数据结构类似HashMap：数组+链表。

这种数据结构的理想状态就是所有的key均衡的分布在数组的每个槽位上，链表长度尽可能的短。

定义一个负载因子loadFactory：KV的数量/数组长度。

当loadFactor>1时，必定有至少1个槽位上的链表长度超过1，对于链表的查询时间复杂度为O(n)，n是链表长度。

当loadFactor超过一定数值时，说明链表长度也会比较长，此时查询时间会变长，因此需要进行扩容数组长度，然后把已经保存的KV重新hash到新的更大的数组上，即rehash。

源码：

struct redisServer {
...    
    redisDb *db;
...
}

typedef struct redisDb {
    dict *dict;                 /* The keyspace for this DB */
...
} redisDb;

typedef struct dict {
...
    dictht ht[2];
    long rehashidx; /* rehashing not in progress if rehashidx == -1 */
...
} dict;

typedef struct dictht {
    dictEntry **table;
    unsigned long size;
    unsigned long sizemask;
    unsigned long used;
} dictht;

typedef struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;
} dictEntry;

对应关系：

一个redisServer可以对应多个redisDb（默认16个），但是通常只是用其中一个，即第0个。

一个redisDb对应一个dict，这里是全局范围KV对应的dict，如果V是hash类型的值，那么一个KV对应一个dict。

一个dict对应2个dictht，但是正常情况只用其中一个，只有在做rehash时，会使用另一个作为rehash的目的地，rehash结束后恢复使用其中一个。

dictht.table即是KV存储结构中的数组，数组的每一个槽位dictEntry及dictEntry->next就构成了链表。

一般情况，rehash会由新设置key的时候触发开始。

但是因为redis是单线程，所以如果K很多时，不可能一步完成整个rehash操作，所以需要渐进式rehash。

分为两种：

1）操作redis时，额外做一步rehash

对redis做读取、插入、删除等操作时，会额外把位于table[dict->rehashidx]位置的链表移动到新的dictht中，然后把rehashidx做加一操作，移动到后面一个槽位。

2）后台定时任务调用rehash

后台定时任务rehash调用链

serverCron() - databasesCron() - incrementallyRehash() - dictRehashMilliseconds() - dictRehash()

控制rehash调用频率：每秒钟调用10次serverCron()方法，由server.hz配置。

rehash源码：

// n：最多做几个槽位对应链表的rehash
int dictRehash(dict *d, int n) {
    // 如果连续遍历到10*n个空的槽位，就结束这一步rehash，等下一步调用到。
    int empty_visits = n*10; /* Max number of empty buckets to visit. */
    if (!dictIsRehashing(d)) return 0;

    while(n-- && d->ht[0].used != 0) {
        dictEntry *de, *nextde;

        /* Note that rehashidx can't overflow as we are sure there are more
         * elements because ht[0].used != 0 */
        assert(d->ht[0].size > (unsigned long)d->rehashidx);
		// 跳过空的槽位
        while(d->ht[0].table[d->rehashidx] == NULL) {
            d->rehashidx++;
            if (--empty_visits == 0) return 1;
        }
		// 找到一个非空的槽位，把对应的链表中每一个node都转移到新的ht中
        de = d->ht[0].table[d->rehashidx];
        /* Move all the keys in this bucket from the old to the new hash HT */
        while(de) {
            unsigned int h;

            nextde = de->next;
            /* Get the index in the new hash table */
			// 计算在新的ht中的数组位置
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;
			// 插到链表的头部
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;
			// 把老的ht容量减一，新的ht容量加一
            d->ht[0].used--;
            d->ht[1].used++;
            de = nextde;
        }
		// 把老的数组对应的位置置空
        d->ht[0].table[d->rehashidx] = NULL;
		// 跳到数组下一个槽位
        d->rehashidx++;
    }

    /* Check if we already rehashed the whole table... */
	// 如果老的ht已经是空的了，把老的内存回收掉，然后把新老ht对调，仍然继续使用老的ht
    if (d->ht[0].used == 0) {
        zfree(d->ht[0].table);
        d->ht[0] = d->ht[1];
        _dictReset(&d->ht[1]);
        d->rehashidx = -1;
        return 0;
    }

    /* More to rehash... */
    return 1;
}

在redis做rehash的过程中，因为存在两个数组+链表结构，所以从redis查询、删除、更新数据时，要先到老的ht操作，如果不存在再到新的ht操作；新插入的直接插入到新的ht即可。

xuefeng0707

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Redis rehash

Redis整体上是以KV形式存储的，V可以有几种类型：string、hash、list、set、zset。KV存储对应的数据结构类似HashMap：数组+链表。这种数据结构的理想状态就是所有的key均衡的分布在数组的每个槽位上，链表长度尽可能的短。定义一个负载因子loadFactory：KV的数量/数组长度。当loadFactor&gt;1时，必定有至少1个槽位上的链表长度超过1，对于链表的查询时...
复制链接

扫一扫

专栏目录