Redis数据结构系列三：Dict字典

Musclewl

已于 2023-09-22 22:18:51 修改

阅读量77

点赞数

分类专栏： redis 文章标签： redis

于 2023-09-22 22:17:10 首次发布

本文链接：https://blog.csdn.net/musclewl/article/details/132892184

版权

redis 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

数据结构——Dict

Dict是redis中键值型（key-value Pair）的数据结构。Dict由哈希表（dictht）、哈希节点（dictEntry）和字典（dict）三部分组成。

哈希节点（dictEntry）

//哈希节点（dictEntry）
typedef struct dictEntry {
    // 类比map中的key, 任意类型
    void *key;
    // c联合类型，只能是其中之一参数赋值，不能同时存在两个参数赋值
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    // 链表中下一个节点
    struct dictEntry *next;
} dictEntry;

哈希表（dictht）

//哈希表（dictht）
typedef struct dictht {
    // dictEntry *table：指向dictEntry的指针，
    // dictEntry **table：指向dictEntry数组的指针，指向每一个dictEntry所在的地址
    // 表示一个dictEntry数组
    dictEntry **table;
    // 数组大小，只能是2^n，最小size=4
    unsigned long size;
    // 数组大小掩码，sizemask = 2^n -1，因为dict要做hash散列，hash(key) & sizemask == hash(key) % size，与sizemask做&运算就等于对size取余
    unsigned long sizemask;
    // 已保存的dictEntry 大小。因为dict是数组+链表的形式，total size = used * size， 所以一般来说used > size
    unsigned long used;
} dictht;

字典（dict）

//字典（dict）
typedef struct dict {
    // dic 散列hash算法类型，上述定义了7种，支持后续扩展
    dictType *type;
    // 私有数据，做特殊hash散列运算时用到
    void *privdata;
    // 定义了包含两个dictht对象的数组，一般ht[0]保存数据，ht[1]作为保存rehash后得到的dict
    dictht ht[2];
    // 是否正在rehash，若值为-1，表示未进行rehash，否则，正在rehash。一般rehashidx>0，即代表正在rehash
    long rehashidx; /* rehashing not in progress if rehashidx == -1 */
    // 代表rehash过程中止。>0代表中止，<0代表编码异常 (1:暂停；0：继续)
    int16_t pauserehash; /* If >0 rehashing is paused (<0 indicates coding error) */
} dict;

三者关系

在这里插入图片描述

Dict扩容

类似于java中的hashMap，当dict中dictht（哈希表）大小不足时，需要进行扩容

Dict扩容条件：负载因子（loadFactor）= used(dictEnrty数量) / size（dictht 数组大小）

负载因子 >= 1时，且当前服务器未进行bgsave 或bgrewiteaof等进程操作
负载因子 > 5时，强制执行扩容操作

源码解析：
先由_dictExpandIfNeeded方法着手，着重看扩容时的条件。

static int _dictExpandIfNeeded(dict *d) {
    // 1 dictIsRehashing(d) ((d)->rehashidx != -1) -> 判断是否正在rehash，若在rehash，直接抛出0
    if (dictIsRehashing(d)) return DICT_OK;
    // 2 如果ht[0]未初始化，则使用dictExpand方法进行初始化扩容，默认size = 4
    if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);
    /**
     * 3 需要扩容的条件
     * 3.1 d->ht[0].used >= d->ht[0].size ： 该条件代表 loadFactor=used/size >= 1
     * 3.2 dict_can_resize : 该条件代表 服务器没有执行bgsave或者bgrewriteaof等后台进程
     * 3.3 dict_force_resize_ratio=5，ht[0].used/d->ht[0].size > 5 ：该条件代表loadFactor=used/size > 5
     * 3.4 dictTypeExpandAllowed ： 该条件代表 判断是否能正常申请扩容后的内存
     * 
    */
    if (d->ht[0].used >= d->ht[0].size &&
        (dict_can_resize ||
         d->ht[0].used/d->ht[0].size > dict_force_resize_ratio) && dictTypeExpandAllowed(d)) {
        // 4 扩容大小为used + 1，实际上是找的是大于等于used + 1的2^n
        return dictExpand(d, d->ht[0].used + 1);
    }
    return DICT_OK;
}

由上述方法，可看出redis dict具体扩容时的条件。如下：

负载因子>=1 && dict_can_resize(表示服务器没有执行bgsave或者bgrewriteaof等后台进程)
负载因子>=1 && 负载因子>5 && 申请的内存是否可被分配

若是达到上述条件，则需要进行Dict扩容。扩容大小为userd+ 1，实际上是找的是大于等于used + 1的2^n

下述紧接着进入具体dict扩容方法_dictExpand：

int _dictExpand(dict *d, unsigned long size, int* malloc_failed) {
    if (malloc_failed) *malloc_failed = 0;
    // 1.1 dictIsRehashing(d) ((d)->rehashidx != -1) -> 判断是否正在rehash，若在rehash，直接抛出error
    // 1.2 判断ht[0].used > size，若是已经使用的（即dictEntry）> size, 则负载因子loadFactor = used/size 已经>1，不符合条件，直接抛出error
    if (dictIsRehashing(d) || d->ht[0].used > size)
        return DICT_ERR;
    dictht n; // 声明新的dicthashtable(哈希表)，将来赋值给ht[1]
    // 1.3 由于size=2^n ，在此通过_dictNextPower方法获取正在的size，扩容则是比size+1大的2^n，缩容则是比size大的2^n。最小为4
    unsigned long realsize = _dictNextPower(size);
    // 1.4 若算出的size小于原先size，则抛出error
    if (realsize < size || realsize * sizeof(dictEntry*) < realsize)
        return DICT_ERR;
    // 1.5 若算出的size等于原先size，则代表扩容和缩容没有意义，直接抛出error
    if (realsize == d->ht[0].size) return DICT_ERR;
    /* Allocate the new hash table and initialize all pointers to NULL */
    n.size = realsize;
    n.sizemask = realsize-1;
    if (malloc_failed) {
        n.table = ztrycalloc(realsize*sizeof(dictEntry*));
        *malloc_failed = n.table == NULL;
        if (*malloc_failed)
            return DICT_ERR;
    } else
        // 2 为创建dictEntry数组指针分配内存, 内存大小为 realsize * dictEntrySize
        n.table = zcalloc(realsize*sizeof(dictEntry*));

    n.used = 0;
    // 3 如果是第一次初始化，ht[0]==null，则将新创建的dictht n赋值给ht[0]
    if (d->ht[0].table == NULL) {
        d->ht[0] = n;
        return DICT_OK;
    }
    // 4 将新创建的dictht n赋值给ht[1]，且将rehashidx设置为0。rehashidx > -1 则代表正在rehash
    d->ht[1] = n;
    d->rehashidx = 0;
    return DICT_OK;
}

其扩容的步骤为：

健壮性检测，判断是否正在rehash等
通过_dictNextPower方法获取实际需要扩容的size
再次对real size进行健壮性检测
为创建dictEntry数组指针分配内存, 内存大小为 realsize * dictEntrySize
若ht[0]==null，则将n 赋值给ht[0]，为第一次初始化。若不为null，则代表是rehash过程，将其赋值给ht[1]，并将
rehashidx设置为0。

注意：这里仅是dict扩容逻辑，仅为ht[1]分配内存，并设置rehashidx设置为0。具体的原先数据散列过程和迁移数据在rehash逻辑中进行。

下述_dictNextPower方法是计算实际扩容大小。计算的是 >=size 的第一个2^n的数。

static unsigned long _dictNextPower(unsigned long size) {
    unsigned long i = DICT_HT_INITIAL_SIZE;
    if (size >= LONG_MAX) return LONG_MAX + 1LU;
    // 查找>=size 的第一个2^n的数
    while(1) {
        if (i >= size)
            return i;
        i *= 2;
    }
}

Dict收缩

存在扩容就意味着存在收缩，当dict删除元素后，会判断dict是否需要进行收缩。

先由t_hash.c 文件中删除dict中的元素（即hashTypeDeleted方法），可得出dictDelete元素成功后，将判断是否需要resize dict大小。

// t_hash.c # hashTypeDeleted
// 删除元素后，判断是否需要resize hash table
...
if (dictDelete((dict*)o->ptr, field) == C_OK) {
    deleted = 1;
    /* Always check if the dictionary needs a resize after a delete. */
    // 删除元素后，判断是否需要resize dict大小
    if (htNeedsResize(o->ptr)) dictResize(o->ptr);
}

判断是否需要的进行收缩的必要条件= loadFactor = used/size < 0.1，即负载因子<0.1

int htNeedsResize(dict *dict) {
    long long size, used;
    // 哈希表 dictht 大小
    size = dictSlots(dict);
    // 哈希节点 dictEntry数量
    used = dictSize(dict);
    // 若size > 4 且loadFactor = used/size < 0.1 时，需要进行resize
    return (size > DICT_HT_INITIAL_SIZE &&
            (used*100/size < HASHTABLE_MIN_FILL));
}

具体的收缩逻辑如下，可看出最终执行的还是Dict扩容方法：dictExpand。仅是传入的size不同。

int dictResize(dict *d) {
    unsigned long minimal;
    // dict_can_resize = 0，则代表服务器正在执行bdsave或bgrewiteaof，直接抛出error
    // 或增在rehash过程，也直接抛出error
    if (!dict_can_resize || dictIsRehashing(d)) return DICT_ERR;
    minimal = d->ht[0].used;
    // 若used < 4， 则将其赋值为4
    if (minimal < DICT_HT_INITIAL_SIZE)
        minimal = DICT_HT_INITIAL_SIZE;
    // 重置大小为minimal，实际上是第一个大于等于minimal的2^n
    return dictExpand(d, minimal);
}

总结：
扩容和收缩最终调用的都是dictExpand方法。仅是传入的size不同。

扩容传入的size = used+1
收缩传入的size = used。
最终会在_dictNextPower方法中计算出实际需要扩容的大小realsize，为>=size的第一个2^n

Dict Rehash

进行数据重新散列和数据迁移逻辑

Dict 的rehash过程不是一次性迁移。倘若dict内数据上百万，一次性迁移势必造成CPU阻塞，进而阻塞主进程。因此Dict的rehash过程是分多次、渐进式的完成。也称为渐进式Rehash。

步骤如下：

按照扩容或者收缩执行_dictExpand方法进行dict扩容或收缩操作
每次增、删、改、查操作都会检测rehashidx > -1，若是>-1，则代表正在rehash。会将dict.ht[0].table[rehashidx++]该角标下的链表重新散列，迁移到ht[1].table中。rehashidx++则是让数组角标后移一位，旨在下次增删改查时可进行后续的rehash操作
若将dict.ht[0].table的所有entry都迁移到dict.ht[1].table完成，将dict.ht[1]赋值给dict.ht[0]，将dict.ht[1]初始化为空哈希表
最后，将rehashidx设置为-1，代表rehash过程结束。

注意：在rehash过程中，除了增加操作只需要插入ht[1].table外，其他删除、修改和查询都需要在ht[0].table和ht[1].table中一依次遍历，防止某一个哈希表table中不存在。