redis源码学习--数据结构：字典设计和实现

最新推荐文章于 2024-07-18 17:57:16 发布

Carson_zhong

最新推荐文章于 2024-07-18 17:57:16 发布

阅读量133

点赞数

分类专栏：数据结构 hash 文章标签：数据结构

本文链接：https://blog.csdn.net/dmgy614262711/article/details/107299976

版权

数据结构 hash 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

redis的字典定义在dict.h，使用hash来实现，所以需要先了解hash的基础知识。
一般hash是用数组实现，根据key计算出数组下标存放value，有可能多个key值会对于一个索引，即key产生冲突，redis使用链地址法解决，即索引下存放的是value的链表。数组使用的数量和申请大小的比值称为负载因子，负载因子越大，表示key产生冲突的概率就越大，需要扩充数组；负载因子越小，表示不有内存被浪费，可以缩小数组。以上调整数组大小称为rehash。rehash前后由key计算索引的公式是不一样的。

我们从下往上看数据结构，先从hash的数据结构开始。
以下是hash中链表节点的数据结构，使用union可以灵活的存储数据

typedef struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;
} dictEntry;

一下是hash表的数据结构

typedef struct dictht {
    dictEntry **table;
    unsigned long size; // 数组总长度
    unsigned long sizemask; // 最大数组下标 size - 1
    unsigned long used; // 使用长度
} dictht;

接下来是hash中为了方便“定制”而定义的一系列方法

typedef struct dictType {
    uint64_t (*hashFunction)(const void *key); // 计算hssh值(索引)
    void *(*keyDup)(void *privdata, const void *key); // key拷贝
    void *(*valDup)(void *privdata, const void *obj); //value拷贝
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);
    void (*keyDestructor)(void *privdata, void *key);
    void (*valDestructor)(void *privdata, void *obj);
} dictType;

字典的数据结构：

typedef struct dict {
    dictType *type;
    void *privdata;
    dictht ht[2];
    long rehashidx; /* rehashing not in progress if rehashidx == -1 */
    unsigned long iterators; /* number of iterators currently running */
} dict;

使用两个dictht原因是为了支持rehash。当不进行rehash时，rehashidx == -1，否则就是当前进行正在进行rehash的索引。rehash的过程就是数据从ht[0]搬迁到ht[1]的过程，搬迁完之后ht[1]变成ht[0]，同时释放ht[1],准备下次rehash

下面直接跟进mian函数走读代码：
creat即申请内存，之后赋值，普通流程。

/* Create a new hash table */
dict *dictCreate(dictType *type, void *privDataPtr)
{
    dict *d = zmalloc(sizeof(*d));

    _dictInit(d,type,privDataPtr);
    return d;
}

add:创建和赋值分开，如果key值重复讲返回失败。key和value的赋值可以使用创建时注册的方法。添加时如何索引重复，从表头插入

/* Add an element to the target hash table 
   添加元素到hash表中
***/
int dictAdd(dict *d, void *key, void *val)
{
    // 申请创建元素
    dictEntry *entry = dictAddRaw(d,key,NULL);

    if (!entry) return DICT_ERR;
    dictSetVal(d, entry, val); // 拷贝值到hash表中
    return DICT_OK;
}

Rehash:操作可以指定时间，避免耗时。具体如何rehash的实现不在此处展开。

/* Rehash for an amount of time between ms milliseconds and ms+1 milliseconds */
int dictRehashMilliseconds(dict *d, int ms) {
    long long start = timeInMilliseconds();
    int rehashes = 0;

    while(dictRehash(d,100)) {
        rehashes += 100;
        if (timeInMilliseconds()-start > ms) break;
    }
    return rehashes;
}

find:逻辑简单，先判断是否空，之后计算索引后遍历链表查找。需要注意的是操作中引入了rehash操作，目的是将rehash分散在每步操作中，如果hash中元素数量级大，一步到位会导致耗时，所以分步操作。


dictEntry *dictFind(dict *d, const void *key)
{
    dictEntry *he;
    uint64_t h, idx, table;

    if (d->ht[0].used + d->ht[1].used == 0) return NULL; /* dict is empty */
    if (dictIsRehashing(d)) _dictRehashStep(d);
    h = dictHashKey(d, key);
    for (table = 0; table <= 1; table++) {
        idx = h & d->ht[table].sizemask;
        he = d->ht[table].table[idx];
        while(he) {
            if (key==he->key || dictCompareKeys(d, key, he->key))
                return he;
            he = he->next;
        }
        if (!dictIsRehashing(d)) return NULL;
    }
    return NULL;
}

delte：删除逻辑如下，逻辑简单。计算出索引之后再变了链表，找到之后就是删除链表的操作。也加入了rehash操作

static dictEntry *dictGenericDelete(dict *d, const void *key, int nofree) {
    uint64_t h, idx;
    dictEntry *he, *prevHe;
    int table;

    if (d->ht[0].used == 0 && d->ht[1].used == 0) return NULL;

    if (dictIsRehashing(d)) _dictRehashStep(d);
    h = dictHashKey(d, key);

    for (table = 0; table <= 1; table++) {
        idx = h & d->ht[table].sizemask;
        he = d->ht[table].table[idx];
        prevHe = NULL;
        while(he) {
            if (key==he->key || dictCompareKeys(d, key, he->key)) {
                /* Unlink the element from the list */
                if (prevHe)
                    prevHe->next = he->next;
                else
                    d->ht[table].table[idx] = he->next;
                if (!nofree) {
                    dictFreeKey(d, he);
                    dictFreeVal(d, he);
                    zfree(he);
                }
                d->ht[table].used--;
                return he;
            }
            prevHe = he;
            he = he->next;
        }
        if (!dictIsRehashing(d)) break;
    }
    return NULL; /* not found */
}

总结：
1、redis的dict是使用的hash实现，使用了dictType 结构可以方便的对具体不同的dict进行“定制”操作，实际上是使用C语言实现了C++函数的重载。
2、rehash操作分散在了每个操作中。封装了可以指定耗时的rehash操作。

Carson_zhong

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
redis源码学习--数据结构：字典设计和实现

redis的字典定义在dict.h，使用hash来实现，所以需要先了解hash的基础知识。一般hash是用数组实现，根据key计算出数组下标存放value，有可能多个key值会对于一个索引，即key产生冲突，redis使用链地址法解决，即索引下存放的是value的链表。数组使用的数量和申请大小的比值称为负载因子，负载因子越大，表示key产生冲突的概率就越大，需要扩充数组；负载因子越小，表示不有内存被浪费，可以缩小数组。以上调整数组大小称为rehash。rehash前后由key计算索引的公式是不一样的。我
复制链接

扫一扫