Redis源码分析笔记4-redis的数据类型-字典

最新推荐文章于 2024-07-23 22:25:10 发布

bobkentblog

最新推荐文章于 2024-07-23 22:25:10 发布

阅读量575

点赞数

分类专栏： store system 文章标签：源码 redis 字典源码实现哈希表 rehash

本文链接：https://blog.csdn.net/bobkentblog/article/details/46053313

版权

store system 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

简介
本节主要介绍redis的第三种主要数据类型字典。
本文介绍顺序

介绍redis字典的应用；
分析字典类的定义；
分析字典类各个属性的含义；
介绍字典类的主要API及功能；
分析字典类的代码实现。

介绍redis字典的应用
redis中字典的应用在《redis的设计与实现》一书中已经描述的十分清楚了。不过这里让手中没有书的朋友看一下此小节：
这里写图片描述

字典的定义
那么我们下面一起看一下代码src/dict.h

/*
 * 字典
 */
typedef struct dict {

    // 类型特定函数
    dictType *type;

    // 私有数据
    void *privdata;

    // 哈希表
    dictht ht[2];

    // rehash 索引
    // 当 rehash 不在进行时，值为 -1
    int rehashidx; /* rehashing not in progress if rehashidx == -1 */

    // 目前正在运行的安全迭代器的数量
    int iterators; /* number of iterators currently running */

} dict;

图17
我们把上面的字典看做是OO中的一个类，该类中有4个属性和6个方法，我们一个一个来分析：
dictType *type

字典类型特定函数。为实现根据字典类型不同而多态操作的方法。它的定义如下图所示：

/*
 * 字典类型特定函数
 */
typedef struct dictType {

    // 计算哈希值的函数
    unsigned int (*hashFunction)(const void *key);

    // 复制键的函数
    void *(*keyDup)(void *privdata, const void *key);

    // 复制值的函数
    void *(*valDup)(void *privdata, const void *obj);

    // 对比键的函数
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);

    // 销毁键的函数
    void (*keyDestructor)(void *privdata, void *key);

    // 销毁值的函数
    void (*valDestructor)(void *privdata, void *obj);

} dictType;

从上图可以看到，字典类的6个方法都是公有方法，可针对不同的字典键值对类型来进行多态。
* void *privdata

字典类用于多态的私有数据。更详细的下面继续分析。

dictht ht[2]

这里的ht是一个哈希表数组，字典中存放两个哈希表。其中一个用于常规字典操作，一个用于rehash。我们先看一下哈希表的定义，在具体讨论两个哈希表在字典中的作用。

/* This is our hash table structure. Every dictionary has two of this as we
 * implement incremental rehashing, for the old to the new table. */
/*
 * 哈希表
 *  * 每个字典都使用两个哈希表，从而实现渐进式 rehash 。
 */
typedef struct dictht {

    // 哈希表数组
    dictEntry **table;

    // 哈希表大小
    unsigned long size;

    // 哈希表大小掩码，用于计算索引值
    // 总是等于 size - 1
    unsigned long sizemask;

    // 该哈希表已有节点的数量
    unsigned long used;

} dictht;

图19
在这个图中我们看到哈希表共有四个属性：
* 属性table

是一个指针数组，数组中每一个元素都指向一个哈希表节点的指针。我们在这里先看一下哈希表节点的定义：

/*
 * 哈希表节点
 */
typedef struct dictEntry {

    // 键
    void *key;

    // 值
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
    } v;

    // 指向下个哈希表节点，形成链表
    struct dictEntry *next;

} dictEntry;

图20
在哈希表的节点中实际存储着键值对。Key属性保存着哈希表的键，v属性保存着哈希表的值。有三种数据类型的值，一个指向值的指针，或一个uint64_t类型的整数，或一个int64_t类型的整数。
* 属性size

属性size记录哈希表的大小。

属性sizemask

属性sizemask总等于size -1，用于计算哈希表的索引值

属性used

该哈希表已有节点的数量。注意这里不是指哈希表中已经使用的桶的数量，而是指哈希表中实际键值对的数量。

int rehashidx

此变量表示当前rehash进行在dict->ht[0]中已经rehash的桶号(rehash按照从头至尾的过程)。的主要用于渐进式rehash(渐进式rehash在后续小节中描述)。此变量为-1时，表示哈希表未进行rehash。
总结：

通过上面的描述大家可能还有点晕，下面我用一张数据成员图来描述字典类：
这里写图片描述
数据成员图

2.3.1.字典的实现

在实现的部分，我们将详细的分析字典的API方法：
首先我们看一下字典的常用API：
这里写图片描述
创建一个新的字典

/*
 * 字典的操作状态
 */
// 操作成功
#define DICT_OK 0
// 操作失败（或出错）
#define DICT_ERR 1

/* Reset a hash table already initialized with ht_init().
 * NOTE: This function should only be called by ht_destroy(). */
/*
 * 重置（或初始化）给定哈希表的各项属性值
 *
 * p.s. 上面的英文注释已经过期
 *
 * T = O(1)
 */
static void _dictReset(dictht *ht)
{
    ht->table = NULL;
    ht->size = 0;
    ht->sizemask = 0;
    ht->used = 0;
}
/* Create a new hash table */
/*
 * 创建一个新的字典
 *
 * T = O(1)
 */
dict *dictCreate(dictType *type,
        void *privDataPtr)
{
    dict *d = zmalloc(sizeof(*d));

    _dictInit(d,type,privDataPtr);

    return d;
}

/* Initialize the hash table */
/*
 * 初始化哈希表
 *
 * T = O(1)
 */
int _dictInit(dict *d, dictType *type,
        void *privDataPtr)
{
    // 初始化两个哈希表的各项属性值
    // 但暂时还不分配内存给哈希表数组
    _dictReset(&d->ht[0]);
    _dictReset(&d->ht[1]);

    // 设置类型特定函数
    d->type = type;

    // 设置私有数据
    d->privdata = privDataPtr;

    // 设置哈希表 rehash 状态
    d->rehashidx = -1;

    // 设置字典的安全迭代器数量
    d->iterators = 0;

    return DICT_OK;
}

初始化字典的操作没有多少特殊的。完成以下几个步骤：

分配字典的内存
初始化两个哈希表(不分配哈希表空间)；
设置哈希表的公共方法和私有数据；
设置字典的rehash状态为-1；（rehash的作用后续详细讨论）
设置安全迭代器的属性为0；

尝试将给定键值对添加到字典中

/* Add an element to the target hash table */
/*
 * 尝试将给定键值对添加到字典中
 *  * 只有给定键 key 不存在于字典时，添加操作才会成功
 *  * 添加成功返回 DICT_OK ，失败返回 DICT_ERR
 *  * 最坏 T = O(N) ，平滩 O(1) 
 */
int dictAdd(dict *d, void *key, void *val)
{
    // 尝试添加键到字典，并返回包含了这个键的新哈希节点
    // T = O(N)
    dictEntry *entry = dictAddRaw(d,key);

    // 键已存在，添加失败
    if (!entry) return DICT_ERR;

    // 键不存在，设置节点的值
    // T = O(1)
    dictSetVal(d, entry, val);

    // 添加成功
    return DICT_OK;
}

由上面代码可以看到，将添加一个键值对分为了两步：

dictAddRaw：如果键不存在，返回该键所在的桶
dictSetVal：设置值

这块代码注释的很清楚了，我就不多废话了，我们先看一下dictAddRaw的实现吧：

/* Low level add. This function adds the entry but instead of setting
 * a value returns the dictEntry structure to the user, that will make
 * sure to fill the value field as he wishes.
 *
 * This function is also directly exposed to user API to be called
 * mainly in order to store non-pointers inside the hash value, example:
 *
 * entry = dictAddRaw(dict,mykey);
 * if (entry != NULL) dictSetSignedIntegerVal(entry,1000);
 *
 * Return values:
 *
 * If key already exists NULL is returned.
 * If key was added, the hash entry is returned to be manipulated by the caller.
 */
/*
 * 尝试将键插入到字典中
 *
 * 如果键已经在字典存在，那么返回 NULL
 *
 * 如果键不存在，那么程序创建新的哈希节点，
 * 将节点和键关联，并插入到字典，然后返回节点本身。
 *
 * T = O(N)
 */
dictEntry *dictAddRaw(dict *d, void *key)
{
    int index;
    dictEntry *entry;
    dictht *ht;

    // 如果条件允许的话，进行单步 rehash
    // T = O(1)
    if (dictIsRehashing(d)) _dictRehashStep(d);

    /* Get the index of the new element, or -1 if
     * the element already exists. */
    // 计算键在哈希表中的索引值
    // 如果值为 -1 ，那么表示键已经存在
    // T = O(N)
    if ((index = _dictKeyIndex(d, key)) == -1)
        return NULL;

    // T = O(1)
    /* Allocate the memory and store the new entry */
    // 如果字典正在 rehash ，那么将新键添加到 1 号哈希表
    // 否则，将新键添加到 0 号哈希表
    ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0];
    // 为新节点分配空间
    entry = zmalloc(sizeof(*entry));
    // 将新节点插入到链表表头
    entry->next = ht->table[index];
    ht->table[index] = entry;
    // 更新哈希表已使用节点数量
    ht->used++;

    /* Set the hash entry fields. */
    // 设置新节点的键
    // T = O(1)
    dictSetKey(d, entry, key);

    return entry;
}

这个的英文注释还是写的很清晰的。
总结下来这个函数就是完成以下流程：
1.如果条件允许的话，进行单步 rehash；
2.判断键是否存在并计算键在哈希表中的索引值；
3.如果字典正在 rehash ，那么将新键添加到 1 号哈希表；
4.为新节点分配空间，并将新节点插入到链表表头。将新节点插入表头的原因是字典中没有指向桶表尾的指针，所以将新的冲突的节点放在哈希表的表头；
5.更新哈希表已使用节点数量；
6.设置新节点的键。

这里我们再分析一下步骤1中的单步rehash

/* This function performs just a step of rehashing, and only if there are
 * no safe iterators bound to our hash table. When we have iterators in the
 * middle of a rehashing we can't mess with the two hash tables otherwise
 * some element can be missed or duplicated.
 *
 * 在字典不存在安全迭代器的情况下，对字典进行单步 rehash 。
 *
 * 字典有安全迭代器的情况下不能进行 rehash ，
 * 因为两种不同的迭代和修改操作可能会弄乱字典。
 *
 * This function is called by common lookup or update operations in the
 * dictionary so that the hash table automatically migrates from H1 to H2
 * while it is actively used. 
 *
 * 这个函数被多个通用的查找、更新操作调用，
 * 它可以让字典在被使用的同时进行 rehash 。
 *
 * T = O(1)
 */
static void _dictRehashStep(dict *d) {
    if (d->iterators == 0) dictRehash(d,1);
}
/* Performs N steps of incremental rehashing. Returns 1 if there are still
 * keys to move from the old to the new hash table, otherwise 0 is returned.
 *
 * 执行 N 步渐进式 rehash 。
 *
 * 返回 1 表示仍有键需要从 0 号哈希表移动到 1 号哈希表，
 * 返回 0 则表示所有键都已经迁移完毕。
 *
 * Note that a rehashing step consists in moving a bucket (that may have more
 * than one key as we use chaining) from the old to the new hash table.
 *
 * 注意，每步 rehash 都是以一个哈希表索引（桶）作为单位的，
 * 一个桶里可能会有多个节点，
 * 被 rehash 的桶里的所有节点都会被移动到新哈希表。
 *
 * T = O(N)
 */
int dictRehash(dict *d, int n) {

    // 只可以在 rehash 进行中时执行
    if (!dictIsRehashing(d)) return 0;

    // 进行 N 步迁移
    // T = O(N)
    while(n--) {
        dictEntry *de, *nextde;

        /* Check if we already rehashed the whole table... */
        // 如果 0 号哈希表为空，那么表示 rehash 执行完毕
        // T = O(1)
        if (d->ht[0].used == 0) {
            // 释放 0 号哈希表
            zfree(d->ht[0].table);
            // 将原来的 1 号哈希表设置为新的 0 号哈希表
            d->ht[0] = d->ht[1];
            // 重置旧的 1 号哈希表
            _dictReset(&d->ht[1]);
            // 关闭 rehash 标识
            d->rehashidx = -1;
            // 返回 0 ，向调用者表示 rehash 已经完成
            return 0;
        }

        /* Note that rehashidx can't overflow as we are sure there are more
         * elements because ht[0].used != 0 */
        // 确保 rehashidx 没有越界
        assert(d->ht[0].size > (unsigned)d->rehashidx);

        // 略过数组中为空的索引，找到下一个非空索引
        while(d->ht[0].table[d->rehashidx] == NULL) d->rehashidx++;

        // 指向该索引的链表表头节点
        de = d->ht[0].table[d->rehashidx];
        /* Move all the keys in this bucket from the old to the new hash HT */
        // 将链表中的所有节点迁移到新哈希表
        // T = O(1)
        while(de) {
            unsigned int h;

            // 保存下个节点的指针
            nextde = de->next;

            /* Get the index in the new hash table */
            // 计算新哈希表的哈希值，以及节点插入的索引位置
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;

            // 插入节点到新哈希表
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;

            // 更新计数器
            d->ht[0].used--;
            d->ht[1].used++;

            // 继续处理下个节点
            de = nextde;
        }
        // 将刚迁移完的哈希表索引的指针设为空
        d->ht[0].table[d->rehashidx] = NULL;
        // 更新 rehash 索引
        d->rehashidx++;
    }

    return 1;
}

更多rehash细节和策略，在《redis设计与实现》中已经说的很清楚，这里就不多说了。

其它的几个API函数也没有太多特殊的，这里我要重复强调的一点是：
redis字典采用渐进式rehash将rehash操作，平摊到增删改查的各个步骤。使时间复杂度为O(1)的做法。是我们值得学习的。
rehash的详细步骤，请参照《redis设计与实现》。

这篇笔记就写到这里，请多指教。

bobkentblog

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Redis源码分析笔记4-redis的数据类型-字典

简介本节主要介绍redis的第三种主要数据类型字典。本文介绍顺序介绍redis字典的应用；分析字典类的定义；分析字典类各个属性的含义；介绍字典类的主要API及功能；分析字典类的代码实现。介绍redis字典的应用 redis中字典的应用在《redis的设计与实现》一书中已经描述的十分清楚了。不过这里让手中没有书的朋友看一下此小节：字典的定义那么我们下面一起看一下代码src
复制链接

扫一扫

专栏目录