Dict 和 hash 是 Redis 中最基础的核心数据结构之一,在src目录下的dict.c 和dict.h 定义了Hash的接口和实现。本文将对这两个文件进行解析,借此加深了解Redis的Hash实现。如下图是Dict 实现的数据结构组织:
在分析源码之前,我们先介绍下Redis中的实现思路。Redis 使用链表来解决 hash冲突的问题。每个字典使用两个哈希表,是因为要实现渐增式 rehash ,redis 会逐个逐个地将 0 号哈希表的元素移动到 1 号哈希表,直到 0 号哈希表被清空为止,然后将1号表赋给0号表,并将清空1号表的内容。但是,Dict 提供给外部的接口不会显示出内部实现是用到了两个hash表,且向字典中添加一个元素时也只会保存到其中的一个表中。下面给出Dict的实现
一、基本数据结构
typedef struct dictEntry {
void *key;
void *val;
struct dictEntry *next;
} dictEntry;
typedef struct dictType {
unsigned int (*hashFunction)(const void *key);
void *(*keyDup)(void *privdata, const void *key);
void *(*valDup)(void *privdata, const void *obj);
int (*keyCompare)(void *privdata, const void *key1, const void *key2);
void (*keyDestructor)(void *privdata, void *key);
void (*valDestructor)(void *privdata, void *obj);
} dictType;
/* This is our hash table structure. Every dictionary has two of this as we
* implement incremental rehashing, for the old to the new table. */
typedef struct dictht {
dictEntry **table;
unsigned long size;
unsigned long sizemask;
unsigned long used;
} dictht;
typedef struct dict {
dictType *type;
void *privdata;
dictht ht[2];
int rehashidx; /* rehashing not in progress if rehashidx == -1 */
int iterators; /* number of iterators currently running */
} dict;
dictEntry: Hash 表中的项,包括key 和 value,以及下一个和它相同hash值的项;
dictType:包含项的key值的hash函数,key/value的拷贝函数和析构函数,和key值的compare函数的结构体;
dictht :字典内部的hash 表结构,size即桶的个数,sizemask等于size-1,与hash值做与操作,用以最后决定把元素放到哪个桶中;
dict:字典的数据结构,type即该字典的内部保存元素的一族函数,ht[2] 为字典内部的两个hash表,rehashidx 记录的实际上是 rehash 进行到的索引,比如如果 rehash 进行到第 10 个元素,那么 rehashidx 的值就为 9,以此类推,如果没有在进行 rehash ,rehashidx 的值就为 -1 ;iterators表示当前正在进行的迭代数量。
二、dict 的接口函数
dict *dictCreate(dictType *type, void *privDataPtr);
int dictExpand(dict *d, unsigned long size);
int dictAdd(dict *d, void *key, void *val);
int dictReplace(dict *d, void *key, void *val);
int dictDelete(dict *d, const void *key);
int dictDeleteNoFree(dict *d, const void *key);
void dictRelease(dict *d);
dictEntry * dictFind(dict *d, const void *key);
void *dictFetchValue(dict *d, const void *key);
int dictResize(dict *d);
dictIterator *dictGetIterator(dict *d);
dictIterator *dictGetSafeIterator(dict *d);
dictEntry *dictNext(dictIterator *iter);
void dictReleaseIterator(dictIterator *iter);
dictEntry *dictGetRandomKey(dict *d);
void dictPrintStats(dict *d);
unsigned int dictGenHashFunction(const unsigned char *buf, int len);
unsigned int dictGenCaseHashFunction(const unsigned char *buf, int len);
void dictEmpty(dict *d);
void dictEnableResize(void);
void dictDisableResize(void);
int dictRehash(dict *d, int n);
int dictRehashMilliseconds(dict *d, int ms);
下面分别解析这些接口函数
1. dict *dictCreate(dictType *type, void *privDataPtr);
static void _dictReset(dictht *ht)
{
ht->table = NULL;
ht->size = 0;
ht->sizemask = 0;
ht->used = 0;
}
/* Create a new hash table */
dict *dictCreate(dictType *type,
void *privDataPtr)
{
dict *d = zmalloc(sizeof(*d));
_dictInit(d,type,privDataPtr);
return d;
}
/* Initialize the hash table */
int _dictInit(dict *d, dictType *type,
void *privDataPtr)
{
_dictReset(&d->ht[0]);
_dictReset(&d->ht[1]);
d->type = type;
d->privdata = privDataPtr;
d->rehashidx = -1;
d->iterators = 0;
return DICT_OK;
}
dictCreate 函数用于创建一个一个字典类型的变量,他首先使用zmalloc分配一个dict,然后调用内部函数_dictInit初始化dict结构体内的数据,_dictInit在初始化时会首先清空ht[2]两个hash表,rehashidx赋初值为-1,表示没有rehash,iterators初值为0,最后返回DICT_OK。
那么,hash表是在什么时候分配内存和初始化的呢,答案是首次执行dictAdd函数时,当第一次执行dictAdd是,0号hash 表的链表数组会被初始化。
2. int dictExpand(dict *d, unsigned long size)
int dictExpand(dict *d, unsigned long size)
{
dictht n; /* the new hashtable */
unsigned long realsize = _dictNextPower(size);
/* the size is invalid if it is smaller than the number of
* elements already inside the hashtable */
if (dictIsRehashing(d) || d->ht[0].used > size)
return DICT_ERR;
/* Allocate the new hashtable and initialize all pointers to NULL */
n.size = realsize;
n.sizemask = realsize-1;
n.table = zcalloc(realsize*sizeof(dictEntry*));
n.used = 0;
/* Is this the first initialization? If so it's not really a rehashing
* we just set the first hash table so that it can accept keys. */
if (d->ht[0].table == NULL) {
d->ht[0] = n;
return DICT_OK;
}
/* Prepare a second hash table for incremental rehashing */
d->ht[1] = n;
d->rehashidx = 0;
return DICT_OK;
}
/* Our hash table capability is a power of two */
static unsigned long _dictNextPower(unsigned long size)
{
unsigned long i = DICT_HT_INITIAL_SIZE;
if (size >= LONG_MAX) return LONG_MAX;
while(1) {
if (i >= size)
return i;
i *= 2;
}
}
dictExpand函数用于扩展hash表,它首先会调用_dictNextPower函数重新计算所要扩展的hash表的大小(桶的数目),保证该值为2的幂次方且大于原size值;然后判断此时是否有在rehash或者得到的size大小小于ht[0].used(0号表现有元素的个数),如果是则直接返回DICT_ERROR;如果满足上述条件,则分配新的hash表和初始化dict结构体的成员。当此时ht[0].table == NULL 时,表示这是第一次初始化(用户自定义表的大小),则把刚获得的表赋给ht[0],否则赋给ht[1],并置d->rehashidx = 0,表示需要进行rehash。
3. int dictAdd(dict *d, void *key, void *val)
/* Add an element to the target hash table */
int dictAdd(dict *d, void *key, void *val)
{
int index;
dictEntry *entry;
dictht *ht;
if (dictIsRehashing(d)) _dictRehashStep(d);
/* Get the index of the new element, or -1 if
* the element already exists. */
if ((index = _dictKeyIndex(d, key)) == -1)
return DICT_ERR;
/* Allocates the memory and stores key */
ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0];
entry = zmalloc(sizeof(*entry));
entry->next = ht->table[index];
ht->table[index] = entry;
ht->used++;
/* Set the hash entry fields. */
dictSetHashKey(d, entry, key);
dictSetHashVal(d, entry, val);
return DICT_OK;
}
/* Returns the index of a free slot that can be populated with
* an hash entry for the given 'key'.
* If the key already exists, -1 is returned.
*
* Note that if we are in the process of rehashing the hash table, the
* index is always returned in the context of the second (new) hash table. */
static int _dictKeyIndex(dict *d, const void *key)
{
unsigned int h, idx, table;
dictEntry *he;
/* Expand the hashtable if needed */
if (_dictExpandIfNeeded(d) == DICT_ERR)
return -1;
/* Compute the key hash value */
h = dictHashKey(d, key);
for (table = 0; table <= 1; table++) {
idx = h & d->ht[table].sizemask;
/* Search if this slot does not already contain the given key */
he = d->ht[table].table[idx];
while(he) {
if (dictCompareHashKeys(d, key, he->key))
return -1;
he = he->next;
}
if (!dictIsRehashing(d)) break;
}
return idx;
}
我们首先分析下static int _dictKeyIndex(dict *d, const void *key)函数,该函数返回key值所在的桶的索引,但是如果key值已经在hash表中,则返回-1,我们可以看到当此时正在rehash时,那么如果不返回-1的话,则索引值必定是1号hash表的内容(会覆盖0号表的值)。我们还可以看到这里调用了_dictExpandIFNeed函数,该函数用于判断dict 是否需要扩展,我们会在后面分析,需要注意的是,在整个代码只在这里做判断是否需要扩展dict,因为只有需要添加元素时才有必要扩展dict。
dictAdd函数实现向hash 表中添加一个元素。我们忽略if (dictIsRehashing(d)) _dictRehashStep(d);这条语句,该语句是rehash的实现过程,我们会在后面分析。dictAdd调用_dictKeyIndex函数获取桶的索引值,并通过ditIsRehashing(d)宏判断是否有rehash来得到现在使用的hash 表为0号还是1号(当rehash时,1号使用,0号进行rehash操作,把其内部的项都移到1号hash表),最后就是和普通的hash表实现过程一样,给项分配内存和初始化key和value,链接到ht中,ht->used加1。
4. dictEntry *dictFind(dict *d, const void *key)
dictEntry *dictFind(dict *d, const void *key)
{
dictEntry *he;
unsigned int h, idx, table;
if (d->ht[0].size == 0) return NULL; /* We don't have a table at all */
if (dictIsRehashing(d)) _dictRehashStep(d);
h = dictHashKey(d, key);
for (table = 0; table <= 1; table++) {
idx = h & d->ht[table].sizemask;
he = d->ht[table].table[idx];
while(he) {
if (dictCompareHashKeys(d, key, he->key))
return he;
he = he->next;
}
if (!dictIsRehashing(d)) return NULL;
}
return NULL;
}
dictFind函数根据key值查找该key是否在hash表中,它首先也会进行rehashing(如果有rehash),然后先遍历0号hash表所在的那个桶,如果找到则返回,否则先判断是否有rehash(因为只有在有rehash时,1号把才有元素),有的话就接着遍历一号hash表,没有则返回NULL,即没有找到该key值。
5. void *dictFetchValue(dict *d, const void *key)
void *dictFetchValue(dict *d, const void *key) {
dictEntry *he;
he = dictFind(d,key);
return he ? dictGetEntryVal(he) : NULL;
}
dictFetchValue函数返回key值对应的value值,该函数实现很简单,先调用dictFind函数找到该元素,然后利用一个三元组操作返回值(dictGetEntryVal是个宏,即he->value)
6. int dictReplace(dict *d, void *key, void *val)
/* Add an element, discarding the old if the key already exists.
* Return 1 if the key was added from scratch, 0 if there was already an
* element with such key and dictReplace() just performed a value update
* operation. */
int dictReplace(dict *d, void *key, void *val)
{
dictEntry *entry, auxentry;
/* Try to add the element. If the key
* does not exists dictAdd will suceed. */
if (dictAdd(d, key, val) == DICT_OK)
return 1;
/* It already exists, get the entry */
entry = dictFind(d, key);
/* Free the old value and set the new one */
/* Set the new value and free the old one. Note that it is important
* to do that in this order, as the value may just be exactly the same
* as the previous one. In this context, think to reference counting,
* you want to increment (set), and then decrement (free), and not the
* reverse. */
auxentry = *entry;
dictSetHashVal(d, entry, val);
dictFreeEntryVal(d, &auxentry);
return 0;
}
dictReplace函数也是实现向表中添加元素的函数,但与dictAdd不同的是,当表中已经存在key时,他会覆盖掉原先的value值,且如果是覆盖返回0,否则返回1。其实现是首先调用dictAdd函数,如果成功返回1,表示原来没有该key,否则调用dictFind找到该key的位置,覆盖掉原先的值,并调用调用dictFreeEntryVal函数清空原值内存。
7. int dictDelete(dict *ht, const void *key) 和 int dictDeleteNoFree(dict *ht, const void *key)
/* Search and remove an element */
static int dictGenericDelete(dict *d, const void *key, int nofree)
{
unsigned int h, idx;
dictEntry *he, *prevHe;
int table;
if (d->ht[0].size == 0) return DICT_ERR; /* d->ht[0].table is NULL */
if (dictIsRehashing(d)) _dictRehashStep(d);
h = dictHashKey(d, key);
for (table = 0; table <= 1; table++) {
idx = h & d->ht[table].sizemask;
he = d->ht[table].table[idx];
prevHe = NULL;
while(he) {
if (dictCompareHashKeys(d, key, he->key)) {
/* Unlink the element from the list */
if (prevHe)
prevHe->next = he->next;
else
d->ht[table].table[idx] = he->next;
if (!nofree) {
dictFreeEntryKey(d, he);
dictFreeEntryVal(d, he);
}
zfree(he);
d->ht[table].used--;
return DICT_OK;
}
prevHe = he;
he = he->next;
}
if (!dictIsRehashing(d)) break;
}
return DICT_ERR; /* not found */
}
int dictDelete(dict *ht, const void *key) {
return dictGenericDelete(ht,key,0);
}
int dictDeleteNoFree(dict *ht, const void *key) {
return dictGenericDelete(ht,key,1);
}
dict的删除操作分两种,分别表示安全删除和不安全删除,他们的区别是是否需要调用key和value的析构函数,删除键和值。其他的操作和先前查找操作以及一般的hash删除操作基本相同。
8. dictEntry *dictGetRandomKey(dict *d)
/* Return a random entry from the hash table. Useful to
* implement randomized algorithms */
dictEntry *dictGetRandomKey(dict *d)
{
dictEntry *he, *orighe;
unsigned int h;
int listlen, listele;
if (dictSize(d) == 0) return NULL;
if (dictIsRehashing(d)) _dictRehashStep(d);
if (dictIsRehashing(d)) {
do {
h = random() % (d->ht[0].size+d->ht[1].size);
he = (h >= d->ht[0].size) ? d->ht[1].table[h - d->ht[0].size] :
d->ht[0].table[h];
} while(he == NULL);
} else {
do {
h = random() & d->ht[0].sizemask;
he = d->ht[0].table[h];
} while(he == NULL);
}
/* Now we found a non empty bucket, but it is a linked
* list and we need to get a random element from the list.
* The only sane way to do so is counting the elements and
* select a random index. */
listlen = 0;
orighe = he;
while(he) {
he = he->next;
listlen++;
}
listele = random() % listlen;
he = orighe;
while(listele--) he = he->next;
return he;
}
dictGetRandomKey函数从hash表中随机获得一个元素,该函数的实现过程是首先使用random函数随机到两个表中的其中一个桶中(保证该桶有元素),然后获取该桶的大小listlen,再一次调用random函数,并与listlen求模,获得桶中的具体位置,
9. static int _dictExpandIfNeeded(dict *d)
/* Expand the hash table if needed */
static int _dictExpandIfNeeded(dict *d)
{
/* Incremental rehashing already in progress. Return. */
if (dictIsRehashing(d)) return DICT_OK;
/* If the hash table is empty expand it to the intial size. */
if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);
/* If we reached the 1:1 ratio, and we are allowed to resize the hash
* table (global setting) or we should avoid it but the ratio between
* elements/buckets is over the "safe" threshold, we resize doubling
* the number of buckets. */
if (d->ht[0].used >= d->ht[0].size &&
(dict_can_resize ||
d->ht[0].used/d->ht[0].size > dict_force_resize_ratio))
{
return dictExpand(d, ((d->ht[0].size > d->ht[0].used) ?
d->ht[0].size : d->ht[0].used)*2);
}
return DICT_OK;
}
_dictExpandIfNeeded函数是dict.c文件内部函数,该函数只会在dictAdd调用_dictKeyIndex时被调用,判断是否需要进行扩充hash表。其实现过程如下:首先判断是否有rehash,如果有则返回DICT_OK,因为如果已经rehash了,那就表示不需要扩充,且dictExpand中也会判断是否有rehash的,但它如果有rehash时是会返回DICT_ERROR的,因此需先判断,当d->ht[0].size =0时说明此时还没有初始化dict过,需要先给其分配内存,初始化分配内存的默认值DICT_HT_INITIAL_SIZE一般为4。否则的话就是调用注释下的代码,当代码注释中所说的两种情况的其中一种被满足的时候, dictExpand 函数就会被调用, 0 号哈希表的桶数量和节点数量两个数值之间的较大者乘以 2 ,就会被作为第二个参数传入 dictExpand 函数。
10. int dictRehash(dict *d, int n)
int dictRehash(dict *d, int n) {
if (!dictIsRehashing(d)) return 0;
while(n--) {
dictEntry *de, *nextde;
/* Check if we already rehashed the whole table... */
if (d->ht[0].used == 0) {
zfree(d->ht[0].table);
d->ht[0] = d->ht[1];
_dictReset(&d->ht[1]);
d->rehashidx = -1;
return 0;
}
/* Note that rehashidx can't overflow as we are sure there are more
* elements because ht[0].used != 0 */
while(d->ht[0].table[d->rehashidx] == NULL) d->rehashidx++;
de = d->ht[0].table[d->rehashidx];
/* Move all the keys in this bucket from the old to the new hash HT */
while(de) {
unsigned int h;
nextde = de->next;
/* Get the index in the new hash table */
h = dictHashKey(d, de->key) & d->ht[1].sizemask;
de->next = d->ht[1].table[h];
d->ht[1].table[h] = de;
d->ht[0].used--;
d->ht[1].used++;
de = nextde;
}
d->ht[0].table[d->rehashidx] = NULL; d->rehashidx++;
}
return 1;
}
long long timeInMilliseconds(void) {
struct timeval tv;
gettimeofday(&tv,NULL);
return (((long long)tv.tv_sec)*1000)+(tv.tv_usec/1000);
}
/* Rehash for an amount of time between ms milliseconds and ms+1 milliseconds */
int dictRehashMilliseconds(dict *d, int ms) {
long long start = timeInMilliseconds();
int rehashes = 0;
while(dictRehash(d,100)) {
rehashes += 100;
if (timeInMilliseconds()-start > ms) break;
}
return rehashes;
}
/* This function performs just a step of rehashing, and only if there are
* no safe iterators bound to our hash table. When we have iterators in the
* middle of a rehashing we can't mess with the two hash tables otherwise
* some element can be missed or duplicated.
*
* This function is called by common lookup or update operations in the
* dictionary so that the hash table automatically migrates from H1 to H2
* while it is actively used. */
static void _dictRehashStep(dict *d) {
if (d->iterators == 0) dictRehash(d,1);
}
我们首先来分析上面几个函数,然后再来介绍Redis具体实现rehash的操作流程。dictRehash函数实现一个while循环,表示把n个桶中的元素从ht[0]迁移到ht[1]中,在while循环内部,首先判断0号hash表是否还有元素,没了则释放该表,把1号表赋给0号表(结构体赋值,且内部table都指向新表),清零1号表(清零的只是1号表结构体的内容,0号表不会受到影响)。否则,遍历0号表的一个桶,移动该桶内的元素到一号表的hash值位置。
dictRehashMilliseconds函数用于Redis主程序调用,它会每隔一定时间调用dictRehash函数,迁移100个桶。_dictRehashStep则只要没有iterators(表示在 rehash 时不能有迭代器,因为迭代器可能会修改元素,所以不能在有迭代器的情况下进行 rehash )就rehash一个桶。
下面我们解释下Redis中的具体流程。从上面的代码中我们看到,当 expand 执行完毕之后,字典同时使用两个哈希表,并且字典的 rehashidx 属性从 -1 被改为 0 ,这是一个重要的改动,它标志着 rehash 可以进行了。Redis使用了一种渐增式和平摊的rehash操作方式,它将 rehash 操作平摊到 dictAdd 、dictGetRandomKey 、dictFind 、dictGenericDelete 这些函数里面,每当上面这些函数被执行的时候(或者其他人调用它们时), _dictRehashStep 函数就会执行,将 1 个桶中的元素从 0 号哈希表 rehash 到 1 号哈希表,这样就避免了集中式的 rehash 。
11. dictEntry *dictNext(dictIterator *iter)]
dictIterator *dictGetIterator(dict *d)
{
dictIterator *iter = zmalloc(sizeof(*iter));
iter->d = d;
iter->table = 0;
iter->index = -1;
iter->safe = 0;
iter->entry = NULL;
iter->nextEntry = NULL;
return iter;
}
dictIterator *dictGetSafeIterator(dict *d) {
dictIterator *i = dictGetIterator(d);
i->safe = 1;
return i;
}
dictEntry *dictNext(dictIterator *iter)
{
while (1) {
if (iter->entry == NULL) {
dictht *ht = &iter->d->ht[iter->table];
if (iter->safe && iter->index == -1 && iter->table == 0)
iter->d->iterators++;
iter->index++;
if (iter->index >= (signed) ht->size) {
if (dictIsRehashing(iter->d) && iter->table == 0) {
iter->table++;
iter->index = 0;
ht = &iter->d->ht[1];
} else {
break;
}
}
iter->entry = ht->table[iter->index];
} else {
iter->entry = iter->nextEntry;
}
if (iter->entry) {
/* We need to save the 'next' here, the iterator user
* may delete the entry we are returning. */
iter->nextEntry = iter->entry->next;
return iter->entry;
}
}
return NULL;
}
dictNext函数实现 dict 的迭代器作用,返回dict hash表中的下一个元素。首先初始化迭代器结构体的内容,然后调用dictNext,dictNext函数会依次访问两个hash表(如果有第二个的话),函数内部有五个if语句,第一个if语句判断iter->entry是否为NULL,表示第一次调用dictNext或者iter指向的桶元素为空,第二个if语句判断该迭代器是否safe,且是否第一次调用(index=-1&table=0),是的话就把该迭代器所指向的dict的迭代器数加一;第三个if语句用于判断是否已经超出了0号桶的大小了,如果是再判断一号桶是否在,存在就去访问1号桶,否则推出while循环;最后一二if语句表示当iter->entry有值时,就保存它的下一个元素,然后返回该值。