Redis关键点（rehash）<转>-CSDN博客

hash table是一种高效的数据结构，被广泛的用在key-value存储中，Redis的dict其实就是一个典型的hash table实现。

rehash是在hash table的大小不能满足需求，造成过多hash碰撞后需要进行的扩容hash table的操作，其实通常的做法确实是建立一个额外的hash table，将原来的hash table中的数据在新的数据中进行重新输入，从而生成新的hash表。

redis的 rehash包括了lazy rehashing和active rehashing两种方式

lazy rehashing：在每次对dict进行操作的时候执行一个slot的rehash
active rehashing：每100ms里面使用1ms时间进行rehash。

dict实现中主要用到如下结构体,其实就是个典型的链式hash。

一个dict会有2个hash table，由dictht结构管理，编号为0和1.

使用是优先使用0号hash table，当空间不足时会调用dictExpand来扩展hash table，此时准备1号hash table用于增量的rehash使用。rehash完成后把0号释放，1号保存到0号。

rehashidx是下一个需要rehash的项在ht[0]中的索引，不需要rehash时置为-1。也就是说-1时，表示不进行rehash。

iterators记录当前dict中的迭代器数，主要是为了避免在有迭代器时rehash，在有迭代器时rehash可能会造成值的丢失或重复，

dictht中的table是一个数组+指针形式的hash表，size表hash数组(桶)的大小，used表示hash表的元素个数，这两个值与rehash、resize过程密切相关。sizemask等于size-1，这是为了方便将hash值映射到数组中。

[html]view plaincopy 
   
 typedef struct dictEntry {  
 void *key;  
 void *val;  
 struct dictEntry *next;  
 } dictEntry;  
 typedef struct dictht {  
 dictEntry **table;  
 unsigned long size;//hash桶的个数  
 unsigned long sizemask;//hash取模的用到  
 unsigned long used;//元素个数  
 } dictht;  
 typedef struct dict {  
 dictType *type;  
 void *privdata;  
 dictht ht[2];  
 int rehashidx; /* rehashing not in progress if rehashidx == -1 */  
 int iterators; /* number of iterators currently running */  
 } dict;  
 typedef struct dictIterator {  
 dict *d;  
 int table;  
 int index;  
 dictEntry *entry, *nextEntry;  
 } dictIterator;  

什么时候dict做扩容

在数据插入的时候会调用dictKeyIndex,该方法里会调用_dictExpandIfNeeded，判断dict是否需要rehash，当dict中元素大于桶的个数时，调用dictExpand扩展hash

[html]view plaincopy 
   
 /* Expand the hash table if needed */  
    
 static int _dictExpandIfNeeded(dict *d)  
    
 {  
    
 /* If the hash table is empty expand it to the intial size,  
    
 * if the table is “full” dobule its size. */  
    
 if (dictIsRehashing(d)) return DICT_OK;  
    
 if (d->ht[0].size == 0)  
    
 return dictExpand(d, DICT_HT_INITIAL_SIZE);  
    
 if (d->ht[0].used >= d->ht[0].size && dict_can_resize)  
    
 return dictExpand(d, ((d->ht[0].size > d->ht[0].used) ?  
    
 d->ht[0].size : d->ht[0].used)*2);  
    
 return DICT_OK;  
    
 }

dictExpand的工作主要是初始化hash表，默认是扩大两倍(并不单纯是桶的两倍)，然后赋值给ht[1]，然后状态改为rehashing,此时该dict开始rehashing

扩容过程如何进行

rehash主要在dictRehash中完成。先看下什么时候进行rehash。

active rehashing ：serverCron中，当没有后台子线程时，会调用incrementallyRehash，最终调用dictRehashMilliseconds。incrementallyRehash的时间较长，rehash的个数也比较多。这里每次执行 1 millisecond rehash 操作；如果未完成 rehash，会在下一个 loop 里面继续执行。

[html]view plaincopy 
   
 /* Rehash for an amount of time between ms milliseconds and ms+1 milliseconds */  
    
 int dictRehashMilliseconds(dict *d, int ms) {  
    
 long long start = timeInMilliseconds();  
    
 int rehashes = 0;  
    
 while(dictRehash(d,100)) {  
    
 rehashes += 100;  
    
 if (timeInMilliseconds()-start > ms) break;  
    
 }  
    
 return rehashes;  
    
 }

lazy rehashing：_dictRehashStep中，也会调用dictRehash，而_dictRehashStep每次仅会rehash一个值从ht[0]到 ht[1]，但由于_dictRehashStep是被dictGetRandomKey、dictFind、 dictGenericDelete、dictAdd调用的，因此在每次dict增删查改时都会被调用，这无疑就加快rehash了过程。

我们再来看看做rehash的方法。dictRehash每次增量rehash n个元素，由于在自动调整大小时已设置好了ht[1]的大小，因此rehash的主要过程就是遍历ht[0]，取得key，然后将该key按ht[1]的桶的大小重新rehash，并在rehash完后将ht[0]指向ht[1],然后将ht[1]清空。在这个过程中rehashidx非常重要，它表示上次rehash时在ht[0]的下标位置。

可以看到，redis对dict的rehash是分批进行的，这样不会阻塞请求，设计的比较优雅。

但是在调用dictFind的时候，可能需要对两张dict表做查询。唯一的优化判断是，当key在ht[0]不存在且不在rehashing状态时，可以速度返回空。如果在rehashing状态，当在ht[0]没值的时候，还需要在ht[1]里查找。

dictAdd的时候，如果状态是rehashing，则把值插入到ht[1]，否则ht[0]