文章目录
为什么要有过期机制和淘汰机制
因为我们的redis是一个内存型数据库,所有的数据都是放在内存里面的。但是内存是有大小的。如果数据过多,内存就会满,不能再放数据了。此时,redis不可用了。为了保证可用性,Redis会做过期清除和淘汰。
一般来说,如果不设置maxmemory或者设置为0,64位系统不限制redis内存的大小,而32位系统最多使用3GB内存。
过期策略
概念:redis中的数据是可以设置过期时间的,如果时间到,这些数据需要从redis中删除。
打个比方:家里的冰箱满了,我们需要去找到过期的菜并扔掉。判断过期的方法一般有两种:
- 要用菜的时候,查看一下有没有过期,过期了就扔掉。
- 定期清理一下冰箱,扔掉过期的菜。
和冰箱处理过期食物的方法类似,redis的过期策略也有两种:惰性过期、定期过期。
惰性过期(被动过期)
惰性过期(被动过期):每次在访问操作Key的时候,判断这个Key是不是过期了,如果过期了就删除。
定期过期
定期过期:redis有个定时任务,定期检查是否有过期数据,并删除过期数据。
多久执行一次?
在redis的的server.c中有serverCron方法,用于执行redis的各种定时任务。执行频率由redis.conf中的hz配置的值确定,默认是10,表示100ms执行一次,1s执行10次。
实现流程
- 定时serverCron方法去执行清理,执行频率根据redis.conf中的hz配置的值。
- 执行清理的时候,不是去扫描所有的key,而是去扫描所有设置了过期时间的key(redisDb.expires)。
- 扫描时,不会一次性拿取所有的key。而是按hash桶的维度去扫描,扫到20(值可配)个key为止,如果扫到20个key不满一个hash桶,则把当前hash桶扫完。比如扫了第一个桶10个key,没到20,继续扫第二个桶,第二个桶有30个key,那么它会把第二个桶全部扫描完,总共扫描40个key。除了有20个key的限制,还有一次最多扫描400个桶。
- 删除扫描的数据中过期的数据。
- 验证执行结果。如果:(1)扫描了400个hash桶都是空的;或者(2)删除的数据和扫描的数据的比例超过10%(值可配),则循环继续执行3、4步。
- 但是这个循环次数也不是无限的,循环16次后回去检测时间,超过指定时间会跳出。这里的循环是为了做时间和空间的平衡。
实现流程图
源码分析
入口(server.c)
int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
//......
//入口
databasesCron();
//......
//server.hz是执行频率,是可配的,默认是10,表示100ms执行一次,1s执行10次
return 1000/server.hz;
}
执行databasesCron函数(server.c)
void databasesCron(void) {
/* Expire keys by random sampling. Not required for slaves
* as master will synthesize DELs for us. */
if (server.active_expire_enabled) {
if (iAmMaster()) {
//执行过期循环
activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW);
} else {
expireSlaveKeys();
}
}
//......
}
执行activeExpireCycle函数(expire.c)
void activeExpireCycle(int type) {
/* Adjust the running parameters according to the configured expire
* effort. The default effort is 1, and the maximum configurable effort
* is 10. */
unsigned long
effort = server.active_expire_effort-1, /* Rescale from 0 to 9. */
config_keys_per_loop = ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP +
ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP/4*effort,
config_cycle_fast_duration = ACTIVE_EXPIRE_CYCLE_FAST_DURATION +
ACTIVE_EXPIRE_CYCLE_FAST_DURATION/4*effort,
config_cycle_slow_time_perc = ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC +
2*effort,
config_cycle_acceptable_stale = ACTIVE_EXPIRE_CYCLE_ACCEPTABLE_STALE-
effort;
/* This function has some global state in order to continue the work
* incrementally across calls. */
static unsigned int current_db = 0; /* Last DB tested. */
static int timelimit_exit = 0; /* Time limit hit in previous call? */
static long long last_fast_cycle = 0; /* When last fast cycle ran. */
int j, iteration = 0;
int dbs_per_call = CRON_DBS_PER_CALL;
long long start = ustime(), timelimit, elapsed;
/* When clients are paused the dataset should be static not just from the
* POV of clients not being able to write, but also from the POV of
* expires and evictions of keys not being performed. */
if (clientsArePaused()) return;
if (type == ACTIVE_EXPIRE_CYCLE_FAST) {
/* Don't start a fast cycle if the previous cycle did not exit
* for time limit, unless the percentage of estimated stale keys is
* too high. Also never repeat a fast cycle for the same period
* as the fast cycle total duration itself. */
if (!timelimit_exit &&
server.stat_expired_stale_perc < config_cycle_acceptable_stale)
return;
if (start < last_fast_cycle + (long long)config_cycle_fast_duration*2)
return;
last_fast_cycle = start;
}
/* We usually should test CRON_DBS_PER_CALL per iteration, with
* two exceptions:
*
* 1) Don't test more DBs than we have.
* 2) If last time we hit the time limit, we want to scan all DBs
* in this iteration, as there is work to do in some DB and we don't want
* expired keys to use memory for too much time. */
if (dbs_per_call > server.dbnum || timelimit_exit)
dbs_per_call = server.dbnum;
/* We can use at max 'config_cycle_slow_time_perc' percentage of CPU
* time per iteration. Since this function gets called with a frequency of
* server.hz times per second, the following is the max amount of
* microseconds we can spend in this function. */
timelimit = config_cycle_slow_time_perc*1000000/server.hz/100;
timelimit_exit = 0;
if (timelimit <= 0) timelimit = 1;
if (type == ACTIVE_EXPIRE_CYCLE_FAST)
timelimit = config_cycle_fast_duration; /* in microseconds. */
/* Accumulate some global stats as we expire keys, to have some idea
* about the number of keys that are already logically expired, but still
* existing inside the database. */
long total_sampled = 0;
long total_expired = 0;
//for循环,默认次数16
for (j = 0; j < dbs_per_call && timelimit_exit == 0; j++) {
/* Expired and checked in a single loop. */
unsigned long expired, sampled;
redisDb *db = server.db+(current_db % server.dbnum);
/* Increment the DB now so we are sure if we run out of time
* in the current DB we'll restart from the next. This allows to
* distribute the time evenly across DBs. */
current_db++;
/* Continue to expire if at the end of the cycle there are still
* a big percentage of keys to expire, compared to the number of keys
* we scanned. The percentage, stored in config_cycle_acceptable_stale
* is not fixed, but depends on the Redis configured "expire effort". */
do {
unsigned long num, slots;
long long now, ttl_sum;
int ttl_samples;
iteration++;
/* If there is nothing to expire try next DB ASAP. */
if ((num = dictSize(db->expires)) == 0) {
db->avg_ttl = 0;
break;
}
slots = dictSlots(db->expires);
now = mstime();
/* When there are less than 1% filled slots, sampling the key
* space is expensive, so stop here waiting for better times...
* The dictionary will be resized asap. */
if (num && slots > DICT_HT_INITIAL_SIZE &&
(num*100/slots < 1)) break;
/* The main collection cycle. Sample random keys among keys
* with an expire set, checking for expired ones. */
expired = 0;
sampled = 0;
ttl_sum = 0;
ttl_samples = 0;
//config_keys_per_loop默认是20,表示一次最多扫20个
if (num > config_keys_per_loop)
num = config_keys_per_loop;
/* Here we access the low level representation of the hash table
* for speed concerns: this makes this code coupled with dict.c,
* but it hardly changed in ten years.
*
* Note that certain places of the hash table may be empty,
* so we want also a stop condition about the number of
* buckets that we scanned. However scanning for free buckets
* is very fast: we are in the cache line scanning a sequential
* array of NULL pointers, so we can scan a lot more buckets
* than keys in the same time. */
long max_buckets = num*20;
long checked_buckets = 0;
//如果扫描的元素个数<20并且最多检查的桶数<400
while (sampled < num && checked_buckets < max_buckets) {
//考虑到扩容的情况,这里循环两次,ht[0]和ht[1]
//从db.expires中获取带有过期时间的key
for (int table = 0; table < 2; table++) {
if (table == 1 && !dictIsRehashing(db->expires)) break;
unsigned long idx = db->expires_cursor;
idx &= db->expires->ht[table].sizemask;
dictEntry *de = db->expires->ht[table].table[idx];
long long ttl;
/* Scan the current bucket of the current table. */
checked_buckets++;
//循环检查桶内的链表
while(de) {
/* Get the next entry now since this entry may get
* deleted. */
dictEntry *e = de;
de = de->next;
ttl = dictGetSignedIntegerVal(e)-now;
//删除的方法
if (activeExpireCycleTryExpire(db,e,now)) expired++;
if (ttl > 0) {
/* We want the average TTL of keys yet
* not expired. */
ttl_sum += ttl;
ttl_samples++;
}
sampled++;
}
}
//游标,用于表示扫描到hash桶的下标,下次循环从expires_cursor开始继续往下扫描。
db->expires_cursor++;
}
total_expired += expired;
total_sampled += sampled;
/* Update the average TTL stats for this database. */
if (ttl_samples) {
long long avg_ttl = ttl_sum/ttl_samples;
/* Do a simple running average with a few samples.
* We just use the current estimate with a weight of 2%
* and the previous estimate with a weight of 98%. */
if (db->avg_ttl == 0) db->avg_ttl = avg_ttl;
db->avg_ttl = (db->avg_ttl/50)*49 + (avg_ttl/50);
}
/* We can't block forever here even if there are many keys to
* expire. So after a given amount of milliseconds return to the
* caller waiting for the other active expire cycle. */
//每16次循环后,检查清理花费的时间是否超时,如果超时,则退出循环。
if ((iteration & 0xf) == 0) { /* check once every 16 iterations. */
elapsed = ustime()-start;
if (elapsed > timelimit) {
timelimit_exit = 1;
server.stat_expired_time_cap_reached_count++;
break;
}
}
/* We don't repeat the cycle for the current database if there are
* an acceptable amount of stale keys (logically expired but yet
* not reclaimed). */
// 判断是否没扫描到数据 ||(过期的数据数量 * 100 / 扫描的数据的数量)是否 > 10%
} while (sampled == 0 ||
(expired*100/sampled) > config_cycle_acceptable_stale);
}
elapsed = ustime()-start;
server.stat_expire_cycle_time_used += elapsed;
latencyAddSampleIfNeeded("expire-cycle",elapsed/1000);
/* Update our estimate of keys existing but yet to be expired.
* Running average with this sample accounting for 5%. */
double current_perc;
if (total_sampled) {
current_perc = (double)total_expired/total_sampled;
} else
current_perc = 0;
server.stat_expired_stale_perc = (current_perc*0.05)+
(server.stat_expired_stale_perc*0.95);
}
淘汰策略
概述
-
假如我里面没有设置过期时间的key,或者设置了过期时间没到期。此时,内存里面的数据都是有效的,但是内存也可能会满。在这种满了的情况下,如果要写入新数据,就要不得已地要选择删除一些数据。这就是淘汰策略。
-
淘汰策略不会删除太多数据,只会删到你的新数据能够放下为止。
-
还是以冰箱类比,如果我们发现冰箱的菜满了,但是冰箱里的菜都是好的,现在又新买了菜要放入冰箱,这就需要不得已清理一下菜。一般会有如下方法:
- 买个新冰箱。---------------加机器,加内存
- 不放入新的菜,等多的菜吃掉。---------------noeviction
- 扔掉长时间不吃的 ---------------LRU
- 扔掉很少吃的 ----------LFU
- 扔掉即将快过期的菜 -----------ttl
- 随机扔掉一些菜 -----------Random
-
redis的淘汰策略有8种,需要我们在config中配置maxmemory-policy即可指定相关的淘汰策略,如下。
# maxmemory-policy noeviction //默认不淘汰数据,能读不能写
- 8种淘汰策略如下
- noeviction:默认,不淘汰,此时redis能读不能写。
- allkeys-lru:伪LRU算法,从所有的key中去淘汰。
- allkeys-lfu:伪LFU算法,从所有的key中去淘汰。
- allkeys-random:随机算法,从所有的key中去淘汰。
- volatile-lru:伪LRU算法,从设置过期时间的key中去淘汰。
- volatile-lfu:伪LFU算法,从设置过期时间的key中去淘汰。
- volatile-random:随机算法,从设置过期时间的key中去淘汰。
- volatile-ttl:根据过期时间来,淘汰即将过期。
淘汰流程
流程图
流程说明
- 用户在做指令操作的时候,redis自旋判断内存是否满足指令所需要的内存。如果内存足够,则执行指令
- 如果内存不够,就判断当前淘汰策略是否是noeviction
- 如果是,则报OOM错给用户
- 如果不是,根据配置 maxmemory-samples的数量,从Redis随机抽样获取数据。并循环抽样数据
- 根据不同的淘汰策略计算抽样数据的淘汰值(idle)。
- 判断当前抽样数据是否可以加入淘汰池,这里淘汰池是一个数组,存储候选删除的key。
- 如果淘汰池未满,取样数据直接加入淘汰池
- 如果淘汰池满了,但是池里存在数据的淘汰值比取样数据低,那么删除池中淘汰值最低的数据,取样数据加入淘汰池
- 循环完抽样数据后,在淘汰池中实行末尾淘汰制度,删除最右边的一个数据。
- 淘汰一次数据后,自旋判断一下内存是否足够,如果内存足够,则结束淘汰;如果内存不足,则执行步骤3、4。
源码分析
freeMemoryIfNeeded方法(evict.c文件)
int freeMemoryIfNeeded(void) {
int keys_freed = 0;
/* By default replicas should ignore maxmemory
* and just be masters exact copies. */
//从库忽略内存淘汰限制
if (server.masterhost && server.repl_slave_ignore_maxmemory) return C_OK;
size_t mem_reported, mem_tofree, mem_freed;
mstime_t latency, eviction_latency, lazyfree_latency;
long long delta;
int slaves = listLength(server.slaves);
int result = C_ERR;
/* When clients are paused the dataset should be static not just from the
* POV of clients not being able to write, but also from the POV of
* expires and evictions of keys not being performed. */
if (clientsArePaused()) return C_OK;
//判断内存是否满,如果没有超过内存,直接返回
if (getMaxmemoryState(&mem_reported,NULL,&mem_tofree,NULL) == C_OK)
return C_OK;
mem_freed = 0;
latencyStartMonitor(latency);
//如果策略为不淘汰数据,直接报错OOM
if (server.maxmemory_policy == MAXMEMORY_NO_EVICTION)
goto cant_free; /* We need to free memory, but policy forbids. */
//到这一步肯定是内存不够 如果释放的内存不够 一直自旋释放内存
while (mem_freed < mem_tofree) {
int j, k, i;
static unsigned int next_db = 0;
sds bestkey = NULL;
int bestdbid;
redisDb *db;
dict *dict;
dictEntry *de;
//如果淘汰算法是LRU | LFU | TTL
if (server.maxmemory_policy & (MAXMEMORY_FLAG_LRU|MAXMEMORY_FLAG_LFU) ||
server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL)
{
//淘汰池 默认大小16
struct evictionPoolEntry *pool = EvictionPoolLRU;
//自旋找到合适的要淘汰的key为止
while(bestkey == NULL) {
unsigned long total_keys = 0, keys;
/* We don't want to make local-db choices when expiring keys,
* so to start populate the eviction pool sampling keys from
* every DB. */
//去不同的DB查找
for (i = 0; i < server.dbnum; i++) {
db = server.db+i;
//判断需要淘汰的范围 是所有数据还是过期的数据
dict = (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) ?
db->dict : db->expires;
if ((keys = dictSize(dict)) != 0) {
//关键方法:去从范围中取样拿到最适合淘汰的数据
evictionPoolPopulate(i, dict, db->dict, pool);
total_keys += keys;
}
}
//没有key过期
if (!total_keys) break; /* No keys to evict. */
//循环淘汰池
/* Go backward from best to worst element to evict. */
for (k = EVPOOL_SIZE-1; k >= 0; k--) {
if (pool[k].key == NULL) continue;
bestdbid = pool[k].dbid;
if (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) {
de = dictFind(server.db[pool[k].dbid].dict,
pool[k].key);
} else {
de = dictFind(server.db[pool[k].dbid].expires,
pool[k].key);
}
/* Remove the entry from the pool. */
if (pool[k].key != pool[k].cached)
sdsfree(pool[k].key);
pool[k].key = NULL;
pool[k].idle = 0;
/* If the key exists, is our pick. Otherwise it is
* a ghost and we need to try the next element. */
if (de) {
bestkey = dictGetKey(de);
break;
} else {
/* Ghost... Iterate again. */
}
}
}
}
/* volatile-random and allkeys-random policy */
else if (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM ||
server.maxmemory_policy == MAXMEMORY_VOLATILE_RANDOM)
{
/* When evicting a random key, we try to evict a key for
* each DB, so we use the static 'next_db' variable to
* incrementally visit all DBs. */
for (i = 0; i < server.dbnum; i++) {
j = (++next_db) % server.dbnum;
db = server.db+j;
dict = (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM) ?
db->dict : db->expires;
if (dictSize(dict) != 0) {
de = dictGetRandomKey(dict);
bestkey = dictGetKey(de);
bestdbid = j;
break;
}
}
}
/* Finally remove the selected key. */
// 移除这个key
if (bestkey) {
db = server.db+bestdbid;
robj *keyobj = createStringObject(bestkey,sdslen(bestkey));
propagateExpire(db,keyobj,server.lazyfree_lazy_eviction);
/* We compute the amount of memory freed by db*Delete() alone.
* It is possible that actually the memory needed to propagate
* the DEL in AOF and replication link is greater than the one
* we are freeing removing the key, but we can't account for
* that otherwise we would never exit the loop.
*
* Same for CSC invalidation messages generated by signalModifiedKey.
*
* AOF and Output buffer memory will be freed eventually so
* we only care about memory used by the key space. */
delta = (long long) zmalloc_used_memory();
latencyStartMonitor(eviction_latency);
//如果是异步淘汰 会进行异步淘汰
if (server.lazyfree_lazy_eviction)
dbAsyncDelete(db,keyobj);
else
//同步淘汰
dbSyncDelete(db,keyobj);
latencyEndMonitor(eviction_latency);
latencyAddSampleIfNeeded("eviction-del",eviction_latency);
delta -= (long long) zmalloc_used_memory();
mem_freed += delta;
server.stat_evictedkeys++;
signalModifiedKey(NULL,db,keyobj);
notifyKeyspaceEvent(NOTIFY_EVICTED, "evicted",
keyobj, db->id);
decrRefCount(keyobj);
keys_freed++;
/* When the memory to free starts to be big enough, we may
* start spending so much time here that is impossible to
* deliver data to the slaves fast enough, so we force the
* transmission here inside the loop. */
if (slaves) flushSlavesOutputBuffers();
/* Normally our stop condition is the ability to release
* a fixed, pre-computed amount of memory. However when we
* are deleting objects in another thread, it's better to
* check, from time to time, if we already reached our target
* memory, since the "mem_freed" amount is computed only
* across the dbAsyncDelete() call, while the thread can
* release the memory all the time. */
if (server.lazyfree_lazy_eviction && !(keys_freed % 16)) {
if (getMaxmemoryState(NULL,NULL,NULL,NULL) == C_OK) {
/* Let's satisfy our stop condition. */
mem_freed = mem_tofree;
}
}
} else {
goto cant_free; /* nothing to free... */
}
}
result = C_OK;
cant_free:
/* We are here if we are not able to reclaim memory. There is only one
* last thing we can try: check if the lazyfree thread has jobs in queue
* and wait... */
if (result != C_OK) {
latencyStartMonitor(lazyfree_latency);
while(bioPendingJobsOfType(BIO_LAZY_FREE)) {
if (getMaxmemoryState(NULL,NULL,NULL,NULL) == C_OK) {
result = C_OK;
break;
}
usleep(1000);
}
latencyEndMonitor(lazyfree_latency);
latencyAddSampleIfNeeded("eviction-lazyfree",lazyfree_latency);
}
latencyEndMonitor(latency);
latencyAddSampleIfNeeded("eviction-cycle",latency);
return result;
}
evictionPoolPopulate方法(evict.c文件)
/* evictionPoolPopulate是freeMemoryIfNeeded()的辅助函数,它用于在我们需要淘汰一个key的时候,把一些元素迁移进淘汰池。
* 如果抽样元素的idle大于淘汰池中的某个元素的idle,则抽样元素会加入淘汰池。
* 如里淘汰池有空项,则抽样数据一定加入淘汰池。
* 淘汰池里,元素按idle从小到大排序的。左边小,右边大。
*/
void evictionPoolPopulate(int dbid, dict *sampledict, dict *keydict, struct evictionPoolEntry *pool) {
int j, k, count;
//需要取样的数据
dictEntry *samples[server.maxmemory_samples];
//随机从需要取样的范围中得到取样的数据
count = dictGetSomeKeys(sampledict,samples,server.maxmemory_samples);
//循环取样数据
for (j = 0; j < count; j++) {
unsigned long long idle;
sds key;
robj *o;
dictEntry *de;
de = samples[j];
key = dictGetKey(de);
/* If the dictionary we are sampling from is not the main
* dictionary (but the expires one) we need to lookup the key
* again in the key dictionary to obtain the value object. */
//如果是ttl,只能从带有过期时间的数据中获取,所以不需要获取对象,其他的淘汰策略都需要去我们的键值对中获取值对象
if (server.maxmemory_policy != MAXMEMORY_VOLATILE_TTL) {
if (sampledict != keydict) de = dictFind(keydict, key);
o = dictGetVal(de);
}
/* Calculate the idle time according to the policy. This is called
* idle just because the code initially handled LRU, but is in fact
* just a score where an higher score means better candidate. */
//计算淘汰值,值越大,越容易被淘汰
//如果是LRU算法,采用LRU算法得到最长时间没访问的
if (server.maxmemory_policy & MAXMEMORY_FLAG_LRU) {
//估算淘汰值(淘汰权重)
idle = estimateObjectIdleTime(o);
}
//如果是LFU算法,根据LRU算法得到最少访问的,但是用255-值 所以idle越大,越容易淘汰
else if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
/* When we use an LRU policy, we sort the keys by idle time
* so that we expire keys starting from greater idle time.
* However when the policy is an LFU one, we have a frequency
* estimation, and we want to evict keys with lower frequency
* first. So inside the pool we put objects using the inverted
* frequency subtracting the actual frequency to the maximum
* frequency of 255. */
idle = 255-LFUDecrAndReturn(o);
}
//ttl 直接根据时间来
else if (server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL) {
/* In this case the sooner the expire the better. */
idle = ULLONG_MAX - (long)dictGetVal(de);
} else {
serverPanic("Unknown eviction policy in evictionPoolPopulate()");
}
/* Insert the element inside the pool.
* First, find the first empty bucket or the first populated
* bucket that has an idle time smaller than our idle time. */
//将取样的数据,计算好淘汰的idle后,放入淘汰池中
k = 0;
//自旋,找到淘汰池中比当前key的idle小的最后一个下标
while (k < EVPOOL_SIZE &&
pool[k].key &&
pool[k].idle < idle) k++;
//k=0说明上面循环没进,也就是淘汰池中的所有数据都比当前数据的idle大,并且淘汰池的最后一个不为空,说明淘汰池也是满的。优先淘汰淘汰池的
if (k == 0 && pool[EVPOOL_SIZE-1].key != NULL) {
/* Can't insert if the element is < the worst element we have
* and there are no empty buckets. */
continue;
}
//说明当前数据的idle比淘汰池中所有数据的都大,所以插入到桶后面不管
else if (k < EVPOOL_SIZE && pool[k].key == NULL) {
/* Inserting into empty position. No setup needed before insert. */
}
//插入到中间 会进行淘汰池的数据移动
else {
/* Inserting in the middle. Now k points to the first element
* greater than the element to insert. */
//如果淘汰池最右边的元素是空的,k及k以后的元素往右边移一位
if (pool[EVPOOL_SIZE-1].key == NULL) {
/* Free space on the right? Insert at k shifting
* all the elements from k to end to the right. */
/* Save SDS before overwriting. */
sds cached = pool[EVPOOL_SIZE-1].cached;
memmove(pool+k+1,pool+k,
sizeof(pool[0])*(EVPOOL_SIZE-k-1));
pool[k].cached = cached;
}
//如果淘汰池最右边的元素不是空的,k以前的元素往左边移一位,把最左边的元素移出淘汰池
else {
/* No free space on right? Insert at k-1 */
k--;
/* Shift all elements on the left of k (included) to the
* left, so we discard the element with smaller idle time. */
sds cached = pool[0].cached; /* Save SDS before overwriting. */
if (pool[0].key != pool[0].cached) sdsfree(pool[0].key);
memmove(pool,pool+1,sizeof(pool[0])*k);
pool[k].cached = cached;
}
}
/* Try to reuse the cached SDS string allocated in the pool entry,
* because allocating and deallocating this object is costly
* (according to the profiler, not my fantasy. Remember:
* premature optimization bla bla bla. */
//以下所及,将当前的放入淘汰池
int klen = sdslen(key);
if (klen > EVPOOL_CACHED_SDS_SIZE) {
pool[k].key = sdsdup(key);
} else {
memcpy(pool[k].cached,key,klen+1);
sdssetlen(pool[k].cached,klen);
pool[k].key = pool[k].cached;
}
pool[k].idle = idle;
pool[k].dbid = dbid;
}
}
LRU算法
概述
LRU(Least Recently Used),翻译过来是最久未使用,时间越久越容易被淘汰。对于redis,就是数据最后一次访问的时间越早,越容易被淘汰。
redis的LRU算法是伪LRU算法,不是对redis所有的数据做LRU,而是随机抽样数据做LRU。有个参数maxmemory-samples用于确定抽样数量,这个数量越大,算法越精准,但复杂度越高。
maxmemory-samples 5 //默认是5
实现原理
怎么知道这个对象多久没访问?
-
LRU是根据这个对象的访问操作时间来进行淘汰的,所以我们需要知道这个对象的最后一次操作访问时间。知道了对象的最后操作访问时间后,我们只需要跟当前的系统时间来进行对比,就能计算出对象已经多久没访问了。
-
在redis中,对象都会被redisObject对象包装,它里面有个字段叫做lru。
redisObject对象(server.h)
typedef struct redisObject {
unsigned type:4;
unsigned encoding:4;
unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
* LFU data (least significant 8 bits frequency
* and most significant 16 bits access time). */
int refcount;
void *ptr;
} robj;
- redisObject.lru:大小为24bit,记录的是这个对象最后的访问时间的秒单位的最后24bit(大概是194天)。取最后24bit的方法如下:
long time = System.currentTimeMillis() / 1000; //获取当前秒
time / 1000 & (1 << 24 - 1) //当前的秒数和低24个1做按位与
- 知道了这个对象的最后操作访问的时间。就可以算出这个对象多久没访问:
这个对象没访问的时间 = 当前的时间的秒单位的后24bit - 这个对象最后操作访问的访问时间的秒单位的最后24bit
//假设lruclock=当前时间的秒单位的最后24bit
这个对象没访问的时间 = lruclock - redisObject.lru
- 但这里有个问题:24bit是有大小限制的,最大是24个1,大概是194天。那么假如时间一直往前走,过了194天,这个系统时间的最后24bit肯定会变成24个0。所以,redis有个轮询的概念,它如果超过24位,又会从0开始。所以我们不能直接的用系统时间秒单位的24bit位去减对象的lru去计算这个对象没访问的时间,而是要判断一下。
在生活中,我们有时会计算过了多少个月这个问题。比如:
- 我上次旅游是在今年5月份,现在是7月份了,请问过了多少个月?8 - 5 = 3个月
- 我上次旅游是在去年5月份,现在是3月份了,请问过了多少个月?12 + 3 - 5 = 10个月
同理:
如果redisObject.lru < lruclock,通过lruclock - redisObject.lru
计算个对象没访问的时间。
如果redisObject.lru > lruclock,通过lruclock + 24bit最大值 - redisObject.lru
计算个对象没访问的时间。
- 这里还有个问题:如果时间一直往前走,过了2个194天,系统时间的最后24bit还是会变成24个0。这个时候对象没访问的时间怎么算?
- 我上次旅游是在前年5月份,现在是3月份了,请问过了多少个月?
对于redis而言,伪lru算法本来就不是非常精确的。所以redis只考虑1次轮询的情况,对于多次轮询,也只按1次轮询处理。即,前年5月份也按去年5月份处理。
LRU算法执行流程图
源码分析
estimateObjectIdleTime方法(evict.c)
unsigned long long estimateObjectIdleTime(robj *o) {
//获取秒单位时间的最后24bit
unsigned long long lruclock = LRU_CLOCK();
//判断lruclock(当前系统时间)跟缓存对象的lru字段的大小
if (lruclock >= o->lru) {
//如果lruclock >= robj.lru,返回lruclock-o->lru,再转换单位
//robj.lru的值越小,返回的值越大,越大越容易被淘汰
return (lruclock - o->lru) * LRU_CLOCK_RESOLUTION;
} else {
//如果lruclock < robj.lru,返回lruclock + (LRU_CLOCK_MAX - o->lru),再转换单位
//LRU_CLOCK_MAX是24bit的最大的值为2的24次方-1
return (lruclock + (LRU_CLOCK_MAX - o->lru)) *
LRU_CLOCK_RESOLUTION;
}
}
LFU算法
LRU(Least Frequently Used),翻译成中文就是最不常用的优先淘汰。它的衡量标准就是次数,次数越少的越容易被淘汰。每次操作访问一次,就+1; 淘汰的时候,直接去比较这个次数,次数越少的越容易淘汰。但是LFU有个致命的时效性问题。
时效性问题
举个例子:去年有个吴x凡事件,当时很火,点击量很高4000w。今年有个新闻,刚出来,点击量是100。本来,我们应该要让今年的这个新闻显示出来的,吴x凡事件虽然去年很火,但是由于时间久了,显示优先级肯定是要低于今年的新闻的。但是按上述的LFU算法,次数越少的越容易被淘汰,今年的新闻就很容易被淘汰。导致的问题:新的数据进不去,旧的数据出不来。
这是个很严重的问题,Redis肯定会解决。 那么Redis是怎么解决的?
举个例子:我们经常会充一些会员,比如爱奇艺、b站等等。但是有些网站会有这样的现象,假如哪天我没充钱了的话,或者没有续VIP的时候,我这个会员等级会随着时间的流失而降低。比如我本来是V6,但是一年不充钱的话,可能就变成了V4。
redis解决时效性问题的方法与这类似,redis隔一段时间就减少一下数据的访问次数,这样旧的访问量又很高的数据会随着时间的流逝,其访问次数不断减少,直到可以淘汰。
实现原理
- 在redis中,对象都会被redisObject对象包装,它里面有个字段叫做lru。
redisObject对象(server.h)
typedef struct redisObject {
unsigned type:4;
unsigned encoding:4;
unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
* LFU data (least significant 8 bits frequency
* and most significant 16 bits access time). */
int refcount;
void *ptr;
} robj;
在LFU算法中,redisObject.lru的前16bit表示时间,后8bit表示这个对象的访问频率,先给它叫做counter。
- 前16bit代表的是这个对象最后访问时间的分单位。通过这个值能够得到这个对象多少分钟没访问。redis有个配置:lfu-decay-time,表示多少分钟没访问就减少一次。
lfu-decay-time 1 //多少分钟没访问就减少一次
所以:
减少次数(num_periods) = 这个对象最后访问时间的分单位 / server.lfu_decay_time
LFU计算淘汰值的源码如下,LFUDecrAndReturn(evict.c 文件)
unsigned long LFUDecrAndReturn(robj *o) {
//lru字段右移8位,得到前面16位的时间
unsigned long ldt = o->lru >> 8;
//lru字段与255进行&运算(255代表8位的最大值),得到8位counter值
unsigned long counter = o->lru & 255;
//如果配置了lfu_decay_time,用LFUTimeElapsed(ldt) 除以配置的值。
//总的没访问的分钟时间 / 配置值(多少分钟衰减1次),得衰减次数。
unsigned long num_periods = server.lfu_decay_time ? LFUTimeElapsed(ldt) / server.lfu_decay_time : 0;
if (num_periods)
//不能减少为负数,非负数用couter值减去衰减值
counter = (num_periods > counter) ? 0 : counter - num_periods;
return counter;
}
- 这里有个问题:8bit最大值是255,用后8bit表示访问次数是不够的。redis在这里做了一些处理,让数据达到255很难。方案如下:
- 假设这8bit值为counter。counter最大只能255,如果到了255,不往上加。实际到达255的几率不是很高。可以支撑很大很大的数据量。
- counter属于随机添加,添加的几率根据基数值(LFU_INIT_VAL)、已有的counter值、配置server.lfu_log_factor相关,counter值越大,添加的几率越小,lfu-log-factor配置的值越大,添加的几率越小。
- 如果counter <= 5(LFU_INIT_VAL),则每次访问counter必+1
- 如果:5 <counter<255,那么越往上加的概率越低
源码如下:LFULogIncr方法(evict.c 文件)
uint8_t LFULogIncr(uint8_t counter) {
//如果已经到最大值255,返回255 ,8位的最大值
if (counter == 255) return 255;
//得到随机数(0-1)
double r = (double)rand()/RAND_MAX;
//LFU_INIT_VAL表示基数值,默认值是5,表示5次以下,每次访问counter必加1(在server.h配置)
double baseval = counter - LFU_INIT_VAL;
//保证baseval >= 0
if (baseval < 0) baseval = 0;
//server.lfu_log_factor默认值是10,所以p <= 1。而且counter越大,p越小。
double p = 1.0/(baseval*server.lfu_log_factor+1);
//p越小,加counter几率越小
if (r < p) counter++;
return counter;
}
参考资料
-
《咕泡云课堂》
-
官网
https://redis.io/docs/manual/eviction/