Redis系列（四）Redis内存管理之过期、淘汰策略

白垩纪往事

已于 2022-10-23 11:51:15 修改

阅读量536

点赞数

分类专栏： redis 文章标签： redis

于 2022-10-23 11:46:09 首次发布

本文链接：https://blog.csdn.net/CountryLord/article/details/127472760

版权

redis 专栏收录该内容

8 篇文章 1 订阅

订阅专栏

为什么要有过期机制和淘汰机制

因为我们的redis是一个内存型数据库，所有的数据都是放在内存里面的。但是内存是有大小的。如果数据过多，内存就会满，不能再放数据了。此时，redis不可用了。为了保证可用性，Redis会做过期清除和淘汰。

一般来说，如果不设置maxmemory或者设置为0，64位系统不限制redis内存的大小，而32位系统最多使用3GB内存。

过期策略

概念：redis中的数据是可以设置过期时间的，如果时间到，这些数据需要从redis中删除。

打个比方：家里的冰箱满了，我们需要去找到过期的菜并扔掉。判断过期的方法一般有两种：

要用菜的时候，查看一下有没有过期，过期了就扔掉。
定期清理一下冰箱，扔掉过期的菜。

和冰箱处理过期食物的方法类似，redis的过期策略也有两种：惰性过期、定期过期。

惰性过期（被动过期）

惰性过期（被动过期）：每次在访问操作Key的时候，判断这个Key是不是过期了，如果过期了就删除。

定期过期

定期过期：redis有个定时任务，定期检查是否有过期数据，并删除过期数据。

多久执行一次？

在redis的的server.c中有serverCron方法，用于执行redis的各种定时任务。执行频率由redis.conf中的hz配置的值确定，默认是10，表示100ms执行一次，1s执行10次。

实现流程

定时serverCron方法去执行清理，执行频率根据redis.conf中的hz配置的值。
执行清理的时候，不是去扫描所有的key，而是去扫描所有设置了过期时间的key（redisDb.expires）。
扫描时，不会一次性拿取所有的key。而是按hash桶的维度去扫描，扫到20（值可配）个key为止，如果扫到20个key不满一个hash桶，则把当前hash桶扫完。比如扫了第一个桶10个key，没到20，继续扫第二个桶，第二个桶有30个key，那么它会把第二个桶全部扫描完，总共扫描40个key。除了有20个key的限制，还有一次最多扫描400个桶。
删除扫描的数据中过期的数据。
验证执行结果。如果：（1）扫描了400个hash桶都是空的；或者（2）删除的数据和扫描的数据的比例超过10%（值可配），则循环继续执行3、4步。
但是这个循环次数也不是无限的，循环16次后回去检测时间，超过指定时间会跳出。这里的循环是为了做时间和空间的平衡。

实现流程图

在这里插入图片描述

源码分析

入口（server.c）

int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
    //......
    //入口
    databasesCron();

    //......
    
	//server.hz是执行频率，是可配的，默认是10，表示100ms执行一次，1s执行10次
    return 1000/server.hz;
}

执行databasesCron函数（server.c）

void databasesCron(void) {
    /* Expire keys by random sampling. Not required for slaves
     * as master will synthesize DELs for us. */
    if (server.active_expire_enabled) {
        if (iAmMaster()) {
			//执行过期循环
            activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW);
        } else {
            expireSlaveKeys();
        }
    }
    
    //......
}

执行activeExpireCycle函数（expire.c）

void activeExpireCycle(int type) {
    /* Adjust the running parameters according to the configured expire
     * effort. The default effort is 1, and the maximum configurable effort
     * is 10. */
    unsigned long
    effort = server.active_expire_effort-1, /* Rescale from 0 to 9. */
    config_keys_per_loop = ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP +
                           ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP/4*effort,
    config_cycle_fast_duration = ACTIVE_EXPIRE_CYCLE_FAST_DURATION +
                                 ACTIVE_EXPIRE_CYCLE_FAST_DURATION/4*effort,
    config_cycle_slow_time_perc = ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC +
                                  2*effort,
    config_cycle_acceptable_stale = ACTIVE_EXPIRE_CYCLE_ACCEPTABLE_STALE-
                                    effort;

    /* This function has some global state in order to continue the work
     * incrementally across calls. */
    static unsigned int current_db = 0; /* Last DB tested. */
    static int timelimit_exit = 0;      /* Time limit hit in previous call? */
    static long long last_fast_cycle = 0; /* When last fast cycle ran. */

    int j, iteration = 0;
    int dbs_per_call = CRON_DBS_PER_CALL;
    long long start = ustime(), timelimit, elapsed;

    /* When clients are paused the dataset should be static not just from the
     * POV of clients not being able to write, but also from the POV of
     * expires and evictions of keys not being performed. */
    if (clientsArePaused()) return;

    if (type == ACTIVE_EXPIRE_CYCLE_FAST) {
        /* Don't start a fast cycle if the previous cycle did not exit
         * for time limit, unless the percentage of estimated stale keys is
         * too high. Also never repeat a fast cycle for the same period
         * as the fast cycle total duration itself. */
        if (!timelimit_exit &&
            server.stat_expired_stale_perc < config_cycle_acceptable_stale)
            return;

        if (start < last_fast_cycle + (long long)config_cycle_fast_duration*2)
            return;

        last_fast_cycle = start;
    }

    /* We usually should test CRON_DBS_PER_CALL per iteration, with
     * two exceptions:
     *
     * 1) Don't test more DBs than we have.
     * 2) If last time we hit the time limit, we want to scan all DBs
     * in this iteration, as there is work to do in some DB and we don't want
     * expired keys to use memory for too much time. */
    if (dbs_per_call > server.dbnum || timelimit_exit)
        dbs_per_call = server.dbnum;

    /* We can use at max 'config_cycle_slow_time_perc' percentage of CPU
     * time per iteration. Since this function gets called with a frequency of
     * server.hz times per second, the following is the max amount of
     * microseconds we can spend in this function. */
    timelimit = config_cycle_slow_time_perc*1000000/server.hz/100;
    timelimit_exit = 0;
    if (timelimit <= 0) timelimit = 1;

    if (type == ACTIVE_EXPIRE_CYCLE_FAST)
        timelimit = config_cycle_fast_duration; /* in microseconds. */

    /* Accumulate some global stats as we expire keys, to have some idea
     * about the number of keys that are already logically expired, but still
     * existing inside the database. */
    long total_sampled = 0;
    long total_expired = 0;
	
	//for循环，默认次数16
    for (j = 0; j < dbs_per_call && timelimit_exit == 0; j++) {
        /* Expired and checked in a single loop. */
        unsigned long expired, sampled;

        redisDb *db = server.db+(current_db % server.dbnum);

        /* Increment the DB now so we are sure if we run out of time
         * in the current DB we'll restart from the next. This allows to
         * distribute the time evenly across DBs. */
        current_db++;

        /* Continue to expire if at the end of the cycle there are still
         * a big percentage of keys to expire, compared to the number of keys
         * we scanned. The percentage, stored in config_cycle_acceptable_stale
         * is not fixed, but depends on the Redis configured "expire effort". */
        do {
            unsigned long num, slots;
            long long now, ttl_sum;
            int ttl_samples;
            iteration++;

            /* If there is nothing to expire try next DB ASAP. */
            if ((num = dictSize(db->expires)) == 0) {
                db->avg_ttl = 0;
                break;
            }
            slots = dictSlots(db->expires);
            now = mstime();

            /* When there are less than 1% filled slots, sampling the key
             * space is expensive, so stop here waiting for better times...
             * The dictionary will be resized asap. */
            if (num && slots > DICT_HT_INITIAL_SIZE &&
                (num*100/slots < 1)) break;

            /* The main collection cycle. Sample random keys among keys
             * with an expire set, checking for expired ones. */
            expired = 0;
            sampled = 0;
            ttl_sum = 0;
            ttl_samples = 0;
			//config_keys_per_loop默认是20，表示一次最多扫20个
            if (num > config_keys_per_loop)
                num = config_keys_per_loop;

            /* Here we access the low level representation of the hash table
             * for speed concerns: this makes this code coupled with dict.c,
             * but it hardly changed in ten years.
             *
             * Note that certain places of the hash table may be empty,
             * so we want also a stop condition about the number of
             * buckets that we scanned. However scanning for free buckets
             * is very fast: we are in the cache line scanning a sequential
             * array of NULL pointers, so we can scan a lot more buckets
             * than keys in the same time. */
            long max_buckets = num*20;
            long checked_buckets = 0;
			//如果扫描的元素个数<20并且最多检查的桶数<400
            while (sampled < num && checked_buckets < max_buckets) {
				//考虑到扩容的情况，这里循环两次，ht[0]和ht[1]
				//从db.expires中获取带有过期时间的key
                for (int table = 0; table < 2; table++) {
                    if (table == 1 && !dictIsRehashing(db->expires)) break;

                    unsigned long idx = db->expires_cursor;
                    idx &= db->expires->ht[table].sizemask;
                    dictEntry *de = db->expires->ht[table].table[idx];
                    long long ttl;

                    /* Scan the current bucket of the current table. */
                    checked_buckets++;
					//循环检查桶内的链表
                    while(de) {
                        /* Get the next entry now since this entry may get
                         * deleted. */
                        dictEntry *e = de;
                        de = de->next;

                        ttl = dictGetSignedIntegerVal(e)-now;
						//删除的方法
                        if (activeExpireCycleTryExpire(db,e,now)) expired++;
                        if (ttl > 0) {
                            /* We want the average TTL of keys yet
                             * not expired. */
                            ttl_sum += ttl;
                            ttl_samples++;
                        }
                        sampled++;
                    }
                }
				//游标，用于表示扫描到hash桶的下标，下次循环从expires_cursor开始继续往下扫描。
                db->expires_cursor++;
            }
            total_expired += expired;
            total_sampled += sampled;

            /* Update the average TTL stats for this database. */
            if (ttl_samples) {
                long long avg_ttl = ttl_sum/ttl_samples;

                /* Do a simple running average with a few samples.
                 * We just use the current estimate with a weight of 2%
                 * and the previous estimate with a weight of 98%. */
                if (db->avg_ttl == 0) db->avg_ttl = avg_ttl;
                db->avg_ttl = (db->avg_ttl/50)*49 + (avg_ttl/50);
            }

            /* We can't block forever here even if there are many keys to
             * expire. So after a given amount of milliseconds return to the
             * caller waiting for the other active expire cycle. */
			//每16次循环后，检查清理花费的时间是否超时，如果超时，则退出循环。
            if ((iteration & 0xf) == 0) { /* check once every 16 iterations. */
                elapsed = ustime()-start;
                if (elapsed > timelimit) {
                    timelimit_exit = 1;
                    server.stat_expired_time_cap_reached_count++;
                    break;
                }
            }
            /* We don't repeat the cycle for the current database if there are
             * an acceptable amount of stale keys (logically expired but yet
             * not reclaimed). */
		// 判断是否没扫描到数据 ||（过期的数据数量 * 100 / 扫描的数据的数量）是否 > 10%
        } while (sampled == 0 ||
                 (expired*100/sampled) > config_cycle_acceptable_stale);
    }

    elapsed = ustime()-start;
    server.stat_expire_cycle_time_used += elapsed;
    latencyAddSampleIfNeeded("expire-cycle",elapsed/1000);

    /* Update our estimate of keys existing but yet to be expired.
     * Running average with this sample accounting for 5%. */
    double current_perc;
    if (total_sampled) {
        current_perc = (double)total_expired/total_sampled;
    } else
        current_perc = 0;
    server.stat_expired_stale_perc = (current_perc*0.05)+
                                     (server.stat_expired_stale_perc*0.95);
}

淘汰策略

概述

假如我里面没有设置过期时间的key，或者设置了过期时间没到期。此时，内存里面的数据都是有效的，但是内存也可能会满。在这种满了的情况下，如果要写入新数据，就要不得已地要选择删除一些数据。这就是淘汰策略。
淘汰策略不会删除太多数据，只会删到你的新数据能够放下为止。
还是以冰箱类比，如果我们发现冰箱的菜满了，但是冰箱里的菜都是好的，现在又新买了菜要放入冰箱，这就需要不得已清理一下菜。一般会有如下方法：
- 买个新冰箱。---------------加机器，加内存
- 不放入新的菜，等多的菜吃掉。---------------noeviction
- 扔掉长时间不吃的 ---------------LRU
- 扔掉很少吃的 ----------LFU
- 扔掉即将快过期的菜 -----------ttl
- 随机扔掉一些菜 -----------Random
redis的淘汰策略有8种，需要我们在config中配置maxmemory-policy即可指定相关的淘汰策略，如下。

# maxmemory-policy noeviction //默认不淘汰数据，能读不能写

8种淘汰策略如下
- noeviction：默认，不淘汰，此时redis能读不能写。
- allkeys-lru：伪LRU算法，从所有的key中去淘汰。
- allkeys-lfu：伪LFU算法，从所有的key中去淘汰。
- allkeys-random：随机算法，从所有的key中去淘汰。
- volatile-lru：伪LRU算法，从设置过期时间的key中去淘汰。
- volatile-lfu：伪LFU算法，从设置过期时间的key中去淘汰。
- volatile-random：随机算法，从设置过期时间的key中去淘汰。
- volatile-ttl：根据过期时间来，淘汰即将过期。

淘汰流程

流程图

在这里插入图片描述

流程说明

用户在做指令操作的时候，redis自旋判断内存是否满足指令所需要的内存。如果内存足够，则执行指令
如果内存不够，就判断当前淘汰策略是否是noeviction
- 如果是，则报OOM错给用户
如果不是，根据配置 maxmemory-samples的数量，从Redis随机抽样获取数据。并循环抽样数据
- 根据不同的淘汰策略计算抽样数据的淘汰值（idle）。
- 判断当前抽样数据是否可以加入淘汰池，这里淘汰池是一个数组，存储候选删除的key。
  - 如果淘汰池未满，取样数据直接加入淘汰池
  - 如果淘汰池满了，但是池里存在数据的淘汰值比取样数据低，那么删除池中淘汰值最低的数据，取样数据加入淘汰池
循环完抽样数据后，在淘汰池中实行末尾淘汰制度，删除最右边的一个数据。
淘汰一次数据后，自旋判断一下内存是否足够，如果内存足够，则结束淘汰；如果内存不足，则执行步骤3、4。

源码分析

freeMemoryIfNeeded方法（evict.c文件）

int freeMemoryIfNeeded(void) {
    int keys_freed = 0;
    /* By default replicas should ignore maxmemory
     * and just be masters exact copies. */
	//从库忽略内存淘汰限制
    if (server.masterhost && server.repl_slave_ignore_maxmemory) return C_OK;

    size_t mem_reported, mem_tofree, mem_freed;
    mstime_t latency, eviction_latency, lazyfree_latency;
    long long delta;
    int slaves = listLength(server.slaves);
    int result = C_ERR;

    /* When clients are paused the dataset should be static not just from the
     * POV of clients not being able to write, but also from the POV of
     * expires and evictions of keys not being performed. */
    if (clientsArePaused()) return C_OK;
    //判断内存是否满，如果没有超过内存，直接返回
    if (getMaxmemoryState(&mem_reported,NULL,&mem_tofree,NULL) == C_OK)
        return C_OK;

    mem_freed = 0;

    latencyStartMonitor(latency);
    //如果策略为不淘汰数据，直接报错OOM
    if (server.maxmemory_policy == MAXMEMORY_NO_EVICTION)
        goto cant_free; /* We need to free memory, but policy forbids. */
    //到这一步肯定是内存不够 如果释放的内存不够 一直自旋释放内存
    while (mem_freed < mem_tofree) {
        int j, k, i;
        static unsigned int next_db = 0;
        sds bestkey = NULL;
        int bestdbid;
        redisDb *db;
        dict *dict;
        dictEntry *de;
        //如果淘汰算法是LRU | LFU | TTL
        if (server.maxmemory_policy & (MAXMEMORY_FLAG_LRU|MAXMEMORY_FLAG_LFU) ||
            server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL)
        {
            //淘汰池 默认大小16
            struct evictionPoolEntry *pool = EvictionPoolLRU;
            //自旋找到合适的要淘汰的key为止
            while(bestkey == NULL) {
                unsigned long total_keys = 0, keys;

                /* We don't want to make local-db choices when expiring keys,
                 * so to start populate the eviction pool sampling keys from
                 * every DB. */
				//去不同的DB查找
                for (i = 0; i < server.dbnum; i++) {
                    db = server.db+i;
                    //判断需要淘汰的范围 是所有数据还是过期的数据
                    dict = (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) ?
                            db->dict : db->expires;
                    if ((keys = dictSize(dict)) != 0) {
                        //关键方法：去从范围中取样拿到最适合淘汰的数据
                        evictionPoolPopulate(i, dict, db->dict, pool);
                        total_keys += keys;
                    }
                }
                //没有key过期
                if (!total_keys) break; /* No keys to evict. */

				//循环淘汰池
                /* Go backward from best to worst element to evict. */
                for (k = EVPOOL_SIZE-1; k >= 0; k--) {
                    if (pool[k].key == NULL) continue;
                    bestdbid = pool[k].dbid;

                    if (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) {
                        de = dictFind(server.db[pool[k].dbid].dict,
                            pool[k].key);
                    } else {
                        de = dictFind(server.db[pool[k].dbid].expires,
                            pool[k].key);
                    }

                    /* Remove the entry from the pool. */
                    if (pool[k].key != pool[k].cached)
                        sdsfree(pool[k].key);
                    pool[k].key = NULL;
                    pool[k].idle = 0;

                    /* If the key exists, is our pick. Otherwise it is
                     * a ghost and we need to try the next element. */
                    if (de) {
                        bestkey = dictGetKey(de);
                        break;
                    } else {
                        /* Ghost... Iterate again. */
                    }
                }
            }
        }

        /* volatile-random and allkeys-random policy */
        else if (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM ||
                 server.maxmemory_policy == MAXMEMORY_VOLATILE_RANDOM)
        {
            /* When evicting a random key, we try to evict a key for
             * each DB, so we use the static 'next_db' variable to
             * incrementally visit all DBs. */
            for (i = 0; i < server.dbnum; i++) {
                j = (++next_db) % server.dbnum;
                db = server.db+j;
                dict = (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM) ?
                        db->dict : db->expires;
                if (dictSize(dict) != 0) {
                    de = dictGetRandomKey(dict);
                    bestkey = dictGetKey(de);
                    bestdbid = j;
                    break;
                }
            }
        }

        /* Finally remove the selected key. */
        // 移除这个key
        if (bestkey) {
            db = server.db+bestdbid;
            robj *keyobj = createStringObject(bestkey,sdslen(bestkey));
            propagateExpire(db,keyobj,server.lazyfree_lazy_eviction);
            /* We compute the amount of memory freed by db*Delete() alone.
             * It is possible that actually the memory needed to propagate
             * the DEL in AOF and replication link is greater than the one
             * we are freeing removing the key, but we can't account for
             * that otherwise we would never exit the loop.
             *
             * Same for CSC invalidation messages generated by signalModifiedKey.
             *
             * AOF and Output buffer memory will be freed eventually so
             * we only care about memory used by the key space. */
            delta = (long long) zmalloc_used_memory();
            latencyStartMonitor(eviction_latency);
            //如果是异步淘汰 会进行异步淘汰
            if (server.lazyfree_lazy_eviction)
                dbAsyncDelete(db,keyobj);
            else
                //同步淘汰
                dbSyncDelete(db,keyobj);
            latencyEndMonitor(eviction_latency);
            latencyAddSampleIfNeeded("eviction-del",eviction_latency);
            delta -= (long long) zmalloc_used_memory();
            mem_freed += delta;
            server.stat_evictedkeys++;
            signalModifiedKey(NULL,db,keyobj);
            notifyKeyspaceEvent(NOTIFY_EVICTED, "evicted",
                keyobj, db->id);
            decrRefCount(keyobj);
            keys_freed++;

            /* When the memory to free starts to be big enough, we may
             * start spending so much time here that is impossible to
             * deliver data to the slaves fast enough, so we force the
             * transmission here inside the loop. */
            if (slaves) flushSlavesOutputBuffers();

            /* Normally our stop condition is the ability to release
             * a fixed, pre-computed amount of memory. However when we
             * are deleting objects in another thread, it's better to
             * check, from time to time, if we already reached our target
             * memory, since the "mem_freed" amount is computed only
             * across the dbAsyncDelete() call, while the thread can
             * release the memory all the time. */
            if (server.lazyfree_lazy_eviction && !(keys_freed % 16)) {
                if (getMaxmemoryState(NULL,NULL,NULL,NULL) == C_OK) {
                    /* Let's satisfy our stop condition. */
                    mem_freed = mem_tofree;
                }
            }
        } else {
            goto cant_free; /* nothing to free... */
        }
    }
    result = C_OK;

cant_free:
    /* We are here if we are not able to reclaim memory. There is only one
     * last thing we can try: check if the lazyfree thread has jobs in queue
     * and wait... */
    if (result != C_OK) {
        latencyStartMonitor(lazyfree_latency);
        while(bioPendingJobsOfType(BIO_LAZY_FREE)) {
            if (getMaxmemoryState(NULL,NULL,NULL,NULL) == C_OK) {
                result = C_OK;
                break;
            }
            usleep(1000);
        }
        latencyEndMonitor(lazyfree_latency);
        latencyAddSampleIfNeeded("eviction-lazyfree",lazyfree_latency);
    }
    latencyEndMonitor(latency);
    latencyAddSampleIfNeeded("eviction-cycle",latency);
    return result;
}

evictionPoolPopulate方法（evict.c文件）

/* evictionPoolPopulate是freeMemoryIfNeeded()的辅助函数，它用于在我们需要淘汰一个key的时候，把一些元素迁移进淘汰池。
 * 如果抽样元素的idle大于淘汰池中的某个元素的idle，则抽样元素会加入淘汰池。
 * 如里淘汰池有空项，则抽样数据一定加入淘汰池。
 * 淘汰池里，元素按idle从小到大排序的。左边小，右边大。
 */
void evictionPoolPopulate(int dbid, dict *sampledict, dict *keydict, struct evictionPoolEntry *pool) {
    int j, k, count;
    //需要取样的数据
    dictEntry *samples[server.maxmemory_samples];
    //随机从需要取样的范围中得到取样的数据
    count = dictGetSomeKeys(sampledict,samples,server.maxmemory_samples);
    //循环取样数据
    for (j = 0; j < count; j++) {
        unsigned long long idle;
        sds key;
        robj *o;
        dictEntry *de;

        de = samples[j];
        key = dictGetKey(de);

        /* If the dictionary we are sampling from is not the main
         * dictionary (but the expires one) we need to lookup the key
         * again in the key dictionary to obtain the value object. */
         //如果是ttl，只能从带有过期时间的数据中获取，所以不需要获取对象，其他的淘汰策略都需要去我们的键值对中获取值对象
        if (server.maxmemory_policy != MAXMEMORY_VOLATILE_TTL) {
            if (sampledict != keydict) de = dictFind(keydict, key);
            o = dictGetVal(de);
        }

        /* Calculate the idle time according to the policy. This is called
         * idle just because the code initially handled LRU, but is in fact
         * just a score where an higher score means better candidate. */
        //计算淘汰值，值越大，越容易被淘汰
         //如果是LRU算法，采用LRU算法得到最长时间没访问的
        if (server.maxmemory_policy & MAXMEMORY_FLAG_LRU) {
			//估算淘汰值（淘汰权重）
            idle = estimateObjectIdleTime(o);
        } 
        //如果是LFU算法，根据LRU算法得到最少访问的，但是用255-值 所以idle越大，越容易淘汰
        else if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
            /* When we use an LRU policy, we sort the keys by idle time
             * so that we expire keys starting from greater idle time.
             * However when the policy is an LFU one, we have a frequency
             * estimation, and we want to evict keys with lower frequency
             * first. So inside the pool we put objects using the inverted
             * frequency subtracting the actual frequency to the maximum
             * frequency of 255. */
            idle = 255-LFUDecrAndReturn(o);
        } 
        //ttl 直接根据时间来
        else if (server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL) {
            /* In this case the sooner the expire the better. */
            idle = ULLONG_MAX - (long)dictGetVal(de);
        } else {
            serverPanic("Unknown eviction policy in evictionPoolPopulate()");
        }

        /* Insert the element inside the pool.
         * First, find the first empty bucket or the first populated
         * bucket that has an idle time smaller than our idle time. */
         //将取样的数据，计算好淘汰的idle后，放入淘汰池中
        k = 0;
        //自旋，找到淘汰池中比当前key的idle小的最后一个下标
        while (k < EVPOOL_SIZE &&
               pool[k].key &&
               pool[k].idle < idle) k++;

        //k=0说明上面循环没进，也就是淘汰池中的所有数据都比当前数据的idle大，并且淘汰池的最后一个不为空，说明淘汰池也是满的。优先淘汰淘汰池的
        if (k == 0 && pool[EVPOOL_SIZE-1].key != NULL) {
            /* Can't insert if the element is < the worst element we have
             * and there are no empty buckets. */
            continue;
        } 
        //说明当前数据的idle比淘汰池中所有数据的都大，所以插入到桶后面不管
        else if (k < EVPOOL_SIZE && pool[k].key == NULL) {
            /* Inserting into empty position. No setup needed before insert. */
        } 
        //插入到中间 会进行淘汰池的数据移动
        else {
            /* Inserting in the middle. Now k points to the first element
             * greater than the element to insert.  */
             //如果淘汰池最右边的元素是空的，k及k以后的元素往右边移一位
            if (pool[EVPOOL_SIZE-1].key == NULL) {
                /* Free space on the right? Insert at k shifting
                 * all the elements from k to end to the right. */

                /* Save SDS before overwriting. */
                sds cached = pool[EVPOOL_SIZE-1].cached;
                memmove(pool+k+1,pool+k,
                    sizeof(pool[0])*(EVPOOL_SIZE-k-1));
                pool[k].cached = cached;
            } 
            //如果淘汰池最右边的元素不是空的，k以前的元素往左边移一位，把最左边的元素移出淘汰池
            else {
                /* No free space on right? Insert at k-1 */
                k--;
                /* Shift all elements on the left of k (included) to the
                 * left, so we discard the element with smaller idle time. */
                sds cached = pool[0].cached; /* Save SDS before overwriting. */
                if (pool[0].key != pool[0].cached) sdsfree(pool[0].key);
                memmove(pool,pool+1,sizeof(pool[0])*k);
                pool[k].cached = cached;
            }
        }

        /* Try to reuse the cached SDS string allocated in the pool entry,
         * because allocating and deallocating this object is costly
         * (according to the profiler, not my fantasy. Remember:
         * premature optimization bla bla bla. */
         //以下所及，将当前的放入淘汰池
        int klen = sdslen(key);
        if (klen > EVPOOL_CACHED_SDS_SIZE) {
            pool[k].key = sdsdup(key);
        } else {
            memcpy(pool[k].cached,key,klen+1);
            sdssetlen(pool[k].cached,klen);
            pool[k].key = pool[k].cached;
        }
        pool[k].idle = idle;
        pool[k].dbid = dbid;
    }
}

LRU算法

概述

LRU（Least Recently Used），翻译过来是最久未使用，时间越久越容易被淘汰。对于redis，就是数据最后一次访问的时间越早，越容易被淘汰。

redis的LRU算法是伪LRU算法，不是对redis所有的数据做LRU，而是随机抽样数据做LRU。有个参数maxmemory-samples用于确定抽样数量，这个数量越大，算法越精准，但复杂度越高。

maxmemory-samples 5		//默认是5

实现原理

怎么知道这个对象多久没访问？

LRU是根据这个对象的访问操作时间来进行淘汰的，所以我们需要知道这个对象的最后一次操作访问时间。知道了对象的最后操作访问时间后，我们只需要跟当前的系统时间来进行对比，就能计算出对象已经多久没访问了。
在redis中，对象都会被redisObject对象包装，它里面有个字段叫做lru。

redisObject对象（server.h）

typedef struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
                            * LFU data (least significant 8 bits frequency
                            * and most significant 16 bits access time). */
    int refcount;
    void *ptr;
} robj;

redisObject.lru：大小为24bit，记录的是这个对象最后的访问时间的秒单位的最后24bit（大概是194天）。取最后24bit的方法如下：

long time = System.currentTimeMillis() / 1000;	//获取当前秒
time / 1000 & (1 << 24 - 1)	//当前的秒数和低24个1做按位与

知道了这个对象的最后操作访问的时间。就可以算出这个对象多久没访问：

这个对象没访问的时间 = 当前的时间的秒单位的后24bit - 这个对象最后操作访问的访问时间的秒单位的最后24bit

//假设lruclock=当前时间的秒单位的最后24bit
这个对象没访问的时间 = lruclock - redisObject.lru

但这里有个问题：24bit是有大小限制的，最大是24个1，大概是194天。那么假如时间一直往前走，过了194天，这个系统时间的最后24bit肯定会变成24个0。所以，redis有个轮询的概念，它如果超过24位，又会从0开始。所以我们不能直接的用系统时间秒单位的24bit位去减对象的lru去计算这个对象没访问的时间，而是要判断一下。

在生活中，我们有时会计算过了多少个月这个问题。比如：

我上次旅游是在今年5月份，现在是7月份了，请问过了多少个月？8 - 5 = 3个月
我上次旅游是在去年5月份，现在是3月份了，请问过了多少个月？12 + 3 - 5 = 10个月

同理：

如果redisObject.lru < lruclock，通过lruclock - redisObject.lru计算个对象没访问的时间。

如果redisObject.lru > lruclock，通过lruclock + 24bit最大值 - redisObject.lru计算个对象没访问的时间。

这里还有个问题：如果时间一直往前走，过了2个194天，系统时间的最后24bit还是会变成24个0。这个时候对象没访问的时间怎么算？

我上次旅游是在前年5月份，现在是3月份了，请问过了多少个月？

对于redis而言，伪lru算法本来就不是非常精确的。所以redis只考虑1次轮询的情况，对于多次轮询，也只按1次轮询处理。即，前年5月份也按去年5月份处理。

LRU算法执行流程图

在这里插入图片描述

源码分析

estimateObjectIdleTime方法(evict.c)

unsigned long long estimateObjectIdleTime(robj *o) {
    //获取秒单位时间的最后24bit
    unsigned long long lruclock = LRU_CLOCK();
    //判断lruclock（当前系统时间）跟缓存对象的lru字段的大小
    if (lruclock >= o->lru) {
        //如果lruclock >= robj.lru，返回lruclock-o->lru，再转换单位
        //robj.lru的值越小，返回的值越大，越大越容易被淘汰
        return (lruclock - o->lru) * LRU_CLOCK_RESOLUTION;
    } else {
        //如果lruclock < robj.lru，返回lruclock + (LRU_CLOCK_MAX - o->lru)，再转换单位
        //LRU_CLOCK_MAX是24bit的最大的值为2的24次方-1
        return (lruclock + (LRU_CLOCK_MAX - o->lru)) *
                    LRU_CLOCK_RESOLUTION;
    }
}

LFU算法

LRU（Least Frequently Used），翻译成中文就是最不常用的优先淘汰。它的衡量标准就是次数，次数越少的越容易被淘汰。每次操作访问一次，就+1; 淘汰的时候，直接去比较这个次数，次数越少的越容易淘汰。但是LFU有个致命的时效性问题。

时效性问题

举个例子：去年有个吴x凡事件，当时很火，点击量很高4000w。今年有个新闻，刚出来，点击量是100。本来，我们应该要让今年的这个新闻显示出来的，吴x凡事件虽然去年很火，但是由于时间久了，显示优先级肯定是要低于今年的新闻的。但是按上述的LFU算法，次数越少的越容易被淘汰，今年的新闻就很容易被淘汰。导致的问题：新的数据进不去，旧的数据出不来。

这是个很严重的问题，Redis肯定会解决。那么Redis是怎么解决的？

举个例子：我们经常会充一些会员，比如爱奇艺、b站等等。但是有些网站会有这样的现象，假如哪天我没充钱了的话，或者没有续VIP的时候，我这个会员等级会随着时间的流失而降低。比如我本来是V6，但是一年不充钱的话，可能就变成了V4。

redis解决时效性问题的方法与这类似，redis隔一段时间就减少一下数据的访问次数，这样旧的访问量又很高的数据会随着时间的流逝，其访问次数不断减少，直到可以淘汰。

实现原理

在redis中，对象都会被redisObject对象包装，它里面有个字段叫做lru。

redisObject对象（server.h）

typedef struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
                            * LFU data (least significant 8 bits frequency
                            * and most significant 16 bits access time). */
    int refcount;
    void *ptr;
} robj;

在LFU算法中，redisObject.lru的前16bit表示时间，后8bit表示这个对象的访问频率，先给它叫做counter。

前16bit代表的是这个对象最后访问时间的分单位。通过这个值能够得到这个对象多少分钟没访问。redis有个配置：lfu-decay-time，表示多少分钟没访问就减少一次。

lfu-decay-time 1	//多少分钟没访问就减少一次

所以：

减少次数（num_periods） = 这个对象最后访问时间的分单位 / server.lfu_decay_time

LFU计算淘汰值的源码如下，LFUDecrAndReturn（evict.c 文件）

unsigned long LFUDecrAndReturn(robj *o) {
    //lru字段右移8位，得到前面16位的时间
    unsigned long ldt = o->lru >> 8;
    //lru字段与255进行&运算（255代表8位的最大值），得到8位counter值
    unsigned long counter = o->lru & 255;
    //如果配置了lfu_decay_time，用LFUTimeElapsed(ldt) 除以配置的值。
    //总的没访问的分钟时间 / 配置值（多少分钟衰减1次），得衰减次数。
    unsigned long num_periods = server.lfu_decay_time ? LFUTimeElapsed(ldt) / server.lfu_decay_time : 0;
    if (num_periods)
        //不能减少为负数，非负数用couter值减去衰减值
        counter = (num_periods > counter) ? 0 : counter - num_periods;
    return counter;
}

这里有个问题：8bit最大值是255，用后8bit表示访问次数是不够的。redis在这里做了一些处理，让数据达到255很难。方案如下：
- 假设这8bit值为counter。counter最大只能255，如果到了255，不往上加。实际到达255的几率不是很高。可以支撑很大很大的数据量。
- counter属于随机添加，添加的几率根据基数值（LFU_INIT_VAL）、已有的counter值、配置server.lfu_log_factor相关，counter值越大，添加的几率越小，lfu-log-factor配置的值越大，添加的几率越小。
- 如果counter <= 5（LFU_INIT_VAL），则每次访问counter必+1
- 如果：5 <counter<255，那么越往上加的概率越低

源码如下：LFULogIncr方法（evict.c 文件）

uint8_t LFULogIncr(uint8_t counter) {
    //如果已经到最大值255，返回255 ，8位的最大值
    if (counter == 255) return 255;
    //得到随机数（0-1）
    double r = (double)rand()/RAND_MAX;
    //LFU_INIT_VAL表示基数值，默认值是5，表示5次以下，每次访问counter必加1（在server.h配置）
    double baseval = counter - LFU_INIT_VAL;
    //保证baseval >= 0
    if (baseval < 0) baseval = 0;
    //server.lfu_log_factor默认值是10，所以p <= 1。而且counter越大，p越小。
    double p = 1.0/(baseval*server.lfu_log_factor+1);
    //p越小，加counter几率越小
    if (r < p) counter++;
    return counter;
}