Redis系列(四)Redis内存管理之过期、淘汰策略

为什么要有过期机制和淘汰机制

因为我们的redis是一个内存型数据库,所有的数据都是放在内存里面的。但是内存是有大小的。如果数据过多,内存就会满,不能再放数据了。此时,redis不可用了。为了保证可用性,Redis会做过期清除和淘汰。

一般来说,如果不设置maxmemory或者设置为0,64位系统不限制redis内存的大小,而32位系统最多使用3GB内存。

过期策略

概念:redis中的数据是可以设置过期时间的,如果时间到,这些数据需要从redis中删除。

打个比方:家里的冰箱满了,我们需要去找到过期的菜并扔掉。判断过期的方法一般有两种:

  1. 要用菜的时候,查看一下有没有过期,过期了就扔掉。
  2. 定期清理一下冰箱,扔掉过期的菜。

和冰箱处理过期食物的方法类似,redis的过期策略也有两种:惰性过期、定期过期。

惰性过期(被动过期)

惰性过期(被动过期):每次在访问操作Key的时候,判断这个Key是不是过期了,如果过期了就删除。

定期过期

定期过期:redis有个定时任务,定期检查是否有过期数据,并删除过期数据。

多久执行一次?

在redis的的server.c中有serverCron方法,用于执行redis的各种定时任务。执行频率由redis.conf中的hz配置的值确定,默认是10,表示100ms执行一次,1s执行10次。

实现流程

  1. 定时serverCron方法去执行清理,执行频率根据redis.conf中的hz配置的值。
  2. 执行清理的时候,不是去扫描所有的key,而是去扫描所有设置了过期时间的key(redisDb.expires)。
  3. 扫描时,不会一次性拿取所有的key。而是按hash桶的维度去扫描,扫到20(值可配)个key为止,如果扫到20个key不满一个hash桶,则把当前hash桶扫完。比如扫了第一个桶10个key,没到20,继续扫第二个桶,第二个桶有30个key,那么它会把第二个桶全部扫描完,总共扫描40个key。除了有20个key的限制,还有一次最多扫描400个桶。
  4. 删除扫描的数据中过期的数据。
  5. 验证执行结果。如果:(1)扫描了400个hash桶都是空的;或者(2)删除的数据和扫描的数据的比例超过10%(值可配),则循环继续执行3、4步。
  6. 但是这个循环次数也不是无限的,循环16次后回去检测时间,超过指定时间会跳出。这里的循环是为了做时间和空间的平衡。

实现流程图

在这里插入图片描述

源码分析

入口(server.c)
int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
    //......
    //入口
    databasesCron();

    //......
    
	//server.hz是执行频率,是可配的,默认是10,表示100ms执行一次,1s执行10次
    return 1000/server.hz;
}
执行databasesCron函数(server.c)
void databasesCron(void) {
    /* Expire keys by random sampling. Not required for slaves
     * as master will synthesize DELs for us. */
    if (server.active_expire_enabled) {
        if (iAmMaster()) {
			//执行过期循环
            activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW);
        } else {
            expireSlaveKeys();
        }
    }
    
    //......
}
执行activeExpireCycle函数(expire.c)
void activeExpireCycle(int type) {
    /* Adjust the running parameters according to the configured expire
     * effort. The default effort is 1, and the maximum configurable effort
     * is 10. */
    unsigned long
    effort = server.active_expire_effort-1, /* Rescale from 0 to 9. */
    config_keys_per_loop = ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP +
                           ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP/4*effort,
    config_cycle_fast_duration = ACTIVE_EXPIRE_CYCLE_FAST_DURATION +
                                 ACTIVE_EXPIRE_CYCLE_FAST_DURATION/4*effort,
    config_cycle_slow_time_perc = ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC +
                                  2*effort,
    config_cycle_acceptable_stale = ACTIVE_EXPIRE_CYCLE_ACCEPTABLE_STALE-
                                    effort;

    /* This function has some global state in order to continue the work
     * incrementally across calls. */
    static unsigned int current_db = 0; /* Last DB tested. */
    static int timelimit_exit = 0;      /* Time limit hit in previous call? */
    static long long last_fast_cycle = 0; /* When last fast cycle ran. */

    int j, iteration = 0;
    int dbs_per_call = CRON_DBS_PER_CALL;
    long long start = ustime(), timelimit, elapsed;

    /* When clients are paused the dataset should be static not just from the
     * POV of clients not being able to write, but also from the POV of
     * expires and evictions of keys not being performed. */
    if (clientsArePaused()) return;

    if (type == ACTIVE_EXPIRE_CYCLE_FAST) {
        /* Don't start a fast cycle if the previous cycle did not exit
         * for time limit, unless the percentage of estimated stale keys is
         * too high. Also never repeat a fast cycle for the same period
         * as the fast cycle total duration itself. */
        if (!timelimit_exit &&
            server.stat_expired_stale_perc < config_cycle_acceptable_stale)
            return;

        if (start < last_fast_cycle + (long long)config_cycle_fast_duration*2)
            return;

        last_fast_cycle = start;
    }

    /* We usually should test CRON_DBS_PER_CALL per iteration, with
     * two exceptions:
     *
     * 1) Don't test more DBs than we have.
     * 2) If last time we hit the time limit, we want to scan all DBs
     * in this iteration, as there is work to do in some DB and we don't want
     * expired keys to use memory for too much time. */
    if (dbs_per_call > server.dbnum || timelimit_exit)
        dbs_per_call = server.dbnum;

    /* We can use at max 'config_cycle_slow_time_perc' percentage of CPU
     * time per iteration. Since this function gets called with a frequency of
     * server.hz times per second, the following is the max amount of
     * microseconds we can spend in this function. */
    timelimit = config_cycle_slow_time_perc*1000000/server.hz/100;
    timelimit_exit = 0;
    if (timelimit <= 0) timelimit = 1;

    if (type == ACTIVE_EXPIRE_CYCLE_FAST)
        timelimit = config_cycle_fast_duration; /* in microseconds. */

    /* Accumulate some global stats as we expire keys, to have some idea
     * about the number of keys that are already logically expired, but still
     * existing inside the database. */
    long total_sampled = 0;
    long total_expired = 0;
	
	//for循环,默认次数16
    for (j = 0; j < dbs_per_call && timelimit_exit == 0; j++) {
        /* Expired and checked in a single loop. */
        unsigned long expired, sampled;

        redisDb *db = server.db+(current_db % server.dbnum);

        /* Increment the DB now so we are sure if we run out of time
         * in the current DB we'll restart from the next. This allows to
         * distribute the time evenly across DBs. */
        current_db++;

        /* Continue to expire if at the end of the cycle there are still
         * a big percentage of keys to expire, compared to the number of keys
         * we scanned. The percentage, stored in config_cycle_acceptable_stale
         * is not fixed, but depends on the Redis configured "expire effort". */
        do {
            unsigned long num, slots;
            long long now, ttl_sum;
            int ttl_samples;
            iteration++;

            /* If there is nothing to expire try next DB ASAP. */
            if ((num = dictSize(db->expires)) == 0) {
                db->avg_ttl = 0;
                break;
            }
            slots = dictSlots(db->expires);
            now = mstime();

            /* When there are less than 1% filled slots, sampling the key
             * space is expensive, so stop here waiting for better times...
             * The dictionary will be resized asap. */
            if (num && slots > DICT_HT_INITIAL_SIZE &&
                (num*100/slots < 1)) break;

            /* The main collection cycle. Sample random keys among keys
             * with an expire set, checking for expired ones. */
            expired = 0;
            sampled = 0;
            ttl_sum = 0;
            ttl_samples = 0;
			//config_keys_per_loop默认是20,表示一次最多扫20个
            if (num > config_keys_per_loop)
                num = config_keys_per_loop;

            /* Here we access the low level representation of the hash table
             * for speed concerns: this makes this code coupled with dict.c,
             * but it hardly changed in ten years.
             *
             * Note that certain places of the hash table may be empty,
             * so we want also a stop condition about the number of
             * buckets that we scanned. However scanning for free buckets
             * is very fast: we are in the cache line scanning a sequential
             * array of NULL pointers, so we can scan a lot more buckets
             * than keys in the same time. */
            long max_buckets = num*20;
            long checked_buckets = 0;
			//如果扫描的元素个数<20并且最多检查的桶数<400
            while (sampled < num && checked_buckets < max_buckets) {
				//考虑到扩容的情况,这里循环两次,ht[0]和ht[1]
				//从db.expires中获取带有过期时间的key
                for (int table = 0; table < 2; table++) {
                    if (table == 1 && !dictIsRehashing(db->expires)) break;

                    unsigned long idx = db->expires_cursor;
                    idx &= db->expires->ht[table].sizemask;
                    dictEntry *de = db->expires->ht[table].table[idx];
                    long long ttl;

                    /* Scan the current bucket of the current table. */
                    checked_buckets++;
					//循环检查桶内的链表
                    while(de) {
                        /* Get the next entry now since this entry may get
                         * deleted. */
                        dictEntry *e = de;
                        de = de->next;

                        ttl = dictGetSignedIntegerVal(e)-now;
						//删除的方法
                        if (activeExpireCycleTryExpire(db,e,now)) expired++;
                        if (ttl > 0) {
                            /* We want the average TTL of keys yet
                             * not expired. */
                            ttl_sum += ttl;
                            ttl_samples++;
                        }
                        sampled++;
                    }
                }
				//游标,用于表示扫描到hash桶的下标,下次循环从expires_cursor开始继续往下扫描。
                db->expires_cursor++;
            }
            total_expired += expired;
            total_sampled += sampled;

            /* Update the average TTL stats for this database. */
            if (ttl_samples) {
                long long avg_ttl = ttl_sum/ttl_samples;

                /* Do a simple running average with a few samples.
                 * We just use the current estimate with a weight of 2%
                 * and the previous estimate with a weight of 98%. */
                if (db->avg_ttl == 0) db->avg_ttl = avg_ttl;
                db->avg_ttl = (db->avg_ttl/50)*49 + (avg_ttl/50);
            }

            /* We can't block forever here even if there are many keys to
             * expire. So after a given amount of milliseconds return to the
             * caller waiting for the other active expire cycle. */
			//每16次循环后,检查清理花费的时间是否超时,如果超时,则退出循环。
            if ((iteration & 0xf) == 0) { /* check once every 16 iterations. */
                elapsed = ustime()-start;
                if (elapsed > timelimit) {
                    timelimit_exit = 1;
                    server.stat_expired_time_cap_reached_count++;
                    break;
                }
            }
            /* We don't repeat the cycle for the current database if there are
             * an acceptable amount of stale keys (logically expired but yet
             * not reclaimed). */
		// 判断是否没扫描到数据 ||(过期的数据数量 * 100 / 扫描的数据的数量)是否 > 10%
        } while (sampled == 0 ||
                 (expired*100/sampled) > config_cycle_acceptable_stale);
    }

    elapsed = ustime()-start;
    server.stat_expire_cycle_time_used += elapsed;
    latencyAddSampleIfNeeded("expire-cycle",elapsed/1000);

    /* Update our estimate of keys existing but yet to be expired.
     * Running average with this sample accounting for 5%. */
    double current_perc;
    if (total_sampled) {
        current_perc = (double)total_expired/total_sampled;
    } else
        current_perc = 0;
    server.stat_expired_stale_perc = (current_perc*0.05)+
                                     (server.stat_expired_stale_perc*0.95);
}

淘汰策略

概述

  • 假如我里面没有设置过期时间的key,或者设置了过期时间没到期。此时,内存里面的数据都是有效的,但是内存也可能会满。在这种满了的情况下,如果要写入新数据,就要不得已地要选择删除一些数据。这就是淘汰策略。

  • 淘汰策略不会删除太多数据,只会删到你的新数据能够放下为止。

  • 还是以冰箱类比,如果我们发现冰箱的菜满了,但是冰箱里的菜都是好的,现在又新买了菜要放入冰箱,这就需要不得已清理一下菜。一般会有如下方法:

    • 买个新冰箱。---------------加机器,加内存
    • 不放入新的菜,等多的菜吃掉。---------------noeviction
    • 扔掉长时间不吃的 ---------------LRU
    • 扔掉很少吃的 ----------LFU
    • 扔掉即将快过期的菜 -----------ttl
    • 随机扔掉一些菜 -----------Random
  • redis的淘汰策略有8种,需要我们在config中配置maxmemory-policy即可指定相关的淘汰策略,如下。

# maxmemory-policy noeviction //默认不淘汰数据,能读不能写  
  • 8种淘汰策略如下
    • noeviction:默认,不淘汰,此时redis能读不能写。
    • allkeys-lru:伪LRU算法,从所有的key中去淘汰。
    • allkeys-lfu:伪LFU算法,从所有的key中去淘汰。
    • allkeys-random:随机算法,从所有的key中去淘汰。
    • volatile-lru:伪LRU算法,从设置过期时间的key中去淘汰。
    • volatile-lfu:伪LFU算法,从设置过期时间的key中去淘汰。
    • volatile-random:随机算法,从设置过期时间的key中去淘汰。
    • volatile-ttl:根据过期时间来,淘汰即将过期。

淘汰流程

流程图

在这里插入图片描述

流程说明

  1. 用户在做指令操作的时候,redis自旋判断内存是否满足指令所需要的内存。如果内存足够,则执行指令
  2. 如果内存不够,就判断当前淘汰策略是否是noeviction
    • 如果是,则报OOM错给用户
  3. 如果不是,根据配置 maxmemory-samples的数量,从Redis随机抽样获取数据。并循环抽样数据
    • 根据不同的淘汰策略计算抽样数据的淘汰值(idle)。
    • 判断当前抽样数据是否可以加入淘汰池,这里淘汰池是一个数组,存储候选删除的key。
      • 如果淘汰池未满,取样数据直接加入淘汰池
      • 如果淘汰池满了,但是池里存在数据的淘汰值比取样数据低,那么删除池中淘汰值最低的数据,取样数据加入淘汰池
  4. 循环完抽样数据后,在淘汰池中实行末尾淘汰制度,删除最右边的一个数据。
  5. 淘汰一次数据后,自旋判断一下内存是否足够,如果内存足够,则结束淘汰;如果内存不足,则执行步骤3、4。

源码分析

freeMemoryIfNeeded方法(evict.c文件)

int freeMemoryIfNeeded(void) {
    int keys_freed = 0;
    /* By default replicas should ignore maxmemory
     * and just be masters exact copies. */
	//从库忽略内存淘汰限制
    if (server.masterhost && server.repl_slave_ignore_maxmemory) return C_OK;

    size_t mem_reported, mem_tofree, mem_freed;
    mstime_t latency, eviction_latency, lazyfree_latency;
    long long delta;
    int slaves = listLength(server.slaves);
    int result = C_ERR;

    /* When clients are paused the dataset should be static not just from the
     * POV of clients not being able to write, but also from the POV of
     * expires and evictions of keys not being performed. */
    if (clientsArePaused()) return C_OK;
    //判断内存是否满,如果没有超过内存,直接返回
    if (getMaxmemoryState(&mem_reported,NULL,&mem_tofree,NULL) == C_OK)
        return C_OK;

    mem_freed = 0;

    latencyStartMonitor(latency);
    //如果策略为不淘汰数据,直接报错OOM
    if (server.maxmemory_policy == MAXMEMORY_NO_EVICTION)
        goto cant_free; /* We need to free memory, but policy forbids. */
    //到这一步肯定是内存不够 如果释放的内存不够 一直自旋释放内存
    while (mem_freed < mem_tofree) {
        int j, k, i;
        static unsigned int next_db = 0;
        sds bestkey = NULL;
        int bestdbid;
        redisDb *db;
        dict *dict;
        dictEntry *de;
        //如果淘汰算法是LRU | LFU | TTL
        if (server.maxmemory_policy & (MAXMEMORY_FLAG_LRU|MAXMEMORY_FLAG_LFU) ||
            server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL)
        {
            //淘汰池 默认大小16
            struct evictionPoolEntry *pool = EvictionPoolLRU;
            //自旋找到合适的要淘汰的key为止
            while(bestkey == NULL) {
                unsigned long total_keys = 0, keys;

                /* We don't want to make local-db choices when expiring keys,
                 * so to start populate the eviction pool sampling keys from
                 * every DB. */
				//去不同的DB查找
                for (i = 0; i < server.dbnum; i++) {
                    db = server.db+i;
                    //判断需要淘汰的范围 是所有数据还是过期的数据
                    dict = (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) ?
                            db->dict : db->expires;
                    if ((keys = dictSize(dict)) != 0) {
                        //关键方法:去从范围中取样拿到最适合淘汰的数据
                        evictionPoolPopulate(i, dict, db->dict, pool);
                        total_keys += keys;
                    }
                }
                //没有key过期
                if (!total_keys) break; /* No keys to evict. */

				//循环淘汰池
                /* Go backward from best to worst element to evict. */
                for (k = EVPOOL_SIZE-1; k >= 0; k--) {
                    if (pool[k].key == NULL) continue;
                    bestdbid = pool[k].dbid;

                    if (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) {
                        de = dictFind(server.db[pool[k].dbid].dict,
                            pool[k].key);
                    } else {
                        de = dictFind(server.db[pool[k].dbid].expires,
                            pool[k].key);
                    }

                    /* Remove the entry from the pool. */
                    if (pool[k].key != pool[k].cached)
                        sdsfree(pool[k].key);
                    pool[k].key = NULL;
                    pool[k].idle = 0;

                    /* If the key exists, is our pick. Otherwise it is
                     * a ghost and we need to try the next element. */
                    if (de) {
                        bestkey = dictGetKey(de);
                        break;
                    } else {
                        /* Ghost... Iterate again. */
                    }
                }
            }
        }

        /* volatile-random and allkeys-random policy */
        else if (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM ||
                 server.maxmemory_policy == MAXMEMORY_VOLATILE_RANDOM)
        {
            /* When evicting a random key, we try to evict a key for
             * each DB, so we use the static 'next_db' variable to
             * incrementally visit all DBs. */
            for (i = 0; i < server.dbnum; i++) {
                j = (++next_db) % server.dbnum;
                db = server.db+j;
                dict = (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM) ?
                        db->dict : db->expires;
                if (dictSize(dict) != 0) {
                    de = dictGetRandomKey(dict);
                    bestkey = dictGetKey(de);
                    bestdbid = j;
                    break;
                }
            }
        }

        /* Finally remove the selected key. */
        // 移除这个key
        if (bestkey) {
            db = server.db+bestdbid;
            robj *keyobj = createStringObject(bestkey,sdslen(bestkey));
            propagateExpire(db,keyobj,server.lazyfree_lazy_eviction);
            /* We compute the amount of memory freed by db*Delete() alone.
             * It is possible that actually the memory needed to propagate
             * the DEL in AOF and replication link is greater than the one
             * we are freeing removing the key, but we can't account for
             * that otherwise we would never exit the loop.
             *
             * Same for CSC invalidation messages generated by signalModifiedKey.
             *
             * AOF and Output buffer memory will be freed eventually so
             * we only care about memory used by the key space. */
            delta = (long long) zmalloc_used_memory();
            latencyStartMonitor(eviction_latency);
            //如果是异步淘汰 会进行异步淘汰
            if (server.lazyfree_lazy_eviction)
                dbAsyncDelete(db,keyobj);
            else
                //同步淘汰
                dbSyncDelete(db,keyobj);
            latencyEndMonitor(eviction_latency);
            latencyAddSampleIfNeeded("eviction-del",eviction_latency);
            delta -= (long long) zmalloc_used_memory();
            mem_freed += delta;
            server.stat_evictedkeys++;
            signalModifiedKey(NULL,db,keyobj);
            notifyKeyspaceEvent(NOTIFY_EVICTED, "evicted",
                keyobj, db->id);
            decrRefCount(keyobj);
            keys_freed++;

            /* When the memory to free starts to be big enough, we may
             * start spending so much time here that is impossible to
             * deliver data to the slaves fast enough, so we force the
             * transmission here inside the loop. */
            if (slaves) flushSlavesOutputBuffers();

            /* Normally our stop condition is the ability to release
             * a fixed, pre-computed amount of memory. However when we
             * are deleting objects in another thread, it's better to
             * check, from time to time, if we already reached our target
             * memory, since the "mem_freed" amount is computed only
             * across the dbAsyncDelete() call, while the thread can
             * release the memory all the time. */
            if (server.lazyfree_lazy_eviction && !(keys_freed % 16)) {
                if (getMaxmemoryState(NULL,NULL,NULL,NULL) == C_OK) {
                    /* Let's satisfy our stop condition. */
                    mem_freed = mem_tofree;
                }
            }
        } else {
            goto cant_free; /* nothing to free... */
        }
    }
    result = C_OK;

cant_free:
    /* We are here if we are not able to reclaim memory. There is only one
     * last thing we can try: check if the lazyfree thread has jobs in queue
     * and wait... */
    if (result != C_OK) {
        latencyStartMonitor(lazyfree_latency);
        while(bioPendingJobsOfType(BIO_LAZY_FREE)) {
            if (getMaxmemoryState(NULL,NULL,NULL,NULL) == C_OK) {
                result = C_OK;
                break;
            }
            usleep(1000);
        }
        latencyEndMonitor(lazyfree_latency);
        latencyAddSampleIfNeeded("eviction-lazyfree",lazyfree_latency);
    }
    latencyEndMonitor(latency);
    latencyAddSampleIfNeeded("eviction-cycle",latency);
    return result;
}

evictionPoolPopulate方法(evict.c文件)

/* evictionPoolPopulate是freeMemoryIfNeeded()的辅助函数,它用于在我们需要淘汰一个key的时候,把一些元素迁移进淘汰池。
 * 如果抽样元素的idle大于淘汰池中的某个元素的idle,则抽样元素会加入淘汰池。
 * 如里淘汰池有空项,则抽样数据一定加入淘汰池。
 * 淘汰池里,元素按idle从小到大排序的。左边小,右边大。
 */
void evictionPoolPopulate(int dbid, dict *sampledict, dict *keydict, struct evictionPoolEntry *pool) {
    int j, k, count;
    //需要取样的数据
    dictEntry *samples[server.maxmemory_samples];
    //随机从需要取样的范围中得到取样的数据
    count = dictGetSomeKeys(sampledict,samples,server.maxmemory_samples);
    //循环取样数据
    for (j = 0; j < count; j++) {
        unsigned long long idle;
        sds key;
        robj *o;
        dictEntry *de;

        de = samples[j];
        key = dictGetKey(de);

        /* If the dictionary we are sampling from is not the main
         * dictionary (but the expires one) we need to lookup the key
         * again in the key dictionary to obtain the value object. */
         //如果是ttl,只能从带有过期时间的数据中获取,所以不需要获取对象,其他的淘汰策略都需要去我们的键值对中获取值对象
        if (server.maxmemory_policy != MAXMEMORY_VOLATILE_TTL) {
            if (sampledict != keydict) de = dictFind(keydict, key);
            o = dictGetVal(de);
        }

        /* Calculate the idle time according to the policy. This is called
         * idle just because the code initially handled LRU, but is in fact
         * just a score where an higher score means better candidate. */
        //计算淘汰值,值越大,越容易被淘汰
         //如果是LRU算法,采用LRU算法得到最长时间没访问的
        if (server.maxmemory_policy & MAXMEMORY_FLAG_LRU) {
			//估算淘汰值(淘汰权重)
            idle = estimateObjectIdleTime(o);
        } 
        //如果是LFU算法,根据LRU算法得到最少访问的,但是用255-值 所以idle越大,越容易淘汰
        else if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
            /* When we use an LRU policy, we sort the keys by idle time
             * so that we expire keys starting from greater idle time.
             * However when the policy is an LFU one, we have a frequency
             * estimation, and we want to evict keys with lower frequency
             * first. So inside the pool we put objects using the inverted
             * frequency subtracting the actual frequency to the maximum
             * frequency of 255. */
            idle = 255-LFUDecrAndReturn(o);
        } 
        //ttl 直接根据时间来
        else if (server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL) {
            /* In this case the sooner the expire the better. */
            idle = ULLONG_MAX - (long)dictGetVal(de);
        } else {
            serverPanic("Unknown eviction policy in evictionPoolPopulate()");
        }

        /* Insert the element inside the pool.
         * First, find the first empty bucket or the first populated
         * bucket that has an idle time smaller than our idle time. */
         //将取样的数据,计算好淘汰的idle后,放入淘汰池中
        k = 0;
        //自旋,找到淘汰池中比当前key的idle小的最后一个下标
        while (k < EVPOOL_SIZE &&
               pool[k].key &&
               pool[k].idle < idle) k++;

        //k=0说明上面循环没进,也就是淘汰池中的所有数据都比当前数据的idle大,并且淘汰池的最后一个不为空,说明淘汰池也是满的。优先淘汰淘汰池的
        if (k == 0 && pool[EVPOOL_SIZE-1].key != NULL) {
            /* Can't insert if the element is < the worst element we have
             * and there are no empty buckets. */
            continue;
        } 
        //说明当前数据的idle比淘汰池中所有数据的都大,所以插入到桶后面不管
        else if (k < EVPOOL_SIZE && pool[k].key == NULL) {
            /* Inserting into empty position. No setup needed before insert. */
        } 
        //插入到中间 会进行淘汰池的数据移动
        else {
            /* Inserting in the middle. Now k points to the first element
             * greater than the element to insert.  */
             //如果淘汰池最右边的元素是空的,k及k以后的元素往右边移一位
            if (pool[EVPOOL_SIZE-1].key == NULL) {
                /* Free space on the right? Insert at k shifting
                 * all the elements from k to end to the right. */

                /* Save SDS before overwriting. */
                sds cached = pool[EVPOOL_SIZE-1].cached;
                memmove(pool+k+1,pool+k,
                    sizeof(pool[0])*(EVPOOL_SIZE-k-1));
                pool[k].cached = cached;
            } 
            //如果淘汰池最右边的元素不是空的,k以前的元素往左边移一位,把最左边的元素移出淘汰池
            else {
                /* No free space on right? Insert at k-1 */
                k--;
                /* Shift all elements on the left of k (included) to the
                 * left, so we discard the element with smaller idle time. */
                sds cached = pool[0].cached; /* Save SDS before overwriting. */
                if (pool[0].key != pool[0].cached) sdsfree(pool[0].key);
                memmove(pool,pool+1,sizeof(pool[0])*k);
                pool[k].cached = cached;
            }
        }

        /* Try to reuse the cached SDS string allocated in the pool entry,
         * because allocating and deallocating this object is costly
         * (according to the profiler, not my fantasy. Remember:
         * premature optimization bla bla bla. */
         //以下所及,将当前的放入淘汰池
        int klen = sdslen(key);
        if (klen > EVPOOL_CACHED_SDS_SIZE) {
            pool[k].key = sdsdup(key);
        } else {
            memcpy(pool[k].cached,key,klen+1);
            sdssetlen(pool[k].cached,klen);
            pool[k].key = pool[k].cached;
        }
        pool[k].idle = idle;
        pool[k].dbid = dbid;
    }
}

LRU算法

概述

LRU(Least Recently Used),翻译过来是最久未使用,时间越久越容易被淘汰。对于redis,就是数据最后一次访问的时间越早,越容易被淘汰。

redis的LRU算法是伪LRU算法,不是对redis所有的数据做LRU,而是随机抽样数据做LRU。有个参数maxmemory-samples用于确定抽样数量,这个数量越大,算法越精准,但复杂度越高。

maxmemory-samples 5		//默认是5

实现原理

怎么知道这个对象多久没访问?

  1. LRU是根据这个对象的访问操作时间来进行淘汰的,所以我们需要知道这个对象的最后一次操作访问时间。知道了对象的最后操作访问时间后,我们只需要跟当前的系统时间来进行对比,就能计算出对象已经多久没访问了。

  2. 在redis中,对象都会被redisObject对象包装,它里面有个字段叫做lru。

redisObject对象(server.h)

typedef struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
                            * LFU data (least significant 8 bits frequency
                            * and most significant 16 bits access time). */
    int refcount;
    void *ptr;
} robj;
  1. redisObject.lru:大小为24bit,记录的是这个对象最后的访问时间的秒单位的最后24bit(大概是194天)。取最后24bit的方法如下:
long time = System.currentTimeMillis() / 1000;	//获取当前秒
time / 1000 & (1 << 24 - 1)	//当前的秒数和低24个1做按位与
  1. 知道了这个对象的最后操作访问的时间。就可以算出这个对象多久没访问:
这个对象没访问的时间 = 当前的时间的秒单位的后24bit - 这个对象最后操作访问的访问时间的秒单位的最后24bit

//假设lruclock=当前时间的秒单位的最后24bit
这个对象没访问的时间 = lruclock - redisObject.lru
  1. 但这里有个问题:24bit是有大小限制的,最大是24个1,大概是194天。那么假如时间一直往前走,过了194天,这个系统时间的最后24bit肯定会变成24个0。所以,redis有个轮询的概念,它如果超过24位,又会从0开始。所以我们不能直接的用系统时间秒单位的24bit位去减对象的lru去计算这个对象没访问的时间,而是要判断一下。

在生活中,我们有时会计算过了多少个月这个问题。比如:

  1. 我上次旅游是在今年5月份,现在是7月份了,请问过了多少个月?8 - 5 = 3个月
  2. 我上次旅游是在去年5月份,现在是3月份了,请问过了多少个月?12 + 3 - 5 = 10个月

同理:

如果redisObject.lru < lruclock,通过lruclock - redisObject.lru计算个对象没访问的时间。

如果redisObject.lru > lruclock,通过lruclock + 24bit最大值 - redisObject.lru计算个对象没访问的时间。

  1. 这里还有个问题:如果时间一直往前走,过了2个194天,系统时间的最后24bit还是会变成24个0。这个时候对象没访问的时间怎么算?
  1. 我上次旅游是在前年5月份,现在是3月份了,请问过了多少个月?

对于redis而言,伪lru算法本来就不是非常精确的。所以redis只考虑1次轮询的情况,对于多次轮询,也只按1次轮询处理。即,前年5月份也按去年5月份处理。

LRU算法执行流程图

在这里插入图片描述

源码分析

estimateObjectIdleTime方法(evict.c)
unsigned long long estimateObjectIdleTime(robj *o) {
    //获取秒单位时间的最后24bit
    unsigned long long lruclock = LRU_CLOCK();
    //判断lruclock(当前系统时间)跟缓存对象的lru字段的大小
    if (lruclock >= o->lru) {
        //如果lruclock >= robj.lru,返回lruclock-o->lru,再转换单位
        //robj.lru的值越小,返回的值越大,越大越容易被淘汰
        return (lruclock - o->lru) * LRU_CLOCK_RESOLUTION;
    } else {
        //如果lruclock < robj.lru,返回lruclock + (LRU_CLOCK_MAX - o->lru),再转换单位
        //LRU_CLOCK_MAX是24bit的最大的值为2的24次方-1
        return (lruclock + (LRU_CLOCK_MAX - o->lru)) *
                    LRU_CLOCK_RESOLUTION;
    }
}

LFU算法

LRU(Least Frequently Used),翻译成中文就是最不常用的优先淘汰。它的衡量标准就是次数,次数越少的越容易被淘汰。每次操作访问一次,就+1; 淘汰的时候,直接去比较这个次数,次数越少的越容易淘汰。但是LFU有个致命的时效性问题。

时效性问题

举个例子:去年有个吴x凡事件,当时很火,点击量很高4000w。今年有个新闻,刚出来,点击量是100。本来,我们应该要让今年的这个新闻显示出来的,吴x凡事件虽然去年很火,但是由于时间久了,显示优先级肯定是要低于今年的新闻的。但是按上述的LFU算法,次数越少的越容易被淘汰,今年的新闻就很容易被淘汰。导致的问题:新的数据进不去,旧的数据出不来。

这是个很严重的问题,Redis肯定会解决。 那么Redis是怎么解决的?

举个例子:我们经常会充一些会员,比如爱奇艺、b站等等。但是有些网站会有这样的现象,假如哪天我没充钱了的话,或者没有续VIP的时候,我这个会员等级会随着时间的流失而降低。比如我本来是V6,但是一年不充钱的话,可能就变成了V4。

redis解决时效性问题的方法与这类似,redis隔一段时间就减少一下数据的访问次数,这样旧的访问量又很高的数据会随着时间的流逝,其访问次数不断减少,直到可以淘汰。

实现原理

  1. 在redis中,对象都会被redisObject对象包装,它里面有个字段叫做lru。

redisObject对象(server.h)

typedef struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
                            * LFU data (least significant 8 bits frequency
                            * and most significant 16 bits access time). */
    int refcount;
    void *ptr;
} robj;

在LFU算法中,redisObject.lru的前16bit表示时间,后8bit表示这个对象的访问频率,先给它叫做counter。

  1. 前16bit代表的是这个对象最后访问时间的分单位。通过这个值能够得到这个对象多少分钟没访问。redis有个配置:lfu-decay-time,表示多少分钟没访问就减少一次。
lfu-decay-time 1	//多少分钟没访问就减少一次

所以:

减少次数(num_periods) = 这个对象最后访问时间的分单位 / server.lfu_decay_time

LFU计算淘汰值的源码如下,LFUDecrAndReturn(evict.c 文件)

unsigned long LFUDecrAndReturn(robj *o) {
    //lru字段右移8位,得到前面16位的时间
    unsigned long ldt = o->lru >> 8;
    //lru字段与255进行&运算(255代表8位的最大值),得到8位counter值
    unsigned long counter = o->lru & 255;
    //如果配置了lfu_decay_time,用LFUTimeElapsed(ldt) 除以配置的值。
    //总的没访问的分钟时间 / 配置值(多少分钟衰减1次),得衰减次数。
    unsigned long num_periods = server.lfu_decay_time ? LFUTimeElapsed(ldt) / server.lfu_decay_time : 0;
    if (num_periods)
        //不能减少为负数,非负数用couter值减去衰减值
        counter = (num_periods > counter) ? 0 : counter - num_periods;
    return counter;
}
  1. 这里有个问题:8bit最大值是255,用后8bit表示访问次数是不够的。redis在这里做了一些处理,让数据达到255很难。方案如下:
    • 假设这8bit值为counter。counter最大只能255,如果到了255,不往上加。实际到达255的几率不是很高。可以支撑很大很大的数据量。
    • counter属于随机添加,添加的几率根据基数值(LFU_INIT_VAL)、已有的counter值、配置server.lfu_log_factor相关,counter值越大,添加的几率越小,lfu-log-factor配置的值越大,添加的几率越小。
    • 如果counter <= 5(LFU_INIT_VAL),则每次访问counter必+1
    • 如果:5 <counter<255,那么越往上加的概率越低

源码如下:LFULogIncr方法(evict.c 文件)

uint8_t LFULogIncr(uint8_t counter) {
    //如果已经到最大值255,返回255 ,8位的最大值
    if (counter == 255) return 255;
    //得到随机数(0-1)
    double r = (double)rand()/RAND_MAX;
    //LFU_INIT_VAL表示基数值,默认值是5,表示5次以下,每次访问counter必加1(在server.h配置)
    double baseval = counter - LFU_INIT_VAL;
    //保证baseval >= 0
    if (baseval < 0) baseval = 0;
    //server.lfu_log_factor默认值是10,所以p <= 1。而且counter越大,p越小。
    double p = 1.0/(baseval*server.lfu_log_factor+1);
    //p越小,加counter几率越小
    if (r < p) counter++;
    return counter;
}

参考资料

  • 《咕泡云课堂》

  • 官网
    https://redis.io/docs/manual/eviction/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值