想要深入理解Redi内存淘汰的原理,我们先要看一下redisObject这个结构体
typedef struct redisObject { unsigned type:4; // 表示Value的类型 unsigned encoding:4; // 表示内部存储的编码 unsigned lru:LRU_BITS; // 24bit /* LRU time (relative to global lru_clock)(24位都用来存储时间) or * LFU data (least significant 8 bits frequency * and most significant 16 bits access time) * (8位要用来存储访问的频率,因为8位最多只能存 * 储255,所以为了存储更多,就要使用存储频次的对数的方式进行存储). */ int refcount; // 引用计数 void *ptr; // 指向具体的数据的指针,Redis新版String长度<=44字节的数据,字符串sds和 redisobject一起分配,从而只要一次内存操作即可 } robj;
1、我们看到在类似于lazyfreeFreeObject这样的方法中,他会执行decrRefCount()这样一个扣减引用数量的方法,当o->refcount == 1时,就会执行freeXXXXObject()这样释放内存的方法。由此可见,Redis使用的是应用计数法来实现内存回收。
void decrRefCount(robj *o) {
if (o->refcount == 1) {
switch(o->type) {
case OBJ_STRING: freeStringObject(o); break;
case OBJ_LIST: freeListObject(o); break;
case OBJ_SET: freeSetObject(o); break;
case OBJ_ZSET: freeZsetObject(o); break;
case OBJ_HASH: freeHashObject(o); break;
case OBJ_MODULE: freeModuleObject(o); break;
case OBJ_STREAM: freeStreamObject(o); break;
default: serverPanic("Unknown object type"); break;
}
zfree(o);
} else {
if (o->refcount <= 0) serverPanic("decrRefCount against refcount <= 0");
if (o->refcount != OBJ_SHARED_REFCOUNT) o->refcount--;
}
}
缓存淘汰算法
当Redis内存超出物理内存限制时,内存的数据会开始和磁盘产生频繁的交换(swap),会让Redis性能急剧下降,这就不符合Redis的设计初衷。
所以,我们要配置maxmemory来限制内存超出期望大小,当实际内存超出 maxmemory 时,Redis 提供了以下几种内存淘汰策略(maxmemory-policy)供用户来决定如何腾出内存供用户读写。
# volatile-lru -> Evict using approximated LRU, only keys with an expire set.
# allkeys-lru -> Evict any key using approximated LRU.
# volatile-lfu -> Evict using approximated LFU, only keys with an expire set.
# allkeys-lfu -> Evict any key using approximated LFU.
# volatile-random -> Remove a random key having an expire set.
# allkeys-random -> Remove a random key, any key.
# volatile-ttl -> Remove the key with the nearest expire time (minor TTL)
# noeviction -> Don't evict anything, just return an error on write operations.
下面我们来看看redis设计的LRU与传统的LRU有什么不同,先看下我用Java在leetcode上刷题时写的LRU算法,
class LRUCache {
MyCache<Integer, Integer> cache;
public LRUCache(int capacity) {
cache = new MyCache<>(capacity);
}
public void put(int key,int value){
cache.addCache(key,value);
}
public int get(int key){
Integer integer = cache.get(key);
return integer == null ? -1 : integer;
}
public static class Node<K,V>{
public K key;
public V value;
public Node(K key, V value) {
this.key = key;
this.value = value;
}
Node<K,V> last;
Node<K,V> next;
}
public static class NodeDoubleLinkedList<K,V>{
public Node<K,V> head;
public Node<K,V> tail;
public NodeDoubleLinkedList() {
head = null;
tail = null;
}
public void add(Node<K,V> newNode){
if(newNode == null)return;
if(head == null){
head = newNode;
tail = newNode;
}else {
tail.next = newNode;
newNode.last = tail;
tail = newNode;
}
}
public Node<K,V> removeHead(){
if (head == null)
return null;
else {
Node<K,V> res = head;
head = head.next;
return res;
}
}
// 移动当前节点到尾部,头部元素会被优先淘汰
public void moveNodeToTail(Node<K,V> node){
if(tail == node)return;
if(head == node){
head = head.next;
head.last = null;
}else {
node.last.next = node.next;
node.next.last = node.last;
}
add(node);
}
}
public static class MyCache<K,V>{
// 存储数据的集合
private HashMap<K,Node<K,V>> cache;
// 双向链表的头和尾
private NodeDoubleLinkedList<K,V> nodeList;
private final int capacity;
public MyCache(int cap) {
cache = new HashMap<K,Node<K,V>>(cap);
nodeList = new NodeDoubleLinkedList<K,V>();
capacity = cap;
}
public void addCache(K key, V value){
if(cache.containsKey(key)){
Node<K, V> kvNode = cache.get(key);
kvNode.value = value;
nodeList.moveNodeToTail(kvNode);
}else {
Node<K, V> kvNode = new Node<>(key, value);
nodeList.add(kvNode);
cache.put(key,kvNode);
if(cache.size()>capacity){
cache.remove(nodeList.removeHead().key);
}
}
}
public V get(K key){
Node<K, V> kvNode = cache.get(key);
if(kvNode == null){
return null;
}else {
nodeList.moveNodeToTail(kvNode);
return kvNode.value;
}
}
}
}
/**
* Your LRUCache object will be instantiated and called as such:
* LRUCache obj = new LRUCache(capacity);
* int param_1 = obj.get(key);
* obj.put(key,value);
*/
本质上就是改变链表的指针指向,将最新操作过的元素放到尾部。
然而,redis要保证高性能,毕竟命令是单线程执行的,如果用传统的LRU那肯定会产生阻塞,影响性能,所以,Redis使用的是近似LRU算法的实现,具体是这样的:
1、当 Redis 执行写操作时,发现内存超出maxmemory,
2、随机采样出 5(可以配置maxmemory-samples)个Key,
3、根据redisObject的unsigned lru:LRU_BITS字段,淘汰掉最旧的 key
这里说到redisObject的unsigned lru:24字段,不得不提这个值是哪来的
首先在创建redisObject的时候给lru字段赋值了
/* ===================== Creation and parsing of objects ==================== */
robj *createObject(int type, void *ptr) {
robj *o = zmalloc(sizeof(*o));
o->type = type;
o->encoding = OBJ_ENCODING_RAW;
o->ptr = ptr;
o->refcount = 1;
/* Set the LRU to the current lruclock (minutes resolution), or
* alternatively the LFU counter. */
if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
o->lru = (LFUGetTimeInMinutes()<<8) | LFU_INIT_VAL;
} else {
o->lru = LRU_CLOCK();
}
return o;
}
我们这里看看LFUGetTimeInMinutes这个方法吧;
unsigned long LFUGetTimeInMinutes(void) {
return (server.unixtime/60) & 65535;
}
这里我们看到他用了server中的unixtime变量,那我们在server.c中看看这个unixtime是怎么来的
/* We take a cached value of the unix time in the global state because with
* virtual memory and aging there is to store the current time in objects at
* every object access, and accuracy is not needed. To access a global var is
* a lot faster than calling time(NULL).
*
* This function should be fast because it is called at every command execution
* in call(), so it is possible to decide if to update the daylight saving
* info or not using the 'update_daylight_info' argument. Normally we update
* such info only when calling this function from serverCron() but not when
* calling it from call(). */
void updateCachedTime(int update_daylight_info) {
server.ustime = ustime();
server.mstime = server.ustime / 1000;
time_t unixtime = server.mstime / 1000;
atomicSet(server.unixtime, unixtime);
/* To get information about daylight saving time, we need to call
* localtime_r and cache the result. However calling localtime_r in this
* context is safe since we will never fork() while here, in the main
* thread. The logging function will call a thread safe version of
* localtime that has no locks. */
if (update_daylight_info) {
struct tm tm;
time_t ut = server.unixtime;
localtime_r(&ut,&tm);
server.daylight_active = tm.tm_isdst;
}
}
看到updateCachedTime()方法上面的注释了吗,翻译过来就是:“
我们在全局状态下获取unix时间的缓存值,因为随着虚拟内存和老化,在每次对象访问时都要在对象中存储当前时间,而不需要精确度。访问全局变量比调用time(NULL)要快得多。
这个函数应该很快,因为call()中的每个命令执行都会调用它,所以可以使用'update_daylight_info'参数来决定是否更新夏令时信息。通常我们只在从serverCron()调用此函数时更新此类信息,而在从call()调用时则不更新。”
也就是说Redis把系统时间缓存起来了,这样不用每次都系统调用,我们知道系统调用是需要从用户态切换到内核态,对于redis来说就违背了其追求急速设计的宗旨。
扯远了,还是说回内存淘汰这块,接下来来说说LFU算法
LFU算法是Redis4.0里面新加的一种淘汰策略。它的全称是Least Frequently Used,它的核心思想是根据key的最近被访问的频率进行淘汰,很少被访问的优先被淘汰,被访问的多的则被留下来。这样就不会因为新增加了一堆数据,把真正的热数据给挤掉了。
LFU算法能更好的表示一个key被访问的热度。假如你使用的是LRU算法,一个key很久没有被访问到,只刚刚是偶尔被访问了一次,那么它就被认为是热点数据,不会被淘汰,而有些key将来是很有可能被访问到的则被淘汰了。如果使用LFU算法则不会出现这种情况,因为使用一次并不会使一个key成为热点数据。LFU原理使用计数器来对key进行排序,每次key被访问的时候,计数器增大。计数器越大,可以约等于访问越频繁。具有相同引用计数的数据块则按照时间排序。
LFU一共有两种策略:
volatile-lfu:在设置了过期时间的key中使用LFU算法淘汰key
allkeys-lfu:在所有的key中使用LFU算法淘汰数据
LFU把原来的key对象的内部时钟的24位分成两部分,前16位ldt还代表时钟,后8位logc代表一个计数器。前面在redisObject中有提到。
logc是8个 bit,用来存储访问频次,因为8个 bit能表示的最大整数值为255,存储频次肯定远远不够,所以这8个 bit存储的是频次的对数值,并且这个值还会随时间衰减,如果它的值比较小,那么就很容易被回收。为了确保新创建的对象不被回收,新对象的这8个bit会被初始化为一个大于零的值LFU INIT_VAL(默认是=5)。
ldt是16个bit,用来存储上一次 logc的更新时间。因为只有16个 bit,所精度不可能很高。它取的是分钟时间戳对2的16次方进行取模。
ldt的值和LRU模式的lru字段不一样的地方是,
ldt不是在对象被访问时更新的,而是在Redis 的淘汰逻辑进行时进行更新,淘汰逻辑只会在内存达到 maxmemory 的设置时才会触发,在每一个指令的执行之前都会触发。每次淘汰都是采用随机策略,随机挑选若干个 key,更新这个 key 的“热度”,淘汰掉“热度”最低的key。因为Redis采用的是随机算法,如果
key比较多的话,那么ldt更新得可能会比较慢。不过既然它是分钟级别的精度,也没有必要更新得过于频繁。
ldt更新的同时也会一同衰减logc的值。
下面来聊聊过期策略和惰性删除 ,来看看redis是怎么处理过期的key的
redis 会将每个设置了过期时间的key 放入到一个独立的字典中,以后会定时遍历这个字典来删除到期的 key。除了定时遍历之外,它还会使用惰性策略来删除过期的 key,所谓惰性策略就是在客户端访问这个 key 的时候,redis 对 key 的过期时间进行检查,如果过期了就立即删除。
定时删除是在设置了过期时间的key字典中每秒进行十次扫描获取随机 20 个 key,删除里面已经过期的 key;如果过期的 key 比率超过 1/4,那就再一次执行扫描。
设想一个大型的 Redis 实例中所有的 key 在同一时间过期了,会出现怎样的结果?
这就是八股文常说的缓存雪崩,解决方法相信大家都知道,就是在过期时间上加上一个随机的时间戳,那缓存雪崩会导致什么后果,又是什么原因导致的呢?
上面我们说到过期的 key 比率超过 1/4,Redis 会持续扫描过期字典 (循环多次),直到过期字典中过期的key 变得稀疏,才会停止 (循环次数明显下降)。这就会导致线上读写请求出现明显的卡顿现象。导致这种卡顿的另外一种原因是内存管理器需要频繁回收内存页,这也会产生一定的 CPU 消耗。
我们知道redis一般都是以集群的方式在企业中大型项目中使用,那slave节点的过期策略与主节点一样吗?
答案是不一样的,从库不会进行过期扫描,从库对过期的处理是被动的。主库在 key 到期时,会在 AOF 文件里增加一条 del 指令,同步到所有的从库,从库通过执行这条 del 指令来删除过期的 key。
惰性删除
所谓惰性策略就是在客户端访问这个key的时候,redis对key的过期时间进行检查,如果过期了就立即删除,不会给你返回任何东西。
定期删除可能会导致很多过期key到了时间并没有被删除掉。所以就有了惰性删除。假如你的过期 key,靠定期删除没有被删除掉,还停留在内存里,除非你的系统去查一下那个 key,才会被redis给删除掉。这就是所谓的惰性删除,即当你主动去查过期的key时,如果发现key过期了,就立即进行删除,不返回任何东西。
总结:定期删除是集中处理,惰性删除是零散处理。
lazyfree
下面把redis的配置文件搬运过来
############################# LAZY FREEING ####################################
# Redis has two primitives to delete keys. One is called DEL and is a blocking
# deletion of the object. It means that the server stops processing new commands
# in order to reclaim all the memory associated with an object in a synchronous
# way. If the key deleted is associated with a small object, the time needed
# in order to execute the DEL command is very small and comparable to most other
# O(1) or O(log_N) commands in Redis. However if the key is associated with an
# aggregated value containing millions of elements, the server can block for
# a long time (even seconds) in order to complete the operation.
#
# For the above reasons Redis also offers non blocking deletion primitives
# such as UNLINK (non blocking DEL) and the ASYNC option of FLUSHALL and
# FLUSHDB commands, in order to reclaim memory in background. Those commands
# are executed in constant time. Another thread will incrementally free the
# object in the background as fast as possible.
#
# DEL, UNLINK and ASYNC option of FLUSHALL and FLUSHDB are user-controlled.
# It's up to the design of the application to understand when it is a good
# idea to use one or the other. However the Redis server sometimes has to
# delete keys or flush the whole database as a side effect of other operations.
# Specifically Redis deletes objects independently of a user call in the
# following scenarios:
#
# 1) On eviction, because of the maxmemory and maxmemory policy configurations,
# in order to make room for new data, without going over the specified
# memory limit.
# 2) Because of expire: when a key with an associated time to live (see the
# EXPIRE command) must be deleted from memory.
# 3) Because of a side effect of a command that stores data on a key that may
# already exist. For example the RENAME command may delete the old key
# content when it is replaced with another one. Similarly SUNIONSTORE
# or SORT with STORE option may delete existing keys. The SET command
# itself removes any old content of the specified key in order to replace
# it with the specified string.
# 4) During replication, when a replica performs a full resynchronization with
# its master, the content of the whole database is removed in order to
# load the RDB file just transferred.
#
# In all the above cases the default is to delete objects in a blocking way,
# like if DEL was called. However you can configure each case specifically
# in order to instead release memory in a non-blocking way like if UNLINK
# was called, using the following configuration directives.
# 内存达到 maxmemory 时进行淘汰
lazyfree-lazy-eviction no
# key 过期删除
lazyfree-lazy-expire no
# RENAME 命令删除 destKey
lazyfree-lazy-server-del no
# 从库接受完 rdb 文件后的 flush 操作
replica-lazy-flush no
在使用 DEL 命令删除体积较大的键, 又或者在使用FLUSHDB 和 FLUSHALL 删除包含大量键的数据库时,造成redis阻塞的情况;另外redis在清理过期数据和淘汰内存超限的数据时,如果碰巧撞到了大体积的键也会造成服务器阻塞。
为了解决以上问题, redis 4.0 引入了lazyfree的机制,它可以将删除键或数据库的操作放在后台线程里执行, 从而尽可能地避免服务器阻塞。