1. 存储的结构
在 redis 字符串对象 String 的介绍中,我们知道 redis 对于字符串的存储共有 3 种存储形式,其存储的内存结构如以下图片示例:
-
OBJ_ENCODING_INT
: 保存的字符串长度小于 20,并且是可以解析为 long 类型的整数值,那么存储方式就是直接将 redisObject 的 ptr 指针指向这个整数值
-
OBJ_ENCODING_EMBSTR
: 长度小于 44 (OBJ_ENCODING_EMBSTR_SIZE_LIMIT)的字符串将以简单动态字符串(SDS) 的形式存储在 redisObject 中,但是redisObject 对象头会和 SDS 对象连续存在一起
-
OBJ_ENCODING_RAW
: 字符串以简单动态字符串(SDS) 的形式存储,redisObject 对象头和 SDS 对象在内存地址上一般是不连续的两块内存
2. 数据存储源码分析
2.1 数据存储过程
-
在 Redis 6.0 源码阅读笔记(1)-Redis 服务端启动及命令执行 中我们已经知道客户端保存字符串的
set
命令将会调用到t_string.c#setCommand()
函数,其源码实现如下:该方法中有以下两个重点函数被调用,本节主要关注 tryObjectEncoding() 函数
- tryObjectEncoding() 将客户端传输过来的需保存的字符串对象尝试进行编码,以节约内存
- setGenericCommand() 将 key-value 保存到数据库中
void setCommand(client *c) { ...... c->argv[2] = tryObjectEncoding(c->argv[2]); setGenericCommand(c,flags,c->argv[1],c->argv[2],expire,unit,NULL,NULL); }
-
object.c#tryObjectEncoding()
函数逻辑很清晰,可以看到主要进行了以下几个操作:- 当字符串长度小于 20 并且可以被解析为 long 类型数据时,这个数据将以整数形式保存,并以
robj->ptr = (void*) value
这种直接赋值的形式存储 - 当字符串长度小于等于 OBJ_ENCODING_EMBSTR_SIZE_LIMIT 配置并且还是 raw 编码时,调用 createEmbeddedStringObject() 函数将其转化为 embstr 编码
- 这个字符串对象已经不能进行转码了,只好调用 trimStringObjectIfNeeded() 函数尝试从字符串对象中移除所有空余空间
robj *tryObjectEncoding(robj *o) { long value; sds s = o->ptr; size_t len; /* Make sure this is a string object, the only type we encode * in this function. Other types use encoded memory efficient * representations but are handled by the commands implementing * the type. */ serverAssertWithInfo(NULL,o,o->type == OBJ_STRING); /* We try some specialized encoding only for objects that are * RAW or EMBSTR encoded, in other words objects that are still * in represented by an actually array of chars. */ if (!sdsEncodedObject(o)) return o; /* It's not safe to encode shared objects: shared objects can be shared * everywhere in the "object space" of Redis and may end in places where * they are not handled. We handle them only as values in the keyspace. */ if (o->refcount > 1) return o; /* Check if we can represent this string as a long integer. * Note that we are sure that a string larger than 20 chars is not * representable as a 32 nor 64 bit integer. */ len = sdslen(s); if (len <= 20 && string2l(s,len,&value)) { /* This object is encodable as a long. Try to use a shared object. * Note that we avoid using shared integers when maxmemory is used * because every object needs to have a private LRU field for the LRU * algorithm to work well. */ if ((server.maxmemory == 0 || !(server.maxmemory_policy & MAXMEMORY_FLAG_NO_SHARED_INTEGERS)) && value >= 0 && value < OBJ_SHARED_INTEGERS) { decrRefCount(o); incrRefCount(shared.integers[value]); return shared.integers[value]; } else { if (o->encoding == OBJ_ENCODING_RAW) { sdsfree(o->ptr); o->encoding = OBJ_ENCODING_INT; o->ptr = (void*) value; return o; } else if (o->encoding == OBJ_ENCODING_EMBSTR) { decrRefCount(o); return createStringObjectFromLongLongForValue(value); } } } /* If the string is small and is still RAW encoded, * try the EMBSTR encoding which is more efficient. * In this representation the object and the SDS string are allocated * in the same chunk of memory to save space and cache misses. */ if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT) { robj *emb; if (o->encoding == OBJ_ENCODING_EMBSTR) return o; emb = createEmbeddedStringObject(s,sdslen(s)); decrRefCount(o); return emb; } /* We can't encode the object... * * Do the last try, and at least optimize the SDS string inside * the string object to require little space, in case there * is more than 10% of free space at the end of the SDS string. * * We do that only for relatively large strings as this branch * is only entered if the length of the string is greater than * OBJ_ENCODING_EMBSTR_SIZE_LIMIT. */ trimStringObjectIfNeeded(o); /* Return the original object. */ return o; }
- 当字符串长度小于 20 并且可以被解析为 long 类型数据时,这个数据将以整数形式保存,并以
-
object.c#createEmbeddedStringObject()
函数实现 embstr 编码也很简单,主要步骤如下:- 首先调用 zmalloc() 函数申请内存,可以看到此处不仅申请了需要存储的字符串的内存及 redisObject 的内存,还申请了 SDS 实现结构体之一 sdshdr8 的内存,这也就是上文所说embstr 编码只申请一次内存,并且redisObject 对象头会和 SDS 对象连续存在一起的由来
- 将 redisObject 对象的 ptr 指针指向 sdshdr8 开始的内存地址
- 填充 sdshdr8 对象各个属性,包括
len
字符串长度,alloc
字符数组容量,实际存储字符串的buf
字符数组
robj *createEmbeddedStringObject(const char *ptr, size_t len) { robj *o = zmalloc(sizeof(robj)+sizeof(struct sdshdr8)+len+1); struct sdshdr8 *sh = (void*)(o+1); o->type = OBJ_STRING; o->encoding = OBJ_ENCODING_EMBSTR; o->ptr = sh+1; o->refcount = 1; if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) { o->lru = (LFUGetTimeInMinutes()<<8) | LFU_INIT_VAL; } else { o->lru = LRU_CLOCK(); } sh->len = len; sh->alloc = len; sh->flags = SDS_TYPE_8; if (ptr == SDS_NOINIT) sh->buf[len] = '\0'; else if (ptr) { memcpy(sh->buf,ptr,len); sh->buf[len] = '\0'; } else { memset(sh->buf,0,len+1); } return o; }
-
raw 编码字符串的创建可参考
object.c#createRawStringObject()
函数,其涉及到两次内存申请,sds.c#sdsnewlen()
申请内存创建 SDS 对象,object.c#createObject()
申请内存创建 redisObject 对象robj *createRawStringObject(const char *ptr, size_t len) { return createObject(OBJ_STRING, sdsnewlen(ptr,len)); }
-
从检测容量大小的的函数
t_string.c#checkStringLength()
看,字符串最大长度为 512M,超出该数值将报错static int checkStringLength(client *c, long long size) { if (size > 512*1024*1024) { addReplyError(c,"string exceeds maximum allowed size (512MB)"); return C_ERR; } return C_OK; }
2.2 简单动态字符串 SDS
2.2.1 SDS 结构体
SDS(简单动态字符串)
在 Redis 中是实现字符串存储的工具,本质上依然是字符数组,但它不像C语言字符串那样以‘\0’来标识字符串结束
传统C字符串符合ASCII编码,这种编码的操作的特点就是:遇零则止 。即当读一个字符串时,只要遇到’\0’就认为到达末尾,忽略’\0’以后的所有字符。另外其获得字符串长度的做法是遍历字符串,遇零则止,时间复杂度为O(n),比较低效
SDS 的实现结构定义在 sds.h
中,其定义如下。因为 SDS 判断是否到达字符串末尾的依据是表头的 len 属性,所以能高效计算字符串长度并快速追加数据
sds 结构一共有 5 种 Header 定义,目的是为不同长度的字符串提供不同大小的 Header,以节省内存。以 sdshdr8 为例,其 len 属性为 uint8_t 类型,占用内存大小为 1 字节,则存储的字符串最大长度为256。Header 主要包含以下几个属性:
len
: 字符串真正的长度,不包含空终止字符alloc
: 除去表头和终止符的 buf 数组长度,也就是最大容量flags
: 标志 header 的类型buf
: 字符数组,实际存储字符
/* Note: sdshdr5 is never used, we just access the flags byte directly.
* However is here to document the layout of type 5 SDS strings. */
struct __attribute__ ((__packed__)) sdshdr5 {
unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr8 {
uint8_t len; /* used */
uint8_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
uint16_t len; /* used */
uint16_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
uint32_t len; /* used */
uint32_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
uint64_t len; /* used */
uint64_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
2.2.2 SDS 容量调整
-
SDS 扩容的函数是
sds.c#sdsMakeRoomFor()
,当字符串长度小于 1M 时,扩容都是加倍现有的空间,如果超过 1M,扩容时一次只会多扩 1M 的空间。以下为源码实现:字符串在长度小于 SDS_MAX_PREALLOC(1024*1024,也就是1MB,定义在 sds.h) 之前,采用 2 倍扩容,也就是保留 100% 的冗余空间。当长度超过 SDS_MAX_PREALLOC 之后,每次扩容只会多分配 SDS_MAX_PREALLOC 大小的冗余空间,避免加倍扩容后的冗余空间过大导致浪费
sds sdsMakeRoomFor(sds s, size_t addlen) { void *sh, *newsh; size_t avail = sdsavail(s); size_t len, newlen; char type, oldtype = s[-1] & SDS_TYPE_MASK; int hdrlen; /* Return ASAP if there is enough space left. */ if (avail >= addlen) return s; len = sdslen(s); sh = (char*)s-sdsHdrSize(oldtype); newlen = (len+addlen); if (newlen < SDS_MAX_PREALLOC) newlen *= 2; else newlen += SDS_MAX_PREALLOC; type = sdsReqType(newlen); /* Don't use type 5: the user is appending to the string and type 5 is * not able to remember empty space, so sdsMakeRoomFor() must be called * at every appending operation. */ if (type == SDS_TYPE_5) type = SDS_TYPE_8; hdrlen = sdsHdrSize(type); if (oldtype==type) { newsh = s_realloc(sh, hdrlen+newlen+1); if (newsh == NULL) return NULL; s = (char*)newsh+hdrlen; } else { /* Since the header size changes, need to move the string forward, * and can't use realloc */ newsh = s_malloc(hdrlen+newlen+1); if (newsh == NULL) return NULL; memcpy((char*)newsh+hdrlen, s, len+1); s_free(sh); s = (char*)newsh+hdrlen; s[-1] = type; sdssetlen(s, len); } sdssetalloc(s, newlen); return s; }
-
SDS 缩容函数为
sds.c#sdsclear()
,从源码实现来看,其主要有以下操作,也就是它并不释放实际占用的内存,体现出一种惰性策略- 重置 SDS 表头的 len 属性值为 0
- 将结束符放到 buf 数组最前面,相当于惰性地删除 buf 中的内容
/* Modify an sds string in-place to make it empty (zero length). * However all the existing buffer is not discarded but set as free space * so that next append operations will not require allocations up to the * number of bytes previously available. */ void sdsclear(sds s) { sdssetlen(s, 0); s[0] = '\0'; }