前言
redis中数组
和链表
结构都可以实现列表类型,链表好理解,我们前文已经讲过了,不再赘述,列表如何通过数组实现的,大家可以带着这个疑问阅读文章;
先说一下链表的缺点
:
1)链表中每一个节点都占用独立的一块内存,导致内存碎片过多;
2)链表节点中前后节点指针占用过多的额外内存;
学过数据结构的可以知道,数组可以较好的解决上面的两个问题,ziplist
是一种类似数组的紧凑型链表格式。它会申请一整块内存,在这个内存上存放该链表所有数据。
定义
ziplist
结构如下:
zlbytes:uint32_t
,记录整个压缩列表占用对内存字节数,包括zlbtes占用的4字节;
zltail:uint32_t
,记录压缩列表「尾部」节点距离起始地址由多少字节,也就是列表尾的偏移量,用于支持链表从尾部弹出或反向(从尾到头)变量链表。
zllen:uint16_t
,记录压缩列表包含的节点数量;
zlend
:标记压缩列表的结束点,固定值 0xFF(十进制255)。
entry
就是ziplist中保存的节点,entry的格式如下:
data
:记录了「前一个节点」的长度;
prevlen
:记录前驱节点长度,单位为字节,该属性长度为1字节或5字节。
1)如果前驱节点小于254,则使用1字节存储前驱节点长度;
2)否则,使用5字节,并且第一个字节固定为254,剩下4个字节存储前驱节点长度;
encoding
:代表当前节点元素的编码格式,包含编码类型和节点长度。一个ziplist中,不同节点元素的编码格式可以不同。编码格式规范如下:
① 00pppppp(pppppp代表encoding的低6位,下同):字符串编码,长度小于或等于63(26-1),长度存放在encoding的低6位中。
② 01pppppp:字符串编码, 长度小于或等于16383(214-1),长度存放在encoding的后6位和encoding后1字节中。
③ 10000000:字符串编码,长度大于16383(214-1),长度存放在encoding后4字节中。
④ 11000000:数值编码, 类型为int16_t,占用2字节。
⑤ 11010000:数值编码,类型为int32_t,占用4字节。
⑥ 11100000:数值编码,类型为int64_t,占用8字节。
⑦ 11110000:数值编码,使用3字节保存一个整数。
⑧ 11111110:数值编码,使用1字节保存一个整数。
⑨ 1111xxxx:使用encoding低4位存储一个整数,存储数值范围为0~12。该编码下encoding低4位的可用范围为0001~1101,encoding低4位减1为实际存储的值。
⑩ 11111111:255,ziplist结束节点。
注意第②、③种编码格式,除了encoding属性,还需要额外的空间存储节点元素长度。第⑨种格式也比较特殊,节点元素直接存放在encoding属性上。该编码是针对小数字的优化。这时entry-data为空。
字节序
encoding
属性使用多个字节存储节点元素长度,这种多字节数据存储在计算机内存中或者进行网络传输时的字节顺序称为字节序,字节序有两种类型:大端字节序
和小端字节序
。
ziplist
采取的是小端字节序,CPU可以先读取并处理低位字节,执行计算的借位、进位操作时效率更高。
源码分析是信息来源的一手资料=
源码分析
注:
源码在ziplist.h
和ziplist.c
中
ziplist
中查找元素使用ziplistFind
函数:
/* Find pointer to the entry equal to the specified entry. Skip 'skip' entries
* between every comparison. Returns NULL when the field could not be found. */
unsigned char *ziplistFind(unsigned char *p, unsigned char *vstr, unsigned int vlen, unsigned int skip) {
int skipcnt = 0;
unsigned char vencoding = 0;
long long vll = 0;
while (p[0] != ZIP_END) {
unsigned int prevlensize, encoding, lensize, len;
unsigned char *q;
ZIP_DECODE_PREVLENSIZE(p, prevlensize);
ZIP_DECODE_LENGTH(p + prevlensize, encoding, lensize, len);
q = p + prevlensize + lensize;
if (skipcnt == 0) {
/* Compare current entry with specified entry */
if (ZIP_IS_STR(encoding)) {
if (len == vlen && memcmp(q, vstr, vlen) == 0) {
return p;
}
} else {
/* Find out if the searched field can be encoded. Note that
* we do it only the first time, once done vencoding is set
* to non-zero and vll is set to the integer value. */
if (vencoding == 0) {
if (!zipTryEncoding(vstr, vlen, &vll, &vencoding)) {
/* If the entry can't be encoded we set it to
* UCHAR_MAX so that we don't retry again the next
* time. */
vencoding = UCHAR_MAX;
}
/* Must be non-zero by now */
assert(vencoding);
}
/* Compare current entry with specified entry, do it only
* if vencoding != UCHAR_MAX because if there is no encoding
* possible for the field it can't be a valid integer. */
if (vencoding != UCHAR_MAX) {
long long ll = zipLoadInteger(q, encoding);
if (ll == vll) {
return p;
}
}
}
/* Reset skip count */
skipcnt = skip;
} else {
/* Skip entry */
skipcnt--;
}
/* Move to next entry */
p = q + len;
}
return NULL;
}
参数说明:
p
:指定从ziplist哪个节点开始查找
vstr、vlen
:待查找元素的内容和长度
skip
:间隔多少个字节才执行一次元素对比操作
ziplist
中插入节点:
/* Insert item at "p". */
unsigned char *__ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen) {
size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), reqlen;
unsigned int prevlensize, prevlen = 0;
size_t offset;
int nextdiff = 0;
unsigned char encoding = 0;
long long value = 123456789; /* initialized to avoid warning. Using a value
that is easy to see if for some reason
we use it uninitialized. */
zlentry tail;
/* Find out prevlen for the entry that is inserted. */
if (p[0] != ZIP_END) {
ZIP_DECODE_PREVLEN(p, prevlensize, prevlen);
} else {
unsigned char *ptail = ZIPLIST_ENTRY_TAIL(zl);
if (ptail[0] != ZIP_END) {
prevlen = zipRawEntryLength(ptail);
}
}
/* See if the entry can be encoded */
if (zipTryEncoding(s,slen,&value,&encoding)) {
/* 'encoding' is set to the appropriate integer encoding */
reqlen = zipIntSize(encoding);
} else {
/* 'encoding' is untouched, however zipStoreEntryEncoding will use the
* string length to figure out how to encode it. */
reqlen = slen;
}
/* We need space for both the length of the previous entry and
* the length of the payload. */
reqlen += zipStorePrevEntryLength(NULL,prevlen);
reqlen += zipStoreEntryEncoding(NULL,encoding,slen);
/* When the insert position is not equal to the tail, we need to
* make sure that the next entry can hold this entry's length in
* its prevlen field. */
int forcelarge = 0;
nextdiff = (p[0] != ZIP_END) ? zipPrevLenByteDiff(p,reqlen) : 0;
if (nextdiff == -4 && reqlen < 4) {
nextdiff = 0;
forcelarge = 1;
}
/* Store offset because a realloc may change the address of zl. */
offset = p-zl;
zl = ziplistResize(zl,curlen+reqlen+nextdiff);
p = zl+offset;
/* Apply memory move when necessary and update tail offset. */
if (p[0] != ZIP_END) {
/* Subtract one because of the ZIP_END bytes */
memmove(p+reqlen,p-nextdiff,curlen-offset-1+nextdiff);
/* Encode this entry's raw length in the next entry. */
if (forcelarge)
zipStorePrevEntryLengthLarge(p+reqlen,reqlen);
else
zipStorePrevEntryLength(p+reqlen,reqlen);
/* Update offset for tail */
ZIPLIST_TAIL_OFFSET(zl) =
intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+reqlen);
/* When the tail contains more than one entry, we need to take
* "nextdiff" in account as well. Otherwise, a change in the
* size of prevlen doesn't have an effect on the *tail* offset. */
zipEntry(p+reqlen, &tail);
if (p[reqlen+tail.headersize+tail.len] != ZIP_END) {
ZIPLIST_TAIL_OFFSET(zl) =
intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);
}
} else {
/* This element will be the new tail. */
ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(p-zl);
}
/* When nextdiff != 0, the raw length of the next entry has changed, so
* we need to cascade the update throughout the ziplist */
if (nextdiff != 0) {
offset = p-zl;
zl = __ziplistCascadeUpdate(zl,p+reqlen);
p = zl+offset;
}
/* Write the entry */
p += zipStorePrevEntryLength(p,prevlen);
p += zipStoreEntryEncoding(p,encoding,slen);
if (ZIP_IS_STR(encoding)) {
memcpy(p,s,slen);
} else {
zipSaveInteger(p,value,encoding);
}
ZIPLIST_INCR_LENGTH(zl,1);
return zl;
}
参数说明:
zl
:待插入ziplist
p
:指向插入位置的后驱节点
s、slen
:待插入元素的内容和长度
插入中函数中用到三个比较重要的函数zipStorePrevEntryLength
、zipStoreEntryEncoding
和zipPrevLenByteDiff
zipStorePrevEntryLength函数计算prelen属性长度(1字节或者5字节)
/* Encode the length of the previous entry and write it to "p". Return the
* number of bytes needed to encode this length if "p" is NULL. */
unsigned int zipStorePrevEntryLength(unsigned char *p, unsigned int len) {
if (p == NULL) {
return (len < ZIP_BIG_PREVLEN) ? 1 : sizeof(len)+1;
} else {
if (len < ZIP_BIG_PREVLEN) {
p[0] = len;
return 1;
} else {
return zipStorePrevEntryLengthLarge(p,len);
}
}
}
zipStoreEntryEncoding
函数计算额外存放字节元素长度所需字节数(encoding
编码中第2、3种格式);reqlen变量值添加这两个函数后返回值后成为插入节点长度;
/* Write the encoding header of the entry in 'p'. If p is NULL it just returns
* the amount of bytes required to encode such a length. Arguments:
*
* 'encoding' is the encoding we are using for the entry. It could be
* ZIP_INT_* or ZIP_STR_* or between ZIP_INT_IMM_MIN and ZIP_INT_IMM_MAX
* for single-byte small immediate integers.
*
* 'rawlen' is only used for ZIP_STR_* encodings and is the length of the
* string that this entry represents.
*
* The function returns the number of bytes used by the encoding/length
* header stored in 'p'. */
unsigned int zipStoreEntryEncoding(unsigned char *p, unsigned char encoding, unsigned int rawlen) {
unsigned char len = 1, buf[5];
if (ZIP_IS_STR(encoding)) {
/* Although encoding is given it may not be set for strings,
* so we determine it here using the raw length. */
if (rawlen <= 0x3f) {
if (!p) return len;
buf[0] = ZIP_STR_06B | rawlen;
} else if (rawlen <= 0x3fff) {
len += 1;
if (!p) return len;
buf[0] = ZIP_STR_14B | ((rawlen >> 8) & 0x3f);
buf[1] = rawlen & 0xff;
} else {
len += 4;
if (!p) return len;
buf[0] = ZIP_STR_32B;
buf[1] = (rawlen >> 24) & 0xff;
buf[2] = (rawlen >> 16) & 0xff;
buf[3] = (rawlen >> 8) & 0xff;
buf[4] = rawlen & 0xff;
}
} else {
/* Implies integer encoding, so length is always 1. */
if (!p) return len;
buf[0] = encoding;
}
/* Store this length at p. */
memcpy(p,buf,len);
return len;
}
zipPrevLenByteDiff
函数计算后驱节点prevlen属性长度需调整多少个字节,结果存放在nextdiff变量
中;
/* Given a pointer 'p' to the prevlen info that prefixes an entry, this
* function returns the difference in number of bytes needed to encode
* the prevlen if the previous entry changes of size.
*
* So if A is the number of bytes used right now to encode the 'prevlen'
* field.
*
* And B is the number of bytes that are needed in order to encode the
* 'prevlen' if the previous element will be updated to one of size 'len'.
*
* Then the function returns B - A
*
* So the function returns a positive number if more space is needed,
* a negative number if less space is needed, or zero if the same space
* is needed. */
int zipPrevLenByteDiff(unsigned char *p, unsigned int len) {
unsigned int prevlensize;
ZIP_DECODE_PREVLENSIZE(p, prevlensize);
return zipStorePrevEntryLength(NULL, len) - prevlensize;
}
级联更新:__ziplistCascadeUpdate
函数
定义:因为zlEntry中保存了前一个元素的长度,而且这个长度又是一个边长的整数,所以当一个zlEntry的前面的元素改变(内容改变或者被删除,或者插入了新的元素)时,可能造成prevlen长度的改变,而prevlen长度的改变又会使整个zlEntry的的长度改变,当当前zlEntry的长度改变的时候又可能会造成下一个zlEntry的长度的改变,这就造成了级联更新.如果ziplist中元素很多造成的级联更新影响可能会很大,影响整个redis服务,所以ziplist不适合存放的元素过多.
/* When an entry is inserted, we need to set the prevlen field of the next
* entry to equal the length of the inserted entry. It can occur that this
* length cannot be encoded in 1 byte and the next entry needs to be grow
* a bit larger to hold the 5-byte encoded prevlen. This can be done for free,
* because this only happens when an entry is already being inserted (which
* causes a realloc and memmove). However, encoding the prevlen may require
* that this entry is grown as well. This effect may cascade throughout
* the ziplist when there are consecutive entries with a size close to
* ZIP_BIG_PREVLEN, so we need to check that the prevlen can be encoded in
* every consecutive entry.
*
* Note that this effect can also happen in reverse, where the bytes required
* to encode the prevlen field can shrink. This effect is deliberately ignored,
* because it can cause a "flapping" effect where a chain prevlen fields is
* first grown and then shrunk again after consecutive inserts. Rather, the
* field is allowed to stay larger than necessary, because a large prevlen
* field implies the ziplist is holding large entries anyway.
*
* The pointer "p" points to the first entry that does NOT need to be
* updated, i.e. consecutive fields MAY need an update. */
unsigned char *__ziplistCascadeUpdate(unsigned char *zl, unsigned char *p) {
size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), rawlen, rawlensize;
size_t offset, noffset, extra;
unsigned char *np;
zlentry cur, next;
while (p[0] != ZIP_END) {
zipEntry(p, &cur);
rawlen = cur.headersize + cur.len;
rawlensize = zipStorePrevEntryLength(NULL,rawlen);
/* Abort if there is no next entry. */
if (p[rawlen] == ZIP_END) break;
zipEntry(p+rawlen, &next);
/* Abort when "prevlen" has not changed. */
if (next.prevrawlen == rawlen) break;
if (next.prevrawlensize < rawlensize) {
/* The "prevlen" field of "next" needs more bytes to hold
* the raw length of "cur". */
offset = p-zl;
extra = rawlensize-next.prevrawlensize;
zl = ziplistResize(zl,curlen+extra);
p = zl+offset;
/* Current pointer and offset for next element. */
np = p+rawlen;
noffset = np-zl;
/* Update tail offset when next element is not the tail element. */
if ((zl+intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))) != np) {
ZIPLIST_TAIL_OFFSET(zl) =
intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+extra);
}
/* Move the tail to the back. */
memmove(np+rawlensize,
np+next.prevrawlensize,
curlen-noffset-next.prevrawlensize-1);
zipStorePrevEntryLength(np,rawlen);
/* Advance the cursor */
p += rawlen;
curlen += extra;
} else {
if (next.prevrawlensize > rawlensize) {
/* This would result in shrinking, which we want to avoid.
* So, set "rawlen" in the available bytes. */
zipStorePrevEntryLengthLarge(p+rawlen,rawlen);
} else {
zipStorePrevEntryLength(p+rawlen,rawlen);
}
/* Stop here, as the raw length of "next" has not changed. */
break;
}
}
return zl;
}
注意:
Redis 针对压缩列表在设计上的不足,在后来的版本中,新增设计了两种数据结构:quicklist
(Redis 3.2 引入) 和 listpack
(Redis 5.0 引入)。这两种数据结构的设计目标,就是尽可能地保持压缩列表节省内存的优势,同时解决压缩列表的「连锁更新」的问题。
相关配置:
1.hash-max-ziplist-entries 512
当hash中元素的个数少于512的时候使用ziplist保存数据,超过的时候使用dict
2.hash-max-ziplist-value 64
当hash中最大元素的字节占用不超过64的时候使用ziplist保存数据,超过的时候使用dict