一 ziplist简介以及应用
ziplist称之为压缩链表,顾名思义,压缩链表有压缩节省空间的语义。回想redis另外一种链表:双向链表,每个节点都需要prev、next指针来指向前后节点,如何数据只占1字节,大量的此类数据会造成空间上的浪费。因此redis提出了另外一种更具有空间效率的链表:压缩链表,压缩链表则没有这两个指针,压缩链表含有两个数,一个代表前一个节点的长度,有一个表示当前节点的长度,最少只需2个字节,相比双向链表节省了很多空间。
压缩链表是列表键和哈希键的底层实现之一。当一个列表键只包含少量列表项,并且每个列表项要么是小整数值,要么就是长度较短的字符串,那么Redis就会使用压缩链表来做列表键的实现。
二 一个例子
虽然注释已经详细说明了ziplist的设计思路,但是ziplist的代码较难理解,不妨先来看一个例子:
创建一个空的ziplist链表
加入了第1个节点"abcde"以后
加入了第2个节点"ABC"以后(加在最后)
加入了第3个节点"10"以后(加在最前面)
三 ziplist实现
先看一下作者对ziplist的一些讲解
/* The ziplist is a specially encoded dually linked list that is designed
* to be very memory efficient.
*
* Ziplist 是为了尽可能地节约内存而设计的特殊编码双端链表。
*
* It stores both strings and integer values,
* where integers are encoded as actual integers instead of a series of
* characters.
*
* Ziplist 可以储存字符串值和整数值,
* 其中,整数值被保存为实际的整数,而不是字符数组。
*
* It allows push and pop operations on either side of the list
* in O(1) time. However, because every operation requires a reallocation of
* the memory used by the ziplist, the actual complexity is related to the
* amount of memory used by the ziplist.
*
* Ziplist 允许在列表的两端进行 O(1) 复杂度的 push 和 pop 操作。
* 但是,因为这些操作都需要对整个 ziplist 进行内存重分配,
* 所以实际的复杂度和 ziplist 占用的内存大小有关。
*
* ----------------------------------------------------------------------------
*
* ZIPLIST OVERALL LAYOUT:
* Ziplist 的整体布局:
*
* The general layout of the ziplist is as follows:
* 以下是 ziplist 的一般布局:
*
* <zlbytes><zltail><zllen><entry><entry><zlend>
*
* <zlbytes> is an unsigned integer to hold the number of bytes that the
* ziplist occupies. This value needs to be stored to be able to resize the
* entire structure without the need to traverse it first.
*
* <zlbytes> 是一个无符号整数,保存着 ziplist 使用的内存数量。
*
* 通过这个值,程序可以直接对 ziplist 的内存大小进行调整,
* 而无须为了计算 ziplist 的内存大小而遍历整个列表。
*
* <zltail> is the offset to the last entry in the list. This allows a pop
* operation on the far side of the list without the need for full traversal.
*
* <zltail> 保存着到达列表中最后一个节点的偏移量。
*
* 这个偏移量使得对表尾的 pop 操作可以在无须遍历整个列表的情况下进行。
*
* <zllen> is the number of entries.When this value is larger than 2**16-2,
* we need to traverse the entire list to know how many items it holds.
*
* <zllen> 保存着列表中的节点数量。
*
* 当 zllen 保存的值大于 2**16-2 时,
* 程序需要遍历整个列表才能知道列表实际包含了多少个节点。
*
* <zlend> is a single byte special value, equal to 255, which indicates the
* end of the list.
*
* <zlend> 的长度为 1 字节,值为 255 ,标识列表的末尾。
*
* ZIPLIST ENTRIES:
* ZIPLIST 节点:
*
* Every entry in the ziplist is prefixed by a header that contains two pieces
* of information. First, the length of the previous entry is stored to be
* able to traverse the list from back to front. Second, the encoding with an
* optional string length of the entry itself is stored.
*
* 每个 ziplist 节点的前面都带有一个 header ,这个 header 包含两部分信息:
*
* 1)前置节点的长度,在程序从后向前遍历时使用。
*
* 2)当前节点所保存的值的类型和长度。
*
* The length of the previous entry is encoded in the following way:
* If this length is smaller than 254 bytes, it will only consume a single
* byte that takes the length as value. When the length is greater than or
* equal to 254, it will consume 5 bytes. The first byte is set to 254 to
* indicate a larger value is following. The remaining 4 bytes take the
* length of the previous entry as value.
*
* 编码前置节点的长度的方法如下:
*
* 1) 如果前置节点的长度小于 254 字节,那么程序将使用 1 个字节来保存这个长度值。
*
* 2) 如果前置节点的长度大于等于 254 字节,那么程序将使用 5 个字节来保存这个长度值:
* a) 第 1 个字节的值将被设为 254 ,用于标识这是一个 5 字节长的长度值。
* b) 之后的 4 个字节则用于保存前置节点的实际长度。
*
* The other header field of the entry itself depends on the contents of the
* entry. When the entry is a string, the first 2 bits of this header will hold
* the type of encoding used to store the length of the string, followed by the
* actual length of the string. When the entry is an integer the first 2 bits
* are both set to 1. The following 2 bits are used to specify what kind of
* integer will be stored after this header. An overview of the different
* types and encodings is as follows:
*
* header 另一部分的内容和节点所保存的值有关。
*
* 1) 如果节点保存的是字符串值,
* 那么这部分 header 的头 2 个位将保存编码字符串长度所使用的类型,
* 而之后跟着的内容则是字符串的实际长度。
*
* |00pppppp| - 1 byte
* String value with length less than or equal to 63 bytes (6 bits).
* 字符串的长度小于或等于 63 字节。
* |01pppppp|qqqqqqqq| - 2 bytes
* String value with length less than or equal to 16383 bytes (14 bits).
* 字符串的长度小于或等于 16383 字节。
* |10______|qqqqqqqq|rrrrrrrr|ssssssss|tttttttt| - 5 bytes
* String value with length greater than or equal to 16384 bytes.
* 字符串的长度大于或等于 16384 字节。
*
* 2) 如果节点保存的是整数值,
* 那么这部分 header 的头 2 位都将被设置为 1 ,
* 而之后跟着的 2 位则用于标识节点所保存的整数的类型。
*
* |11000000| - 1 byte
* Integer encoded as int16_t (2 bytes).
* 节点的值为 int16_t 类型的整数,长度为 2 字节。
* |11010000| - 1 byte
* Integer encoded as int32_t (4 bytes).
* 节点的值为 int32_t 类型的整数,长度为 4 字节。
* |11100000| - 1 byte
* Integer encoded as int64_t (8 bytes).
* 节点的值为 int64_t 类型的整数,长度为 8 字节。
* |11110000| - 1 byte
* Integer encoded as 24 bit signed (3 bytes).
* 节点的值为 24 位(3 字节)长的整数。
* |11111110| - 1 byte
* Integer encoded as 8 bit signed (1 byte).
* 节点的值为 8 位(1 字节)长的整数。
* |1111xxxx| - (with xxxx between 0000 and 1101) immediate 4 bit integer.
* Unsigned integer from 0 to 12. The encoded value is actually from
* 1 to 13 because 0000 and 1111 can not be used, so 1 should be
* subtracted from the encoded 4 bit value to obtain the right value.
* 节点的值为介于 0 至 12 之间的无符号整数。
* 因为 0000 和 1111 都不能使用,所以位的实际值将是 1 至 13 。
* 程序在取得这 4 个位的值之后,还需要减去 1 ,才能计算出正确的值。
* 比如说,如果位的值为 0001 = 1 ,那么程序返回的值将是 1 - 1 = 0 。
* |11111111| - End of ziplist.
* ziplist 的结尾标识
*
* All the integers are represented in little endian byte order.
*
* 所有整数都表示为小端字节序。
*
* ----------------------------------------------------------------------------
*
* Copyright (c) 2009-2012, Pieter Noordhuis <pcnoordhuis at gmail dot com>
* Copyright (c) 2009-2012, Salvatore Sanfilippo <antirez at gmail dot com>
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Redis nor the names of its contributors may be used
* to endorse or promote products derived from this software without
* specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
ziplist节点结构体
/*
* 保存 ziplist 节点信息的结构
*/
typedef struct zlentry {
// prevrawlen :前置节点的长度
// prevrawlensize :编码 prevrawlen 所需的字节大小
unsigned int prevrawlensize, prevrawlen;
// len :当前节点值的长度
// lensize :编码 len 所需的字节大小
unsigned int lensize, len;
// 当前节点 header 的大小
// 等于 prevrawlensize + lensize
unsigned int headersize;
// 当前节点值所使用的编码类型
unsigned char encoding;
// 指向当前节点的指针
unsigned char *p;
} zlentry;
以下是插入的操作
/* Insert item at "p". */
/*
* 根据指针 p 所指定的位置,将长度为 slen 的字符串 s 插入到 zl 中。
*
* 函数的返回值为完成插入操作之后的 ziplist
*
* T = O(N^2)
*/
static unsigned char *__ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen) {
// 记录当前 ziplist 的长度
size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), reqlen, prevlen = 0;
size_t offset;
int nextdiff = 0;
unsigned char encoding = 0;
long long value = 123456789; /* initialized to avoid warning. Using a value
that is easy to see if for some reason
we use it uninitialized. */
zlentry entry, tail;
/* Find out prevlen for the entry that is inserted. */
if (p[0] != ZIP_END) {
// 如果 p[0] 不指向列表末端,说明列表非空,并且 p 正指向列表的其中一个节点
// 那么取出 p 所指向节点的信息,并将它保存到 entry 结构中
// 然后用 prevlen 变量记录前置节点的长度
// (当插入新节点之后 p 所指向的节点就成了新节点的前置节点)
// T = O(1)
entry = zipEntry(p);
prevlen = entry.prevrawlen;
} else {
// 如果 p 指向表尾末端,那么程序需要检查列表是否为:
// 1)如果 ptail 也指向 ZIP_END ,那么列表为空;
// 2)如果列表不为空,那么 ptail 将指向列表的最后一个节点。
unsigned char *ptail = ZIPLIST_ENTRY_TAIL(zl);
if (ptail[0] != ZIP_END) {
// 表尾节点为新节点的前置节点
// 取出表尾节点的长度
// T = O(1)
prevlen = zipRawEntryLength(ptail);
}
}
/* See if the entry can be encoded */
// 尝试看能否将输入字符串转换为整数,如果成功的话:
// 1)value 将保存转换后的整数值
// 2)encoding 则保存适用于 value 的编码方式
// 无论使用什么编码, reqlen 都保存节点值的长度
// T = O(N)
if (zipTryEncoding(s,slen,&value,&encoding)) {
/* 'encoding' is set to the appropriate integer encoding */
reqlen = zipIntSize(encoding);
} else {
/* 'encoding' is untouched, however zipEncodeLength will use the
* string length to figure out how to encode it. */
reqlen = slen;
}
/* We need space for both the length of the previous entry and
* the length of the payload. */
// 计算编码前置节点的长度所需的大小
// T = O(1)
reqlen += zipPrevEncodeLength(NULL,prevlen);
// 计算编码当前节点值所需的大小
// T = O(1)
reqlen += zipEncodeLength(NULL,encoding,slen);
/* When the insert position is not equal to the tail, we need to
* make sure that the next entry can hold this entry's length in
* its prevlen field. */
// 只要新节点不是被添加到列表末端,
// 那么程序就需要检查看 p 所指向的节点(的 header)能否编码新节点的长度。
// nextdiff 保存了新旧编码之间的字节大小差,如果这个值大于 0
// 那么说明需要对 p 所指向的节点(的 header )进行扩展
// T = O(1)
nextdiff = (p[0] != ZIP_END) ? zipPrevLenByteDiff(p,reqlen) : 0;
/* Store offset because a realloc may change the address of zl. */
// 因为重分配空间可能会改变 zl 的地址
// 所以在分配之前,需要记录 zl 到 p 的偏移量,然后在分配之后依靠偏移量还原 p
offset = p-zl;
// curlen 是 ziplist 原来的长度
// reqlen 是整个新节点的长度
// nextdiff 是新节点的后继节点扩展 header 的长度(要么 0 字节,要么 4 个字节)
// T = O(N)
zl = ziplistResize(zl,curlen+reqlen+nextdiff);
p = zl+offset;
/* Apply memory move when necessary and update tail offset. */
if (p[0] != ZIP_END) {
// 新元素之后还有节点,因为新元素的加入,需要对这些原有节点进行调整
/* Subtract one because of the ZIP_END bytes */
// 移动现有元素,为新元素的插入空间腾出位置
// T = O(N)
memmove(p+reqlen,p-nextdiff,curlen-offset-1+nextdiff);
/* Encode this entry's raw length in the next entry. */
// 将新节点的长度编码至后置节点
// p+reqlen 定位到后置节点
// reqlen 是新节点的长度
// T = O(1)
zipPrevEncodeLength(p+reqlen,reqlen);
/* Update offset for tail */
// 更新到达表尾的偏移量,将新节点的长度也算上
ZIPLIST_TAIL_OFFSET(zl) =
intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+reqlen);
/* When the tail contains more than one entry, we need to take
* "nextdiff" in account as well. Otherwise, a change in the
* size of prevlen doesn't have an effect on the *tail* offset. */
// 如果新节点的后面有多于一个节点
// 那么程序需要将 nextdiff 记录的字节数也计算到表尾偏移量中
// 这样才能让表尾偏移量正确对齐表尾节点
// T = O(1)
tail = zipEntry(p+reqlen);
if (p[reqlen+tail.headersize+tail.len] != ZIP_END) {
ZIPLIST_TAIL_OFFSET(zl) =
intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);
}
} else {
/* This element will be the new tail. */
// 新元素是新的表尾节点
ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(p-zl);
}
/* When nextdiff != 0, the raw length of the next entry has changed, so
* we need to cascade the update throughout the ziplist */
// 当 nextdiff != 0 时,新节点的后继节点的(header 部分)长度已经被改变,
// 所以需要级联地更新后续的节点
if (nextdiff != 0) {
offset = p-zl;
// T = O(N^2)
zl = __ziplistCascadeUpdate(zl,p+reqlen);
p = zl+offset;
}
/* Write the entry */
// 一切搞定,将前置节点的长度写入新节点的 header
p += zipPrevEncodeLength(p,prevlen);
// 将节点值的长度写入新节点的 header
p += zipEncodeLength(p,encoding,slen);
// 写入节点值
if (ZIP_IS_STR(encoding)) {
// T = O(N)
memcpy(p,s,slen);
} else {
// T = O(1)
zipSaveInteger(p,value,encoding);
}
// 更新列表的节点数量计数器
// T = O(1)
ZIPLIST_INCR_LENGTH(zl,1);
return zl;
}
以下是删除的操作
/* Delete "num" entries, starting at "p". Returns pointer to the ziplist.
*
* 从位置 p 开始,连续删除 num 个节点。
*
* 函数的返回值为处理删除操作之后的 ziplist 。
*
* T = O(N^2)
*/
static unsigned char *__ziplistDelete(unsigned char *zl, unsigned char *p, unsigned int num) {
unsigned int i, totlen, deleted = 0;
size_t offset;
int nextdiff = 0;
zlentry first, tail;
// 计算被删除节点总共占用的内存字节数
// 以及被删除节点的总个数
// T = O(N)
first = zipEntry(p);
for (i = 0; p[0] != ZIP_END && i < num; i++) {
p += zipRawEntryLength(p);
deleted++;
}
// totlen 是所有被删除节点总共占用的内存字节数
totlen = p-first.p;
if (totlen > 0) {
if (p[0] != ZIP_END) {
// 执行这里,表示被删除节点之后仍然有节点存在
/* Storing `prevrawlen` in this entry may increase or decrease the
* number of bytes required compare to the current `prevrawlen`.
* There always is room to store this, because it was previously
* stored by an entry that is now being deleted. */
// 因为位于被删除范围之后的第一个节点的 header 部分的大小
// 可能容纳不了新的前置节点,所以需要计算新旧前置节点之间的字节数差
// T = O(1)
nextdiff = zipPrevLenByteDiff(p,first.prevrawlen);
// 如果有需要的话,将指针 p 后退 nextdiff 字节,为新 header 空出空间
p -= nextdiff;
// 将 first 的前置节点的长度编码至 p 中
// T = O(1)
zipPrevEncodeLength(p,first.prevrawlen);
/* Update offset for tail */
// 更新到达表尾的偏移量
// T = O(1)
ZIPLIST_TAIL_OFFSET(zl) =
intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))-totlen);
/* When the tail contains more than one entry, we need to take
* "nextdiff" in account as well. Otherwise, a change in the
* size of prevlen doesn't have an effect on the *tail* offset. */
// 如果被删除节点之后,有多于一个节点
// 那么程序需要将 nextdiff 记录的字节数也计算到表尾偏移量中
// 这样才能让表尾偏移量正确对齐表尾节点
// T = O(1)
tail = zipEntry(p);
if (p[tail.headersize+tail.len] != ZIP_END) {
ZIPLIST_TAIL_OFFSET(zl) =
intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);
}
/* Move tail to the front of the ziplist */
// 从表尾向表头移动数据,覆盖被删除节点的数据
// T = O(N)
memmove(first.p,p,
intrev32ifbe(ZIPLIST_BYTES(zl))-(p-zl)-1);
} else {
// 执行这里,表示被删除节点之后已经没有其他节点了
/* The entire tail was deleted. No need to move memory. */
// T = O(1)
ZIPLIST_TAIL_OFFSET(zl) =
intrev32ifbe((first.p-zl)-first.prevrawlen);
}
/* Resize and update length */
// 缩小并更新 ziplist 的长度
offset = first.p-zl;
zl = ziplistResize(zl, intrev32ifbe(ZIPLIST_BYTES(zl))-totlen+nextdiff);
ZIPLIST_INCR_LENGTH(zl,-deleted);
p = zl+offset;
/* When nextdiff != 0, the raw length of the next entry has changed, so
* we need to cascade the update throughout the ziplist */
// 如果 p 所指向的节点的大小已经变更,那么进行级联更新
// 检查 p 之后的所有节点是否符合 ziplist 的编码要求
// T = O(N^2)
if (nextdiff != 0)
zl = __ziplistCascadeUpdate(zl,p);
}
return zl;
}