redis源码分析（八）、redis数据结构之压缩ziplist--------ziplist.c ziplist.h学习笔记

最新推荐文章于 2024-05-04 19:16:53 发布

后打开撒打发了

最新推荐文章于 2024-05-04 19:16:53 发布

阅读量727

点赞数

本文链接：https://blog.csdn.net/chenxun_2010/article/details/78854557

版权

本文主要介绍了Redis中ziplist的数据结构，包括zlbytes、zltail、zllen和zlend等字段的作用，以及ziplist节点的定义和简化后的zlentry属性。特别指出，ziplist在存储节点时，针对字符串类型，只存储encode和len信息，整型长度固定，仅存encoding即可。添加ziplist节点的时间复杂度为O(1)。

摘要由CSDN通过智能技术生成

一、介绍ziplist

/* The ziplist is a specially encoded dually linked list that is designed
 * to be very memory efficient.
 
 * Ziplist 是为了尽可能节约内存而设计相当特许的双端队列

 *It stores both strings and integer values,where integers are encoded as
 *actual integers instead of a series ofcharacters. 
 
 *Ziplist 能存储strings和integer值，整型值被存储为实际的整型值而不是字符数组。

 *It allows push and pop operations on either side of the list
 * in O(1) time. However, because every operation requires a reallocation of
 * the memory used by the ziplist, the actual complexity is related to the
 * amount of memory used by the ziplist.
 
 *Ziplist 在头部和尾部的操作时间0（1），ziplist的操作都需要重新分配内存，所以
 *实际的复杂度和ziplist的使用和内存有关。

二、ziplist结构

<zlbytes>  <zltail>  <zllen> <entry> <entry> ...... <entry> <zlend>
|-----ziplist header--------|----------entry---------------|--end--|

zlbytes: 4字节，是一个无符号整数，保存着 ziplist 使用的内存数量。通过这个值，程序可以直接对 ziplist 的内存大小进行调整，而无须为了计算ziplist的内存大小而遍历整个列表。

zltail: 4字节，保存着到达列表中最后一个节点的偏移量。这个偏移量使得对表尾的操作可以在无须遍历整个列表的情况下进行。

zllen: 2字节，保存着列表中的节点数量。当 zllen 保存的值大于 2**16-2时程序需要遍历整个列表才能知道列表实际包含了多少个节点。

zlend: 1字节，值为 255 ，标识列表的末尾。

/* 
空白 ziplist 示例图
area        |<---- ziplist header ---->|<-- end -->|
size          4 bytes   4 bytes 2 bytes  1 byte
            +---------+--------+-------+-----------+
component   | zlbytes | zltail | zllen | zlend     |
            |         |        |       |           |
value       |  1011   |  1010  |   0   | 1111 1111 |
            +---------+--------+-------+-----------+
                                       ^
                                       |
                               ZIPLIST_ENTRY_HEAD
                                       &
address                        ZIPLIST_ENTRY_TAIL
                                       &
                               ZIPLIST_ENTRY_END


非空 ziplist 示例图
area        |<---- ziplist header ---->|<----------- entries ------------->|<-end->|
size          4 bytes  4 bytes  2 bytes    ?        ?        ?        ?     1 byte
            +---------+--------+-------+--------+--------+--------+--------+-------+
component   | zlbytes | zltail | zllen | entry1 | entry2 |  ...   | entryN | zlend |
            +---------+--------+-------+--------+--------+--------+--------+-------+
                                       ^                          ^        ^
address                                |                          |        |
                                ZIPLIST_ENTRY_HEAD                |   ZIPLIST_ENTRY_END
                                                                  |
                                                        ZIPLIST_ENTRY_TAIL
*/

ziplist节点定义如下：

/*
 * 保存 ziplist 节点信息的结构
 */
typedef struct zlentry {
   

    // prevrawlen ：前置节点的长度
    // prevrawlensize ：编码 prevrawlen 所需的字节大小
    unsigned int prevrawlensize, prevrawlen;

    // len ：当前节点值的长度
    // lensize ：编码 len 所需的字节大小
    unsigned int lensize, len;

    // 当前节点 header 的大小
    // 等于 prevrawlensize + lensize
    unsigned int headersize;

    // 当前节点值所使用的编码类型
    unsigned char encoding;

    // 指向当前节点的指针
    unsigned char *p;

} zlentry;

可以看出zlentry的属性还是比较多的。实际上，ziplist在存储节点信息时，并没有将zlentry数据结构所有属性保存，而是做了简化：

prevlen	encode & len	value

prevlen：表示前一个zlentry的长度
encode&len: 本节点的存储的值是int还是string
value：本节点的值

注意：encode：00 01 10 表示本节点存储的value是string类型
             11表示本节点存储的value是 int 类型
/*
 * 字符串编码类型
 */
#define ZIP_STR_06B (0 << 6)
#define ZIP_STR_14B (1 << 6)
#define ZIP_STR_32B (2 << 6)

zlentry为字符串的时候encode/len编码规则如下：

encoding	占用字节	存贮结构encode/len	字符串长度范围	len取值
ZIP_STR_06B	1字节	00XXXXXX	长度<64	后6位
ZIP_STR_14B	2字节	01XXXXXX XXXXXXXX	长度<16384	后14位
ZIP_STR_32B	5字节	10000000 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX	长度<2^32-1	32位

prevlen 前置节点的长度
小于254	        1	00xxxxxx（用1个字节表示）
大于或等于254	5	11111110 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx （5个字节表示）
/*
 * ziplist 末端标识符，以及 5 字节长长度标识符
 */
#define ZIP_END 255
#define ZIP_BIGLEN 254

由于整型的长度是固定的，因此只需存储encoding信息，length值可根据编码进行计算得出。

encoding	占用字节	存储结构	取值范围
ZIP_INT_XX	1字节	11 11 0001~11111101	0~12
ZIP_INT_8B	1字节	11 11 1110	-2^8~28-1
ZIP_INT_16B	2字节	11 00 0000	-2^16~216-1
ZIP_INT_24B	3字节	11 11 0000	-2^24~224-1
ZIP_INT_32B	4字节	11 01 0000	-2^32~232-1
ZIP_INT_64B	8字节	11 10 0000	-2^64~264-1

/*
 * 整数编码类型
 */
#define ZIP_INT_16B (0xc0 | 0<<4)   ----> 11 00 0000  -2^16~2^16-1
#define ZIP_INT_32B (0xc0 | 1<<4)   ----> 11 01 0000  -2^32~2^32-1
#define ZIP_INT_64B (0xc0 | 2<<4)   ----> 11 10 0000  -2^64~2^64-1
#define ZIP_INT_24B (0xc0 | 3<<4)   ----> 11 11 0000  -2^24~2^24-1
#define ZIP_INT_8B 0xfe             ----> 11 11 1110  -2^8~2^8-1

#define ZIP_INT_IMM_MIN 0xf1    /* 11110001 */
#define ZIP_INT_IMM_MAX 0xfd    /* 11111101 */
  
  0-------12                  1111xxxx   11110001~11111101
  
   0000 and 1111都被占用了，不能使用
   0xfe -->1110 被用来表示 ZIP_INT_8B 编码
   我想问 13 14 15 是用什么来表示？被划分在 ZIP_INT_8B ？emmmm 应该就是这样

解释如下：

* 如果节点保存的是整数值，
 *    那么这部分 header 的头 2 位都将被设置为 1 ，
 *    而之后跟着的 2 位则用于标识节点所保存的整数的类型。
 *
 * |11000000| - 1 byte
 *      Integer encoded as int16_t (2 bytes).
 *      节点的值为 int16_t 类型的整数，长度为 2 字节。
 * |11010000| - 1 byte
 *      Integer encoded as int32_t (4 bytes).
 *      节点的值为 int32_t 类型的整数，长度为 4 字节。
 * |11100000| - 1 byte
 *      Integer encoded as int64_t (8 bytes).
 *      节点的值为 int64_t 类型的整数，长度为 8 字节。
 * |11110000| - 1 byte
 *      Integer encoded as 24 bit signed (3 bytes).
 *      节点的值为 24 位（3 字节）长的整数。
 * |11111110| - 1 byte
 *      Integer encoded as 8 bit signed (1 byte).
 *      节点的值为 8 位（1 字节）长的整数。
 * |1111xxxx| - (with xxxx between 0000 and 1101) immediate 4 bit integer.
 *      Unsigned integer from 0 to 12. The encoded value is actually from
 *      1 to 13 because 0000 and 1111 can not be used, so 1 should be
 *      subtracted from the encoded 4 bit value to obtain the right value.
 *      节点的值为介于 0 至 12 之间的无符号整数。
 *      因为 0000 和 1111 都不能使用，所以位的实际值将是 1 至 13 。
 *      程序在取得这 4 个位的值之后，还需要减去 1 ，才能计算出正确的值。
 *      比如说，如果位的值为 0001 = 1 ，那么程序返回的值将是 1 - 1 = 0 。

/* Macro to determine if the entry is a string. String entries never start
 * with "11" as most significant bits of the first byte. */
#define ZIP_IS_STR(enc) (((enc) & ZIP_STR_MASK) < ZIP_STR_MASK)
很清楚意思就是计算valued是否为11  因为string的类型编码不可能是11，所以其意思就是计算value的类型是否为string

/* Utility macros.*/

/* Return total bytes a ziplist is composed of. */
// 定位到 ziplist 的 bytes 属性，该属性记录了整个 ziplist 所占用的内存字节数
// 用于取出 bytes 属性的现有值，或者为 bytes 属性赋予新值
#define ZIPLIST_BYTES(zl)       (*((uint32_t*)(zl)))   //zlbytes


/* Return the offset of the last item inside the ziplist. */
// 定位到 ziplist 的 offset 属性，该属性记录了到达表尾节点的偏移量
// 用于取出 offset 属性的现有值，或者为 offset 属性赋予新值
#define ZIPLIST_TAIL_OFFSET(zl) (*((uint32_t*)((zl)+sizeof(uint32_t))))  //zltail

/* Return the length of a ziplist, or UINT16_MAX if the length cannot be
 * determined without scanning the whole ziplist. */
// 定位到 ziplist 的 length 属性，该属性记录了 ziplist 包含的节点数量
// 用于取出 length 属性的现有值，或者为 length 属性赋予新值
#define ZIPLIST_LENGTH(zl)      (*((uint16_t*)((zl)+sizeof(uint32_t)*2)))

/* The size of a ziplist header: two 32 bit integers for the total
 * bytes count and last item offset. One 16 bit integer for the number
 * of items field. */
// 返回 ziplist 表头的大小
#define ZIPLIST_HEADER_SIZE     (sizeof(uint32_t)*2+sizeof(uint16_t))

/* Size of the "end of ziplist" entry. Just one byte. */
#define ZIPLIST_END_SIZE        (sizeof(uint8_t))

/* Return the pointer to the first entry of a ziplist. */
// 返回指向 ziplist 第一个节点（的起始位置）的指针
#define ZIPLIST_ENTRY_HEAD(zl)  ((zl)+ZIPLIST_HEADER_SIZE)

/* Return the pointer to the last entry of a ziplist, using the
 * last entry offset inside the ziplist header. */
// 返回指向 ziplist 最后一个节点（的起始位置）的指针
#define ZIPLIST_ENTRY_TAIL(zl)  ((zl)+intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl)))

/* Return the pointer to the last byte of a ziplist, which is, the
 * end of ziplist FF entry. */
// 返回指向 ziplist 末端 ZIP_END （的起始位置）的指针
#define ZIPLIST_ENTRY_END(zl)   ((zl)+intrev32ifbe(ZIPLIST_BYTES(zl))-1)

增加 ziplist 的节点数 T = O(1)

#define ZIPLIST_INCR_LENGTH(zl,incr) { \
    if (ZIPLIST_LENGTH(zl) < UINT16_MAX) \
        ZIPLIST_LENGTH(zl) = intrev16ifbe(intrev16ifbe(ZIPLIST_LENGTH(zl))+incr); \
}

/* Extract the encoding from the byte pointed by 'ptr' and set it into
 * 'encoding'. 
 *
 * 从 ptr 中取出节点值的编码类型，并将它保存到 encoding 变量中。
 *
 * T = O(1)
 */
#define ZIP_ENTRY_ENCODING(ptr, encoding) do {  \
    (encoding) = (ptr[0]); \
    if ((encoding) < ZIP_STR_MASK) (encoding) &= ZIP_STR_MASK; \
} while(0)

/* Return bytes needed to store integer encoded by 'encoding' 
 *
 * 返回保存 encoding 编码的值所需的字节数量
 *
 * T = O(1)
 */
static unsigned int zipIntSize(unsigned char encoding) {

    switch(encoding) {
    case ZIP_INT_8B:  return 1;
    case ZIP_INT_16B: return 2;
    case ZIP_INT_24B: return 3;
    case ZIP_INT_32B: return 4;
    case ZIP_INT_64B: return 8;
    default: return 0; /* 4 bit immediate */
    }

    assert(NULL);
    return 0;
}

/* The ziplist is a specially encoded dually linked list that is designed
 * to be very memory efficient.
 
 * Ziplist 是为了尽可能节约内存而设计双端队列

 *It stores both strings and integer values,where integers are encoded as
 *actual integers instead of a series ofcharacters. 
 
 *Ziplist 能存储strings和integer值，整型值被存储为实际的整型值而不是字符数组。

 *It allows push and pop operations on either side of the list
 * in O(1) time. However, because every operation requires a reallocation of
 * the memory used by the ziplist, the actual complexity is related to the
 * amount of memory used by the ziplist.
 
 *Ziplist 在头部和尾部的操作时间0（