redis源码分析之七基础的数据结构ziplist

最新推荐文章于 2023-06-07 17:06:25 发布

fpcc

最新推荐文章于 2023-06-07 17:06:25 发布

阅读量145

点赞数

分类专栏：数据库开发大数据文章标签： redis

本文链接：https://blog.csdn.net/fpcc/article/details/108965117

版权

数据库开发同时被 2 个专栏收录

47 篇文章 64 订阅

订阅专栏

大数据

22 篇文章 8 订阅

订阅专栏

一、ziplist压缩列表

压缩列表是HASH和跳表的小数据时的数据结构，这个在前面提到过。压缩列表的定义和使用其实在源码的头部说明中是很清楚的。看一下英文的注释：
The ziplist is a specially encoded dually linked list that is designed to be very memory efficient. It stores both strings and integer values, where integers are encoded as actual integers instead of a series of characters. It allows push and pop operations on either side of the list in O(1) time. However, because every operation requires a reallocation of the memory used by the ziplist, the actual complexity is related to the amount of memory used by the ziplist。
它的意思是ziplist是一个特殊编码的双向链表，目标是为了提高存储的效率，其可用于存储字符串和整数，其中整数是按照二进制编码的，而不是按照字符串编码，其压入和弹出的的时间复杂度为O(1)、不过由于每个操作都需要重新处理使用的内存，所以其实际复杂度和内存大小有关系。

其空间存放格式：
The general layout of the ziplist is as follows:

 <zlbytes> <zltail> <zllen> <entry> <entry> ... <entry> <zlend>

在这里插入图片描述

图片中对占用的空间大小和占位符进行了说明。

/*
* ZIPLIST ENTRIES
* ===============
*

* <prevlen> <encoding>
*
* The length of the previous entry, <prevlen>, is encoded in the following way:
* If this length is smaller than 254 bytes, it will only consume a single
* byte representing the length as an unsinged 8 bit integer. When the length
* is greater than or equal to 254, it will consume 5 bytes. The first byte is
* set to 254 (FE) to indicate a larger value is following. The remaining 4
* bytes take the length of the previous entry as value.
*
* So practically an entry is encoded in the following way:
*
* <prevlen from 0 to 253> <encoding> <entry>
*
* Or alternatively if the previous entry length is greater than 253 bytes
* the following encoding is used:
*
* 0xFE <4 bytes unsigned little endian prevlen> <encoding> <entry>
*
* The encoding field of the entry depends on the content of the
* entry. When the entry is a string, the first 2 bits of the encoding first
* byte will hold the type of encoding used to store the length of the string,
* followed by the actual length of the string. When the entry is an integer
* the first 2 bits are both set to 1. The following 2 bits are used to specify
* what kind of integer will be stored after this header. An overview of the
* different types and encodings is as follows. The first byte is always enough
* to determine the kind of entry.
*
* |00pppppp| - 1 byte
*      String value with length less than or equal to 63 bytes (6 bits).
*      "pppppp" represents the unsigned 6 bit length.
* |01pppppp|qqqqqqqq| - 2 bytes
*      String value with length less than or equal to 16383 bytes (14 bits).
*      IMPORTANT: The 14 bit number is stored in big endian.
* |10000000|qqqqqqqq|rrrrrrrr|ssssssss|tttttttt| - 5 bytes
*      String value with length greater than or equal to 16384 bytes.
*      Only the 4 bytes following the first byte represents the length
*      up to 32^2-1. The 6 lower bits of the first byte are not used and
*      are set to zero.
*      IMPORTANT: The 32 bit number is stored in big endian.

说明：上面三种不同的编码，最高位分别为00,01,10是字节数组编码，表示节点的数据存储的是字节数组，
数组的长度由编码去除最高两位后的其他位记录。

* |11000000| - 3 bytes
*      Integer encoded as int16_t (2 bytes).
* |11010000| - 5 bytes
*      Integer encoded as int32_t (4 bytes).
* |11100000| - 9 bytes
*      Integer encoded as int64_t (8 bytes).
* |11110000| - 4 bytes
*      Integer encoded as 24 bit signed (3 bytes).
* |11111110| - 2 bytes
*      Integer encoded as 8 bit signed (1 byte).
* |1111xxxx| - (with xxxx between 0000 and 1101) immediate 4 bit integer.
*      Unsigned integer from 0 to 12. The encoded value is actually from
*      1 to 13 because 0000 and 1111 can not be used, so 1 should be
*      subtracted from the encoded 4 bit value to obtain the right value.
说明：下面的例子讲的是这种特殊情况，即len,data全二为一，前四个表示长度，后四个表示数据
只能有13个值，由于0000，1110，1111和前面定义冲突，所以只能表示1~13，而真实数据是从0开始，
所以要减去1，才是真实的数据。
* |11111111| - End of ziplist special entry.

说明：11开头的上面的编码说明是不同类型的整数。

*
* Like for the ziplist header, all the integers are represented in little
* endian byte order, even when this code is compiled in big endian systems.
*
* EXAMPLES OF ACTUAL ZIPLISTS
* ===========================
*
* The following is a ziplist containing the two elements representing
* the strings "2" and "5". It is composed of 15 bytes, that we visually
* split into sections:
*
*  [0f 00 00 00] [0c 00 00 00] [02 00] [00 f3] [02 f6] [ff]
*        |             |          |       |       |     |
*     zlbytes        zltail    entries   "2"     "5"   end

                              注意：这个entries,指的其实是zlen,2个元素
*
* The first 4 bytes represent the number 15, that is the number of bytes
* the whole ziplist is composed of. The second 4 bytes are the offset
* at which the last ziplist entry is found, that is 12, in fact the
* last entry, that is "5", is at offset 12 inside the ziplist.
* The next 16 bit integer represents the number of elements inside the
* ziplist, its value is 2 since there are just two elements inside.
* Finally "00 f3" is the first entry representing the number 2. It is
* composed of the previous entry length, which is zero because this is
* our first entry, and the byte F3 which corresponds to the encoding
* |1111xxxx| with xxxx between 0001 and 1101. We need to remove the "F"
* higher order bits 1111, and subtract 1 from the "3", so the entry value
* is "2". The next entry has a prevlen of 02, since the first entry is
* composed of exactly two bytes. The entry itself, F6, is encoded exactly
* like the first entry, and 6-1 = 5, so the value of the entry is 5.
* Finally the special entry FF signals the end of the ziplist.
*
* Adding another element to the above string with the value "Hello World"
* allows us to show how the ziplist encodes small strings. We'll just show
* the hex dump of the entry itself. Imagine the bytes as following the
* entry that stores "5" in the ziplist above:
*
* [02] [0b] [48 65 6c 6c 6f 20 57 6f 72 6c 64]
*
* The first byte, 02, is the length of the previous entry. The next
* byte represents the encoding in the pattern |00pppppp| that means
* that the entry is a string of length <pppppp>, so 0B means that
* an 11 bytes string follows. From the third byte (48) to the last (64)
* there are just the ASCII characters for "Hello World".
*
*/

上面是对编码方式的定义说明和一个具体的例子进行的描述，很简单。重点说明了是小端模式，小端模式。请看上面注释内的说明。

二、源码分析

看一下源码：

1、定义

typedef struct zlentry {
   unsigned int prevrawlensize; /* Bytes used to encode the previous entry len*/
   unsigned int prevrawlen;     /* Previous entry len. */
   unsigned int lensize;        /* Bytes used to encode this entry type/len.
                                   For example strings have a 1, 2 or 5 bytes
                                   header. Integers always use a single byte.*/
   unsigned int len;            /* Bytes used to represent the actual entry.
                                   For strings this is just the string length
                                   while for integers it is 1, 2, 3, 4, 8 or
                                   0 (for 4 bit immediate) depending on the
                                   number range. */
   unsigned int headersize;     /* prevrawlensize + lensize. */
   unsigned char encoding;      /* Set to ZIP_STR_* or ZIP_INT_* depending on
                                   the entry encoding. However for 4 bits
                                   immediate integers this can assume a range
                                   of values and must be range-checked. */
   unsigned char * p;            /* Pointer to the very start of the entry, that
                                   is, this points to prev-entry-len field. * /
} zlentry;

unsigned char *ziplistNew(void) {
    unsigned int bytes = ZIPLIST_HEADER_SIZE+ZIPLIST_END_SIZE;
    unsigned char * zl = zmalloc(bytes);
    ZIPLIST_BYTES(zl) = intrev32ifbe(bytes);
    ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(ZIPLIST_HEADER_SIZE);
    ZIPLIST_LENGTH(zl) = 0;
    zl[bytes-1] = ZIP_END;
    return zl;
}

创建没有什么好讲的，几个宏做初始化。

2、插入

unsigned char *__ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen) {
  //计算当前长度，并定义前一个长度
   size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), reqlen;
   unsigned int prevlensize, prevlen = 0;
   size_t offset;
   int nextdiff = 0;
   unsigned char encoding = 0;
   long long value = 123456789; /* initialized to avoid warning. Using a value
                                   that is easy to see if for some reason
                                   we use it uninitialized. */
   zlentry tail;

  //计算前一个长度
   /* Find out prevlen for the entry that is inserted. */
   if (p[0] != ZIP_END) {
       ZIP_DECODE_PREVLEN(p, prevlensize, prevlen);
   } else {
       unsigned char *ptail = ZIPLIST_ENTRY_TAIL(zl);
       if (ptail[0] != ZIP_END) {
           prevlen = zipRawEntryLength(ptail);
       }
   }

   //开始计算reqlen=prevrawlen+len(encoding)+data
   /* See if the entry can be encoded */
   if (zipTryEncoding(s,slen,&value,&encoding)) {
       /* 'encoding' is set to the appropriate integer encoding */
       reqlen = zipIntSize(encoding);
   } else {
       /* 'encoding' is untouched, however zipStoreEntryEncoding will use the
        * string length to figure out how to encode it. */
       reqlen = slen;
   }
   /* We need space for both the length of the previous entry and
    * the length of the payload. */
    //增加长度
   reqlen += zipStorePrevEntryLength(NULL,prevlen);
   reqlen += zipStoreEntryEncoding(NULL,encoding,slen);

   /* When the insert position is not equal to the tail, we need to
    * make sure that the next entry can hold this entry's length in
    * its prevlen field. */
    //重新定义空间变化并分配
   int forcelarge = 0;
   nextdiff = (p[0] != ZIP_END) ? zipPrevLenByteDiff(p,reqlen) : 0;
   if (nextdiff == -4 && reqlen < 4) {
       nextdiff = 0;
       forcelarge = 1;
   }

   //动态调整内存
   /* Store offset because a realloc may change the address of zl. */
   offset = p-zl;
   zl = ziplistResize(zl,curlen+reqlen+nextdiff);
   p = zl+offset;

   /* Apply memory move when necessary and update tail offset. */
   if (p[0] != ZIP_END) {
       /* Subtract one because of the ZIP_END bytes */
       memmove(p+reqlen,p-nextdiff,curlen-offset-1+nextdiff);

       /* Encode this entry's raw length in the next entry. */
       if (forcelarge)
           zipStorePrevEntryLengthLarge(p+reqlen,reqlen);
       else
           zipStorePrevEntryLength(p+reqlen,reqlen);

       /* Update offset for tail */
       ZIPLIST_TAIL_OFFSET(zl) =
           intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+reqlen);

       /* When the tail contains more than one entry, we need to take
        * "nextdiff" in account as well. Otherwise, a change in the
        * size of prevlen doesn't have an effect on the *tail* offset. */
       zipEntry(p+reqlen, &tail);
       if (p[reqlen+tail.headersize+tail.len] != ZIP_END) {
           ZIPLIST_TAIL_OFFSET(zl) =
               intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);
       }
   } else {
       /* This element will be the new tail. */
       ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(p-zl);
   }

   /* When nextdiff != 0, the raw length of the next entry has changed, so
    * we need to cascade the update throughout the ziplist */
   if (nextdiff != 0) {
       offset = p-zl;
       zl = __ziplistCascadeUpdate(zl,p+reqlen);
       p = zl+offset;
   }

   //写入数据
   /* Write the entry * /
   p += zipStorePrevEntryLength(p,prevlen);
   p += zipStoreEntryEncoding(p,encoding,slen);
   if (ZIP_IS_STR(encoding)) {
       memcpy(p,s,slen);
   } else {
       zipSaveInteger(p,value,encoding);
   }
   ZIPLIST_INCR_LENGTH(zl,1);
   return zl;
}

3、删除
POP也是调用这个函数：

unsigned char *__ziplistDelete(unsigned char *zl, unsigned char *p, unsigned int num) {
   unsigned int i, totlen, deleted = 0;
   size_t offset;
   int nextdiff = 0;
   zlentry first, tail;

   zipEntry(p, &first);
   for (i = 0; p[0] != ZIP_END && i < num; i++) {
       p += zipRawEntryLength(p);
       deleted++;
   }

   totlen = p-first.p; /* Bytes taken by the element(s) to delete. */
   if (totlen > 0) {
       if (p[0] != ZIP_END) {
           /* Storing `prevrawlen` in this entry may increase or decrease the
            * number of bytes required compare to the current `prevrawlen`.
            * There always is room to store this, because it was previously
            * stored by an entry that is now being deleted. */
            //计算删除节点p的前一个节点与删除节点的长度差
           nextdiff = zipPrevLenByteDiff(p,first.prevrawlen);

           /* Note that there is always space when p jumps backward: if
            * the new previous entry is large, one of the deleted elements
            * had a 5 bytes prevlen header, so there is for sure at least
            * 5 bytes free and we need just 4. */
           p -= nextdiff;
           zipStorePrevEntryLength(p,first.prevrawlen);

           /* Update offset for tail */
           ZIPLIST_TAIL_OFFSET(zl) =
               intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))-totlen);

           /* When the tail contains more than one entry, we need to take
            * "nextdiff" in account as well. Otherwise, a change in the
            * size of prevlen doesn't have an effect on the *tail* offset. */
            //处理长度变化引起的尾节点的位置变化
           zipEntry(p, &tail);
           if (p[tail.headersize+tail.len] != ZIP_END) {
               ZIPLIST_TAIL_OFFSET(zl) =
                  intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);
           }

           /* Move tail to the front of the ziplist */
           memmove(first.p,p,
               intrev32ifbe(ZIPLIST_BYTES(zl))-(p-zl)-1);
       } else {
           /* The entire tail was deleted. No need to move memory. */
           ZIPLIST_TAIL_OFFSET(zl) =
               intrev32ifbe((first.p-zl)-first.prevrawlen);
       }

       /* Resize and update length */
       offset = first.p-zl;
       zl = ziplistResize(zl, intrev32ifbe(ZIPLIST_BYTES(zl))-totlen+nextdiff);
       ZIPLIST_INCR_LENGTH(zl,-deleted);
       p = zl+offset;

       /* When nextdiff != 0, the raw length of the next entry has changed, so
        * we need to cascade the update throughout the ziplist * /
        //nextdiff != 0，说明删除节点的长度变化，需要更新节点p长度
       if (nextdiff != 0)
           zl = __ziplistCascadeUpdate(zl,p);
   }
   return zl;
}

删除比插入稍微简单一些。

4、查找

unsigned char *ziplistFind(unsigned char *p, unsigned char *vstr, unsigned int vlen, unsigned int skip) {
   int skipcnt = 0;
   unsigned char vencoding = 0;
   long long vll = 0;

   while (p[0] != ZIP_END) {
       unsigned int prevlensize, encoding, lensize, len;
       unsigned char * q;

       //查找时先定位第一个entry即privious_entry_length字段的位置
       ZIP_DECODE_PREVLENSIZE(p, prevlensize);
       ZIP_DECODE_LENGTH(p + prevlensize, encoding, lensize, len);
       q = p + prevlensize + lensize;

       if (skipcnt == 0) {
           /* Compare current entry with specified entry */
           //检查编码类型--字符串
           if (ZIP_IS_STR(encoding)) {
               if (len == vlen && memcmp(q, vstr, vlen) == 0) {
                   return p;
               }
           } else {
               /* Find out if the searched field can be encoded. Note that
                * we do it only the first time, once done vencoding is set
                * to non-zero and vll is set to the integer value. */
               if (vencoding == 0) {
                   if (!zipTryEncoding(vstr, vlen, &vll, &vencoding)) {
                       /* If the entry can't be encoded we set it to
                        * UCHAR_MAX so that we don't retry again the next
                        * time. */
                       vencoding = UCHAR_MAX;
                   }
                   /* Must be non-zero by now */
                   assert(vencoding);
               }

               /* Compare current entry with specified entry, do it only
                * if vencoding != UCHAR_MAX because if there is no encoding
                * possible for the field it can't be a valid integer. */
               if (vencoding != UCHAR_MAX) {
                   long long ll = zipLoadInteger(q, encoding);
                   if (ll == vll) {
                       return p;
                   }
               }
           }

           /* Reset skip count */
           skipcnt = skip;
       } else {
           /* Skip entry */
           skipcnt--;
       }

       /* Move to next entry * /
       p = q + len;
   }

   return NULL;
}

分为字符串查找和数字查找，这个没啥可说的。

三、总结

压缩列表其实就是在控制小规模数据时（元素小于128，所有成员长度小于64字节），利用类似数组的数组操作思想，虽然时间复杂度有所增加，但是由于规模受到限制，自然复杂度也受到限制。其实这就是解决问题的方法，针对不同的场景使用不同的处理方式，相比于跳表，虽然时间复杂度增加，但空间复杂度下降，这就是平衡，设计架构时，无处不在。
在这里插入图片描述

fpcc

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
redis源码分析之七基础的数据结构ziplist

一、ziplist压缩列表压缩列表是HASH和跳表的小数据时的数据结构，这个在前面提到过。压缩列表的定义和使用其实在源码的头部说明中是很清楚的。看一下英文的注释：The ziplist is a specially encoded dually linked list that is designed to be very memory efficient. It stores both strings and integer values, where integers are encoded as
复制链接

扫一扫

专栏目录