Redis源码分析

最新推荐文章于 2024-08-22 10:22:39 发布

cheng~cheng

最新推荐文章于 2024-08-22 10:22:39 发布

阅读量668

点赞数

分类专栏：服务端研发文章标签：后端

本文链接：https://blog.csdn.net/cheng_1017/article/details/113938024

版权

服务端研发专栏收录该内容

5 篇文章 0 订阅

订阅专栏

1.Redis关键配置文件（redis.conf）

2.Redis哨兵配置文件（sentinel.conf）

3.Redis启动流程

4.Redis关键数据结构

5.参考文章

1.Redis关键配置文件（redis.conf）

redis.conf：该配置文件为redis的主要配置文件，可以在当前的配置文件中通过include包含多个子配置文件。该配置文件中包含的重点配置说明如下：

databases 16 该配置设置了Redis数据库的数量，默认的数据库是DB 0，如果制定数据库，需要在连接redis的时候指定数据库编号。

save 900 1 当DB的数据发生了变化的时候，隔多久将数据写入到磁盘上。前面的配置含义：after 900 sec (15 min) if at least 1 key changed。移除该注视表示禁用save能力。注意在《性能之巅》书中提到，该命令可能会导致间歇性的IO飙升，服务延时，如果涉及大量的大key-value频繁变更时需要重点关注一下。

replicaof <masterip> <masterport> 副本配置，redis的replication是将replication实例作为Redis Server的一个复制，首先是异步，可以配置指定的副本数量的前提下才接收写操作。

maxclients 10000 设置同时能够连接到Redis Server的最大客户端数量，默认10000.

maxmemory-policy noeviction：内存不足时，移除现有的redis的key的策略，默认不驱逐，可选择的范围如下：

# volatile-lru -> Evict using approximated LRU, only keys with an expire set.
# allkeys-lru -> Evict any key using approximated LRU.
# volatile-lfu -> Evict using approximated LFU, only keys with an expire set.
# allkeys-lfu -> Evict any key using approximated LFU.
# volatile-random -> Remove a random key having an expire set.
# allkeys-random -> Remove a random key, any key.
# volatile-ttl -> Remove the key with the nearest expire time (minor TTL)
# noeviction -> Don't evict anything, just return an error on write operations.
#
# LRU means Least Recently Used
# LFU means Least Frequently Used
#
# Both LRU, LFU and volatile-ttl are implemented using approximated
# randomized algorithms.

# Note: with any of the above policies, Redis will return an error on write
#       operations, when there are no suitable keys for eviction.
#
#       At the date of writing these commands are: set setnx setex append
#       incr decr rpush lpush rpushx lpushx linsert lset rpoplpush sadd
#       sinter sinterstore sunion sunionstore sdiff sdiffstore zadd zincrby
#       zunionstore zinterstore hset hsetnx hmset hincrby incrby decrby
#       getset mset msetnx exec sort

# The default is:
#
# maxmemory-policy noeviction

lazyfree-lazy-eviction/expire/server-del no：该配置指定驱逐/过期/删除对应key的策略，支持阻塞式的删除，也支持异步处理的方式。

io-threads 4：Redis通常是单线程的，但是在如UNLINK，慢IO访问和其他的一些处理上是支持多线程的。现在同样支持处理客户端的socket读和写采用不同的IO线程，如写慢的时候读可以采用管道方式充分利用每个核加速提升Redis的性能，通常使用IO多线程能够提升Redis两倍的性能。默认情况下，多线程是禁用的，我们建议是在服务器有4个或者更多的核的时候才启用该配置。例如，如果你有一个4核的服务器，可以采用配置2或者3个IO线程；如果有8核，建议配置6个IO线程。默认我们只针对写操作才启用多线程处理，如果需要启用读操作支持多线程，需要配置 io-threads-do-reads yes才能启用，通常情况下读操作没有太多的提升。当前配置不能在运行态配置，并且在SSL启用的时候，该配置无效。

lua-time-limit 5000：用于配置LUA脚本的执行时间，单位是毫秒。在较长时间的脚本执行过程中，只有SCRIPT KILL和SHUTDOWN NOSAVE命令两个命令可以用。

cluster-enabled yes：通常Redis实例独立运行，并不作为Redis集群的一部分。只有通过该配置启用redis实例，才能作为集群的一部分。每个集群都有自己的集群配置文件，可以通过配置项 cluster-config-file nodes-6379.conf 指定集群的配置，如获取集群节点的更新情况。Redis集群支持Docker的，最好是能够指定IP。

高级配置：配置hash数据结构的threshold如hash-max-ziplist-entries 512， hash-max-ziplist-value 64等。配置list数据结构最大的元素个数，如list-max-ziplist-size -2 （-2: max size: 8 Kb）配置，可按需调整。配置set的元素个数，如set-max-intset-entries 512。配置sort set数据结构的zset-max-ziplist-entries 128，zset-max-ziplist-value 64也是一样。配置HyperLogLog数据结构的尺寸限制，如hll-sparse-max-bytes 3000。配置stream数据结构的单个元素大小限制，如stream-node-max-bytes 4096，包含元素的数量限制，如stream-node-max-entries 100。

activerehashing yes：是否开启数据库的rehash操作，Active rehashing uses 1 millisecond every 100 milliseconds of CPU time in order to help rehashing the main Redis hash table (the one mapping top-level keys to values).

loadmodule /path/to/my_module.so：以往我们想给 Redis 加个功能或类似事务的东西只能用 Lua 脚本，这个东西没有实现真正的原子性，另外也无法使用底层的 API ，实质上比单纯的命令脚本提升有限。模块 Module 可以动态的载入和卸载，可以实现底层的数据结构也可以调用高层的指令，这一切都只需要包含头文件 redismodule.h ，和 Redis 本身一样简洁优雅。

2.Redis哨兵配置文件（sentinel.conf）

Redis采用哨兵模式对集群情况下的主从节点管理，并监控集群中各个节点的运行状态并及时上报。哨兵模式是一种特殊的模式，首先Redis提供了哨兵的命令，哨兵是一个独立的进程，作为进程，它会独立运行。其原理是哨兵通过发送命令，等待Redis服务器响应，从而监控运行的多个Redis实例。然而一个哨兵进程对Redis服务器进行监控，可能会出现问题，为此，我们可以使用多个哨兵进行监控。各个哨兵之间还会进行监控，这样就形成了多哨兵模式。用文字描述一下故障切换（failover）的过程。假设主服务器宕机，哨兵1先检测到这个结果，系统并不会马上进行failover过程，仅仅是哨兵1主观的认为主服务器不可用，这个现象成为主观下线。当后面的哨兵也检测到主服务器不可用，并且数量达到一定值时，那么哨兵之间就会进行一次投票，投票的结果由一个哨兵发起，进行failover操作。切换成功后，就会通过发布订阅模式，让各个哨兵把自己监控的从服务器实现切换主机，这个过程称为客观下线。这样对于客户端而言，一切都是透明的。（参考文章：https://www.jianshu.com/p/06ab9daf921d）

sentinel monitor mymaster 127.0.0.1 6379 2：使用Sentinel监控master节点的状态。Tells Sentinel to monitor this master, and to consider it in O_DOWN (Objectively Down) state only if at least <quorum> sentinels agree.

sentinel auth-pass <master-name> <password>：设置鉴权的密码。Set the password to use to authenticate with the master and replicas. Useful if there is a password set in the Redis instances to monitor.

3.Redis启动流程

redis的入口文件为server.c，入口函数为main函数。redis的启动流程步骤如下：

* `initServerConfig()` setups the default values of the `server` structure.
* `initServer()` allocates the data structures needed to operate, setup the listening socket, and so forth.
* `aeMain()` starts the event loop which listens for new connections.

There are two special functions called periodically by the event loop:

1. `serverCron()` is called periodically (according to `server.hz` frequency), and performs tasks that must be performed from time to time, like checking for timedout clients.
2. `beforeSleep()` is called every time the event loop fired, Redis served a few requests, and is returning back into the event loop.

Inside server.c you can find code that handles other vital things of the Redis server:

* `call()` is used in order to call a given command in the context of a given client.
* `activeExpireCycle()` handles eviction of keys with a time to live set via the `EXPIRE` command.
* `freeMemoryIfNeeded()` is called when a new write command should be performed but Redis is out of memory according to the `maxmemory` directive.
* The global variable `redisCommandTable` defines all the Redis commands, specifying the name of the command, the function implementing the command, the number of arguments required, and other properties of each command.
server.c文件中main函数作为redis server端入口源码如下：

int main(int argc, char **argv) {
    struct timeval tv;
    int j;
#在预编译阶段，设置宏定义，运行阶段根据编译设置的宏进行运行代码段选择，这里用于是否是测试的校验
#ifdef REDIS_TEST
    if (argc == 3 && !strcasecmp(argv[1], "test")) {
        if (!strcasecmp(argv[2], "ziplist")) {
            return ziplistTest(argc, argv);
        } else if (!strcasecmp(argv[2], "quicklist")) {
            quicklistTest(argc, argv);
        } else if (!strcasecmp(argv[2], "intset")) {
            return intsetTest(argc, argv);
        } else if (!strcasecmp(argv[2], "zipmap")) {
            return zipmapTest(argc, argv);
        } else if (!strcasecmp(argv[2], "sha1test")) {
            return sha1Test(argc, argv);
        } else if (!strcasecmp(argv[2], "util")) {
            return utilTest(argc, argv);
        } else if (!strcasecmp(argv[2], "endianconv")) {
            return endianconvTest(argc, argv);
        } else if (!strcasecmp(argv[2], "crc64")) {
            return crc64Test(argc, argv);
        } else if (!strcasecmp(argv[2], "zmalloc")) {
            return zmalloc_test(argc, argv);
        }

        return -1; /* test not found */
    }
#endif

#根据初始化宏定义，按需初始化操作
    /* We need to initialize our libraries, and the server configuration. */
#ifdef INIT_SETPROCTITLE_REPLACEMENT
    spt_init(argc, argv);
#endif
    setlocale(LC_COLLATE,"");
    tzset(); /* Populates 'timezone' global. */
#设置内存溢出处理方式
    zmalloc_set_oom_handler(redisOutOfMemoryHandler);
    srand(time(NULL)^getpid());
    srandom(time(NULL)^getpid());
    gettimeofday(&tv,NULL);
    crc64_init();

    uint8_t hashseed[16];
    getRandomBytes(hashseed,sizeof(hashseed));
    dictSetHashFunctionSeed(hashseed);
    server.sentinel_mode = checkForSentinelMode(argc,argv);
#初始化服务端配置
    initServerConfig();
    ACLInit(); /* The ACL subsystem must be initialized ASAP because the
                  basic networking code and client creation depends on it. */
#初始化redis支持的module
    moduleInitModulesSystem();
    tlsInit();

    /* Store the executable path and arguments in a safe place in order
     * to be able to restart the server later. */
    server.executable = getAbsolutePath(argv[0]);
    server.exec_argv = zmalloc(sizeof(char*)*(argc+1));
    server.exec_argv[argc] = NULL;
    for (j = 0; j < argc; j++) server.exec_argv[j] = zstrdup(argv[j]);

    /* We need to init sentinel right now as parsing the configuration file
     * in sentinel mode will have the effect of populating the sentinel
     * data structures with master nodes to monitor. */
#初始化哨兵相关配置
    if (server.sentinel_mode) {
        initSentinelConfig();
        initSentinel();
    }

    /* Check if we need to start in redis-check-rdb/aof mode. We just execute
     * the program main. However the program is part of the Redis executable
     * so that we can easily execute an RDB check on loading errors. */
    if (strstr(argv[0],"redis-check-rdb") != NULL)
        redis_check_rdb_main(argc,argv,NULL);
    else if (strstr(argv[0],"redis-check-aof") != NULL)
        redis_check_aof_main(argc,argv);

#参数检查，并对参数进行相关解析
    if (argc >= 2) {
#第一个参数是执行程序本身，第二个参数才传递的真正参数，即j=1表示开始获取参数并解析
        j = 1; /* First option to parse in argv[] */
        sds options = sdsempty();
        char *configfile = NULL;

#基本参数解析
        /* Handle special options --help and --version */
        if (strcmp(argv[1], "-v") == 0 ||
            strcmp(argv[1], "--version") == 0) version();
        if (strcmp(argv[1], "--help") == 0 ||
            strcmp(argv[1], "-h") == 0) usage();
        if (strcmp(argv[1], "--test-memory") == 0) {
            if (argc == 3) {
                memtest(atoi(argv[2]),50);
                exit(0);
            } else {
                fprintf(stderr,"Please specify the amount of memory to test in megabytes.\n");
                fprintf(stderr,"Example: ./redis-server --test-memory 4096\n\n");
                exit(1);
            }
        }

#检查是不是配置文件，如果不是‘-’或者‘--’开始的表示是配置文件
        /* First argument is the config file name? */
        if (argv[j][0] != '-' || argv[j][1] != '-') {
#对配置文件解析，如redis-server ./redis.conf 启动，获取./redis.conf绝对路径并进行下一个参数解析处理
            configfile = argv[j];
            server.configfile = getAbsolutePath(configfile);
            /* Replace the config file in server.exec_argv with
             * its absolute path. */
            zfree(server.exec_argv[j]);
            server.exec_argv[j] = zstrdup(server.configfile);
            j++;
        }

#解析除前面配置文件参数之外的其他参数，然后将参数添加到配置文件redis.conf中
        /* All the other options are parsed and conceptually appended to the
         * configuration file. For instance --port 6380 will generate the
         * string "port 6380\n" to be parsed after the actual file name
         * is parsed, if any. */
        while(j != argc) {
            if (argv[j][0] == '-' && argv[j][1] == '-') {
                /* Option name */
                if (!strcmp(argv[j], "--check-rdb")) {
                    /* Argument has no options, need to skip for parsing. */
                    j++;
                    continue;
                }
                if (sdslen(options)) options = sdscat(options,"\n");
                options = sdscat(options,argv[j]+2);
                options = sdscat(options," ");
            } else {
                /* Option argument */
                options = sdscatrepr(options,argv[j],strlen(argv[j]));
                options = sdscat(options," ");
            }
            j++;
        }

#如果是运行在哨兵模式，需要指定哨兵配置文件
        if (server.sentinel_mode && configfile && *configfile == '-') {
            serverLog(LL_WARNING,
                "Sentinel config from STDIN not allowed.");
            serverLog(LL_WARNING,
                "Sentinel needs config file on disk to save state.  Exiting...");
            exit(1);
        }

#加载redis.conf配置文件，并添加启动命令中的其他参数，覆盖配置文件中的配置参数，配置文件解析的函数是loadServerConfigFromString，具体的每个配置项的解析可参考该方法
#采用fopen函数打开配置文件，用fgets函数读取配置文件内容，loadServerConfigFromString解析配置并映射到全局的server结构体上
        loadServerConfig(configfile,options);
        sdsfree(options);
    }

#记录启动日志，判断是否为哨兵模式启动，给出启动命令示例
    serverLog(LL_WARNING, "oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo");
    serverLog(LL_WARNING,
        "Redis version=%s, bits=%d, commit=%s, modified=%d, pid=%d, just started",
            REDIS_VERSION,
            (sizeof(long) == 8) ? 64 : 32,
            redisGitSHA1(),
            strtol(redisGitDirty(),NULL,10) > 0,
            (int)getpid());

    if (argc == 1) {
        serverLog(LL_WARNING, "Warning: no config file specified, using the default config. In order to specify a config file use %s /path/to/%s.conf", argv[0], server.sentinel_mode ? "sentinel" : "redis");
    } else {
        serverLog(LL_WARNING, "Configuration loaded");
    }

#解析redis的守护进程管理模式
    server.supervised = redisIsSupervised(server.supervised_mode);
    int background = server.daemonize && !server.supervised;
#如果redis server是作为常驻进程运行，通过fork函数创建子进程继续运行，然后父进程exit(0)
    if (background) daemonize();
    readOOMScoreAdj();

#完成配置加载之后，初始化服务器相关资源
    initServer();
    if (background || server.pidfile) createPidFile();
    redisSetProcTitle(argv[0]);
    redisAsciiArt();
    checkTcpBacklogSettings();

#非哨兵模式，执行if条件内容
    if (!server.sentinel_mode) {
        /* Things not needed when running in Sentinel mode. */
        serverLog(LL_WARNING,"Server initialized");
    #ifdef __linux__
        linuxMemoryWarnings();
    #endif
        moduleInitModulesSystemLast();
#加载各个module内容
        moduleLoadFromQueue();
        ACLLoadUsersAtStartup();
#对Server端做后续启动加载，如BIO的多线程创建以及分配，初始化IO线程（线程最大限制在128）
        InitServerLast();
#从磁盘上加载数据库数据
        loadDataFromDisk();
        if (server.cluster_enabled) {
            if (verifyClusterConfigWithData() == C_ERR) {
                serverLog(LL_WARNING,
                    "You can't have keys in a DB different than DB 0 when in "
                    "Cluster mode. Exiting.");
                exit(1);
            }
        }
        if (server.ipfd_count > 0 || server.tlsfd_count > 0)
            serverLog(LL_NOTICE,"Ready to accept connections");
        if (server.sofd > 0)
            serverLog(LL_NOTICE,"The server is now ready to accept connections at %s", server.unixsocket);
        if (server.supervised_mode == SUPERVISED_SYSTEMD) {
            if (!server.masterhost) {
                redisCommunicateSystemd("STATUS=Ready to accept connections\n");
                redisCommunicateSystemd("READY=1\n");
            } else {
                redisCommunicateSystemd("STATUS=Waiting for MASTER <-> REPLICA sync\n");
            }
        }
    } else {
        InitServerLast();
        sentinelIsRunning();
        if (server.supervised_mode == SUPERVISED_SYSTEMD) {
            redisCommunicateSystemd("STATUS=Ready to accept connections\n");
            redisCommunicateSystemd("READY=1\n");
        }
    }

    /* Warning the user about suspicious maxmemory setting. */
    if (server.maxmemory > 0 && server.maxmemory < 1024*1024) {
        serverLog(LL_WARNING,"WARNING: You specified a maxmemory value that is less than 1MB (current value is %llu bytes). Are you sure this is what you really want?", server.maxmemory);
    }

    redisSetCpuAffinity(server.server_cpulist);
    setOOMScoreAdj(-1);

#监听事件，并做事件处理
    aeMain(server.el);
    aeDeleteEventLoop(server.el);
    return 0;
}

在Redis的server文件server.c文件中，重要的变量redisCommandTable维护了redis的各个命令对应的处理函数，当redis服务端收到对应的命令事件时，会调用对应的函数进行解析处理。

4.Redis关键数据结构

Redis关键数据结构有：

redisObject：所有的Redis基础数据对象，会封装转换为redisObject对象，该对象为对底层的数据对象的封装

#redis对基础类型做了封装，统一转化为该对象，定义别名robj
typedef struct redisObject {
    unsigned type:4;    #对象类型信息，如REDIS_STRING,REDIS_LIST等
    unsigned encoding:4; #表示ptr指针指向的数据结构的对象的编码，如REDIS_ENCODING_INT
    unsigned lru:LRU_BITS; #表示对象最后一次被程序访问的时间
                            /* LRU time (relative to global lru_clock) or
                            * LFU data (least significant 8 bits frequency
                            * and most significant 16 bits access time). */
    int refcount;    #引用计数，主要是考虑自动内存回收机制使用的
    void *ptr;   #指向了底层的数据类型的对象的指针
} robj;

redisDb：redis是支持多个数据库的，默认16个数据库，这里一个数据库采用redisDb结构表示。

/* Redis database representation. There are multiple databases identified
 * by integers from 0 (the default database) up to the max configured
 * database. The database number is the 'id' field in the structure. */
typedef struct redisDb {
    dict *dict;                 /* The keyspace for this DB 数据库的key采用dict数据结构表示*/
    dict *expires;              /* Timeout of keys with a timeout set 有失效时间的key记录在该集合 */
    dict *blocking_keys;        /* Keys with clients waiting for data (BLPOP)*/
    dict *ready_keys;           /* Blocked keys that received a PUSH */
    dict *watched_keys;         /* WATCHED keys for MULTI/EXEC CAS */
    int id;                     /* Database ID 数据库的ID标示*/ 
    long long avg_ttl;          /* Average TTL, just for stats */
    unsigned long expires_cursor; /* Cursor of the active expire cycle. */
    list *defrag_later;         /* List of key names to attempt to defrag one by one, gradually. */
} redisDb;

zskiplistNode：Redis里面大量使用了zskiplist数据结构，该数据结构list的每一个Node节点的类型为该类型，该类型是一个链表结构体。

zskiplist：跳跃表数据结构。跳跃表是一种有序数据结构，它通过在每个结点中维持多个指向其他结点的指针，从而达到快速访问其他结点的目的。大多数情况下，跳跃表的效率的平衡树不相上下，并且跳跃表的实现更加简单。Redis只在两个地方使用了跳跃表：有序集合键和集群结点中用作内部数据结构。

/* ZSETs use a specialized version of Skiplists */
typedef struct zskiplistNode {
    sds ele;
    double score;
    struct zskiplistNode *backward;
    struct zskiplistLevel {
        struct zskiplistNode *forward;
        unsigned long span;
    } level[];
} zskiplistNode;

typedef struct zskiplist {
    struct zskiplistNode *header, *tail;
    unsigned long length;
    int level;
} zskiplist;

graphviz-8fc5de396a5b52c3d0b1991a1e09558ad055dd86

关于跳跃表图片来自：https://www.cnblogs.com/yinbiao/p/11238374.html

在这里插入图片描述

图片源自：https://blog.csdn.net/zy450271923/article/details/106970148/

跳跃表就像是上图一样的一个多层的链表，如果查询46的话。其步骤是：
（1）查询L4层，查询55，需要查询1次
（2）查询L3层，查询–>21–>55，需要查询2次
（3）查询L2层，查询–>37–>55，需要查询2次
（4）查询L1层，查询–>46，查询1次，找到结果

跳跃表就好像每两个元素抽取一个元素放到上一层，这样一次叠加，就形成了多层的链表。上一层的元素个数是下一层元素个数的1/2，所以查询的时候就类似二分查找。
这种方法类似于二分查找的方法，所以跳跃表的查找的时间复杂度为O(logN)。跳跃表每个节点包含两个指针，一个指向同一链表中的下一个元素（next），一个指向下面一层的元素（down）。

zset：redis的有序集合，是基于跳跃表来实现。

typedef struct zset {
    dict *dict;
    zskiplist *zsl;
} zset;

dict：作为整个数据库的数据结构，以及hash数据结构。整个redis的数据库就是一个大的dict对象。dict关键点是rehash的过程，首先dict的设计上是采用两张hash table，未扩容的时候，只是用其中一张hash table，扩容的时候，会每一次将部分旧哈希表的keys移动到新的哈希表中。然后，扩容期间查找的时候，先从旧的哈希表中查找，然后从新的哈希表中查找。扩容完成之后，会将扩容完成的新的哈希表设置未旧的哈希表（减少查找过程中访问的次数）。整个扩容期间，会置位rehashidx，当不为-1的时候标示正在rehash操作。

#dict的元素对象,哈希桶的链表入口
typedef struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;
} dictEntry;

#dict类型包含的函数指针，支持哈希数据结构操作相关方法，如哈希函数，key，value相关操作函数
typedef struct dictType {
    uint64_t (*hashFunction)(const void *key);
    void *(*keyDup)(void *privdata, const void *key);
    void *(*valDup)(void *privdata, const void *obj);
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);
    void (*keyDestructor)(void *privdata, void *key);
    void (*valDestructor)(void *privdata, void *obj);
} dictType;

#哈希表？
/* This is our hash table structure. Every dictionary has two of this as we
 * implement incremental rehashing, for the old to the new table. */
typedef struct dictht {
    dictEntry **table; #这里是一个二级指针，说明是指向dictEntry入口的指针，难道hash的入口是根据地址的加减计算调整？这里是一个数组，从dict.c文件中_dictKeyIndex函数中可以看得出来。
    unsigned long size;
    unsigned long sizemask;
    unsigned long used;
} dictht;

#dict数据结构
typedef struct dict {
    dictType *type;  #支持哪些操作类型
    void *privdata;
    dictht ht[2];    #dict本身包含两个哈希表，一个用于主的，另一个是扩容的时候用到
    long rehashidx; /* rehashing not in progress if rehashidx == -1 */ #标示是否正在重新哈希
    unsigned long iterators; /* number of iterators currently running */
} dict;

redisServer：redis的服务端整个服务的实例，作为服务控制的godclass。

ziplist：压缩表，作为list数据结构的结构体，采用zipentry作为每一个元素对象。压缩表ziplist作为hash,list,set等数据结构的基础结构。ziplist源码中的说明如下：

 * ----------------------------------------------------------------------------
 *
 * ZIPLIST OVERALL LAYOUT
 * ======================
 *
 * The general layout of the ziplist is as follows:
 *
 * <zlbytes> <zltail> <zllen> <entry> <entry> ... <entry> <zlend>
 *
 * NOTE: all fields are stored in little endian（小端存储）, if not specified otherwise.
 *
 * <uint32_t zlbytes>（压缩表大小，占用字节数） is an unsigned integer to hold the number of bytes that
 * the ziplist occupies, including the four bytes of the zlbytes field itself.
 * This value needs to be stored to be able to resize the entire structure
 * without the need to traverse it first.
 *
 * <uint32_t zltail> （链表最后一个元素的偏移量）is the offset to the last entry in the list. This allows
 * a pop operation on the far side of the list without the need for full
 * traversal.
 *
 * <uint16_t zllen> （链表中元素entry的个数）is the number of entries. When there are more than
 * 2^16-2 entries, this value is set to 2^16-1 and we need to traverse the
 * entire list to know how many items it holds.
 *
 * <uint8_t zlend> （链表尾结点）is a special entry representing the end of the ziplist.
 * Is encoded as a single byte equal to 255. No other normal entry starts
 * with a byte set to the value of 255.
 *
 * ZIPLIST ENTRIES （压缩表的entry元素）
 * ===============
 *
 * Every entry in the ziplist is prefixed by metadata that contains two pieces
 * of information. First, the length of the previous entry is stored to be
 * able to traverse the list from back to front. Second, the entry encoding is
 * provided. It represents the entry type, integer or string, and in the case
 * of strings it also represents the length of the string payload.
 * So a complete entry is stored like this:
 *
 * <prevlen> <encoding> <entry-data>
 *
 * Sometimes the encoding represents the entry itself, like for small integers
 * as we'll see later. In such a case the <entry-data> part is missing, and we
 * could have just:
 *
 * <prevlen> <encoding>
 *
 * The length of the previous entry, <prevlen>, is encoded in the following way:
 * If this length is smaller than 254 bytes, it will only consume a single
 * byte representing the length as an unsinged 8 bit integer. When the length
 * is greater than or equal to 254, it will consume 5 bytes. The first byte is
 * set to 254 (FE) to indicate a larger value is following. The remaining 4
 * bytes take the length of the previous entry as value.
 *
 * So practically an entry is encoded in the following way:
 *
 * <prevlen from 0 to 253> <encoding> <entry>
 *
 * Or alternatively if the previous entry length is greater than 253 bytes
 * the following encoding is used:
 *
 * 0xFE <4 bytes unsigned little endian prevlen> <encoding> <entry>
 *
 * The encoding field of the entry depends on the content of the
 * entry. When the entry is a string, the first 2 bits of the encoding first
 * byte will hold the type of encoding used to store the length of the string,
 * followed by the actual length of the string. When the entry is an integer
 * the first 2 bits are both set to 1. The following 2 bits are used to specify
 * what kind of integer will be stored after this header. An overview of the
 * different types and encodings is as follows. The first byte is always enough
 * to determine the kind of entry.
 *
 * |00pppppp| - 1 byte
 *      String value with length less than or equal to 63 bytes (6 bits).
 *      "pppppp" represents the unsigned 6 bit length.
 * |01pppppp|qqqqqqqq| - 2 bytes
 *      String value with length less than or equal to 16383 bytes (14 bits).
 *      IMPORTANT: The 14 bit number is stored in big endian.
 * |10000000|qqqqqqqq|rrrrrrrr|ssssssss|tttttttt| - 5 bytes
 *      String value with length greater than or equal to 16384 bytes.
 *      Only the 4 bytes following the first byte represents the length
 *      up to 2^32-1. The 6 lower bits of the first byte are not used and
 *      are set to zero.
 *      IMPORTANT: The 32 bit number is stored in big endian.
 * |11000000| - 3 bytes
 *      Integer encoded as int16_t (2 bytes).
 * |11010000| - 5 bytes
 *      Integer encoded as int32_t (4 bytes).
 * |11100000| - 9 bytes
 *      Integer encoded as int64_t (8 bytes).
 * |11110000| - 4 bytes
 *      Integer encoded as 24 bit signed (3 bytes).
 * |11111110| - 2 bytes
 *      Integer encoded as 8 bit signed (1 byte).
 * |1111xxxx| - (with xxxx between 0000 and 1101) immediate 4 bit integer.
 *      Unsigned integer from 0 to 12. The encoded value is actually from
 *      1 to 13 because 0000 and 1111 can not be used, so 1 should be
 *      subtracted from the encoded 4 bit value to obtain the right value.
 * |11111111| - End of ziplist special entry.
 *
 * Like for the ziplist header, all the integers are represented in little
 * endian byte order, even when this code is compiled in big endian systems.
 *
 * EXAMPLES OF ACTUAL ZIPLISTS
 * ===========================
 *
 * The following is a ziplist containing the two elements representing
 * the strings "2" and "5". It is composed of 15 bytes, that we visually
 * split into sections:
 *
 *  [0f 00 00 00] [0c 00 00 00] [02 00] [00 f3] [02 f6] [ff]
 *        |             |          |       |       |     |
 *     zlbytes        zltail    entries   "2"     "5"   end
 *
 * The first 4 bytes represent the number 15, that is the number of bytes
 * the whole ziplist is composed of. The second 4 bytes are the offset
 * at which the last ziplist entry is found, that is 12, in fact the
 * last entry, that is "5", is at offset 12 inside the ziplist.
 * The next 16 bit integer represents the number of elements inside the
 * ziplist, its value is 2 since there are just two elements inside.
 * Finally "00 f3" is the first entry representing the number 2. It is
 * composed of the previous entry length, which is zero because this is
 * our first entry, and the byte F3 which corresponds to the encoding
 * |1111xxxx| with xxxx between 0001 and 1101. We need to remove the "F"
 * higher order bits 1111, and subtract 1 from the "3", so the entry value
 * is "2". The next entry has a prevlen of 02, since the first entry is
 * composed of exactly two bytes. The entry itself, F6, is encoded exactly
 * like the first entry, and 6-1 = 5, so the value of the entry is 5.
 * Finally the special entry FF signals the end of the ziplist.
 *
 * Adding another element to the above string with the value "Hello World"
 * allows us to show how the ziplist encodes small strings. We'll just show
 * the hex dump of the entry itself. Imagine the bytes as following the
 * entry that stores "5" in the ziplist above:
 *
 * [02] [0b] [48 65 6c 6c 6f 20 57 6f 72 6c 64]
 *
 * The first byte, 02, is the length of the previous entry. The next
 * byte represents the encoding in the pattern |00pppppp| that means
 * that the entry is a string of length <pppppp>, so 0B means that
 * an 11 bytes string follows. From the third byte (48) to the last (64)
 * there are just the ASCII characters for "Hello World".
 *
 * ----------------------------------------------------------------------------
 *

#ziplist的入口对象结构
/* We use this function to receive information about a ziplist entry.
 * Note that this is not how the data is actually encoded, is just what we
 * get filled by a function in order to operate more easily. */
typedef struct zlentry {
    unsigned int prevrawlensize; /* Bytes used to encode the previous entry len*/
    unsigned int prevrawlen;     /* Previous entry len. */
    unsigned int lensize;        /* Bytes used to encode this entry type/len.
                                    For example strings have a 1, 2 or 5 bytes
                                    header. Integers always use a single byte.*/
    unsigned int len;            /* Bytes used to represent the actual entry.
                                    For strings this is just the string length
                                    while for integers it is 1, 2, 3, 4, 8 or
                                    0 (for 4 bit immediate) depending on the
                                    number range. */
    unsigned int headersize;     /* prevrawlensize + lensize. */
    unsigned char encoding;      /* Set to ZIP_STR_* or ZIP_INT_* depending on
                                    the entry encoding. However for 4 bits
                                    immediate integers this can assume a range
                                    of values and must be range-checked. */
    unsigned char *p;            /* Pointer to the very start of the entry, that
                                    is, this points to prev-entry-len field. */
} zlentry;

#初始化ziplist
/* Create a new empty ziplist. */
unsigned char *ziplistNew(void) {
#分配大小
    unsigned int bytes = ZIPLIST_HEADER_SIZE+ZIPLIST_END_SIZE;
    unsigned char *zl = zmalloc(bytes);
    ZIPLIST_BYTES(zl) = intrev32ifbe(bytes);
    ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(ZIPLIST_HEADER_SIZE);
    ZIPLIST_LENGTH(zl) = 0;
    zl[bytes-1] = ZIP_END;
    return zl;
}


/* The size of a ziplist header: two 32 bit integers for the total
 * bytes count and last item offset. One 16 bit integer for the number
 * of items field. */
#define ZIPLIST_HEADER_SIZE     (sizeof(uint32_t)*2+sizeof(uint16_t))

/* Size of the "end of ziplist" entry. Just one byte. */
#define ZIPLIST_END_SIZE        (sizeof(uint8_t))
/* Return total bytes a ziplist is composed of. */
#define ZIPLIST_BYTES(zl)       (*((uint32_t*)(zl)))

/* Return the offset of the last item inside the ziplist. */
#define ZIPLIST_TAIL_OFFSET(zl) (*((uint32_t*)((zl)+sizeof(uint32_t))))

/* Return the length of a ziplist, or UINT16_MAX if the length cannot be
 * determined without scanning the whole ziplist. */
#define ZIPLIST_LENGTH(zl)      (*((uint16_t*)((zl)+sizeof(uint32_t)*2)))

quicklist: quicklist是基于ziplist压缩表实现的一个双向链表。redis的list数据结构的实现是基于quicklist实现的。

#quicklist的每一个Node节点
/* quicklistNode is a 32 byte struct describing a ziplist for a quicklist.
 * We use bit fields keep the quicklistNode at 32 bytes.
 * count: 16 bits, max 65536 (max zl bytes is 65k, so max count actually < 32k).
 * encoding: 2 bits, RAW=1, LZF=2.
 * container: 2 bits, NONE=1, ZIPLIST=2.
 * recompress: 1 bit, bool, true if node is temporarry decompressed for usage.
 * attempted_compress: 1 bit, boolean, used for verifying during testing.
 * extra: 10 bits, free for future use; pads out the remainder of 32 bits */
typedef struct quicklistNode {
    struct quicklistNode *prev;
    struct quicklistNode *next;
    unsigned char *zl;
    unsigned int sz;             /* ziplist size in bytes */
    unsigned int count : 16;     /* count of items in ziplist */
    unsigned int encoding : 2;   /* RAW==1 or LZF==2 */
    unsigned int container : 2;  /* NONE==1 or ZIPLIST==2 */
    unsigned int recompress : 1; /* was this node previous compressed? */
    unsigned int attempted_compress : 1; /* node can't compress; too small */
    unsigned int extra : 10; /* more bits to steal for future usage */
} quicklistNode;


/* quicklist is a 40 byte struct (on 64-bit systems) describing a quicklist.
 * 'count' is the number of total entries.
 * 'len' is the number of quicklist nodes.
 * 'compress' is: -1 if compression disabled, otherwise it's the number
 *                of quicklistNodes to leave uncompressed at ends of quicklist.
 * 'fill' is the user-requested (or default) fill factor.
 * 'bookmakrs are an optional feature that is used by realloc this struct,
 *      so that they don't consume memory when not used. */
typedef struct quicklist {
    quicklistNode *head;
    quicklistNode *tail;
    unsigned long count;        /* total count of all entries in all ziplists */
    unsigned long len;          /* number of quicklistNodes */
    int fill : QL_FILL_BITS;              /* fill factor for individual nodes */
    unsigned int compress : QL_COMP_BITS; /* depth of end nodes not to compress;0=off */
    unsigned int bookmark_count: QL_BM_BITS;
    quicklistBookmark bookmarks[];
} quicklist;

sds：sds数据结构是redis的string类型的数据结构，分为：sdshdr5，sdshdr8，sdshdr16，sdshdr32，sdshdr64等分别应用不同的场景。

typedef char *sds;

/* Note: sdshdr5 is never used, we just access the flags byte directly.
 * However is here to document the layout of type 5 SDS strings. */
struct __attribute__ ((__packed__)) sdshdr5 {
    unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr8 {
    uint8_t len; /* used */
    uint8_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
    uint16_t len; /* used */
    uint16_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
    uint32_t len; /* used */
    uint32_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
    uint64_t len; /* used */
    uint64_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

#define SDS_TYPE_5  0
#define SDS_TYPE_8  1
#define SDS_TYPE_16 2
#define SDS_TYPE_32 3
#define SDS_TYPE_64 4
#define SDS_TYPE_MASK 7
#define SDS_TYPE_BITS 3
#define SDS_HDR_VAR(T,s) struct sdshdr##T *sh = (void*)((s)-(sizeof(struct sdshdr##T)));
#define SDS_HDR(T,s) ((struct sdshdr##T *)((s)-(sizeof(struct sdshdr##T))))
#define SDS_TYPE_5_LEN(f) ((f)>>SDS_TYPE_BITS)


#创建String的函数，如果小于44创建内嵌的string，如果大于44则创建raw string，主要是以64个字节作为临界点
/* Create a string object with EMBSTR encoding if it is smaller than
 * OBJ_ENCODING_EMBSTR_SIZE_LIMIT, otherwise the RAW encoding is
 * used.
 *
 * The current limit of 44 is chosen so that the biggest string object
 * we allocate as EMBSTR will still fit into the 64 byte arena of jemalloc. */
#define OBJ_ENCODING_EMBSTR_SIZE_LIMIT 44
robj *createStringObject(const char *ptr, size_t len) {
    if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT)
        return createEmbeddedStringObject(ptr,len);
    else
        return createRawStringObject(ptr,len);
}