Redis源码分析

目录

1.Redis关键配置文件(redis.conf)

2.Redis哨兵配置文件(sentinel.conf)

3.Redis启动流程

4.Redis关键数据结构

5.参考文章


1.Redis关键配置文件(redis.conf)

redis.conf:该配置文件为redis的主要配置文件,可以在当前的配置文件中通过include包含多个子配置文件。该配置文件中包含的重点配置说明如下:

databases 16 该配置设置了Redis数据库的数量,默认的数据库是DB 0,如果制定数据库,需要在连接redis的时候指定数据库编号。

save 900 1 当DB的数据发生了变化的时候,隔多久将数据写入到磁盘上。前面的配置含义:after 900 sec (15 min) if at least 1 key changed。移除该注视表示禁用save能力。注意在《性能之巅》书中提到,该命令可能会导致间歇性的IO飙升,服务延时,如果涉及大量的大key-value频繁变更时需要重点关注一下。

replicaof <masterip> <masterport> 副本配置,redis的replication是将replication实例作为Redis Server的一个复制,首先是异步,可以配置指定的副本数量的前提下才接收写操作。

maxclients 10000 设置同时能够连接到Redis Server的最大客户端数量,默认10000. 

maxmemory-policy noeviction:内存不足时,移除现有的redis的key的策略,默认不驱逐,可选择的范围如下:

# volatile-lru -> Evict using approximated LRU, only keys with an expire set.
# allkeys-lru -> Evict any key using approximated LRU.
# volatile-lfu -> Evict using approximated LFU, only keys with an expire set.
# allkeys-lfu -> Evict any key using approximated LFU.
# volatile-random -> Remove a random key having an expire set.
# allkeys-random -> Remove a random key, any key.
# volatile-ttl -> Remove the key with the nearest expire time (minor TTL)
# noeviction -> Don't evict anything, just return an error on write operations.
#
# LRU means Least Recently Used
# LFU means Least Frequently Used
#
# Both LRU, LFU and volatile-ttl are implemented using approximated
# randomized algorithms.

# Note: with any of the above policies, Redis will return an error on write
#       operations, when there are no suitable keys for eviction.
#
#       At the date of writing these commands are: set setnx setex append
#       incr decr rpush lpush rpushx lpushx linsert lset rpoplpush sadd
#       sinter sinterstore sunion sunionstore sdiff sdiffstore zadd zincrby
#       zunionstore zinterstore hset hsetnx hmset hincrby incrby decrby
#       getset mset msetnx exec sort

# The default is:
#
# maxmemory-policy noeviction

lazyfree-lazy-eviction/expire/server-del no:该配置指定驱逐/过期/删除对应key的策略,支持阻塞式的删除,也支持异步处理的方式。

io-threads 4:Redis通常是单线程的,但是在如UNLINK,慢IO访问和其他的一些处理上是支持多线程的。现在同样支持处理客户端的socket读和写采用不同的IO线程,如写慢的时候读可以采用管道方式充分利用每个核加速提升Redis的性能,通常使用IO多线程能够提升Redis两倍的性能。默认情况下,多线程是禁用的,我们建议是在服务器有4个或者更多的核的时候才启用该配置。例如,如果你有一个4核的服务器,可以采用配置2或者3个IO线程; 如果有8核,建议配置6个IO线程。默认我们只针对写操作才启用多线程处理,如果需要启用读操作支持多线程,需要配置 io-threads-do-reads yes才能启用,通常情况下读操作没有太多的提升。当前配置不能在运行态配置,并且在SSL启用的时候,该配置无效。

lua-time-limit 5000:用于配置LUA脚本的执行时间,单位是毫秒。在较长时间的脚本执行过程中,只有SCRIPT KILL和SHUTDOWN NOSAVE命令两个命令可以用。

cluster-enabled yes:通常Redis实例独立运行,并不作为Redis集群的一部分。只有通过该配置启用redis实例,才能作为集群的一部分。每个集群都有自己的集群配置文件,可以通过配置项 cluster-config-file nodes-6379.conf 指定集群的配置,如获取集群节点的更新情况。Redis集群支持Docker的,最好是能够指定IP。

高级配置:配置hash数据结构的threshold如hash-max-ziplist-entries 512, hash-max-ziplist-value 64等。配置list数据结构最大的元素个数,如list-max-ziplist-size -2 (-2: max size: 8 Kb)配置,可按需调整。配置set的元素个数,如set-max-intset-entries 512。配置sort set数据结构的zset-max-ziplist-entries 128,zset-max-ziplist-value 64也是一样。配置HyperLogLog数据结构的尺寸限制,如hll-sparse-max-bytes 3000。配置stream数据结构的单个元素大小限制,如stream-node-max-bytes 4096,包含元素的数量限制,如stream-node-max-entries 100。

activerehashing yes:是否开启数据库的rehash操作,Active rehashing uses 1 millisecond every 100 milliseconds of CPU time in order to help rehashing the main Redis hash table (the one mapping top-level keys to values).

loadmodule /path/to/my_module.so:以往我们想给 Redis 加个功能或类似事务的东西只能用 Lua 脚本,这个东西没有实现真正的原子性,另外也无法使用底层的 API ,实质上比单纯的命令脚本提升有限。模块 Module 可以动态的载入和卸载,可以实现底层的数据结构也可以调用高层的指令,这一切都只需要包含头文件 redismodule.h ,和 Redis 本身一样简洁优雅。

2.Redis哨兵配置文件(sentinel.conf)

Redis采用哨兵模式对集群情况下的主从节点管理,并监控集群中各个节点的运行状态并及时上报。哨兵模式是一种特殊的模式,首先Redis提供了哨兵的命令,哨兵是一个独立的进程,作为进程,它会独立运行。其原理是哨兵通过发送命令,等待Redis服务器响应,从而监控运行的多个Redis实例。然而一个哨兵进程对Redis服务器进行监控,可能会出现问题,为此,我们可以使用多个哨兵进行监控。各个哨兵之间还会进行监控,这样就形成了多哨兵模式。用文字描述一下故障切换(failover)的过程。假设主服务器宕机,哨兵1先检测到这个结果,系统并不会马上进行failover过程,仅仅是哨兵1主观的认为主服务器不可用,这个现象成为主观下线。当后面的哨兵也检测到主服务器不可用,并且数量达到一定值时,那么哨兵之间就会进行一次投票,投票的结果由一个哨兵发起,进行failover操作。切换成功后,就会通过发布订阅模式,让各个哨兵把自己监控的从服务器实现切换主机,这个过程称为客观下线。这样对于客户端而言,一切都是透明的。(参考文章:https://www.jianshu.com/p/06ab9daf921d

sentinel monitor mymaster 127.0.0.1 6379 2: 使用Sentinel监控master节点的状态。Tells Sentinel to monitor this master, and to consider it in O_DOWN (Objectively Down) state only if at least <quorum> sentinels agree.

sentinel auth-pass <master-name> <password>: 设置鉴权的密码。Set the password to use to authenticate with the master and replicas. Useful if there is a password set in the Redis instances to monitor.

3.Redis启动流程

redis的入口文件为server.c,入口函数为main函数。redis的启动流程步骤如下:

* `initServerConfig()` setups the default values of the `server` structure. 
* `initServer()` allocates the data structures needed to operate, setup the listening socket, and so forth.
* `aeMain()` starts the event loop which listens for new connections.

There are two special functions called periodically by the event loop:

1. `serverCron()` is called periodically (according to `server.hz` frequency), and performs tasks that must be performed from time to time, like checking for timedout clients.
2. `beforeSleep()` is called every time the event loop fired, Redis served a few requests, and is returning back into the event loop.

Inside server.c you can find code that handles other vital things of the Redis server:

* `call()` is used in order to call a given command in the context of a given client.
* `activeExpireCycle()` handles eviction of keys with a time to live set via the `EXPIRE` command.
* `freeMemoryIfNeeded()` is called when a new write command should be performed but Redis is out of memory according to the `maxmemory` directive.
* The global variable `redisCommandTable` defines all the Redis commands, specifying the name of the command, the function implementing the command, the number of arguments required, and other properties of each command.
server.c文件中main函数作为redis server端入口源码如下:

int main(int argc, char **argv) {
    struct timeval tv;
    int j;
#在预编译阶段,设置宏定义,运行阶段根据编译设置的宏进行运行代码段选择,这里用于是否是测试的校验
#ifdef REDIS_TEST
    if (argc == 3 && !strcasecmp(argv[1], "test")) {
        if (!strcasecmp(argv[2], "ziplist")) {
            return ziplistTest(argc, argv);
        } else if (!strcasecmp(argv[2], "quicklist")) {
            quicklistTest(argc, argv);
        } else if (!strcasecmp(argv[2], "intset")) {
            return intsetTest(argc, argv);
        } else if (!strcasecmp(argv[2], "zipmap")) {
            return zipmapTest(argc, argv);
        } else if (!strcasecmp(argv[2], "sha1test")) {
            return sha1Test(argc, argv);
        } else if (!strcasecmp(argv[2], "util")) {
            return utilTest(argc, argv);
        } else if (!strcasecmp(argv[2], "endianconv")) {
            return endianconvTest(argc, argv);
        } else if (!strcasecmp(argv[2], "crc64")) {
            return crc64Test(argc, argv);
        } else if (!strcasecmp(argv[2], "zmalloc")) {
            return zmalloc_test(argc, argv);
        }

        return -1; /* test not found */
    }
#endif

#根据初始化宏定义,按需初始化操作
    /* We need to initialize our libraries, and the server configuration. */
#ifdef INIT_SETPROCTITLE_REPLACEMENT
    spt_init(argc, argv);
#endif
    setlocale(LC_COLLATE,"");
    tzset(); /* Populates 'timezone' global. */
#设置内存溢出处理方式
    zmalloc_set_oom_handler(redisOutOfMemoryHandler);
    srand(time(NULL)^getpid());
    srandom(time(NULL)^getpid());
    gettimeofday(&tv,NULL);
    crc64_init();

    uint8_t hashseed[16];
    getRandomBytes(hashseed,sizeof(hashseed));
    dictSetHashFunctionSeed(hashseed);
    server.sentinel_mode = checkForSentinelMode(argc,argv);
#初始化服务端配置
    initServerConfig();
    ACLInit(); /* The ACL subsystem must be initialized ASAP because the
                  basic networking code and client creation depends on it. */
#初始化redis支持的module
    moduleInitModulesSystem();
    tlsInit();

    /* Store the executable path and arguments in a safe place in order
     * to be able to restart the server later. */
    server.executable = getAbsolutePath(argv[0]);
    server.exec_argv = zmalloc(sizeof(char*)*(argc+1));
    server.exec_argv[argc] = NULL;
    for (j = 0; j < argc; j++) server.exec_argv[j] = zstrdup(argv[j]);

    /* We need to init sentinel right now as parsing the configuration file
     * in sentinel mode will have the effect of populating the sentinel
     * data structures with master nodes to monitor. */
#初始化哨兵相关配置
    if (server.sentinel_mode) {
        initSentinelConfig();
        initSentinel();
    }

    /* Check if we need to start in redis-check-rdb/aof mode. We just execute
     * the program main. However the program is part of the Redis executable
     * so that we can easily execute an RDB check on loading errors. */
    if (strstr(argv[0],"redis-check-rdb") != NULL)
        redis_check_rdb_main(argc,argv,NULL);
    else if (strstr(argv[0],"redis-check-aof") != NULL)
        redis_check_aof_main(argc,argv);

#参数检查,并对参数进行相关解析
    if (argc >= 2) {
#第一个参数是执行程序本身,第二个参数才传递的真正参数,即j=1表示开始获取参数并解析
        j = 1; /* First option to parse in argv[] */
        sds options = sdsempty();
        char *configfile = NULL;

#基本参数解析
        /* Handle special options --help and --version */
        if (strcmp(argv[1], "-v") == 0 ||
            strcmp(argv[1], "--version") == 0) version();
        if (strcmp(argv[1], "--help") == 0 ||
            strcmp(argv[1], "-h") == 0) usage();
        if (strcmp(argv[1], "--test-memory") == 0) {
            if (argc == 3) {
                memtest(atoi(argv[2]),50);
                exit(0);
            } else {
                fprintf(stderr,"Please specify the amount of memory to test in megabytes.\n");
                fprintf(stderr,"Example: ./redis-server --test-memory 4096\n\n");
                exit(1);
            }
        }

#检查是不是配置文件,如果不是‘-’或者‘--’开始的表示是配置文件
        /* First argument is the config file name? */
        if (argv[j][0] != '-' || argv[j][1] != '-') {
#对配置文件解析,如redis-server ./redis.conf 启动,获取./redis.conf绝对路径并进行下一个参数解析处理
            configfile = argv[j];
            server.configfile = getAbsolutePath(configfile);
            /* Replace the config file in server.exec_argv with
             * its absolute path. */
            zfree(server.exec_argv[j]);
            server.exec_argv[j] = zstrdup(server.configfile);
            j++;
        }

#解析除前面配置文件参数之外的其他参数,然后将参数添加到配置文件redis.conf中
        /* All the other options are parsed and conceptually appended to the
         * configuration file. For instance --port 6380 will generate the
         * string "port 6380\n" to be parsed after the actual file name
         * is parsed, if any. */
        while(j != argc) {
            if (argv[j][0] == '-' && argv[j][1] == '-') {
                /* Option name */
                if (!strcmp(argv[j], "--check-rdb")) {
                    /* Argument has no options, need to skip for parsing. */
                    j++;
                    continue;
                }
                if (sdslen(options)) options = sdscat(options,"\n");
                options = sdscat(options,argv[j]+2);
                options = sdscat(options," ");
            } else {
                /* Option argument */
                options = sdscatrepr(options,argv[j],strlen(argv[j]));
                options = sdscat(options," ");
            }
            j++;
        }

#如果是运行在哨兵模式,需要指定哨兵配置文件
        if (server.sentinel_mode && configfile && *configfile == '-') {
            serverLog(LL_WARNING,
                "Sentinel config from STDIN not allowed.");
            serverLog(LL_WARNING,
                "Sentinel needs config file on disk to save state.  Exiting...");
            exit(1);
        }

#加载redis.conf配置文件,并添加启动命令中的其他参数,覆盖配置文件中的配置参数,配置文件解析的函数是loadServerConfigFromString,具体的每个配置项的解析可参考该方法
#采用fopen函数打开配置文件,用fgets函数读取配置文件内容,loadServerConfigFromString解析配置并映射到全局的server结构体上
        loadServerConfig(configfile,options);
        sdsfree(options);
    }

#记录启动日志,判断是否为哨兵模式启动,给出启动命令示例
    serverLog(LL_WARNING, "oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo");
    serverLog(LL_WARNING,
        "Redis version=%s, bits=%d, commit=%s, modified=%d, pid=%d, just started",
            REDIS_VERSION,
            (sizeof(long) == 8) ? 64 : 32,
            redisGitSHA1(),
            strtol(redisGitDirty(),NULL,10) > 0,
            (int)getpid());

    if (argc == 1) {
        serverLog(LL_WARNING, "Warning: no config file specified, using the default config. In order to specify a config file use %s /path/to/%s.conf", argv[0], server.sentinel_mode ? "sentinel" : "redis");
    } else {
        serverLog(LL_WARNING, "Configuration loaded");
    }

#解析redis的守护进程管理模式
    server.supervised = redisIsSupervised(server.supervised_mode);
    int background = server.daemonize && !server.supervised;
#如果redis server是作为常驻进程运行,通过fork函数创建子进程继续运行,然后父进程exit(0)
    if (background) daemonize();
    readOOMScoreAdj();

#完成配置加载之后,初始化服务器相关资源
    initServer();
    if (background || server.pidfile) createPidFile();
    redisSetProcTitle(argv[0]);
    redisAsciiArt();
    checkTcpBacklogSettings();

#非哨兵模式,执行if条件内容
    if (!server.sentinel_mode) {
        /* Things not needed when running in Sentinel mode. */
        serverLog(LL_WARNING,"Server initialized");
    #ifdef __linux__
        linuxMemoryWarnings();
    #endif
        moduleInitModulesSystemLast();
#加载各个module内容
        moduleLoadFromQueue();
        ACLLoadUsersAtStartup();
#对Server端做后续启动加载,如BIO的多线程创建以及分配,初始化IO线程(线程最大限制在128)
        InitServerLast();
#从磁盘上加载数据库数据
        loadDataFromDisk();
        if (server.cluster_enabled) {
            if (verifyClusterConfigWithData() == C_ERR) {
                serverLog(LL_WARNING,
                    "You can't have keys in a DB different than DB 0 when in "
                    "Cluster mode. Exiting.");
                exit(1);
            }
        }
        if (server.ipfd_count > 0 || server.tlsfd_count > 0)
            serverLog(LL_NOTICE,"Ready to accept connections");
        if (server.sofd > 0)
            serverLog(LL_NOTICE,"The server is now ready to accept connections at %s", server.unixsocket);
        if (server.supervised_mode == SUPERVISED_SYSTEMD) {
            if (!server.masterhost) {
                redisCommunicateSystemd("STATUS=Ready to accept connections\n");
                redisCommunicateSystemd("READY=1\n");
            } else {
                redisCommunicateSystemd("STATUS=Waiting for MASTER <-> REPLICA sync\n");
            }
        }
    } else {
        InitServerLast();
        sentinelIsRunning();
        if (server.supervised_mode == SUPERVISED_SYSTEMD) {
            redisCommunicateSystemd("STATUS=Ready to accept connections\n");
            redisCommunicateSystemd("READY=1\n");
        }
    }

    /* Warning the user about suspicious maxmemory setting. */
    if (server.maxmemory > 0 && server.maxmemory < 1024*1024) {
        serverLog(LL_WARNING,"WARNING: You specified a maxmemory value that is less than 1MB (current value is %llu bytes). Are you sure this is what you really want?", server.maxmemory);
    }

    redisSetCpuAffinity(server.server_cpulist);
    setOOMScoreAdj(-1);

#监听事件,并做事件处理
    aeMain(server.el);
    aeDeleteEventLoop(server.el);
    return 0;
}

在Redis的server文件server.c文件中,重要的变量redisCommandTable维护了redis的各个命令对应的处理函数,当redis服务端收到对应的命令事件时,会调用对应的函数进行解析处理。

4.Redis关键数据结构

Redis关键数据结构有:

redisObject:所有的Redis基础数据对象,会封装转换为redisObject对象,该对象为对底层的数据对象的封装

#redis对基础类型做了封装,统一转化为该对象,定义别名robj
typedef struct redisObject {
    unsigned type:4;    #对象类型信息,如REDIS_STRING,REDIS_LIST等
    unsigned encoding:4; #表示ptr指针指向的数据结构的对象的编码,如REDIS_ENCODING_INT
    unsigned lru:LRU_BITS; #表示对象最后一次被程序访问的时间
                            /* LRU time (relative to global lru_clock) or
                            * LFU data (least significant 8 bits frequency
                            * and most significant 16 bits access time). */
    int refcount;    #引用计数,主要是考虑自动内存回收机制使用的
    void *ptr;   #指向了底层的数据类型的对象的指针
} robj;

redisDb:redis是支持多个数据库的,默认16个数据库,这里一个数据库采用redisDb结构表示。

/* Redis database representation. There are multiple databases identified
 * by integers from 0 (the default database) up to the max configured
 * database. The database number is the 'id' field in the structure. */
typedef struct redisDb {
    dict *dict;                 /* The keyspace for this DB 数据库的key采用dict数据结构表示*/
    dict *expires;              /* Timeout of keys with a timeout set 有失效时间的key记录在该集合 */
    dict *blocking_keys;        /* Keys with clients waiting for data (BLPOP)*/
    dict *ready_keys;           /* Blocked keys that received a PUSH */
    dict *watched_keys;         /* WATCHED keys for MULTI/EXEC CAS */
    int id;                     /* Database ID 数据库的ID标示*/ 
    long long avg_ttl;          /* Average TTL, just for stats */
    unsigned long expires_cursor; /* Cursor of the active expire cycle. */
    list *defrag_later;         /* List of key names to attempt to defrag one by one, gradually. */
} redisDb;

zskiplistNode:Redis里面大量使用了zskiplist数据结构,该数据结构list的每一个Node节点的类型为该类型,该类型是一个链表结构体。

zskiplist:跳跃表数据结构。跳跃表是一种有序数据结构,它通过在每个结点中维持多个指向其他结点的指针,从而达到快速访问其他结点的目的。大多数情况下,跳跃表的效率的平衡树不相上下, 并且跳跃表的实现更加简单。Redis只在两个地方使用了跳跃表:有序集合键和集群结点中用作内部数据结构。

/* ZSETs use a specialized version of Skiplists */
typedef struct zskiplistNode {
    sds ele;
    double score;
    struct zskiplistNode *backward;
    struct zskiplistLevel {
        struct zskiplistNode *forward;
        unsigned long span;
    } level[];
} zskiplistNode;

typedef struct zskiplist {
    struct zskiplistNode *header, *tail;
    unsigned long length;
    int level;
} zskiplist;

graphviz-8fc5de396a5b52c3d0b1991a1e09558ad055dd86

关于跳跃表图片来自:https://www.cnblogs.com/yinbiao/p/11238374.html

在这里插入图片描述

图片源自:https://blog.csdn.net/zy450271923/article/details/106970148/

跳跃表就像是上图一样的一个多层的链表,如果查询46的话。其步骤是:
(1)查询L4层,查询55,需要查询1次
(2)查询L3层,查询–>21–>55,需要查询2次
(3)查询L2层,查询–>37–>55,需要查询2次
(4)查询L1层,查询–>46,查询1次,找到结果

跳跃表就好像每两个元素抽取一个元素放到上一层,这样一次叠加,就形成了多层的链表。上一层的元素个数是下一层元素个数的1/2,所以查询的时候就类似二分查找。
这种方法类似于二分查找的方法,所以跳跃表的查找的时间复杂度为O(logN)。跳跃表每个节点包含两个指针,一个指向同一链表中的下一个元素(next),一个指向下面一层的元素(down)。

zset:redis的有序集合,是基于跳跃表来实现。

typedef struct zset {
    dict *dict;
    zskiplist *zsl;
} zset;

dict:作为整个数据库的数据结构,以及hash数据结构。整个redis的数据库就是一个大的dict对象。dict关键点是rehash的过程,首先dict的设计上是采用两张hash table,未扩容的时候,只是用其中一张hash table,扩容的时候,会每一次将部分旧哈希表的keys移动到新的哈希表中。然后,扩容期间查找的时候,先从旧的哈希表中查找,然后从新的哈希表中查找。扩容完成之后,会将扩容完成的新的哈希表设置未旧的哈希表(减少查找过程中访问的次数)。整个扩容期间,会置位rehashidx,当不为-1的时候标示正在rehash操作。

#dict的元素对象,哈希桶的链表入口
typedef struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;
} dictEntry;

#dict类型包含的函数指针,支持哈希数据结构操作相关方法,如哈希函数,key,value相关操作函数
typedef struct dictType {
    uint64_t (*hashFunction)(const void *key);
    void *(*keyDup)(void *privdata, const void *key);
    void *(*valDup)(void *privdata, const void *obj);
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);
    void (*keyDestructor)(void *privdata, void *key);
    void (*valDestructor)(void *privdata, void *obj);
} dictType;

#哈希表?
/* This is our hash table structure. Every dictionary has two of this as we
 * implement incremental rehashing, for the old to the new table. */
typedef struct dictht {
    dictEntry **table; #这里是一个二级指针,说明是指向dictEntry入口的指针,难道hash的入口是根据地址的加减计算调整?这里是一个数组,从dict.c文件中_dictKeyIndex函数中可以看得出来。
    unsigned long size;
    unsigned long sizemask;
    unsigned long used;
} dictht;

#dict数据结构
typedef struct dict {
    dictType *type;  #支持哪些操作类型
    void *privdata;
    dictht ht[2];    #dict本身包含两个哈希表,一个用于主的,另一个是扩容的时候用到
    long rehashidx; /* rehashing not in progress if rehashidx == -1 */ #标示是否正在重新哈希
    unsigned long iterators; /* number of iterators currently running */
} dict;

redisServer:redis的服务端整个服务的实例,作为服务控制的godclass。

ziplist:压缩表,作为list数据结构的结构体,采用zipentry作为每一个元素对象。压缩表ziplist作为hash,list,set等数据结构的基础结构。ziplist源码中的说明如下:

 * ----------------------------------------------------------------------------
 *
 * ZIPLIST OVERALL LAYOUT
 * ======================
 *
 * The general layout of the ziplist is as follows:
 *
 * <zlbytes> <zltail> <zllen> <entry> <entry> ... <entry> <zlend>
 *
 * NOTE: all fields are stored in little endian(小端存储), if not specified otherwise.
 *
 * <uint32_t zlbytes>(压缩表大小,占用字节数) is an unsigned integer to hold the number of bytes that
 * the ziplist occupies, including the four bytes of the zlbytes field itself.
 * This value needs to be stored to be able to resize the entire structure
 * without the need to traverse it first.
 *
 * <uint32_t zltail> (链表最后一个元素的偏移量)is the offset to the last entry in the list. This allows
 * a pop operation on the far side of the list without the need for full
 * traversal.
 *
 * <uint16_t zllen> (链表中元素entry的个数)is the number of entries. When there are more than
 * 2^16-2 entries, this value is set to 2^16-1 and we need to traverse the
 * entire list to know how many items it holds.
 *
 * <uint8_t zlend> (链表尾结点)is a special entry representing the end of the ziplist.
 * Is encoded as a single byte equal to 255. No other normal entry starts
 * with a byte set to the value of 255.
 *
 * ZIPLIST ENTRIES (压缩表的entry元素)
 * ===============
 *
 * Every entry in the ziplist is prefixed by metadata that contains two pieces
 * of information. First, the length of the previous entry is stored to be
 * able to traverse the list from back to front. Second, the entry encoding is
 * provided. It represents the entry type, integer or string, and in the case
 * of strings it also represents the length of the string payload.
 * So a complete entry is stored like this:
 *
 * <prevlen> <encoding> <entry-data>
 *
 * Sometimes the encoding represents the entry itself, like for small integers
 * as we'll see later. In such a case the <entry-data> part is missing, and we
 * could have just:
 *
 * <prevlen> <encoding>
 *
 * The length of the previous entry, <prevlen>, is encoded in the following way:
 * If this length is smaller than 254 bytes, it will only consume a single
 * byte representing the length as an unsinged 8 bit integer. When the length
 * is greater than or equal to 254, it will consume 5 bytes. The first byte is
 * set to 254 (FE) to indicate a larger value is following. The remaining 4
 * bytes take the length of the previous entry as value.
 *
 * So practically an entry is encoded in the following way:
 *
 * <prevlen from 0 to 253> <encoding> <entry>
 *
 * Or alternatively if the previous entry length is greater than 253 bytes
 * the following encoding is used:
 *
 * 0xFE <4 bytes unsigned little endian prevlen> <encoding> <entry>
 *
 * The encoding field of the entry depends on the content of the
 * entry. When the entry is a string, the first 2 bits of the encoding first
 * byte will hold the type of encoding used to store the length of the string,
 * followed by the actual length of the string. When the entry is an integer
 * the first 2 bits are both set to 1. The following 2 bits are used to specify
 * what kind of integer will be stored after this header. An overview of the
 * different types and encodings is as follows. The first byte is always enough
 * to determine the kind of entry.
 *
 * |00pppppp| - 1 byte
 *      String value with length less than or equal to 63 bytes (6 bits).
 *      "pppppp" represents the unsigned 6 bit length.
 * |01pppppp|qqqqqqqq| - 2 bytes
 *      String value with length less than or equal to 16383 bytes (14 bits).
 *      IMPORTANT: The 14 bit number is stored in big endian.
 * |10000000|qqqqqqqq|rrrrrrrr|ssssssss|tttttttt| - 5 bytes
 *      String value with length greater than or equal to 16384 bytes.
 *      Only the 4 bytes following the first byte represents the length
 *      up to 2^32-1. The 6 lower bits of the first byte are not used and
 *      are set to zero.
 *      IMPORTANT: The 32 bit number is stored in big endian.
 * |11000000| - 3 bytes
 *      Integer encoded as int16_t (2 bytes).
 * |11010000| - 5 bytes
 *      Integer encoded as int32_t (4 bytes).
 * |11100000| - 9 bytes
 *      Integer encoded as int64_t (8 bytes).
 * |11110000| - 4 bytes
 *      Integer encoded as 24 bit signed (3 bytes).
 * |11111110| - 2 bytes
 *      Integer encoded as 8 bit signed (1 byte).
 * |1111xxxx| - (with xxxx between 0000 and 1101) immediate 4 bit integer.
 *      Unsigned integer from 0 to 12. The encoded value is actually from
 *      1 to 13 because 0000 and 1111 can not be used, so 1 should be
 *      subtracted from the encoded 4 bit value to obtain the right value.
 * |11111111| - End of ziplist special entry.
 *
 * Like for the ziplist header, all the integers are represented in little
 * endian byte order, even when this code is compiled in big endian systems.
 *
 * EXAMPLES OF ACTUAL ZIPLISTS
 * ===========================
 *
 * The following is a ziplist containing the two elements representing
 * the strings "2" and "5". It is composed of 15 bytes, that we visually
 * split into sections:
 *
 *  [0f 00 00 00] [0c 00 00 00] [02 00] [00 f3] [02 f6] [ff]
 *        |             |          |       |       |     |
 *     zlbytes        zltail    entries   "2"     "5"   end
 *
 * The first 4 bytes represent the number 15, that is the number of bytes
 * the whole ziplist is composed of. The second 4 bytes are the offset
 * at which the last ziplist entry is found, that is 12, in fact the
 * last entry, that is "5", is at offset 12 inside the ziplist.
 * The next 16 bit integer represents the number of elements inside the
 * ziplist, its value is 2 since there are just two elements inside.
 * Finally "00 f3" is the first entry representing the number 2. It is
 * composed of the previous entry length, which is zero because this is
 * our first entry, and the byte F3 which corresponds to the encoding
 * |1111xxxx| with xxxx between 0001 and 1101. We need to remove the "F"
 * higher order bits 1111, and subtract 1 from the "3", so the entry value
 * is "2". The next entry has a prevlen of 02, since the first entry is
 * composed of exactly two bytes. The entry itself, F6, is encoded exactly
 * like the first entry, and 6-1 = 5, so the value of the entry is 5.
 * Finally the special entry FF signals the end of the ziplist.
 *
 * Adding another element to the above string with the value "Hello World"
 * allows us to show how the ziplist encodes small strings. We'll just show
 * the hex dump of the entry itself. Imagine the bytes as following the
 * entry that stores "5" in the ziplist above:
 *
 * [02] [0b] [48 65 6c 6c 6f 20 57 6f 72 6c 64]
 *
 * The first byte, 02, is the length of the previous entry. The next
 * byte represents the encoding in the pattern |00pppppp| that means
 * that the entry is a string of length <pppppp>, so 0B means that
 * an 11 bytes string follows. From the third byte (48) to the last (64)
 * there are just the ASCII characters for "Hello World".
 *
 * ----------------------------------------------------------------------------
 *
#ziplist的入口对象结构
/* We use this function to receive information about a ziplist entry.
 * Note that this is not how the data is actually encoded, is just what we
 * get filled by a function in order to operate more easily. */
typedef struct zlentry {
    unsigned int prevrawlensize; /* Bytes used to encode the previous entry len*/
    unsigned int prevrawlen;     /* Previous entry len. */
    unsigned int lensize;        /* Bytes used to encode this entry type/len.
                                    For example strings have a 1, 2 or 5 bytes
                                    header. Integers always use a single byte.*/
    unsigned int len;            /* Bytes used to represent the actual entry.
                                    For strings this is just the string length
                                    while for integers it is 1, 2, 3, 4, 8 or
                                    0 (for 4 bit immediate) depending on the
                                    number range. */
    unsigned int headersize;     /* prevrawlensize + lensize. */
    unsigned char encoding;      /* Set to ZIP_STR_* or ZIP_INT_* depending on
                                    the entry encoding. However for 4 bits
                                    immediate integers this can assume a range
                                    of values and must be range-checked. */
    unsigned char *p;            /* Pointer to the very start of the entry, that
                                    is, this points to prev-entry-len field. */
} zlentry;

#初始化ziplist
/* Create a new empty ziplist. */
unsigned char *ziplistNew(void) {
#分配大小
    unsigned int bytes = ZIPLIST_HEADER_SIZE+ZIPLIST_END_SIZE;
    unsigned char *zl = zmalloc(bytes);
    ZIPLIST_BYTES(zl) = intrev32ifbe(bytes);
    ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(ZIPLIST_HEADER_SIZE);
    ZIPLIST_LENGTH(zl) = 0;
    zl[bytes-1] = ZIP_END;
    return zl;
}


/* The size of a ziplist header: two 32 bit integers for the total
 * bytes count and last item offset. One 16 bit integer for the number
 * of items field. */
#define ZIPLIST_HEADER_SIZE     (sizeof(uint32_t)*2+sizeof(uint16_t))

/* Size of the "end of ziplist" entry. Just one byte. */
#define ZIPLIST_END_SIZE        (sizeof(uint8_t))
/* Return total bytes a ziplist is composed of. */
#define ZIPLIST_BYTES(zl)       (*((uint32_t*)(zl)))

/* Return the offset of the last item inside the ziplist. */
#define ZIPLIST_TAIL_OFFSET(zl) (*((uint32_t*)((zl)+sizeof(uint32_t))))

/* Return the length of a ziplist, or UINT16_MAX if the length cannot be
 * determined without scanning the whole ziplist. */
#define ZIPLIST_LENGTH(zl)      (*((uint16_t*)((zl)+sizeof(uint32_t)*2)))

quicklist: quicklist是基于ziplist压缩表实现的一个双向链表。redis的list数据结构的实现是基于quicklist实现的。

#quicklist的每一个Node节点
/* quicklistNode is a 32 byte struct describing a ziplist for a quicklist.
 * We use bit fields keep the quicklistNode at 32 bytes.
 * count: 16 bits, max 65536 (max zl bytes is 65k, so max count actually < 32k).
 * encoding: 2 bits, RAW=1, LZF=2.
 * container: 2 bits, NONE=1, ZIPLIST=2.
 * recompress: 1 bit, bool, true if node is temporarry decompressed for usage.
 * attempted_compress: 1 bit, boolean, used for verifying during testing.
 * extra: 10 bits, free for future use; pads out the remainder of 32 bits */
typedef struct quicklistNode {
    struct quicklistNode *prev;
    struct quicklistNode *next;
    unsigned char *zl;
    unsigned int sz;             /* ziplist size in bytes */
    unsigned int count : 16;     /* count of items in ziplist */
    unsigned int encoding : 2;   /* RAW==1 or LZF==2 */
    unsigned int container : 2;  /* NONE==1 or ZIPLIST==2 */
    unsigned int recompress : 1; /* was this node previous compressed? */
    unsigned int attempted_compress : 1; /* node can't compress; too small */
    unsigned int extra : 10; /* more bits to steal for future usage */
} quicklistNode;


/* quicklist is a 40 byte struct (on 64-bit systems) describing a quicklist.
 * 'count' is the number of total entries.
 * 'len' is the number of quicklist nodes.
 * 'compress' is: -1 if compression disabled, otherwise it's the number
 *                of quicklistNodes to leave uncompressed at ends of quicklist.
 * 'fill' is the user-requested (or default) fill factor.
 * 'bookmakrs are an optional feature that is used by realloc this struct,
 *      so that they don't consume memory when not used. */
typedef struct quicklist {
    quicklistNode *head;
    quicklistNode *tail;
    unsigned long count;        /* total count of all entries in all ziplists */
    unsigned long len;          /* number of quicklistNodes */
    int fill : QL_FILL_BITS;              /* fill factor for individual nodes */
    unsigned int compress : QL_COMP_BITS; /* depth of end nodes not to compress;0=off */
    unsigned int bookmark_count: QL_BM_BITS;
    quicklistBookmark bookmarks[];
} quicklist;

sds:sds数据结构是redis的string类型的数据结构,分为:sdshdr5,sdshdr8,sdshdr16,sdshdr32,sdshdr64等分别应用不同的场景。

typedef char *sds;

/* Note: sdshdr5 is never used, we just access the flags byte directly.
 * However is here to document the layout of type 5 SDS strings. */
struct __attribute__ ((__packed__)) sdshdr5 {
    unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr8 {
    uint8_t len; /* used */
    uint8_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
    uint16_t len; /* used */
    uint16_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
    uint32_t len; /* used */
    uint32_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
    uint64_t len; /* used */
    uint64_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

#define SDS_TYPE_5  0
#define SDS_TYPE_8  1
#define SDS_TYPE_16 2
#define SDS_TYPE_32 3
#define SDS_TYPE_64 4
#define SDS_TYPE_MASK 7
#define SDS_TYPE_BITS 3
#define SDS_HDR_VAR(T,s) struct sdshdr##T *sh = (void*)((s)-(sizeof(struct sdshdr##T)));
#define SDS_HDR(T,s) ((struct sdshdr##T *)((s)-(sizeof(struct sdshdr##T))))
#define SDS_TYPE_5_LEN(f) ((f)>>SDS_TYPE_BITS)


#创建String的函数,如果小于44创建内嵌的string,如果大于44则创建raw string,主要是以64个字节作为临界点
/* Create a string object with EMBSTR encoding if it is smaller than
 * OBJ_ENCODING_EMBSTR_SIZE_LIMIT, otherwise the RAW encoding is
 * used.
 *
 * The current limit of 44 is chosen so that the biggest string object
 * we allocate as EMBSTR will still fit into the 64 byte arena of jemalloc. */
#define OBJ_ENCODING_EMBSTR_SIZE_LIMIT 44
robj *createStringObject(const char *ptr, size_t len) {
    if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT)
        return createEmbeddedStringObject(ptr,len);
    else
        return createRawStringObject(ptr,len);
}

5.参考文章

https://blog.csdn.net/zy450271923/article/details/106970148/

https://blog.csdn.net/zy450271923/article/details/106970148/

 

 

 

 

 

 

 

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值