目录
1.Redis关键配置文件(redis.conf)
redis.conf:该配置文件为redis的主要配置文件,可以在当前的配置文件中通过include包含多个子配置文件。该配置文件中包含的重点配置说明如下:
databases 16 该配置设置了Redis数据库的数量,默认的数据库是DB 0,如果制定数据库,需要在连接redis的时候指定数据库编号。
save 900 1 当DB的数据发生了变化的时候,隔多久将数据写入到磁盘上。前面的配置含义:after 900 sec (15 min) if at least 1 key changed。移除该注视表示禁用save能力。注意在《性能之巅》书中提到,该命令可能会导致间歇性的IO飙升,服务延时,如果涉及大量的大key-value频繁变更时需要重点关注一下。
replicaof <masterip> <masterport> 副本配置,redis的replication是将replication实例作为Redis Server的一个复制,首先是异步,可以配置指定的副本数量的前提下才接收写操作。
maxclients 10000 设置同时能够连接到Redis Server的最大客户端数量,默认10000.
maxmemory-policy noeviction:内存不足时,移除现有的redis的key的策略,默认不驱逐,可选择的范围如下:
# volatile-lru -> Evict using approximated LRU, only keys with an expire set.
# allkeys-lru -> Evict any key using approximated LRU.
# volatile-lfu -> Evict using approximated LFU, only keys with an expire set.
# allkeys-lfu -> Evict any key using approximated LFU.
# volatile-random -> Remove a random key having an expire set.
# allkeys-random -> Remove a random key, any key.
# volatile-ttl -> Remove the key with the nearest expire time (minor TTL)
# noeviction -> Don't evict anything, just return an error on write operations.
#
# LRU means Least Recently Used
# LFU means Least Frequently Used
#
# Both LRU, LFU and volatile-ttl are implemented using approximated
# randomized algorithms.
# Note: with any of the above policies, Redis will return an error on write
# operations, when there are no suitable keys for eviction.
#
# At the date of writing these commands are: set setnx setex append
# incr decr rpush lpush rpushx lpushx linsert lset rpoplpush sadd
# sinter sinterstore sunion sunionstore sdiff sdiffstore zadd zincrby
# zunionstore zinterstore hset hsetnx hmset hincrby incrby decrby
# getset mset msetnx exec sort
# The default is:
#
# maxmemory-policy noeviction
lazyfree-lazy-eviction/expire/server-del no:该配置指定驱逐/过期/删除对应key的策略,支持阻塞式的删除,也支持异步处理的方式。
io-threads 4:Redis通常是单线程的,但是在如UNLINK,慢IO访问和其他的一些处理上是支持多线程的。现在同样支持处理客户端的socket读和写采用不同的IO线程,如写慢的时候读可以采用管道方式充分利用每个核加速提升Redis的性能,通常使用IO多线程能够提升Redis两倍的性能。默认情况下,多线程是禁用的,我们建议是在服务器有4个或者更多的核的时候才启用该配置。例如,如果你有一个4核的服务器,可以采用配置2或者3个IO线程; 如果有8核,建议配置6个IO线程。默认我们只针对写操作才启用多线程处理,如果需要启用读操作支持多线程,需要配置 io-threads-do-reads yes才能启用,通常情况下读操作没有太多的提升。当前配置不能在运行态配置,并且在SSL启用的时候,该配置无效。
lua-time-limit 5000:用于配置LUA脚本的执行时间,单位是毫秒。在较长时间的脚本执行过程中,只有SCRIPT KILL和SHUTDOWN NOSAVE命令两个命令可以用。
cluster-enabled yes:通常Redis实例独立运行,并不作为Redis集群的一部分。只有通过该配置启用redis实例,才能作为集群的一部分。每个集群都有自己的集群配置文件,可以通过配置项 cluster-config-file nodes-6379.conf 指定集群的配置,如获取集群节点的更新情况。Redis集群支持Docker的,最好是能够指定IP。
高级配置:配置hash数据结构的threshold如hash-max-ziplist-entries 512, hash-max-ziplist-value 64等。配置list数据结构最大的元素个数,如list-max-ziplist-size -2 (-2: max size: 8 Kb)配置,可按需调整。配置set的元素个数,如set-max-intset-entries 512。配置sort set数据结构的zset-max-ziplist-entries 128,zset-max-ziplist-value 64也是一样。配置HyperLogLog数据结构的尺寸限制,如hll-sparse-max-bytes 3000。配置stream数据结构的单个元素大小限制,如stream-node-max-bytes 4096,包含元素的数量限制,如stream-node-max-entries 100。
activerehashing yes:是否开启数据库的rehash操作,Active rehashing uses 1 millisecond every 100 milliseconds of CPU time in order to help rehashing the main Redis hash table (the one mapping top-level keys to values).
loadmodule /path/to/my_module.so:以往我们想给 Redis 加个功能或类似事务的东西只能用 Lua 脚本,这个东西没有实现真正的原子性,另外也无法使用底层的 API ,实质上比单纯的命令脚本提升有限。模块 Module 可以动态的载入和卸载,可以实现底层的数据结构也可以调用高层的指令,这一切都只需要包含头文件 redismodule.h ,和 Redis 本身一样简洁优雅。
2.Redis哨兵配置文件(sentinel.conf)
Redis采用哨兵模式对集群情况下的主从节点管理,并监控集群中各个节点的运行状态并及时上报。哨兵模式是一种特殊的模式,首先Redis提供了哨兵的命令,哨兵是一个独立的进程,作为进程,它会独立运行。其原理是哨兵通过发送命令,等待Redis服务器响应,从而监控运行的多个Redis实例。然而一个哨兵进程对Redis服务器进行监控,可能会出现问题,为此,我们可以使用多个哨兵进行监控。各个哨兵之间还会进行监控,这样就形成了多哨兵模式。用文字描述一下故障切换(failover)的过程。假设主服务器宕机,哨兵1先检测到这个结果,系统并不会马上进行failover过程,仅仅是哨兵1主观的认为主服务器不可用,这个现象成为主观下线。当后面的哨兵也检测到主服务器不可用,并且数量达到一定值时,那么哨兵之间就会进行一次投票,投票的结果由一个哨兵发起,进行failover操作。切换成功后,就会通过发布订阅模式,让各个哨兵把自己监控的从服务器实现切换主机,这个过程称为客观下线。这样对于客户端而言,一切都是透明的。(参考文章:https://www.jianshu.com/p/06ab9daf921d)
sentinel monitor mymaster 127.0.0.1 6379 2: 使用Sentinel监控master节点的状态。Tells Sentinel to monitor this master, and to consider it in O_DOWN (Objectively Down) state only if at least <quorum> sentinels agree.
sentinel auth-pass <master-name> <password>: 设置鉴权的密码。Set the password to use to authenticate with the master and replicas. Useful if there is a password set in the Redis instances to monitor.
3.Redis启动流程
redis的入口文件为server.c,入口函数为main函数。redis的启动流程步骤如下:
* `initServerConfig()` setups the default values of the `server` structure.
* `initServer()` allocates the data structures needed to operate, setup the listening socket, and so forth.
* `aeMain()` starts the event loop which listens for new connections.
There are two special functions called periodically by the event loop:
1. `serverCron()` is called periodically (according to `server.hz` frequency), and performs tasks that must be performed from time to time, like checking for timedout clients.
2. `beforeSleep()` is called every time the event loop fired, Redis served a few requests, and is returning back into the event loop.
Inside server.c you can find code that handles other vital things of the Redis server:
* `call()` is used in order to call a given command in the context of a given client.
* `activeExpireCycle()` handles eviction of keys with a time to live set via the `EXPIRE` command.
* `freeMemoryIfNeeded()` is called when a new write command should be performed but Redis is out of memory according to the `maxmemory` directive.
* The global variable `redisCommandTable` defines all the Redis commands, specifying the name of the command, the function implementing the command, the number of arguments required, and other properties of each command.
server.c文件中main函数作为redis server端入口源码如下:
int main(int argc, char **argv) {
struct timeval tv;
int j;
#在预编译阶段,设置宏定义,运行阶段根据编译设置的宏进行运行代码段选择,这里用于是否是测试的校验
#ifdef REDIS_TEST
if (argc == 3 && !strcasecmp(argv[1], "test")) {
if (!strcasecmp(argv[2], "ziplist")) {
return ziplistTest(argc, argv);
} else if (!strcasecmp(argv[2], "quicklist")) {
quicklistTest(argc, argv);
} else if (!strcasecmp(argv[2], "intset")) {
return intsetTest(argc, argv);
} else if (!strcasecmp(argv[2], "zipmap")) {
return zipmapTest(argc, argv);
} else if (!strcasecmp(argv[2], "sha1test")) {
return sha1Test(argc, argv);
} else if (!strcasecmp(argv[2], "util")) {
return utilTest(argc, argv);
} else if (!strcasecmp(argv[2], "endianconv")) {
return endianconvTest(argc, argv);
} else if (!strcasecmp(argv[2], "crc64")) {
return crc64Test(argc, argv);
} else if (!strcasecmp(argv[2], "zmalloc")) {
return zmalloc_test(argc, argv);
}
return -1; /* test not found */
}
#endif
#根据初始化宏定义,按需初始化操作
/* We need to initialize our libraries, and the server configuration. */
#ifdef INIT_SETPROCTITLE_REPLACEMENT
spt_init(argc, argv);
#endif
setlocale(LC_COLLATE,"");
tzset(); /* Populates 'timezone' global. */
#设置内存溢出处理方式
zmalloc_set_oom_handler(redisOutOfMemoryHandler);
srand(time(NULL)^getpid());
srandom(time(NULL)^getpid());
gettimeofday(&tv,NULL);
crc64_init();
uint8_t hashseed[16];
getRandomBytes(hashseed,sizeof(hashseed));
dictSetHashFunctionSeed(hashseed);
server.sentinel_mode = checkForSentinelMode(argc,argv);
#初始化服务端配置
initServerConfig();
ACLInit(); /* The ACL subsystem must be initialized ASAP because the
basic networking code and client creation depends on it. */
#初始化redis支持的module
moduleInitModulesSystem();
tlsInit();
/* Store the executable path and arguments in a safe place in order
* to be able to restart the server later. */
server.executable = getAbsolutePath(argv[0]);
server.exec_argv = zmalloc(sizeof(char*)*(argc+1));
server.exec_argv[argc] = NULL;
for (j = 0; j < argc; j++) server.exec_argv[j] = zstrdup(argv[j]);
/* We need to init sentinel right now as parsing the configuration file
* in sentinel mode will have the effect of populating the sentinel
* data structures with master nodes to monitor. */
#初始化哨兵相关配置
if (server.sentinel_mode) {
initSentinelConfig();
initSentinel();
}
/* Check if we need to start in redis-check-rdb/aof mode. We just execute
* the program main. However the program is part of the Redis executable
* so that we can easily execute an RDB check on loading errors. */
if (strstr(argv[0],"redis-check-rdb") != NULL)
redis_check_rdb_main(argc,argv,NULL);
else if (strstr(argv[0],"redis-check-aof") != NULL)
redis_check_aof_main(argc,argv);
#参数检查,并对参数进行相关解析
if (argc >= 2) {
#第一个参数是执行程序本身,第二个参数才传递的真正参数,即j=1表示开始获取参数并解析
j = 1; /* First option to parse in argv[] */
sds options = sdsempty();
char *configfile = NULL;
#基本参数解析
/* Handle special options --help and --version */
if (strcmp(argv[1], "-v") == 0 ||
strcmp(argv[1], "--version") == 0) version();
if (strcmp(argv[1], "--help") == 0 ||
strcmp(argv[1], "-h") == 0) usage();
if (strcmp(argv[1], "--test-memory") == 0) {
if (argc == 3) {
memtest(atoi(argv[2]),50);
exit(0);
} else {
fprintf(stderr,"Please specify the amount of memory to test in megabytes.\n");
fprintf(stderr,"Example: ./redis-server --test-memory 4096\n\n");
exit(1);
}
}
#检查是不是配置文件,如果不是‘-’或者‘--’开始的表示是配置文件
/* First argument is the config file name? */
if (argv[j][0] != '-' || argv[j][1] != '-') {
#对配置文件解析,如redis-server ./redis.conf 启动,获取./redis.conf绝对路径并进行下一个参数解析处理
configfile = argv[j];
server.configfile = getAbsolutePath(configfile);
/* Replace the config file in server.exec_argv with
* its absolute path. */
zfree(server.exec_argv[j]);
server.exec_argv[j] = zstrdup(server.configfile);
j++;
}
#解析除前面配置文件参数之外的其他参数,然后将参数添加到配置文件redis.conf中
/* All the other options are parsed and conceptually appended to the
* configuration file. For instance --port 6380 will generate the
* string "port 6380\n" to be parsed after the actual file name
* is parsed, if any. */
while(j != argc) {
if (argv[j][0] == '-' && argv[j][1] == '-') {
/* Option name */
if (!strcmp(argv[j], "--check-rdb")) {
/* Argument has no options, need to skip for parsing. */
j++;
continue;
}
if (sdslen(options)) options = sdscat(options,"\n");
options = sdscat(options,argv[j]+2);
options = sdscat(options," ");
} else {
/* Option argument */
options = sdscatrepr(options,argv[j],strlen(argv[j]));
options = sdscat(options," ");
}
j++;
}
#如果是运行在哨兵模式,需要指定哨兵配置文件
if (server.sentinel_mode && configfile && *configfile == '-') {
serverLog(LL_WARNING,
"Sentinel config from STDIN not allowed.");
serverLog(LL_WARNING,
"Sentinel needs config file on disk to save state. Exiting...");
exit(1);
}
#加载redis.conf配置文件,并添加启动命令中的其他参数,覆盖配置文件中的配置参数,配置文件解析的函数是loadServerConfigFromString,具体的每个配置项的解析可参考该方法
#采用fopen函数打开配置文件,用fgets函数读取配置文件内容,loadServerConfigFromString解析配置并映射到全局的server结构体上
loadServerConfig(configfile,options);
sdsfree(options);
}
#记录启动日志,判断是否为哨兵模式启动,给出启动命令示例
serverLog(LL_WARNING, "oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo");
serverLog(LL_WARNING,
"Redis version=%s, bits=%d, commit=%s, modified=%d, pid=%d, just started",
REDIS_VERSION,
(sizeof(long) == 8) ? 64 : 32,
redisGitSHA1(),
strtol(redisGitDirty(),NULL,10) > 0,
(int)getpid());
if (argc == 1) {
serverLog(LL_WARNING, "Warning: no config file specified, using the default config. In order to specify a config file use %s /path/to/%s.conf", argv[0], server.sentinel_mode ? "sentinel" : "redis");
} else {
serverLog(LL_WARNING, "Configuration loaded");
}
#解析redis的守护进程管理模式
server.supervised = redisIsSupervised(server.supervised_mode);
int background = server.daemonize && !server.supervised;
#如果redis server是作为常驻进程运行,通过fork函数创建子进程继续运行,然后父进程exit(0)
if (background) daemonize();
readOOMScoreAdj();
#完成配置加载之后,初始化服务器相关资源
initServer();
if (background || server.pidfile) createPidFile();
redisSetProcTitle(argv[0]);
redisAsciiArt();
checkTcpBacklogSettings();
#非哨兵模式,执行if条件内容
if (!server.sentinel_mode) {
/* Things not needed when running in Sentinel mode. */
serverLog(LL_WARNING,"Server initialized");
#ifdef __linux__
linuxMemoryWarnings();
#endif
moduleInitModulesSystemLast();
#加载各个module内容
moduleLoadFromQueue();
ACLLoadUsersAtStartup();
#对Server端做后续启动加载,如BIO的多线程创建以及分配,初始化IO线程(线程最大限制在128)
InitServerLast();
#从磁盘上加载数据库数据
loadDataFromDisk();
if (server.cluster_enabled) {
if (verifyClusterConfigWithData() == C_ERR) {
serverLog(LL_WARNING,
"You can't have keys in a DB different than DB 0 when in "
"Cluster mode. Exiting.");
exit(1);
}
}
if (server.ipfd_count > 0 || server.tlsfd_count > 0)
serverLog(LL_NOTICE,"Ready to accept connections");
if (server.sofd > 0)
serverLog(LL_NOTICE,"The server is now ready to accept connections at %s", server.unixsocket);
if (server.supervised_mode == SUPERVISED_SYSTEMD) {
if (!server.masterhost) {
redisCommunicateSystemd("STATUS=Ready to accept connections\n");
redisCommunicateSystemd("READY=1\n");
} else {
redisCommunicateSystemd("STATUS=Waiting for MASTER <-> REPLICA sync\n");
}
}
} else {
InitServerLast();
sentinelIsRunning();
if (server.supervised_mode == SUPERVISED_SYSTEMD) {
redisCommunicateSystemd("STATUS=Ready to accept connections\n");
redisCommunicateSystemd("READY=1\n");
}
}
/* Warning the user about suspicious maxmemory setting. */
if (server.maxmemory > 0 && server.maxmemory < 1024*1024) {
serverLog(LL_WARNING,"WARNING: You specified a maxmemory value that is less than 1MB (current value is %llu bytes). Are you sure this is what you really want?", server.maxmemory);
}
redisSetCpuAffinity(server.server_cpulist);
setOOMScoreAdj(-1);
#监听事件,并做事件处理
aeMain(server.el);
aeDeleteEventLoop(server.el);
return 0;
}
在Redis的server文件server.c文件中,重要的变量redisCommandTable维护了redis的各个命令对应的处理函数,当redis服务端收到对应的命令事件时,会调用对应的函数进行解析处理。
4.Redis关键数据结构
Redis关键数据结构有:
redisObject:所有的Redis基础数据对象,会封装转换为redisObject对象,该对象为对底层的数据对象的封装
#redis对基础类型做了封装,统一转化为该对象,定义别名robj
typedef struct redisObject {
unsigned type:4; #对象类型信息,如REDIS_STRING,REDIS_LIST等
unsigned encoding:4; #表示ptr指针指向的数据结构的对象的编码,如REDIS_ENCODING_INT
unsigned lru:LRU_BITS; #表示对象最后一次被程序访问的时间
/* LRU time (relative to global lru_clock) or
* LFU data (least significant 8 bits frequency
* and most significant 16 bits access time). */
int refcount; #引用计数,主要是考虑自动内存回收机制使用的
void *ptr; #指向了底层的数据类型的对象的指针
} robj;
redisDb:redis是支持多个数据库的,默认16个数据库,这里一个数据库采用redisDb结构表示。
/* Redis database representation. There are multiple databases identified
* by integers from 0 (the default database) up to the max configured
* database. The database number is the 'id' field in the structure. */
typedef struct redisDb {
dict *dict; /* The keyspace for this DB 数据库的key采用dict数据结构表示*/
dict *expires; /* Timeout of keys with a timeout set 有失效时间的key记录在该集合 */
dict *blocking_keys; /* Keys with clients waiting for data (BLPOP)*/
dict *ready_keys; /* Blocked keys that received a PUSH */
dict *watched_keys; /* WATCHED keys for MULTI/EXEC CAS */
int id; /* Database ID 数据库的ID标示*/
long long avg_ttl; /* Average TTL, just for stats */
unsigned long expires_cursor; /* Cursor of the active expire cycle. */
list *defrag_later; /* List of key names to attempt to defrag one by one, gradually. */
} redisDb;
zskiplistNode:Redis里面大量使用了zskiplist数据结构,该数据结构list的每一个Node节点的类型为该类型,该类型是一个链表结构体。
zskiplist:跳跃表数据结构。跳跃表是一种有序数据结构,它通过在每个结点中维持多个指向其他结点的指针,从而达到快速访问其他结点的目的。大多数情况下,跳跃表的效率的平衡树不相上下, 并且跳跃表的实现更加简单。Redis只在两个地方使用了跳跃表:有序集合键和集群结点中用作内部数据结构。
/* ZSETs use a specialized version of Skiplists */
typedef struct zskiplistNode {
sds ele;
double score;
struct zskiplistNode *backward;
struct zskiplistLevel {
struct zskiplistNode *forward;
unsigned long span;
} level[];
} zskiplistNode;
typedef struct zskiplist {
struct zskiplistNode *header, *tail;
unsigned long length;
int level;
} zskiplist;
关于跳跃表图片来自:https://www.cnblogs.com/yinbiao/p/11238374.html
图片源自:https://blog.csdn.net/zy450271923/article/details/106970148/
跳跃表就像是上图一样的一个多层的链表,如果查询46的话。其步骤是:
(1)查询L4层,查询55,需要查询1次
(2)查询L3层,查询–>21–>55,需要查询2次
(3)查询L2层,查询–>37–>55,需要查询2次
(4)查询L1层,查询–>46,查询1次,找到结果
跳跃表就好像每两个元素抽取一个元素放到上一层,这样一次叠加,就形成了多层的链表。上一层的元素个数是下一层元素个数的1/2,所以查询的时候就类似二分查找。
这种方法类似于二分查找的方法,所以跳跃表的查找的时间复杂度为O(logN)。跳跃表每个节点包含两个指针,一个指向同一链表中的下一个元素(next),一个指向下面一层的元素(down)。
zset:redis的有序集合,是基于跳跃表来实现。
typedef struct zset {
dict *dict;
zskiplist *zsl;
} zset;
dict:作为整个数据库的数据结构,以及hash数据结构。整个redis的数据库就是一个大的dict对象。dict关键点是rehash的过程,首先dict的设计上是采用两张hash table,未扩容的时候,只是用其中一张hash table,扩容的时候,会每一次将部分旧哈希表的keys移动到新的哈希表中。然后,扩容期间查找的时候,先从旧的哈希表中查找,然后从新的哈希表中查找。扩容完成之后,会将扩容完成的新的哈希表设置未旧的哈希表(减少查找过程中访问的次数)。整个扩容期间,会置位rehashidx,当不为-1的时候标示正在rehash操作。
#dict的元素对象,哈希桶的链表入口
typedef struct dictEntry {
void *key;
union {
void *val;
uint64_t u64;
int64_t s64;
double d;
} v;
struct dictEntry *next;
} dictEntry;
#dict类型包含的函数指针,支持哈希数据结构操作相关方法,如哈希函数,key,value相关操作函数
typedef struct dictType {
uint64_t (*hashFunction)(const void *key);
void *(*keyDup)(void *privdata, const void *key);
void *(*valDup)(void *privdata, const void *obj);
int (*keyCompare)(void *privdata, const void *key1, const void *key2);
void (*keyDestructor)(void *privdata, void *key);
void (*valDestructor)(void *privdata, void *obj);
} dictType;
#哈希表?
/* This is our hash table structure. Every dictionary has two of this as we
* implement incremental rehashing, for the old to the new table. */
typedef struct dictht {
dictEntry **table; #这里是一个二级指针,说明是指向dictEntry入口的指针,难道hash的入口是根据地址的加减计算调整?这里是一个数组,从dict.c文件中_dictKeyIndex函数中可以看得出来。
unsigned long size;
unsigned long sizemask;
unsigned long used;
} dictht;
#dict数据结构
typedef struct dict {
dictType *type; #支持哪些操作类型
void *privdata;
dictht ht[2]; #dict本身包含两个哈希表,一个用于主的,另一个是扩容的时候用到
long rehashidx; /* rehashing not in progress if rehashidx == -1 */ #标示是否正在重新哈希
unsigned long iterators; /* number of iterators currently running */
} dict;
redisServer:redis的服务端整个服务的实例,作为服务控制的godclass。
ziplist:压缩表,作为list数据结构的结构体,采用zipentry作为每一个元素对象。压缩表ziplist作为hash,list,set等数据结构的基础结构。ziplist源码中的说明如下:
* ----------------------------------------------------------------------------
*
* ZIPLIST OVERALL LAYOUT
* ======================
*
* The general layout of the ziplist is as follows:
*
* <zlbytes> <zltail> <zllen> <entry> <entry> ... <entry> <zlend>
*
* NOTE: all fields are stored in little endian(小端存储), if not specified otherwise.
*
* <uint32_t zlbytes>(压缩表大小,占用字节数) is an unsigned integer to hold the number of bytes that
* the ziplist occupies, including the four bytes of the zlbytes field itself.
* This value needs to be stored to be able to resize the entire structure
* without the need to traverse it first.
*
* <uint32_t zltail> (链表最后一个元素的偏移量)is the offset to the last entry in the list. This allows
* a pop operation on the far side of the list without the need for full
* traversal.
*
* <uint16_t zllen> (链表中元素entry的个数)is the number of entries. When there are more than
* 2^16-2 entries, this value is set to 2^16-1 and we need to traverse the
* entire list to know how many items it holds.
*
* <uint8_t zlend> (链表尾结点)is a special entry representing the end of the ziplist.
* Is encoded as a single byte equal to 255. No other normal entry starts
* with a byte set to the value of 255.
*
* ZIPLIST ENTRIES (压缩表的entry元素)
* ===============
*
* Every entry in the ziplist is prefixed by metadata that contains two pieces
* of information. First, the length of the previous entry is stored to be
* able to traverse the list from back to front. Second, the entry encoding is
* provided. It represents the entry type, integer or string, and in the case
* of strings it also represents the length of the string payload.
* So a complete entry is stored like this:
*
* <prevlen> <encoding> <entry-data>
*
* Sometimes the encoding represents the entry itself, like for small integers
* as we'll see later. In such a case the <entry-data> part is missing, and we
* could have just:
*
* <prevlen> <encoding>
*
* The length of the previous entry, <prevlen>, is encoded in the following way:
* If this length is smaller than 254 bytes, it will only consume a single
* byte representing the length as an unsinged 8 bit integer. When the length
* is greater than or equal to 254, it will consume 5 bytes. The first byte is
* set to 254 (FE) to indicate a larger value is following. The remaining 4
* bytes take the length of the previous entry as value.
*
* So practically an entry is encoded in the following way:
*
* <prevlen from 0 to 253> <encoding> <entry>
*
* Or alternatively if the previous entry length is greater than 253 bytes
* the following encoding is used:
*
* 0xFE <4 bytes unsigned little endian prevlen> <encoding> <entry>
*
* The encoding field of the entry depends on the content of the
* entry. When the entry is a string, the first 2 bits of the encoding first
* byte will hold the type of encoding used to store the length of the string,
* followed by the actual length of the string. When the entry is an integer
* the first 2 bits are both set to 1. The following 2 bits are used to specify
* what kind of integer will be stored after this header. An overview of the
* different types and encodings is as follows. The first byte is always enough
* to determine the kind of entry.
*
* |00pppppp| - 1 byte
* String value with length less than or equal to 63 bytes (6 bits).
* "pppppp" represents the unsigned 6 bit length.
* |01pppppp|qqqqqqqq| - 2 bytes
* String value with length less than or equal to 16383 bytes (14 bits).
* IMPORTANT: The 14 bit number is stored in big endian.
* |10000000|qqqqqqqq|rrrrrrrr|ssssssss|tttttttt| - 5 bytes
* String value with length greater than or equal to 16384 bytes.
* Only the 4 bytes following the first byte represents the length
* up to 2^32-1. The 6 lower bits of the first byte are not used and
* are set to zero.
* IMPORTANT: The 32 bit number is stored in big endian.
* |11000000| - 3 bytes
* Integer encoded as int16_t (2 bytes).
* |11010000| - 5 bytes
* Integer encoded as int32_t (4 bytes).
* |11100000| - 9 bytes
* Integer encoded as int64_t (8 bytes).
* |11110000| - 4 bytes
* Integer encoded as 24 bit signed (3 bytes).
* |11111110| - 2 bytes
* Integer encoded as 8 bit signed (1 byte).
* |1111xxxx| - (with xxxx between 0000 and 1101) immediate 4 bit integer.
* Unsigned integer from 0 to 12. The encoded value is actually from
* 1 to 13 because 0000 and 1111 can not be used, so 1 should be
* subtracted from the encoded 4 bit value to obtain the right value.
* |11111111| - End of ziplist special entry.
*
* Like for the ziplist header, all the integers are represented in little
* endian byte order, even when this code is compiled in big endian systems.
*
* EXAMPLES OF ACTUAL ZIPLISTS
* ===========================
*
* The following is a ziplist containing the two elements representing
* the strings "2" and "5". It is composed of 15 bytes, that we visually
* split into sections:
*
* [0f 00 00 00] [0c 00 00 00] [02 00] [00 f3] [02 f6] [ff]
* | | | | | |
* zlbytes zltail entries "2" "5" end
*
* The first 4 bytes represent the number 15, that is the number of bytes
* the whole ziplist is composed of. The second 4 bytes are the offset
* at which the last ziplist entry is found, that is 12, in fact the
* last entry, that is "5", is at offset 12 inside the ziplist.
* The next 16 bit integer represents the number of elements inside the
* ziplist, its value is 2 since there are just two elements inside.
* Finally "00 f3" is the first entry representing the number 2. It is
* composed of the previous entry length, which is zero because this is
* our first entry, and the byte F3 which corresponds to the encoding
* |1111xxxx| with xxxx between 0001 and 1101. We need to remove the "F"
* higher order bits 1111, and subtract 1 from the "3", so the entry value
* is "2". The next entry has a prevlen of 02, since the first entry is
* composed of exactly two bytes. The entry itself, F6, is encoded exactly
* like the first entry, and 6-1 = 5, so the value of the entry is 5.
* Finally the special entry FF signals the end of the ziplist.
*
* Adding another element to the above string with the value "Hello World"
* allows us to show how the ziplist encodes small strings. We'll just show
* the hex dump of the entry itself. Imagine the bytes as following the
* entry that stores "5" in the ziplist above:
*
* [02] [0b] [48 65 6c 6c 6f 20 57 6f 72 6c 64]
*
* The first byte, 02, is the length of the previous entry. The next
* byte represents the encoding in the pattern |00pppppp| that means
* that the entry is a string of length <pppppp>, so 0B means that
* an 11 bytes string follows. From the third byte (48) to the last (64)
* there are just the ASCII characters for "Hello World".
*
* ----------------------------------------------------------------------------
*
#ziplist的入口对象结构
/* We use this function to receive information about a ziplist entry.
* Note that this is not how the data is actually encoded, is just what we
* get filled by a function in order to operate more easily. */
typedef struct zlentry {
unsigned int prevrawlensize; /* Bytes used to encode the previous entry len*/
unsigned int prevrawlen; /* Previous entry len. */
unsigned int lensize; /* Bytes used to encode this entry type/len.
For example strings have a 1, 2 or 5 bytes
header. Integers always use a single byte.*/
unsigned int len; /* Bytes used to represent the actual entry.
For strings this is just the string length
while for integers it is 1, 2, 3, 4, 8 or
0 (for 4 bit immediate) depending on the
number range. */
unsigned int headersize; /* prevrawlensize + lensize. */
unsigned char encoding; /* Set to ZIP_STR_* or ZIP_INT_* depending on
the entry encoding. However for 4 bits
immediate integers this can assume a range
of values and must be range-checked. */
unsigned char *p; /* Pointer to the very start of the entry, that
is, this points to prev-entry-len field. */
} zlentry;
#初始化ziplist
/* Create a new empty ziplist. */
unsigned char *ziplistNew(void) {
#分配大小
unsigned int bytes = ZIPLIST_HEADER_SIZE+ZIPLIST_END_SIZE;
unsigned char *zl = zmalloc(bytes);
ZIPLIST_BYTES(zl) = intrev32ifbe(bytes);
ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(ZIPLIST_HEADER_SIZE);
ZIPLIST_LENGTH(zl) = 0;
zl[bytes-1] = ZIP_END;
return zl;
}
/* The size of a ziplist header: two 32 bit integers for the total
* bytes count and last item offset. One 16 bit integer for the number
* of items field. */
#define ZIPLIST_HEADER_SIZE (sizeof(uint32_t)*2+sizeof(uint16_t))
/* Size of the "end of ziplist" entry. Just one byte. */
#define ZIPLIST_END_SIZE (sizeof(uint8_t))
/* Return total bytes a ziplist is composed of. */
#define ZIPLIST_BYTES(zl) (*((uint32_t*)(zl)))
/* Return the offset of the last item inside the ziplist. */
#define ZIPLIST_TAIL_OFFSET(zl) (*((uint32_t*)((zl)+sizeof(uint32_t))))
/* Return the length of a ziplist, or UINT16_MAX if the length cannot be
* determined without scanning the whole ziplist. */
#define ZIPLIST_LENGTH(zl) (*((uint16_t*)((zl)+sizeof(uint32_t)*2)))
quicklist: quicklist是基于ziplist压缩表实现的一个双向链表。redis的list数据结构的实现是基于quicklist实现的。
#quicklist的每一个Node节点
/* quicklistNode is a 32 byte struct describing a ziplist for a quicklist.
* We use bit fields keep the quicklistNode at 32 bytes.
* count: 16 bits, max 65536 (max zl bytes is 65k, so max count actually < 32k).
* encoding: 2 bits, RAW=1, LZF=2.
* container: 2 bits, NONE=1, ZIPLIST=2.
* recompress: 1 bit, bool, true if node is temporarry decompressed for usage.
* attempted_compress: 1 bit, boolean, used for verifying during testing.
* extra: 10 bits, free for future use; pads out the remainder of 32 bits */
typedef struct quicklistNode {
struct quicklistNode *prev;
struct quicklistNode *next;
unsigned char *zl;
unsigned int sz; /* ziplist size in bytes */
unsigned int count : 16; /* count of items in ziplist */
unsigned int encoding : 2; /* RAW==1 or LZF==2 */
unsigned int container : 2; /* NONE==1 or ZIPLIST==2 */
unsigned int recompress : 1; /* was this node previous compressed? */
unsigned int attempted_compress : 1; /* node can't compress; too small */
unsigned int extra : 10; /* more bits to steal for future usage */
} quicklistNode;
/* quicklist is a 40 byte struct (on 64-bit systems) describing a quicklist.
* 'count' is the number of total entries.
* 'len' is the number of quicklist nodes.
* 'compress' is: -1 if compression disabled, otherwise it's the number
* of quicklistNodes to leave uncompressed at ends of quicklist.
* 'fill' is the user-requested (or default) fill factor.
* 'bookmakrs are an optional feature that is used by realloc this struct,
* so that they don't consume memory when not used. */
typedef struct quicklist {
quicklistNode *head;
quicklistNode *tail;
unsigned long count; /* total count of all entries in all ziplists */
unsigned long len; /* number of quicklistNodes */
int fill : QL_FILL_BITS; /* fill factor for individual nodes */
unsigned int compress : QL_COMP_BITS; /* depth of end nodes not to compress;0=off */
unsigned int bookmark_count: QL_BM_BITS;
quicklistBookmark bookmarks[];
} quicklist;
sds:sds数据结构是redis的string类型的数据结构,分为:sdshdr5,sdshdr8,sdshdr16,sdshdr32,sdshdr64等分别应用不同的场景。
typedef char *sds;
/* Note: sdshdr5 is never used, we just access the flags byte directly.
* However is here to document the layout of type 5 SDS strings. */
struct __attribute__ ((__packed__)) sdshdr5 {
unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr8 {
uint8_t len; /* used */
uint8_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
uint16_t len; /* used */
uint16_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
uint32_t len; /* used */
uint32_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
uint64_t len; /* used */
uint64_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
#define SDS_TYPE_5 0
#define SDS_TYPE_8 1
#define SDS_TYPE_16 2
#define SDS_TYPE_32 3
#define SDS_TYPE_64 4
#define SDS_TYPE_MASK 7
#define SDS_TYPE_BITS 3
#define SDS_HDR_VAR(T,s) struct sdshdr##T *sh = (void*)((s)-(sizeof(struct sdshdr##T)));
#define SDS_HDR(T,s) ((struct sdshdr##T *)((s)-(sizeof(struct sdshdr##T))))
#define SDS_TYPE_5_LEN(f) ((f)>>SDS_TYPE_BITS)
#创建String的函数,如果小于44创建内嵌的string,如果大于44则创建raw string,主要是以64个字节作为临界点
/* Create a string object with EMBSTR encoding if it is smaller than
* OBJ_ENCODING_EMBSTR_SIZE_LIMIT, otherwise the RAW encoding is
* used.
*
* The current limit of 44 is chosen so that the biggest string object
* we allocate as EMBSTR will still fit into the 64 byte arena of jemalloc. */
#define OBJ_ENCODING_EMBSTR_SIZE_LIMIT 44
robj *createStringObject(const char *ptr, size_t len) {
if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT)
return createEmbeddedStringObject(ptr,len);
else
return createRawStringObject(ptr,len);
}
5.参考文章
https://blog.csdn.net/zy450271923/article/details/106970148/
https://blog.csdn.net/zy450271923/article/details/106970148/