环境说明:redis源码版本 5.0.3;我在阅读源码过程做了注释,git地址:https://gitee.com/xiaoangg/redis_annotation
如有错误欢迎指正
参考书籍:《redis的设计与实现》相关推荐:
redis中的时间事件 https://blog.csdn.net/qq_16399991/article/details/107850466
文章推荐:
redis源码阅读-一--sds简单动态字符串
redis源码阅读--二-链表
redis源码阅读--三-redis散列表的实现
redis源码浅析--四-redis跳跃表的实现
redis源码浅析--五-整数集合的实现
redis源码浅析--六-压缩列表
redis源码浅析--七-redisObject对象(下)(内存回收、共享)
redis源码浅析--八-数据库的实现
redis源码浅析--九-RDB持久化
redis源码浅析--十-AOF(append only file)持久化
redis源码浅析--十一.事件(上)文件事件
redis源码浅析--十一.事件(下)时间事件
redis源码浅析--十二.单机数据库的实现-客户端
redis源码浅析--十三.单机数据库的实现-服务端 - 时间事件
redis源码浅析--十三.单机数据库的实现-服务端 - redis服务器的初始化
redis源码浅析--十四.多机数据库的实现(一)--新老版本复制功能的区别与实现原理
redis源码浅析--十四.多机数据库的实现(二)--复制的实现SLAVEOF、PSYNY
redis源码浅析--十五.哨兵sentinel的设计与实现
redis源码浅析--十六.cluster集群的设计与实现
redis源码浅析--十七.发布与订阅的实现
redis源码浅析--十八.事务的实现
redis源码浅析--十九.排序的实现
redis源码浅析--二十.BIT MAP的实现
redis源码浅析--二十一.慢查询日志的实现
redis源码浅析--二十二.监视器的实现
目录
serverCron时间事件
目前redis中的时间时间只有serverCron函数,默认每隔100毫秒执行一次;
这个函数负责管理服务器的资源,保持服务器的良好运行;
serverCron的注册流程:
1.server.c/main()
2.server.c/initServer()
3.调用aeCreateTimeEvent(server.el, 1, serverCron, NULL, NULL);
下面详解介绍serverCron函数所做的事情。
一.更新服务器时间缓存
服务中有不好获取系统当前时间的操作,而获取系统时间都需要调用一次系统调用,为了减少系统调用,服务器状态中unixtime属性和mstime属性会缓存当前时间。
因为serverCron默认是100ms执行一次,所以这个两个属性存在误差;
所以两属性只会用在对时间精度要求不高的功能上,如打印日志、更新服务lru时钟,决定是否执行持久化任务、计算服务器上线时间等;
上源码:
server.c/serverCron:
int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
//....
/*更新server中时间缓存*/
/* Update the time cache. */
updateCachedTime();
//.....
}
server.c/updateCachedTime:
/**
* 我们在全局状态下缓存unix时间的值,
* 因为在虚拟内存和老化的情况下,每次访问对象时都要将当前时间存储在对象中;
* 访问全局变量比调用时间(NULL)快得多
* 在不需要准确的的获取时间的情况下,可以访问存在
*/
/* We take a cached value of the unix time in the global state because with
* virtual memory and aging there is to store the current time in objects at
* every object access, and accuracy is not needed. To access a global var is
* a lot faster than calling time(NULL) */
void updateCachedTime(void) {
time_t unixtime = time(NULL);
atomicSet(server.unixtime,unixtime); //原子操作
server.mstime = mstime();
/* To get information about daylight saving time, we need to call localtime_r
* and cache the result. However calling localtime_r in this context is safe
* since we will never fork() while here, in the main thread. The logging
* function will call a thread safe version of localtime that has no locks. */
struct tm tm;
localtime_r(&server.unixtime,&tm);
server.daylight_active = tm.tm_isdst;
}
二.更新LRU时钟
服务器状态中lruclock属性保存了服务的lru时钟,这个属性和和上面介绍的unixtime属性和mstime属性一样,都是服务器时间缓存的一种;
每个redis对象都会有个lru属性,记录对象最后一次被访问的时间:
server.h/ struct redisObject:
//redisObjec结构体来表示string、hash、list、set、zset五种数据类型
typedef struct redisObject {
//4位的type表示具体的数据类型()。Redis中共有5中数据类型(string、hash、list、set、zset)。
//2^4 = 16足以表示这些类型
unsigned type:4;
//4位的encoding表示该类型的物理编码方式,同一种数据类型可能有不同的编码方式
unsigned encoding:4;
//lru 属性保存了对象最后一次被命令访问的时间
unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
* LFU data (least significant 8 bits frequency
* and most significant 16 bits access time). */
int refcount;//refcount表示对象的引用计数
void *ptr;//ptr指针指向真正的存储结构
} robj;
当服务器需要数据库键的空转时间时,程序就会用服务器的lruclock属性减 对象的lru属性就是,得出空转时间;
tips:可以使用OBJECT IDLETIME 命令获取key的空转时间
三.增加操作采样信息
trackInstantaneousMetric函数会以每100ms一次的频率采样,统计时间段内服务器请求数、流量等信息;
然后计算平均一毫米的处理量,乘以1000就是估算1s的处理量;
这个估量会存放的服务端状态inst_metric的环形数组中;
当客户端执行info命令,就会去server.h/inst_metric数组拿去取样结果;
上代码:
server.h/inst_metric结构:
struct redisServer {
//.....
//用来跟踪实时指标,如每秒操作数、网络流量等
/* The following two are used to track instantaneous metrics, like
* number of operations per second, network traffic. */
struct {
long long last_sample_time; /* Timestamp of last sample in ms */ //上次采样时间 毫秒级时间戳
long long last_sample_count;/* Count in last sample */ // 上次采样的值
long long samples[STATS_METRIC_SAMPLES];
int idx;
} inst_metric[STATS_METRIC_COUNT];
//.....
}
server.c/serverCron:
int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
///.....
//用来跟踪实时指标,如每秒操作数、网络流量等
run_with_period(100) {
trackInstantaneousMetric(STATS_METRIC_COMMAND,server.stat_numcommands); //命令操作数
trackInstantaneousMetric(STATS_METRIC_NET_INPUT,
server.stat_net_input_bytes); //NET_INPUT
trackInstantaneousMetric(STATS_METRIC_NET_OUTPUT,
server.stat_net_output_bytes); //NET_OUTPUT
}
}
server.c/trackInstantaneousMetric:存采样信息
/* Add a sample to the operations per second array of samples. */
void trackInstantaneousMetric(int metric, long long current_reading) {
long long t = mstime() - server.inst_metric[metric].last_sample_time; //两次取样的时间差值
long long ops = current_reading -
server.inst_metric[metric].last_sample_count; //采样时间段内 操作量
long long ops_sec;
ops_sec = t > 0 ? (ops*1000/t) : 0; //计算出每秒的操作量
//放到循环数组中
server.inst_metric[metric].samples[server.inst_metric[metric].idx] =
ops_sec;
server.inst_metric[metric].idx++;
server.inst_metric[metric].idx %= STATS_METRIC_SAMPLES;
server.inst_metric[metric].last_sample_time = mstime();
server.inst_metric[metric].last_sample_count = current_reading;
}
server.c/getInstantaneousMetric:获取采样信息:
/* Return the mean of all the samples. */
long long getInstantaneousMetric(int metric) {
int j;
long long sum = 0;
for (j = 0; j < STATS_METRIC_SAMPLES; j++)
sum += server.inst_metric[metric].samples[j];
return sum / STATS_METRIC_SAMPLES;
}
tips: 采样信息可以通过 INFO status 命令的 返回的instantaneous_ops_per_sec查看
四.更新服务器内存峰值记录
服务器状态中的stat_peak_memory属性记录了服务器内存使用的峰值;
server.c/serverCron:
int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
//......
//记录内存使用峰值
/* Record the max memory used since the server was started. */
if (zmalloc_used_memory() > server.stat_peak_memory)
server.stat_peak_memory = zmalloc_used_memory();
//.....
}
tips: 服务器内存峰值可以通过 INFO memory命令 返回used_memory_peak查看
五.处理SIGTERM信号
在服务器初始化的时候会调用setupSignalHandlers 设置信号关联处理函数;
设置SIGTERM信号关联的处理函数是sigShutdownHandler;
sigShutdownHandler会设置服务状态shutdown_asap标识为1;
serverCron函数每次运营的时候,都会检查shutdown_asap,如果属性为1,则会执行服务器关闭操作;
int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
//......
/**
* 处理SIGTERM信号
*/
/* We received a SIGTERM, shutting down here in a safe way, as it is
* not ok doing so inside the signal handler. */
if (server.shutdown_asap) {
if (prepareForShutdown(SHUTDOWN_NOFLAGS) == C_OK) exit(0);
serverLog(LL_WARNING,"SIGTERM received but errors trying to shut down the server, check the logs for more information");
server.shutdown_asap = 0;
}
//......
}
六.管理客户端资源
serverCron函数每次执行都会调用clientsCron()函数;
clientsCron会做以下事情:
- 检查客户端与服务器之间连接是否超时(长时间没有和服务端互动),如果长时间没有互动,那么释放这个客户端
- 客户端输入缓冲区是否超过一定限制,如果超过限制,那么释放输入缓冲区,并创建一个默认大小的缓冲区,防止占用内存过多;
-
跟踪最近几秒钟内使用最大内存量的客户端。 这样可以给info命令提供相关信息,从而避免O(n)遍历client列表;
server.c/clientsCron():
/**
* 这个函数被serverCron函数调用
* 用于在客户机上执行必须经常执行的重要操作。
* 例如:
* 断开超时的客户端连接,包括哪些被堵塞命令的堵塞客户机;
*
*
*/
/* This function is called by serverCron() and is used in order to perform
* operations on clients that are important to perform constantly. For instance
* we use this function in order to disconnect clients after a timeout, including
* clients blocked in some blocking command with a non-zero timeout.
*
* The function makes some effort to process all the clients every second, even
* if this cannot be strictly guaranteed, since serverCron() may be called with
* an actual frequency lower than server.hz in case of latency events like slow
* commands.
*
* It is very important for this function, and the functions it calls, to be
* very fast: sometimes Redis has tens of hundreds of connected clients, and the
* default server.hz value is 10, so sometimes here we need to process thousands
* of clients per second, turning this function into a source of latency.
*/
#define CLIENTS_CRON_MIN_ITERATIONS 5
void clientsCron(void) {
/**
* 每次调用 尝试至少处理numclient/server.hz客户端数。
*
* 通常在没有大的延时事件发生时,这个函数每秒会被调用server.hz次;
* 平均1s就处理了所有的客户端;
*/
/* Try to process at least numclients/server.hz of clients
* per call. Since normally (if there are no big latency events) this
* function is called server.hz times per second, in the average case we
* process all the clients in 1 second. */
int numclients = listLength(server.clients);
int iterations = numclients/server.hz;
mstime_t now = mstime();
//每次至少处理CLIENTS_CRON_MIN_ITERATIONS个客户端
/* Process at least a few clients while we are at it, even if we need
* to process less than CLIENTS_CRON_MIN_ITERATIONS to meet our contract
* of processing each client once per second. */
if (iterations < CLIENTS_CRON_MIN_ITERATIONS)
iterations = (numclients < CLIENTS_CRON_MIN_ITERATIONS) ?
numclients : CLIENTS_CRON_MIN_ITERATIONS;
while(listLength(server.clients) && iterations--) {
client *c;
listNode *head;
/* Rotate the list, take the current head, process.
* This way if the client must be removed from the list it's the
* first element and we don't incur into O(N) computation. */
listRotate(server.clients); //翻转列表,将尾部移动到头部,(保证每次处理最老的连接)
head = listFirst(server.clients);
c = listNodeValue(head);
/* The following functions do different service checks on the client.
* The protocol is that they return non-zero if the client was
* terminated. */
if (clientsCronHandleTimeout(c,now)) continue; //处理空闲超时的客户端
if (clientsCronResizeQueryBuffer(c)) continue; //处理输入的客户端
if (clientsCronTrackExpansiveClients(c)) continue;
}
}
七.管理数据库资源
serverCon每次执行都会调用databasesCron函数,处理Redis数据库中需要增量执行的“后台”操作。
例如处理key过期、resize、rehash
server.c/databasesCron:
/**
* 此函数处理Redis数据库中需要增量执行的“后台”操作,
* 例如处理key过期、resize、rehash。
*/
/* This function handles 'background' operations we are required to do
* incrementally in Redis databases, such as active key expiring, resizing,
* rehashing. */
void databasesCron(void) {
/**
* 随机处理过期的key
* (server.masterhost == null说明是该服务是master)
*/
/* Expire keys by random sampling. Not required for slaves
* as master will synthesize DELs for us. */
if (server.active_expire_enabled && server.masterhost == NULL) {
activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW);
} else if (server.masterhost != NULL) {
expireSlaveKeys();
}
/**
* 初步整理key碎片
*/
/* Defrag keys gradually. */
if (server.active_defrag_enabled)
activeDefragCycle();
/**
* 只在没有其他进程将数据库保存在磁盘上时,才执行哈希表重新哈希。
* 否则,rehash是不友好的,因为这将导致内存页的大量copy-on-write。
*/
/* Perform hash tables rehashing if needed, but only if there are no
* other processes saving the DB on disk. Otherwise rehashing is bad
* as will cause a lot of copy-on-write of memory pages. */
if (server.rdb_child_pid == -1 && server.aof_child_pid == -1) {
/* We use global counters so if we stop the computation at a given
* DB we'll be able to start from the successive in the next
* cron loop iteration. */
static unsigned int resize_db = 0;
static unsigned int rehash_db = 0;
int dbs_per_call = CRON_DBS_PER_CALL;
int j;
/* Don't test more DBs than we have. */
if (dbs_per_call > server.dbnum) dbs_per_call = server.dbnum;
/* Resize */
for (j = 0; j < dbs_per_call; j++) {
tryResizeHashTables(resize_db % server.dbnum);
resize_db++;
}
/* Rehash */
if (server.activerehashing) {
for (j = 0; j < dbs_per_call; j++) {
int work_done = incrementallyRehash(rehash_db);
if (work_done) {
/* If the function did some work, stop here, we'll do
* more at the next cron loop. */
break;
} else {
/* If this db didn't need rehash, we'll try the next one. */
rehash_db++;
rehash_db %= server.dbnum;
}
}
}
}
}
八.执行被延时的BGREWRITEAOF
在服务器执行BGSAVE 命令期间,如果客户端向服务器发来了BGREWRITEAOF命令,
那么服务器会将BGREWRITEAOF执行延迟,直到BGSAVE执行完毕;
serverCron每次执行都会检查是否有被延迟的BGREWRITEAOF命令;
如果有则会调用rewriteAppendOnlyFileBackground()函数,执行BGREWRITEAOF;
server.c/serverCron:
int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
//.................
/**
* (背景:在服务器执行BGSAVE 命令期间,如果客户端向服务器发来了BGREWRITEAOF命令,
* 那么服务器会将BGREWRITEAOF执行延迟,直到BGSAVE执行完毕;)下面就是检查是否有被延迟的BGREWRITEAOF命令;
*
* 执行被延时的BGREWRITEAOF命令; *
* server.aof_rewrite_scheduled标记服务器是否延时了BGREWRITEAOF
*/
/* Start a scheduled AOF rewrite if this was requested by the user while
* a BGSAVE was in progress. */
if (server.rdb_child_pid == -1 && server.aof_child_pid == -1 &&
server.aof_rewrite_scheduled)
{
rewriteAppendOnlyFileBackground();
}
//..................
}
九.检查持久化操作的运行状态
服务器状态使用rdb_child_pid和aof_child_pid记录了BGSAVE和BGREWRITEAOF命令了子进程ID,
这两个属性可以用来检查BGSAVE和BGREWRITEAOF是否正在执行;
这两个值中只要有一个不是-1,程序就会调用wait3函数,检查子进程是否有信号发到服务器进程,
如果有信号达到,表示子进程已经完成,服务器执行后续操作,如用新的RDB文件替换旧的;
/** 检查持久化操作的运行状态 检查 background saving 或AOF重写是否已终止。*/
/* Check if a background saving or AOF rewrite in progress terminated. */
if (server.rdb_child_pid != -1 || server.aof_child_pid != -1 ||
ldbPendingChildren())
{
int statloc;
pid_t pid;
if ((pid = wait3(&statloc,WNOHANG,NULL)) != 0) { //检查子进程是否有信号发到服务器进程
int exitcode = WEXITSTATUS(statloc);
int bysignal = 0;
if (WIFSIGNALED(statloc)) bysignal = WTERMSIG(statloc);
if (pid == -1) {
serverLog(LL_WARNING,"wait3() returned an error: %s. "
"rdb_child_pid = %d, aof_child_pid = %d",
strerror(errno),
(int) server.rdb_child_pid,
(int) server.aof_child_pid);
} else if (pid == server.rdb_child_pid) { //bgsave完成 后续处理处理,
backgroundSaveDoneHandler(exitcode,bysignal);
if (!bysignal && exitcode == 0) receiveChildInfo();
} else if (pid == server.aof_child_pid) { //bgrewriteaof完成 后续处理处理,
backgroundRewriteDoneHandler(exitcode,bysignal); //
if (!bysignal && exitcode == 0) receiveChildInfo();
} else {
if (!ldbRemoveChild(pid)) {
serverLog(LL_WARNING,
"Warning, detected child with unmatched pid: %ld",
(long)pid);
}
}
updateDictResizePolicy();
closeChildInfoPipe();
}
} else {
//如果服务器当前没有进行持久化操作, 检查现在是否要执行持久化操作
/* If there is not a background saving/rewrite in progress check if
* we have to save/rewrite now. */
for (j = 0; j < server.saveparamslen; j++) { //循环检查save的触发条件
struct saveparam *sp = server.saveparams+j;
/* Save if we reached the given amount of changes,
* the given amount of seconds, and if the latest bgsave was
* successful or if, in case of an error, at least
* CONFIG_BGSAVE_RETRY_DELAY seconds already elapsed. */
if (server.dirty >= sp->changes &&
server.unixtime-server.lastsave > sp->seconds &&
(server.unixtime-server.lastbgsave_try >
CONFIG_BGSAVE_RETRY_DELAY ||
server.lastbgsave_status == C_OK))
{
serverLog(LL_NOTICE,"%d changes in %d seconds. Saving...",
sp->changes, (int)sp->seconds);
rdbSaveInfo rsi, *rsiptr;
rsiptr = rdbPopulateSaveInfo(&rsi);
rdbSaveBackground(server.rdb_filename,rsiptr);
break;
}
}
//判断是否触发了AOF持久化
/* Trigger an AOF rewrite if needed. */
if (server.aof_state == AOF_ON &&
server.rdb_child_pid == -1 &&
server.aof_child_pid == -1 &&
server.aof_rewrite_perc &&
server.aof_current_size > server.aof_rewrite_min_size)
{
long long base = server.aof_rewrite_base_size ?
server.aof_rewrite_base_size : 1;
long long growth = (server.aof_current_size*100/base) - 100;
if (growth >= server.aof_rewrite_perc) {
serverLog(LL_NOTICE,"Starting automatic rewriting of AOF on %lld%% growth",growth);
rewriteAppendOnlyFileBackground();
}
}
}