【Redis-6.0.8】Redis持久化策略(中)

目录

0.阅读引用

1.初识AOF

1.1 简单说明AOF与其存在的意义及持久化方案的选择

1.2 一些概念

1.2.1 AOF文件写入

1.2.2 AOF重写

1.2.3 AOF文件写入的触发

1.2.4 AOF重写的触发条件

1.2.5 与AOF相关的配置说明

1.2.6 AOF命令同步

1.2.7 bgrewriteaof命令执行

2.源码阅读

2.1  server.c和server.h中与AOF相关的代码

2.2  看看命令调用过程中一些与AOF相关的代码

2.3 catAppendOnlyGenericCommand(非对键设置过期信息的命令)函数的实现

2.4 catAppendOnlyExpireAtCommand(对键设置过期信息的命令)函数的实现 

2.5 flushAppendOnlyFile的调用与实现

2.5.1 flushAppendOnlyFile的三处调用

2.5.2 flushAppendOnlyFile的实现

2.6 aofRewriteBufferAppend-父进程给子进程追加字符串

2.7 bgrewriteaofCommand&rewriteAppendOnlyFileBackground

2.8 serverCron中两处可以触发rewriteAppendOnlyFileBackground

2.9 startAppendOnly中可触发rewriteAppendOnlyFileBackground

2.10 rewriteAppendOnlyFileBackground重要子步骤rewriteAppendOnlyFile

2.11 rewriteAppendOnlyFileBackground详细分析

2.11.1 aofCreatePipes-创建父子进程之间通信的管道

2.11.2 openChildInfoPipe-开启父子进程通信管道

2.11.3 画出2.11.2和2.11.3中的管道关系

2.11.4 redisFork()-创建子进程

2.11.5 子进程逻辑

2.11.6 父进程逻辑


0.阅读引用

《Redis5 设计与源码分析》AOF部分

《Redis设计与实现》黄健宏-AOF部分

menwen的AOF

商汤小彭-aof持久化

历小冰的AOF-提供了三处rewrite触发

阅读1

惰性删除的操作

个人觉得全网写的第二好的RedisAOF

redis4.0之利用管道优化aofrewrite

wait3和wait4的用法

进程控制之函数wait、waitpid、waitid、wait3和wait4

Muten-Redis配置文件与调试

Muten-wait系列函数

《unix高级环境编程(第三版)》15.2节 进程间通信-管道  查看电子书

1.初识AOF

1.1 简单说明AOF与其存在的意义及持久化方案的选择

AOF是Redis的另外一种持久化方式。简单来说,AOF就是将Redis服务端执行过的每一条命令都保存到一个
文件,这样当Redis重启时只要按顺序回放这些命令就会恢复到原始状态.

那么,既然已经有了RDB为什么还需要AOF呢?
我们还是从RDB和AOF的实现方式考虑:RDB保存的是一个时间点的快照,那么如果Redis出现了故障,丢失
的就是从最后一次RDB执行的时间点到故障发生的时间间隔之内产生的数据。如果Redis数据量很大,QPS
很高,那么执行一次RDB需要的时间会相应增加,发生故障时丢失的数据也会增多。而AOF保存的是一条条
命令,理论上可以做到发生故障时只丢失一条命令。但由于操作系统中执行写文件操作代价很大,Redis
提供了配置参数,通过对安全性和性能的折中,我们可以设置不同的策略。


既然AOF数据安全性更高,是否可以只使用AOF呢?其实官方是推荐两种持久化的方式同时使用,为什么
Redis推荐RDB和AOF同时开启呢?


我们分析一下两种方式的加载过程:
RDB只需要把相应数据加载到内存并生成相应的数据结构(有些结构如intset、ziplist,保存时直接按字
符串保存,所以加载时速度会更快),而AOF文件的加载需要先创建一个伪客户端,然后把命令一条条发
送给Redis服务端,服务端再完整执行一遍相应的命令。根据Redis作者做的测试,RDB 10s~20s能加载
1GB的文件,AOF的速度是RDB速度的一半(如果做了AOF重写会加快)。

由于AOF和RDB各有优缺点,因此Redis一般会同时开启AOF和RDB.

但假设线上同时配置了RDB和AOF,那么会带来两难选择:重启时如果优先加载RDB,加载速度更快,但是
数据不是很全;如果优先加载AOF,加载速度会变慢,但是数据会比RDB中的要完整.

如何更好的部署与加载呢?让我们在本文中找到答案.

1.2 一些概念

1.2.1 AOF文件写入

AOF持久化最终需要将缓冲区中的内容写入一个文件,写文件通过操作系统提供的write函数执行,
write之后数据将保存到kernel的缓冲区中,然后再调用fsync函数将数据写入磁盘.
fsync是一个阻塞并且缓慢的操作,所以Redis通过appendfsync配置控制执行fsync的频次.
❏ no:不执行fsync,由操作系统负责数据的刷盘。数据安全性最低但Redis性能最高.
❏ always:每执行一次写入就会执行一次fsync。数据安全性最高但会导致Redis性能降低.
❏ everysec:每1秒执行一次fsync操作。属于折中方案,在数据安全性和性能之间达到一个平衡.
生产环境一般配置为appendfsync everysec,即每秒执行一次fsync操作.

1.2.2 AOF重写

随着Redis服务的运行,AOF文件会越来越大,并且当Redis服务有大量的修改操作时,对同一个键可能有
成百上千条执行命令。AOF重写通过fork出一个子进程来执行,重写不会对原有文件进行任何修改和读
取,子进程对所有数据库中所有的键各自生成一条相应的执行命令,最后将重写开始后父进程继续执行的
命令进行回放,生成一个新的AOF文件.

比如说在客户端执行:
rpush list 1 2 3
rpush list 4
rpush list 5
lpop list 
以上四条命令等价于执行:
rpush list 2 3 4 5

AOF重写就是直接按当前list中的内容写为“rpush list 2 3 4 5”.

4条命令变为了一条命令,既可以减小文件大小,又可以提高加载速度.

1.2.3 AOF文件写入的触发

通过配置文件中的appendonly选项来决定是否开启AOF功能,如果是yes,则开启,否则不开启.

1.2.4 AOF重写的触发条件

1.AOF重写有两种种触发方式:
(1)通过配置自动触发;
(2)手动执行bgrewriteaof命令显式触发.

2.触发时机(rewriteAppendOnlyFileBackground):
(1)手动调用bgrewriteaof 命令,如果当前有正在运行的rewrite子进程,则本次rewrite会推迟执行,否则直接触发一次rewrite;
(2)通过配置指令手动开启AOF功能,如果没有RDB子进程的情况下,会触发一次rewrite,将当前数据库中的数据写入rewrite文件(startAppendOnly函数);
(3)在Redis定时器中,如果有需要退出执行的rewrite并且没有正在运行的RDB或者rewrite子进程时,触发一次或者AOF文件大小已经到达配置的rewrite条件也会自动触发一次.


3.配置自动触发的配置和条件说明:
3.1 配置
根据redis.conf的两个参数确定触发的时机,
auto-aof-rewrite-percentage 100:当前AOF的文件空间(aof_current_size)和上一次重写后AOF文件空
间(aof_base_size)的比值。
auto-aof-rewrite-min-size 64mb:表示运行AOF重写时文件最小的体积.
3.2 配置自动触发条件说明
自动触发时机为当下面两个条件同时满足的时候(men_wen):
(1)(aof_current_size > auto-aof-rewrite-min-size;
(2)(aof_current_size - aof_base_size) / aof_base_size >= auto-aof-rewrite-percentage);


1.2.5 与AOF相关的配置说明

(1)appendonly no                      
(2)appendfilename "appendonly.aof"
(3)appendfsync everysec|always|no
(4)no-appendfsync-on-rewrite no
(5)auto-aof-rewrite-percentage 100
(6)auto-aof-rewrite-min-size 64mb 
(7)aof-load-truncated yes
(8)aof-use-rdb-preamble yes
(9)aof-rewrite-incremental-fsync yes
(10)dir AOF和RDB文件存放路径

1.2.5.1 appendonly

决定是否开启AOF(默认值为no),如果是yes,则开启AOF功能,如果配置为no,则不开启AOF功能.

1.2.5.2 appendfilename 

AOF文件名称(默认值为appendonly.aof).

1.2.5.3 appendfsync 

fsync的执行频次(默认值为everysec),有no,always,everysec三个选项.

❏ no:不执行fsync,由操作系统负责数据的刷盘. 数据安全性最低但Redis性能最高. 

❏ always:每执行一次写入就会执行一次fsync. 数据安全性最高但会导致Redis性能降低. 

❏ everysec:每1秒执行一次fsync操作. 属于折中方案,在数据安全性和性能之间达到一个平衡. 

1.2.5.4 no-appendfsync-on-rewrite

开启该参数后,如果后台正在执行一次RDB快照或者AOF重写,则主进程不再进行fsync操作(即使将appendfsync配置为always或者everysec).

1.2.5.5 auto-aof-rewrite-percentage

自动重写的第一个条件,表示当前AOF的文件空间(aof_current_size)和上一次重写后AOF文件空间(aof_base_size)的比值.

1.2.5.6 auto-aof-rewrite-min-size

自动重写的第二个条件,表示运行AOF重写时文件最小的体积.

1.2.5.7 aof-load-truncated

AOF文件以追加日志的方式生成,所以服务端发生故障时可能会有尾部命令不完整的情况.  开启该参数(默认值为yes)后,在此种情况下,AOF文件会截断尾部不完整的命令然后继续加载,并且会在日志中进行提示。如果不开启该参数,则加载AOF文件时会打印错误日志,然后直接退出.

1.2.5.8 aof-use-rdb-preamble

是否开启混合持久化(默认值为yes).

1.2.5.9 aof-rewrite-incremental-fsync

开启该参数之后,AOF重写时,每产生32M数据执行一次fsync.

1.2.5.10 dir

AOF和RDB文件存放路径.

1.2.6 AOF命令同步

每一条命令的执行都会调用call函数,AOF命令的同步就是在call命令中实现的,如下图,如果开启了
AOF,则每条命令执行完毕后都会同步写入aof_buf中,aof_buf是个全局的SDS类型的缓冲区,在struct 
redisServer这个结构体中定义.


命令是按什么格式写入缓冲区中的呢?
Redis通过catAppendOnlyGenericCommand函数将命令转换为保存在缓冲区中的数据结构,我们通过在该函
数处设置断点,打印出转换后的格式.

在我的电脑上(主要在配置文件中要将appendonly配置成yes):
cd /home/muten/module/redis-6.0.8/redis-6.0.8/src
gdb ./redis-server
(gdb) set args ../redis.conf
(gdb) b aof.c:542
(gdb) r
(gdb) c
(gdb) c
(gdb) p dst

cd /home/muten/module/redis-6.0.8/redis-6.0.8/src
./redis-cli
127.0.0.1:6379> set name muten

我测试出现的结果如下,但是前面出现了一段不认识的字符,这个是怎么回事呢?
我现在aof-use-rdb-preamble设置成yes了.

feedAppendOnlyFile

1.2.7 bgrewriteaof命令执行

通过在客户端输入bgrewriteaof命令,该命令调用bgrewriteaofCommand,然后创建管道(管道的作用下
文介绍), fork进程,子进程调用rewriteAppendOnlyFile执行AOF重写操作,父进程记录一些统计指标后
继续进入主循环处理客户端请求。当子进程执行完毕后,父进程调用回调函数做一些后续的处理操作。我
们知道RDB保存的是一个时间点的快照,但是AOF故障时最少可以只丢失一条命令。图20-15中的子进程执
行重写时可能会有成千上万条命令继续在父进程中执行,那么如何保证重写完成后的文件也包括这些命令
呢?很明显,首先需要在父进程中将重写过程中执行的命令进行保存,其次需要将这些命令在重写后的文
件中进行回放。Redis为了尽量减少主进程的阻塞时间,通过管道按批次将父进程累积的命令发送给子进
程,由子进程重写完成后进行回放。因此子进程退出后只会有少量的命令还累积在父进程中,父进程只需
回放这些命令即可。下面介绍重写时父进程用来累积命令使用的结构体。在图20-13中,如果服务端执行
一条命令时正在执行AOF重写,命令还会同步到aof_rewrite_buf_blocks中,这是一个list类型的缓冲
区,每个节点中保存一个aofrwblock类型的数据,代码如下:

#define AOF_RW_BUF_BLOCK_SIZE (1024*1024*10)    /* 10 MB per block */

typedef struct aofrwblock {
    unsigned long used, free;
    char buf[AOF_RW_BUF_BLOCK_SIZE];
} aofrwblock;

该结构体中会保存10MB大小的缓冲区内容,并且有缓冲区使用和空闲长度的记录。当一个节点缓冲区写满
之后,会开辟一个新的节点继续保存执行过的命令。

2.源码阅读

2.1  server.c和server.h中与AOF相关的代码

#define CMD_CALL_PROPAGATE_AOF (1<<2)
#define CMD_CALL_PROPAGATE_REPL (1<<3)
#define CMD_CALL_PROPAGATE (CMD_CALL_PROPAGATE_AOF|CMD_CALL_PROPAGATE_REPL)

/* Command call flags, see call() function */
#define CMD_CALL_NONE 0
#define CMD_CALL_SLOWLOG (1<<0)
#define CMD_CALL_STATS (1<<1)
#define CMD_CALL_PROPAGATE_AOF (1<<2)
#define CMD_CALL_PROPAGATE_REPL (1<<3)
#define CMD_CALL_PROPAGATE (CMD_CALL_PROPAGATE_AOF|CMD_CALL_PROPAGATE_REPL)
#define CMD_CALL_FULL (CMD_CALL_SLOWLOG | CMD_CALL_STATS | CMD_CALL_PROPAGATE)
#define CMD_CALL_NOWRAP (1<<4)  /* Don't wrap also propagate array into
                                   MULTI/EXEC: the caller will handle it.  */

struct redisServer{
   ...
   /* AOF persistence */
    int aof_enabled;                /* AOF configuration */
    int aof_state;                  /* AOF_(ON|OFF|WAIT_REWRITE) */
    int aof_fsync;                  /* Kind of fsync() policy */
    char *aof_filename;             /* Name of the AOF file */
    int aof_no_fsync_on_rewrite;    /* Don't fsync if a rewrite is in prog. */
    int aof_rewrite_perc;           /* Rewrite AOF if % growth is > M and... */
    off_t aof_rewrite_min_size;     /* the AOF file is at least N bytes. */
    off_t aof_rewrite_base_size;    /* AOF size on latest startup or rewrite. */
    off_t aof_current_size;         /* AOF current size. */
    off_t aof_fsync_offset;         /* AOF offset which is already synced to disk. */
    int aof_flush_sleep;            /* Micros to sleep before flush. (used by tests) */
    int aof_rewrite_scheduled;      /* Rewrite once BGSAVE terminates. */
    pid_t aof_child_pid;            /* PID if rewriting process */
    /* 
      如果服务端执行一条命令时正在执行AOF重写,会将这条命令写到aof_rewrite_buf_blocks, 
      可以理解成重写缓冲区
    */
    list *aof_rewrite_buf_blocks;   /* Hold changes during an AOF rewrite. */
    /* aof 缓冲区 */
    sds aof_buf;      /* AOF buffer, written before entering the event loop */
    int aof_fd;       /* File descriptor of currently selected AOF file */
    int aof_selected_db; /* Currently selected DB in AOF */
    time_t aof_flush_postponed_start; /* UNIX time of postponed AOF flush */
    time_t aof_last_fsync;            /* UNIX time of last fsync() */
    time_t aof_rewrite_time_last;   /* Time used by last AOF rewrite run. */
    time_t aof_rewrite_time_start;  /* Current AOF rewrite start time. */
    int aof_lastbgrewrite_status;   /* C_OK or C_ERR */
    unsigned long aof_delayed_fsync;  /* delayed AOF fsync() counter */
    int aof_rewrite_incremental_fsync;/* fsync incrementally while aof rewriting? */
    int rdb_save_incremental_fsync;   /* fsync incrementally while rdb saving? */
    int aof_last_write_status;      /* C_OK or C_ERR */
    int aof_last_write_errno;       /* Valid if aof_last_write_status is ERR */
    int aof_load_truncated;         /* Don't stop on unexpected AOF EOF. */
    int aof_use_rdb_preamble;       /* Use RDB preamble on AOF rewrites. */
    /* AOF pipes used to communicate between parent and child during rewrite. */
    int aof_pipe_write_data_to_child;
    int aof_pipe_read_data_from_parent;
    int aof_pipe_write_ack_to_parent;
    int aof_pipe_read_ack_from_child;
    int aof_pipe_write_ack_to_child;
    int aof_pipe_read_ack_from_parent;
    int aof_stop_sending_diff;     /* If true stop sending accumulated diffs
                                      to child process. */
    sds aof_child_diff;             /* AOF diff accumulator child side. */
    ...
}

2.2  看看命令调用过程中一些与AOF相关的代码

配置文件间中:
standardConfig configs[] = {
...
createBoolConfig("appendonly", NULL, MODIFIABLE_CONFIG, server.aof_enabled, 0, NULL, updateAppendonly),
...
}


void initServer(void) {  
...
server.aof_state = server.aof_enabled ? AOF_ON : AOF_OFF;  
...
}


问题:这个appendonly是配置在配置文件中的,它是如何做到触发appendfsync的呢?
      与这个call是怎么联系上的呢?
答案:
      问题中误会了appendonly的作用,appendonly并不触发命令,只是写缓存,如果appendonly不开 
      启,客户端触发call的时候服务对应的aof_buf是不会写入任何内容的;但一旦appendonly
      打开了,每一次都会把指向的内容按照一定的规则写到aof_buf中.

问题:具体流程是怎么样的呢?
回答:
      客户端每一次调用命令的时候执行call,当发现配置文件中的appendonly参数配置成yes之后,
      会调用propagate,propagate中在一定条件下(AOF开启)会调用feedAppendOnlyFile,
      feedAppendOnlyFile中的函数会调用catAppendOnlyGenericCommand.
      
 

我们看到每一条命令的执行都会调用call函数,AOF命令的同步就是在call命令中实现的,
让我们看一下call命令中与AOF命令相关的内容:

void call(client *c, int flags) {
    ...
    if (flags & CMD_CALL_PROPAGATE &&
        (c->flags & CLIENT_PREVENT_PROP) != CLIENT_PREVENT_PROP)
    {
        int propagate_flags = PROPAGATE_NONE;

        /* Check if the command operated changes in the data set. If so
         * set for replication / AOF propagation. */
        if (dirty) propagate_flags |= (PROPAGATE_AOF|PROPAGATE_REPL);

        /* If the client forced AOF / replication of the command, set
         * the flags regardless of the command effects on the data set. */
        if (c->flags & CLIENT_FORCE_REPL) propagate_flags |= PROPAGATE_REPL;
        if (c->flags & CLIENT_FORCE_AOF) propagate_flags |= PROPAGATE_AOF;

        /* However prevent AOF / replication propagation if the command
         * implementations called preventCommandPropagation() or similar,
         * or if we don't have the call() flags to do so. */
        if (c->flags & CLIENT_PREVENT_REPL_PROP ||
            !(flags & CMD_CALL_PROPAGATE_REPL))
                propagate_flags &= ~PROPAGATE_REPL;
        if (c->flags & CLIENT_PREVENT_AOF_PROP ||
            !(flags & CMD_CALL_PROPAGATE_AOF))
                propagate_flags &= ~PROPAGATE_AOF;

        /* Call propagate() only if at least one of AOF / replication
         * propagation is needed. Note that modules commands handle replication
         * in an explicit way, so we never replicate them automatically. */
        if (propagate_flags != PROPAGATE_NONE && !(c->cmd->flags & CMD_MODULE))
            /* 将命令的详细参数传入aof的buffer的方法 */
            propagate(c->cmd,c->db->id,c->argv,c->argc,propagate_flags);
    }
    ...
}



void propagate(struct redisCommand *cmd, int dbid, robj **argv, int argc,
               int flags)
{
    if (server.aof_state != AOF_OFF && flags & PROPAGATE_AOF)
        feedAppendOnlyFile(cmd,dbid,argv,argc);
    if (flags & PROPAGATE_REPL)
        replicationFeedSlaves(server.slaves,dbid,argv,argc);
}

/*  追加内容到aof文件中 */
void feedAppendOnlyFile(struct redisCommand *cmd, int dictid, robj **argv, int argc) {
 ...
 

 if (exarg)
 buf = catAppendOnlyExpireAtCommand(buf,server.expireCommand,argv[1],exarg);
 if (pxarg)
 buf = catAppendOnlyExpireAtCommand(buf,server.pexpireCommand,argv[1],pxarg);
 else{
    buf = catAppendOnlyGenericCommand(buf,3,tmpargv);
 }
 ...
 if (server.aof_state == AOF_ON)
        server.aof_buf = sdscatlen(server.aof_buf,buf,sdslen(buf));

 /* If a background append only file rewriting is in progress we want to
     * accumulate the differences between the child DB and the current one
     * in a buffer, so that when the child process will do its work we
     * can append the differences to the new append only file. */
    if (server.aof_child_pid != -1)
        aofRewriteBufferAppend((unsigned char*)buf,sdslen(buf));
 ...
}

2.3 catAppendOnlyGenericCommand(非对键设置过期信息的命令)函数的实现

注意:catAppendOnlyGenericCommand只是保存普通键的信息,对于expire和pexpire这两个命令,
      需要保存住过期信息,需要调用catAppendOnlyExpireAtCommand

sds catAppendOnlyGenericCommand(sds dst, int argc, robj **argv) {
    char buf[32];
    int len, j;
    robj *o;

    buf[0] = '*';
    len = 1+ll2string(buf+1,sizeof(buf)-1,argc);
    buf[len++] = '\r';
    buf[len++] = '\n';
    dst = sdscatlen(dst,buf,len);

    for (j = 0; j < argc; j++) {
        o = getDecodedObject(argv[j]);
        buf[0] = '$';
        len = 1+ll2string(buf+1,sizeof(buf)-1,sdslen(o->ptr));
        buf[len++] = '\r';
        buf[len++] = '\n';
        dst = sdscatlen(dst,buf,len);
        dst = sdscatlen(dst,o->ptr,sdslen(o->ptr));
        dst = sdscatlen(dst,"\r\n",2);
        decrRefCount(o);
    }
    return dst;
}

2.4 catAppendOnlyExpireAtCommand(对键设置过期信息的命令)函数的实现 

sds catAppendOnlyExpireAtCommand(sds buf, struct redisCommand *cmd, robj *key, robj *seconds) {
    long long when;
    robj *argv[3];

    /* Make sure we can use strtoll */
    seconds = getDecodedObject(seconds);
    when = strtoll(seconds->ptr,NULL,10);
    /* Convert argument into milliseconds for EXPIRE, SETEX, EXPIREAT */
    if (cmd->proc == expireCommand || cmd->proc == setexCommand ||
        cmd->proc == expireatCommand)
    {
        when *= 1000;
    }
    /* Convert into absolute time for EXPIRE, PEXPIRE, SETEX, PSETEX */
    if (cmd->proc == expireCommand || cmd->proc == pexpireCommand ||
        cmd->proc == setexCommand || cmd->proc == psetexCommand)
    {
        when += mstime();
    }
    decrRefCount(seconds);

    argv[0] = createStringObject("PEXPIREAT",9);
    argv[1] = key;
    argv[2] = createStringObjectFromLongLong(when);
    buf = catAppendOnlyGenericCommand(buf, 3, argv);
    decrRefCount(argv[0]);
    decrRefCount(argv[2]);
    return buf;
}

2.5 flushAppendOnlyFile的调用与实现

2.5.1 flushAppendOnlyFile的三处调用


第一处调用(不强行刷盘):
int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
    ....
   /* AOF postponed flush: Try at every cron cycle if the slow fsync
     * completed. */
    if (server.aof_flush_postponed_start) flushAppendOnlyFile(0);

    /* AOF write errors: in this case we have a buffer to flush as well and
     * clear the AOF error in case of success to make the DB writable again,
     * however to try every second is enough in case of 'hz' is set to
     * an higher frequency. */
    run_with_period(1000) {
        if (server.aof_last_write_status == C_ERR)
            flushAppendOnlyFile(0);
    }
    ...
}

第二处调用(不强行刷盘):
void beforeSleep(struct aeEventLoop *eventLoop) {
    ...
   /* Write the AOF buffer on disk */
    flushAppendOnlyFile(0);
   ...
}


第三处调用(强行刷盘):
int prepareForShutdown(int flags) {

    ...
    if (server.aof_state != AOF_OFF) {
        /* Kill the AOF saving child as the AOF we already have may be longer
         * but contains the full dataset anyway. */
        if (server.aof_child_pid != -1) {
            /* If we have AOF enabled but haven't written the AOF yet, don't
             * shutdown or else the dataset will be lost. */
            if (server.aof_state == AOF_WAIT_REWRITE) {
                serverLog(LL_WARNING, "Writing initial AOF, can't exit.");
                return C_ERR;
            }
            serverLog(LL_WARNING,
                "There is a child rewriting the AOF. Killing it!");
            killAppendOnlyChild();
        }
        /* Append only file: flush buffers and fsync() the AOF at exit */
        serverLog(LL_NOTICE,"Calling fsync() on the AOF file.");
        flushAppendOnlyFile(1);
        redis_fsync(server.aof_fd);
    }
    ...


}

2.5.2 flushAppendOnlyFile的实现

C语言运算符优先级

/* Called when the user switches from "appendonly yes" to "appendonly no"
 * at runtime using the CONFIG command. */


/* 将命令追加到AOF文件中 
   关于force参数
   当fsync被设置为每秒执行一次,如果后台仍有线程正在执行fsync操作,我们可能会延迟flush操
   作,因为write操作可能会被阻塞,当发生这种情况时,说明需要尽快的执行flush操作,会调用
   serverCron()函数. 然而如果force被设置为1,我们会无视后台的fsync,直接进行写入操作.
*/


void flushAppendOnlyFile(int force) {
    ssize_t nwritten;
    int sync_in_progress = 0;
    mstime_t latency;
    
    /* 如果缓冲区没有数据 */
    if (sdslen(server.aof_buf) == 0) {
        /* Check if we need to do fsync even the aof buffer is empty,
         * because previously in AOF_FSYNC_EVERYSEC mode, fsync is
         * called only when aof buffer is not empty, so if users
         * stop write commands before fsync called in one second,
         * the data in page cache cannot be flushed in time. */
        /* 判断我们是否要重试fsync,如果需要重试调用try_fsync */
        if (server.aof_fsync == AOF_FSYNC_EVERYSEC &&
            server.aof_fsync_offset != server.aof_current_size &&
            server.unixtime > server.aof_last_fsync &&
            !(sync_in_progress = aofFsyncInProgress())) {
            goto try_fsync;
        } else {
            return;
        }
    }
    
    /* 如果服务器的aof的刷盘方式是每秒刷一次 */
    if (server.aof_fsync == AOF_FSYNC_EVERYSEC)
       sync_in_progress = aofFsyncInProgress();/*判断AOF刷盘对应的BIO线程是否正在运行*/

    /* 如果服务器的aof的刷盘方式是每秒刷一次且不强制刷盘 */
    if (server.aof_fsync == AOF_FSYNC_EVERYSEC && !force) {
        /* With this append fsync policy we do background fsyncing.
         * If the fsync is still in progress we can try to delay
         * the write for a couple of seconds. */
        if (sync_in_progress) {
            if (server.aof_flush_postponed_start == 0) {
                /* No previous write postponing, remember that we are
                 * postponing the flush and return. */
                server.aof_flush_postponed_start = server.unixtime;
                return;
            } else if (server.unixtime - server.aof_flush_postponed_start < 2) {
                /* We were already waiting for fsync to finish, but for less
                 * than two seconds this is still ok. Postpone again. */
                return;
            }
            /* Otherwise fall trough, and go write since we can't wait
             * over two seconds. */
            server.aof_delayed_fsync++;
            serverLog(LL_NOTICE,"Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.");
        }
    }
    /* We want to perform a single write. This should be guaranteed atomic
     * at least if the filesystem we are writing is a real physical one.
     * While this will save us against the server being killed I don't think
     * there is much to do about the whole server stopping for power problems
     * or alike */

    if (server.aof_flush_sleep && sdslen(server.aof_buf)) {
        usleep(server.aof_flush_sleep);
    }

    latencyStartMonitor(latency);
    nwritten = aofWrite(server.aof_fd,server.aof_buf,sdslen(server.aof_buf));
    latencyEndMonitor(latency);
    /* We want to capture different events for delayed writes:
     * when the delay happens with a pending fsync, or with a saving child
     * active, and when the above two conditions are missing.
     * We also use an additional event name to save all samples which is
     * useful for graphing / monitoring purposes. */
    if (sync_in_progress) {
        latencyAddSampleIfNeeded("aof-write-pending-fsync",latency);
    } else if (hasActiveChildProcess()) {
        latencyAddSampleIfNeeded("aof-write-active-child",latency);
    } else {
        latencyAddSampleIfNeeded("aof-write-alone",latency);
    }
    latencyAddSampleIfNeeded("aof-write",latency);

    /* We performed the write so reset the postponed flush sentinel to zero. */
    server.aof_flush_postponed_start = 0;

    if (nwritten != (ssize_t)sdslen(server.aof_buf)) {
        static time_t last_write_error_log = 0;
        int can_log = 0;

        /* Limit logging rate to 1 line per AOF_WRITE_LOG_ERROR_RATE seconds. */
        if ((server.unixtime - last_write_error_log) > AOF_WRITE_LOG_ERROR_RATE) {
            can_log = 1;
            last_write_error_log = server.unixtime;
        }

        /* Log the AOF write error and record the error code. */
        if (nwritten == -1) {
            if (can_log) {
                serverLog(LL_WARNING,"Error writing to the AOF file: %s",
                    strerror(errno));
                server.aof_last_write_errno = errno;
            }
        } else {
            if (can_log) {
                serverLog(LL_WARNING,"Short write while writing to "
                                       "the AOF file: (nwritten=%lld, "
                                       "expected=%lld)",
                                       (long long)nwritten,
                                       (long long)sdslen(server.aof_buf));
            }

            if (ftruncate(server.aof_fd, server.aof_current_size) == -1) {
                if (can_log) {
                    serverLog(LL_WARNING, "Could not remove short write "
                             "from the append-only file.  Redis may refuse "
                             "to load the AOF the next time it starts.  "
                             "ftruncate: %s", strerror(errno));
                }
            } else {
                /* If the ftruncate() succeeded we can set nwritten to
                 * -1 since there is no longer partial data into the AOF. */
                nwritten = -1;
            }
            server.aof_last_write_errno = ENOSPC;
        }

        /* Handle the AOF write error. */
        if (server.aof_fsync == AOF_FSYNC_ALWAYS) {
            /* We can't recover when the fsync policy is ALWAYS since the
             * reply for the client is already in the output buffers, and we
             * have the contract with the user that on acknowledged write data
             * is synced on disk. */
            serverLog(LL_WARNING,"Can't recover from AOF write error when the AOF fsync policy is 'always'. Exiting...");
            exit(1);
        } else {
            /* Recover from failed write leaving data into the buffer. However
             * set an error to stop accepting writes as long as the error
             * condition is not cleared. */
            server.aof_last_write_status = C_ERR;

            /* Trim the sds buffer if there was a partial write, and there
             * was no way to undo it with ftruncate(2). */
            if (nwritten > 0) {
                server.aof_current_size += nwritten;
                sdsrange(server.aof_buf,nwritten,-1);
            }
            return; /* We'll try again on the next call... */
        }
    } else {
        /* Successful write(2). If AOF was in error state, restore the
         * OK state and log the event. */
        if (server.aof_last_write_status == C_ERR) {
            serverLog(LL_WARNING,
                "AOF write error looks solved, Redis can write again.");
            server.aof_last_write_status = C_OK;
        }
    }
    server.aof_current_size += nwritten;

    /* Re-use AOF buffer when it is small enough. The maximum comes from the
     * arena size of 4k minus some overhead (but is otherwise arbitrary). */
    if ((sdslen(server.aof_buf)+sdsavail(server.aof_buf)) < 4000) {
        sdsclear(server.aof_buf);
    } else {
        sdsfree(server.aof_buf);
        server.aof_buf = sdsempty();
    }

try_fsync:
    /* Don't fsync if no-appendfsync-on-rewrite is set to yes and there are
     * children doing I/O in the background. */
    if (server.aof_no_fsync_on_rewrite && hasActiveChildProcess())
        return;

    /* Perform the fsync if needed. */
    if (server.aof_fsync == AOF_FSYNC_ALWAYS) {
        /* redis_fsync is defined as fdatasync() for Linux in order to avoid
         * flushing metadata. */
        latencyStartMonitor(latency);
        redis_fsync(server.aof_fd); /* Let's try to get this data on the disk */
        latencyEndMonitor(latency);
        latencyAddSampleIfNeeded("aof-fsync-always",latency);
        server.aof_fsync_offset = server.aof_current_size;
        server.aof_last_fsync = server.unixtime;
    } else if ((server.aof_fsync == AOF_FSYNC_EVERYSEC &&
                server.unixtime > server.aof_last_fsync)) {
        if (!sync_in_progress) {
            aof_background_fsync(server.aof_fd);
            server.aof_fsync_offset = server.aof_current_size;
        }
        server.aof_last_fsync = server.unixtime;
    }
}

2.6 aofRewriteBufferAppend-父进程给子进程追加字符串

/*
如本文2.2中,如果在执行feedAppendOnlyFile中,如果父进程发现有子进程正在进行重写的
操作,父进程将新的数据发送给正在重写的子进程,使得重写文件数据更完备. 
*/

/* Append data to the AOF rewrite buffer, allocating new blocks if needed. */
void aofRewriteBufferAppend(unsigned char *s, unsigned long len) {
    listNode *ln = listLast(server.aof_rewrite_buf_blocks);
    aofrwblock *block = ln ? ln->value : NULL;

    while(len) {
        /* If we already got at least an allocated block, try appending
         * at least some piece into it. */
        if (block) {
            unsigned long thislen = (block->free < len) ? block->free : len;
            if (thislen) {  /* The current block is not already full. */
                memcpy(block->buf+block->used, s, thislen);
                block->used += thislen;
                block->free -= thislen;
                s += thislen;
                len -= thislen;
            }
        }

        if (len) { /* First block to allocate, or need another block. */
            int numblocks;

            block = zmalloc(sizeof(*block));
            block->free = AOF_RW_BUF_BLOCK_SIZE;
            block->used = 0;
            listAddNodeTail(server.aof_rewrite_buf_blocks,block);

            /* Log every time we cross more 10 or 100 blocks, respectively
             * as a notice or warning. */
            numblocks = listLength(server.aof_rewrite_buf_blocks);
            if (((numblocks+1) % 10) == 0) {
                int level = ((numblocks+1) % 100) == 0 ? LL_WARNING :
                                                         LL_NOTICE;
                serverLog(level,"Background AOF buffer size: %lu MB",
                    aofRewriteBufferSize()/(1024*1024));
            }
        }
    }

    /* Install a file event to send data to the rewrite child if there is
     * not one already. */
    if (aeGetFileEvents(server.el,server.aof_pipe_write_data_to_child) == 0) {
        aeCreateFileEvent(server.el, server.aof_pipe_write_data_to_child,
            AE_WRITABLE, aofChildWriteDiffData, NULL);
    }
}

2.7 bgrewriteaofCommand&rewriteAppendOnlyFileBackground

void bgrewriteaofCommand(client *c) {
    /* 重写正在进行时,返回错误 */
    if (server.aof_child_pid != -1) {
        addReplyError(c,"Background append only file rewriting already in progress");
    } 
    /* 有其它子进程正在进行工作时, 延后执行 */
    else if (hasActiveChildProcess()) {
        server.aof_rewrite_scheduled = 1;
        addReplyStatus(c,"Background append only file rewriting scheduled");
    }
    /* 开启子进程,异步执行重写 */ 
    else if (rewriteAppendOnlyFileBackground() == C_OK) {
        addReplyStatus(c,"Background append only file rewriting started");
    } 
    else /* 重写操作失败, 检查原因 */ 
    {
        addReplyError(c,"Can't execute an AOF background rewriting. "
                        "Please check the server logs for more information.");
    }
}



int rewriteAppendOnlyFileBackground(void) {
    pid_t childpid;

    if (hasActiveChildProcess()) return C_ERR;
    if (aofCreatePipes() != C_OK) return C_ERR;
    openChildInfoPipe();
    if ((childpid = redisFork()) == 0) {
        char tmpfile[256];

        /* Child */
        redisSetProcTitle("redis-aof-rewrite");
        redisSetCpuAffinity(server.aof_rewrite_cpulist);
        snprintf(tmpfile,256,"temp-rewriteaof-bg-%d.aof", (int) getpid());
        if (rewriteAppendOnlyFile(tmpfile) == C_OK) {
            sendChildCOWInfo(CHILD_INFO_TYPE_AOF, "AOF rewrite");
            exitFromChild(0);
        } else {
            exitFromChild(1);
        }
    } else {
        /* Parent */
        if (childpid == -1) {
            closeChildInfoPipe();
            serverLog(LL_WARNING,
                "Can't rewrite append only file in background: fork: %s",
                strerror(errno));
            aofClosePipes();
            return C_ERR;
        }
        serverLog(LL_NOTICE,
            "Background append only file rewriting started by pid %d",childpid);
        server.aof_rewrite_scheduled = 0;
        server.aof_rewrite_time_start = time(NULL);
        server.aof_child_pid = childpid;
        /* We set appendseldb to -1 in order to force the next call to the
         * feedAppendOnlyFile() to issue a SELECT command, so the differences
         * accumulated by the parent into server.aof_rewrite_buf will start
         * with a SELECT statement and it will be safe to merge. */
        server.aof_selected_db = -1;
        replicationScriptCacheFlush();
        return C_OK;
    }
    return C_OK; /* unreached */
}

2.8 serverCron中两处可以触发rewriteAppendOnlyFileBackground

int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
    ....
   /* Start a scheduled AOF rewrite if this was requested by the user while
     * a BGSAVE was in progress. */
    if (!hasActiveChildProcess() &&
        server.aof_rewrite_scheduled)
    {
        rewriteAppendOnlyFileBackground(); 
    }

     /* Check if a background saving or AOF rewrite in progress terminated. */
    if (hasActiveChildProcess() || ldbPendingChildren())
    {
        checkChildrenDone();
    } else {
        /* If there is not a background saving/rewrite in progress check if
         * we have to save/rewrite now. */
        for (j = 0; j < server.saveparamslen; j++) {
            struct saveparam *sp = server.saveparams+j;

            /* Save if we reached the given amount of changes,
             * the given amount of seconds, and if the latest bgsave was
             * successful or if, in case of an error, at least
             * CONFIG_BGSAVE_RETRY_DELAY seconds already elapsed. */
            if (server.dirty >= sp->changes &&
                server.unixtime-server.lastsave > sp->seconds &&
                (server.unixtime-server.lastbgsave_try >
                 CONFIG_BGSAVE_RETRY_DELAY ||
                 server.lastbgsave_status == C_OK))
            {
                serverLog(LL_NOTICE,"%d changes in %d seconds. Saving...",
                    sp->changes, (int)sp->seconds);
                rdbSaveInfo rsi, *rsiptr;
                rsiptr = rdbPopulateSaveInfo(&rsi);
                rdbSaveBackground(server.rdb_filename,rsiptr);
                break;
            }
        }

        /* Trigger an AOF rewrite if needed. */
        if (server.aof_state == AOF_ON &&
            !hasActiveChildProcess() &&
            server.aof_rewrite_perc &&
            server.aof_current_size > server.aof_rewrite_min_size)
        {
            long long base = server.aof_rewrite_base_size ?
                server.aof_rewrite_base_size : 1;
            long long growth = (server.aof_current_size*100/base) - 100;
            if (growth >= server.aof_rewrite_perc) {
                serverLog(LL_NOTICE,"Starting automatic rewriting of AOF on %lld%% growth",growth);
                rewriteAppendOnlyFileBackground();
            }
        }
    }
    ...
}

2.9 startAppendOnly中可触发rewriteAppendOnlyFileBackground

/* Called when the user switches from "appendonly no" to "appendonly yes"
 * at runtime using the CONFIG command. */
int startAppendOnly(void) {
    char cwd[MAXPATHLEN]; /* Current working dir path for error messages. */
    int newfd;

    newfd = open(server.aof_filename,O_WRONLY|O_APPEND|O_CREAT,0644);
    serverAssert(server.aof_state == AOF_OFF);
    if (newfd == -1) {
        char *cwdp = getcwd(cwd,MAXPATHLEN);

        serverLog(LL_WARNING,
            "Redis needs to enable the AOF but can't open the "
            "append only file %s (in server root dir %s): %s",
            server.aof_filename,
            cwdp ? cwdp : "unknown",
            strerror(errno));
        return C_ERR;
    }
    if (hasActiveChildProcess() && server.aof_child_pid == -1) {
        server.aof_rewrite_scheduled = 1;
        serverLog(LL_WARNING,"AOF was enabled but there is already another background operation. An AOF background was scheduled to start when possible.");
    } else {
        /* If there is a pending AOF rewrite, we need to switch it off and
         * start a new one: the old one cannot be reused because it is not
         * accumulating the AOF buffer. */
        if (server.aof_child_pid != -1) {
            serverLog(LL_WARNING,"AOF was enabled but there is already an AOF rewriting in background. Stopping background AOF and starting a rewrite now.");
            killAppendOnlyChild();
        }
        if (rewriteAppendOnlyFileBackground() == C_ERR) {
            close(newfd);
            serverLog(LL_WARNING,"Redis needs to enable the AOF but can't trigger a background AOF rewrite operation. Check the above logs for more info about the error.");
            return C_ERR;
        }
    }
    /* We correctly switched on AOF, now wait for the rewrite to be complete
     * in order to append data on disk. */
    server.aof_state = AOF_WAIT_REWRITE;
    server.aof_last_fsync = server.unixtime;
    server.aof_fd = newfd;
    return C_OK;
}

2.10 rewriteAppendOnlyFileBackground重要子步骤rewriteAppendOnlyFile

/* Write a sequence of commands able to fully rebuild the dataset into
 * "filename". Used both by REWRITEAOF and BGREWRITEAOF.
 *
 * In order to minimize the number of commands needed in the rewritten
 * log Redis uses variadic commands when possible, such as RPUSH, SADD
 * and ZADD. However at max AOF_REWRITE_ITEMS_PER_CMD items per time
 * are inserted using a single command. */
int rewriteAppendOnlyFile(char *filename) {
    rio aof;
    FILE *fp;
    char tmpfile[256];
    char byte;

    /* Note that we have to use a different temp name here compared to the
     * one used by rewriteAppendOnlyFileBackground() function. */
    snprintf(tmpfile,256,"temp-rewriteaof-%d.aof", (int) getpid());
    fp = fopen(tmpfile,"w");
    if (!fp) {
        serverLog(LL_WARNING, "Opening the temp file for AOF rewrite in rewriteAppendOnlyFile(): %s", strerror(errno));
        return C_ERR;
    }

    server.aof_child_diff = sdsempty();
    rioInitWithFile(&aof,fp);

    if (server.aof_rewrite_incremental_fsync)
        rioSetAutoSync(&aof,REDIS_AUTOSYNC_BYTES);

    startSaving(RDBFLAGS_AOF_PREAMBLE);

    if (server.aof_use_rdb_preamble) {
        int error;
        if (rdbSaveRio(&aof,&error,RDBFLAGS_AOF_PREAMBLE,NULL) == C_ERR) {
            errno = error;
            goto werr;
        }
    } else {
        if (rewriteAppendOnlyFileRio(&aof) == C_ERR) goto werr;
    }

    /* Do an initial slow fsync here while the parent is still sending
     * data, in order to make the next final fsync faster. */
    if (fflush(fp) == EOF) goto werr;
    if (fsync(fileno(fp)) == -1) goto werr;

    /* Read again a few times to get more data from the parent.
     * We can't read forever (the server may receive data from clients
     * faster than it is able to send data to the child), so we try to read
     * some more data in a loop as soon as there is a good chance more data
     * will come. If it looks like we are wasting time, we abort (this
     * happens after 20 ms without new data). */
    int nodata = 0;
    mstime_t start = mstime();
    while(mstime()-start < 1000 && nodata < 20) {
        if (aeWait(server.aof_pipe_read_data_from_parent, AE_READABLE, 1) <= 0)
        {
            nodata++;
            continue;
        }
        nodata = 0; /* Start counting from zero, we stop on N *contiguous*
                       timeouts. */
        aofReadDiffFromParent();
    }

    /* Ask the master to stop sending diffs. */
    if (write(server.aof_pipe_write_ack_to_parent,"!",1) != 1) goto werr;
    if (anetNonBlock(NULL,server.aof_pipe_read_ack_from_parent) != ANET_OK)
        goto werr;
    /* We read the ACK from the server using a 10 seconds timeout. Normally
     * it should reply ASAP, but just in case we lose its reply, we are sure
     * the child will eventually get terminated. */
    if (syncRead(server.aof_pipe_read_ack_from_parent,&byte,1,5000) != 1 ||
        byte != '!') goto werr;
    serverLog(LL_NOTICE,"Parent agreed to stop sending diffs. Finalizing AOF...");

    /* Read the final diff if any. */
    aofReadDiffFromParent();

    /* Write the received diff to the file. */
    serverLog(LL_NOTICE,
        "Concatenating %.2f MB of AOF diff received from parent.",
        (double) sdslen(server.aof_child_diff) / (1024*1024));
    if (rioWrite(&aof,server.aof_child_diff,sdslen(server.aof_child_diff)) == 0)
        goto werr;

    /* Make sure data will not remain on the OS's output buffers */
    if (fflush(fp) == EOF) goto werr;
    if (fsync(fileno(fp)) == -1) goto werr;
    if (fclose(fp) == EOF) goto werr;

    /* Use RENAME to make sure the DB file is changed atomically only
     * if the generate DB file is ok. */
    if (rename(tmpfile,filename) == -1) {
        serverLog(LL_WARNING,"Error moving temp append only file on the final destination: %s", strerror(errno));
        unlink(tmpfile);
        stopSaving(0);
        return C_ERR;
    }
    serverLog(LL_NOTICE,"SYNC append only file rewrite performed");
    stopSaving(1);
    return C_OK;

werr:
    serverLog(LL_WARNING,"Write error writing append only file on disk: %s", strerror(errno));
    fclose(fp);
    unlink(tmpfile);
    stopSaving(0);
    return C_ERR;
}

2.11 rewriteAppendOnlyFileBackground详细分析

2.11.1 aofCreatePipes-创建父子进程之间通信的管道

/* Create the pipes used for parent - child process IPC during rewrite.
 * We have a data pipe used to send AOF incremental diffs to the child,
 * and two other pipes used by the children to signal it finished with
 * the rewrite so no more data should be written, and another for the
 * parent to acknowledge it understood this new condition. */
int aofCreatePipes(void) {
    int fds[6] = {-1, -1, -1, -1, -1, -1};
    int j;
    /* 父进程向子进程写数据的管道,父写子读 */
    if (pipe(fds) == -1) goto error; /* parent -> children data. */
    /* 子进程向父进程发起停止传输的控制管道,子写父读 */
    if (pipe(fds+2) == -1) goto error; /* children -> parent ack. */
    /* 父进程向子进程回复的控制管道,父写子读 */
    if (pipe(fds+4) == -1) goto error; /* parent -> children ack. */
    /* Parent -> children data is non blocking. */
    /* 将写数据的管道设置成非阻塞的 */
    if (anetNonBlock(NULL,fds[0]) != ANET_OK) goto error;
    if (anetNonBlock(NULL,fds[1]) != ANET_OK) goto error;
    if (aeCreateFileEvent(server.el, fds[2], AE_READABLE, aofChildPipeReadable, NULL) == AE_ERR) goto error;
    
    /*   man 3 pipe
         int pipe(int fildes[2]);
         Data can be written to the file descriptor fildes[1] 
         and read from the file descriptor fildes[0]
         fildes[0]--读端
         fildes[1]--写端
      
     */

    server.aof_pipe_write_data_to_child = fds[1]; /* 父进程向子进程写数据的fd */
    server.aof_pipe_read_data_from_parent = fds[0];/* 子进程从父进程读数据的fd */
    server.aof_pipe_write_ack_to_parent = fds[3];/* 子进程向父进程发起停止消息的fd */
    server.aof_pipe_read_ack_from_child = fds[2];/* 父进程从子进程读取停止消息的fd */
    server.aof_pipe_write_ack_to_child = fds[5];/* 父进程向子进程回复消息的fd */
    server.aof_pipe_read_ack_from_parent = fds[4];/* 子进程从父进程读取回复消息的fd */
    server.aof_stop_sending_diff = 0;/* 将是否停止管道传输标记位初始化为0 */
    return C_OK;

error:
    serverLog(LL_WARNING,"Error opening /setting AOF rewrite IPC pipes: %s",
        strerror(errno));
    for (j = 0; j < 6; j++) if(fds[j] != -1) close(fds[j]);
    return C_ERR;
}

2.11.2 openChildInfoPipe-开启父子进程通信管道

/* Open a child-parent channel used in order to move information about the
 * RDB / AOF saving process from the child to the parent (for instance
 * the amount of copy on write memory used) */
/*
打开子-父通道,该通道用于移动RDB/AOF保存过程中从子节点保存到父节点所产生的信息
(例如,写时复制所用到的内存量)
(网友:openChildInfoPipe()函数可以用来收集子进程copy-on-write用到的内存)
*/
void openChildInfoPipe(void) {
    if (pipe(server.child_info_pipe) == -1) {
        /* On error our two file descriptors should be still set to -1,
         * but we call anyway cloesChildInfoPipe() since can't hurt. */
        closeChildInfoPipe();
    } else if (anetNonBlock(NULL,server.child_info_pipe[0]) != ANET_OK) {
        closeChildInfoPipe();
    } else {
        memset(&server.child_info_data,0,sizeof(server.child_info_data));
    }
}

2.11.3 画出2.11.2和2.11.3中的管道关系

 

2.11.4 redisFork()-创建子进程

int redisFork() {
    int childpid;
    long long start = ustime();
    if ((childpid = fork()) == 0) {
        /* Child */
        setOOMScoreAdj(CONFIG_OOM_BGCHILD);
        setupChildSignalHandlers();
        closeClildUnusedResourceAfterFork();/* 名字写错了 */
    } else {
        /* Parent */
        server.stat_fork_time = ustime()-start;
        server.stat_fork_rate = (double) zmalloc_used_memory() * 1000000 / server.stat_fork_time / (1024*1024*1024); /* GB per second. */
        latencyAddSampleIfNeeded("fork",server.stat_fork_time/1000);
        if (childpid == -1) {
            return -1;
        }
        updateDictResizePolicy();
    }
    return childpid;
}

2.11.5 子进程逻辑

 if ((childpid = redisFork()) == 0) {
        char tmpfile[256];

        /* Child */
        redisSetProcTitle("redis-aof-rewrite");
        redisSetCpuAffinity(server.aof_rewrite_cpulist);
        snprintf(tmpfile,256,"temp-rewriteaof-bg-%d.aof", (int) getpid());
        if (rewriteAppendOnlyFile(tmpfile) == C_OK) {
            sendChildCOWInfo(CHILD_INFO_TYPE_AOF, "AOF rewrite");
            exitFromChild(0);
        } else {
            exitFromChild(1);
        }
    }

2.11.6 父进程逻辑

else {
        /* Parent */
        if (childpid == -1) {
            closeChildInfoPipe();
            serverLog(LL_WARNING,
                "Can't rewrite append only file in background: fork: %s",
                strerror(errno));
            aofClosePipes();
            return C_ERR;
        }
        serverLog(LL_NOTICE,
            "Background append only file rewriting started by pid %d",childpid);
        server.aof_rewrite_scheduled = 0;
        server.aof_rewrite_time_start = time(NULL);
        server.aof_child_pid = childpid;
        /* We set appendseldb to -1 in order to force the next call to the
         * feedAppendOnlyFile() to issue a SELECT command, so the differences
         * accumulated by the parent into server.aof_rewrite_buf will start
         * with a SELECT statement and it will be safe to merge. */
        server.aof_selected_db = -1;
        replicationScriptCacheFlush();
        return C_OK;
    }

serverCron中调用了checkChildrenDone,checkChildrenDone中调用了receiveChildInfo,这个应该是父进程发起的调用.

  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值