【Redis源码剖析】 - Redis持久化之AOF

原创作品,转载请标明:http://blog.csdn.net/xiejingfa/article/details/51644390

Redis源码剖析系列文章汇总:传送门

Redis提供了两种持久化方法:RDB和AOF。在前面一篇文章中我们已经介绍过RDB持久化机制,今天我们来看看AOF持久化方法。本文主要涉及aof.c文件。


在前面一篇文章中,我们看到RDB持久化实际上就是把Redis数据库中的所有键值对数据按照约定好的格式存放在磁盘文件中。而AOF持久化则采用了不同的策略,它将所有写操作相关的命令记录到磁盘文件中。一般对Redis的操作命令可以分为“读命令”和“写命令”两种,只有写命令才会改变数据的状态,所以AOF持久化机制将服务器所执行的写命令记录下来,当系统崩溃时,只要重新执行记录在AOF文件中的写命令就可以将数据库还原成原来的状态。从这个角度看,Redis的AOF机制有点类似于log(记日志)的过程。

1、初识AOF

AOF是Append Only File的缩写,我们先通过一个例子直观地感受一下AOF文件。

先在Redis客户端中执行以下命令,存入一些数据:

127.0.0.1:6379> flushdb
OK
127.0.0.1:6379> set mystr "this is redis"
OK
127.0.0.1:6379> hset myhash name xiejingfa
(integer) 1
127.0.0.1:6379> lpush mylist one two three
(integer) 3
127.0.0.1:6379> sadd myset hello world
(integer) 2
127.0.0.1:6379> zadd myzset 1 a 2 b 3 c 4 d
(integer) 4

Redis提供了BGREWRITEAOF命令来重写AOF文件,关于这个命令我们下面会详细介绍。现在我们在客户端中输入BGREWRITEAOF命令:

127.0.0.1:6379> BGREWRITEAOF
Background append only file rewriting started

执行成功后我们在磁盘中找到该AOF文件(appendonly.aof),该文件存放的内容如下:

*2          // 接下来的一条命令有2个参数
$6         // 第一个参数的长度为6
SELECT      // 第一个参数
$1         // 第二个参数的长度为1
0           // 第二个参数
*3          // 接下来的一条命令有3个参数
$3         // ...
SET
$5
mystr
$13
this is redis
*5
$5
RPUSH
$6
mylist
$5
three
$3
two
$3
one
*4
$5
HMSET
$6
myhash
$4
name
$9
xiejingfa
*4
$4
SADD
$5
myset
$5
world
$5
hello
*10
$4
ZADD
$6
myzset
$1
1
$1
a
$1
2
$1
b
$1
3
$1
c
$1
4
$1
d

我们可以看到AOF文件中的内容完全是以纯文本格式的形式存放的。

2、AOF实现原理

2.1、AOF文件格式

从上面的例子可以看到所有被写入到AOF文件中的命令都是纯文本格式。相比于RDB文件的存储格式,AOF文件的存储格式要简单得多,对于AOF文件中的一条命令,其保存的格式如下:

*<count>    // <count>表示该命令有2个参数
$<len>     // <len>表示第1个参数的长度
<content>   // <content>表示第1个参数的内容
$<len>     // <len>表示第2个参数的长度
<content>   // <content>表示第2个参数的内容
...


aof.c文件中的catAppendOnlyGenericCommand函数提供了根据传入命令和该命令的参数将其构造成满足AOF文件格式的字符串的功能。

sds catAppendOnlyGenericCommand(sds dst, int argc, robj **argv) {
    char buf[32];
    int len, j;
    robj *o;

    // 构建格式为“*<count>\r\n"格式的字符串,<count>为命令参数个数
    buf[0] = '*';
    len = 1+ll2string(buf+1,sizeof(buf)-1,argc);
    buf[len++] = '\r';
    buf[len++] = '\n';
    dst = sdscatlen(dst,buf,len);

    // 重建命令,每个item的格式为“$<len>\r\n<content>\r\n”,其中<len>指明<content>的字符长度,<content>为参数内容
    for (j = 0; j < argc; j++) {
        o = getDecodedObject(argv[j]);
        buf[0] = '$';
        len = 1+ll2string(buf+1,sizeof(buf)-1,sdslen(o->ptr));
        buf[len++] = '\r';
        buf[len++] = '\n';
        dst = sdscatlen(dst,buf,len);
        dst = sdscatlen(dst,o->ptr,sdslen(o->ptr));
        dst = sdscatlen(dst,"\r\n",2);
        decrRefCount(o);
    }
    // 返回重建后的命令内容
    return dst;
}

2.2、AOF缓冲区

AOF持久化需要将所有写命令记录在文件中来保存服务器状态,而文件写入操作效率比较低,如果每执行一条写命令都要写一次AOF文件无疑是低效的。为了提高效率,Redis提供了一个中间层 – AOF缓冲区,也就是说当Redis执行一条写命令后,先将该命令追加到AOF缓冲区中,在以后的某个时刻再将AOF缓冲区中的内容同步到文件中。

AOF缓冲区定义在redisServer结构体中,实际上是一个字符串对象。

struct redisServer {
    ...
    //  AOF缓冲区
    sds aof_buf;      /* AOF buffer, written before entering the event loop */
    ...
};

将命令追加到缓冲区中的操作由feedAppendOnlyFile函数实现,如果后台正在执行AOF文件后台重写操作,该函数命令追加到AOF重写缓存中。

void feedAppendOnlyFile(struct redisCommand *cmd, int dictid, robj **argv, int argc) {
    sds buf = sdsempty();
    robj *tmpargv[3];

    /* The DB this command was targeting is not the same as the last command
     * we appended. To issue a SELECT command is needed. */
    // 如果当前命令涉及的数据库与server.aof_selected_db指明的数据库不一致,需要加入SELECT命令显式设置
    if (dictid != server.aof_selected_db) {
        char seldb[64];

        snprintf(seldb,sizeof(seldb),"%d",dictid);
        buf = sdscatprintf(buf,"*2\r\n$6\r\nSELECT\r\n$%lu\r\n%s\r\n",
            (unsigned long)strlen(seldb),seldb);
        server.aof_selected_db = dictid;
    }

    // 处理EXPIRE, SETEX, EXPIREAT命令
    if (cmd->proc == expireCommand || cmd->proc == pexpireCommand ||
        cmd->proc == expireatCommand) {
        /* Translate EXPIRE/PEXPIRE/EXPIREAT into PEXPIREAT */
        // 将EXPIRE/PEXPIRE/EXPIREAT命令都转换为PEXPIREAT命令
        buf = catAppendOnlyExpireAtCommand(buf,cmd,argv[1],argv[2]);
    } 
    // 处理SETEX、PSETEX命令
    else if (cmd->proc == setexCommand || cmd->proc == psetexCommand) {
        /* Translate SETEX/PSETEX to SET and PEXPIREAT */
        // 将SETEX/PSETEX命令转换为SET命令和PEXPIREAT命令
        tmpargv[0] = createStringObject("SET",3);
        tmpargv[1] = argv[1];
        tmpargv[2] = argv[3];
        buf = catAppendOnlyGenericCommand(buf,3,tmpargv);
        decrRefCount(tmpargv[0]);
        buf = catAppendOnlyExpireAtCommand(buf,cmd,argv[1],argv[2]);
    } 
    // 其它命令使用catAppendOnlyGenericCommand()函数处理
    else {
        /* All the other commands don't need translation or need the
         * same translation already operated in the command vector
         * for the replication itself. */
        // 所有其它命令并不需要转换操作或者已经完成转换
        buf = catAppendOnlyGenericCommand(buf,argc,argv);
    }

    /* Append to the AOF buffer. This will be flushed on disk just before
     * of re-entering the event loop, so before the client will get a
     * positive reply about the operation performed. */
    // 将重构后的命令字符串追加到AOF缓冲区中。AOF缓冲区中的数据会在重新进入时间循环前写入磁盘中,相应的客户端
    // 也会受到一个关于此次操作的回复消息
    if (server.aof_state == REDIS_AOF_ON)
        server.aof_buf = sdscatlen(server.aof_buf,buf,sdslen(buf));

    /* If a background append only file rewriting is in progress we want to
     * accumulate the differences between the child DB and the current one
     * in a buffer, so that when the child process will do its work we
     * can append the differences to the new append only file. */
    // 如果后台正在执行AOF文件重写操作(即BGREWRITEAOF命令),为了记录当前正在重写的AOF文件和当前数据库的
    // 差异信息,我们还需要将重构后的命令追加到AOF重写缓存中。
    if (server.aof_child_pid != -1)
        aofRewriteBufferAppend((unsigned char*)buf,sdslen(buf));

    sdsfree(buf);
}

2.3、同步策略

在上面的介绍中,我们调用feedAppendOnlyFile函数只是把命令追加到了AOF缓冲区server.aof_buf中,并没有写入到磁盘文件中。

在现代操作系统中,当用户将数据写入一个文件中时,为了提高效率,操作系统会先利用一个缓冲区来存放写入的内容,直到这个缓冲区满了或者超过指定的时间后才真正将缓冲区中的内容写入到磁盘文件中。为了强制让操作系统将缓冲区中的数据写入磁盘,一般可以通过fsync()函数来强制写入到磁盘中。而fsync()函数的调用频率就是我们这一小节要介绍的“同步策略”。

Redis可以通过配置redis.conf文件中的flush选项来指定AOF同步策略,主要支持以下三种同步策略:

aof_fsync选项值功能
AOF_FSYNC_EVERYSEC每秒同步一次
AOF_FSYNC_ALWAYS每次事件循环写操作后都执行同步
AOF_FSYNC_NO不同步,让操作系统来决定何时同步


下面详细介绍以上各选项:

2.3.1、AOF_FSYNC_NO

在该模式下,Redis服务器在每个事件循环都将AOF缓冲区server.aof_buf中的数据写入AOF文件中,但不执行同步fsync方法,由操作系统决定何时同步。该模式速度最快(无需执行同步操作)但也最不安全(如果机器崩溃将丢失上次同步后的所有数据)。

2.3.2、AOF_FSYNC_ALWAYS

在该模式下,Redis服务器在每个事件循环都将AOF缓冲区server.aof_buf中的数据写入AOF文件中,且执行一次AOF文件同步操作。该模式速度最慢(每个事件循环都要执行同步操作)但也最安全(如果机器崩溃只丢失当前事件循环中处理的新数据)。

2.3.2、AOF_FSYNC_EVERYSEC

在该模式下,Redis服务器在每个事件循环都将AOF缓冲区server.aof_buf中的数据写入AOF文件中,且每秒执行一次AOF文件同步操作。该模式效率和安全性(如果机器崩溃只丢失前一秒处理的新数据)比较适中,是Redis的默认同步策略。

2.4、数据还原

数据还原就是将AOF文件中保存的命令解析并执行,这样就可以将数据库还原为原来的状态。

因为在Redis中,命令必须由redisClient实例来执行,所以为了加载AOF文件需要创建一个伪Redis客户端。创建了伪Redis客户端后,执行数据还原的过程就是从AOF文件中读取命令并交给伪Redis客户端执行的过程。

数据还原的功能由aof.c文件中的loadAppendOnlyFile函数完成。该函数的实现比较简单,这里就不贴出代码,如果有需要大家可以到文末提供的注释版源码中查看。

3、AOF重写

3.1、AOF重写实现

上面介绍的内容基本上就实现了数据的持久化功能,但是“贴心”的Redis还为我们考虑到了这样一个场景:AOF文件只是简单的存储了写操作相关的命令,而并没有进行合并。随着Redis服务器在运行过程中不断接受命令,如果Redis只是将客户端修改数据库的命令存储在AOF文件中,AOF文件会急剧膨胀而导致效率低下(AOF文件越大,占用存储空间越大,数据还原过程耗时越多)。

为了解决这个问题,Redis提供了一种称为AOF重写(AOF rewrite)的功能。何为AOF重写呢?

AOF重写可以理解为命令合并的过程。比如,Redis服务器接收了下面5条命令:

127.0.0.1:6379> lpush mylist one two
(integer) 2
127.0.0.1:6379> rpush mylist three four
(integer) 4
127.0.0.1:6379> lpop mylist
"two"
127.0.0.1:6379> rpop mylist
"four"
127.0.0.1:6379> lrange mylist 0 -1
1) "one"
2) "three"

AOF文件中需要使用4条记录来前面4条写命令,实际上mylist中只存放了两个元素。

127.0.0.1:6379> lrange mylist 0 -1
1) "one"
2) "three"

这样,我们就可以用lpush mylist one three这样一条命令来代替上面的四条命令,从而减少AOF文件的大小。此过程就是AOF重写的过程。

那么如何进行AOF重写呢?最简单的方法就是遍历当前数据库的键空间,将每个key对应的对象用一条命令来表达并保存到AOF文件中。

AOF重写的功能由rewriteAppendOnlyFile函数实现:

int rewriteAppendOnlyFile(char *filename) {
    dictIterator *di = NULL;
    dictEntry *de;
    rio aof;
    FILE *fp;
    char tmpfile[256];
    int j;
    long long now = mstime();
    char byte;
    size_t processed = 0;

    /* Note that we have to use a different temp name here compared to the
     * one used by rewriteAppendOnlyFileBackground() function. */
    // 创建临时文件,注意到这里的临时文件名和rewriteAppendOnlyFileBackground函数中的临时文件名不同
    snprintf(tmpfile,256,"temp-rewriteaof-%d.aof", (int) getpid());
    // 打开临时文件
    fp = fopen(tmpfile,"w");
    if (!fp) {
        // 打开失败
        redisLog(REDIS_WARNING, "Opening the temp file for AOF rewrite in rewriteAppendOnlyFile(): %s", strerror(errno));
        return REDIS_ERR;
    }

    server.aof_child_diff = sdsempty();
    // 初始化文件rio对象
    rioInitWithFile(&aof,fp);
    // 每写入REDIS_AOF_AUTOSYNC_BYTES个字节数据就执行一个sync同步操作
    if (server.aof_rewrite_incremental_fsync)
        rioSetAutoSync(&aof,REDIS_AOF_AUTOSYNC_BYTES);
    // 遍历所有的数据库,重构命令
    for (j = 0; j < server.dbnum; j++) {
        // SELECT命令
        char selectcmd[] = "*2\r\n$6\r\nSELECT\r\n";
        // 指向当前数据库
        redisDb *db = server.db+j;
        // 指向当前数据库的键空间
        dict *d = db->dict;
        // 如果当前键空间为空,处理下一个数据库
        if (dictSize(d) == 0) continue;
        // 创建键空间的迭代器
        di = dictGetSafeIterator(d);
        if (!di) {
            fclose(fp);
            return REDIS_ERR;
        }

        /* SELECT the new DB */
        // 写入SELECT命令,确保数据恢复到相应数据库中
        if (rioWrite(&aof,selectcmd,sizeof(selectcmd)-1) == 0) goto werr;
        if (rioWriteBulkLongLong(&aof,j) == 0) goto werr;

        /* Iterate this DB writing every entry */
        // 遍历键空间中的所有key
        while((de = dictNext(di)) != NULL) {
            sds keystr;
            robj key, *o;
            long long expiretime;

            // 取出key值
            keystr = dictGetKey(de);
            // 取出对应的value值
            o = dictGetVal(de);
            initStaticStringObject(key,keystr);

            // 取出该key的过期时间
            expiretime = getExpire(db,&key);

            /* If this key is already expired skip it */
            // 如果该key已经过期,则跳过该key
            if (expiretime != -1 && expiretime < now) continue;

            /* Save the key and associated value */
            // 根据value值对象的类型还远成相应的命令进行保存

            // 处理string类型对象
            if (o->type == REDIS_STRING) {
                /* Emit a SET command */
                // 构造SET命令来保存string类型对象
                char cmd[]="*3\r\n$3\r\nSET\r\n";
                if (rioWrite(&aof,cmd,sizeof(cmd)-1) == 0) goto werr;
                /* Key and value */
                //  保存key值和value值
                if (rioWriteBulkObject(&aof,&key) == 0) goto werr;
                if (rioWriteBulkObject(&aof,o) == 0) goto werr;
            } 
            // 保存list类型对象
            else if (o->type == REDIS_LIST) {
                if (rewriteListObject(&aof,&key,o) == 0) goto werr;
            } 
            // 保存set类型对象
            else if (o->type == REDIS_SET) {
                if (rewriteSetObject(&aof,&key,o) == 0) goto werr;
            } 
            //  保存zset类型对象
            else if (o->type == REDIS_ZSET) {
                if (rewriteSortedSetObject(&aof,&key,o) == 0) goto werr;
            } 
            //  保存hash类型对象
            else if (o->type == REDIS_HASH) {
                if (rewriteHashObject(&aof,&key,o) == 0) goto werr;
            } else {
                redisPanic("Unknown object type");
            }
            // 使用PEXPIREAT命令保存该key的过期时间
            /* Save the expire time */
            if (expiretime != -1) {
                char cmd[]="*3\r\n$9\r\nPEXPIREAT\r\n";
                if (rioWrite(&aof,cmd,sizeof(cmd)-1) == 0) goto werr;
                if (rioWriteBulkObject(&aof,&key) == 0) goto werr;
                if (rioWriteBulkLongLong(&aof,expiretime) == 0) goto werr;
            }
            /* Read some diff from the parent process from time to time. */
            if (aof.processed_bytes > processed+1024*10) {
                processed = aof.processed_bytes;
                aofReadDiffFromParent();
            }
        }
        dictReleaseIterator(di);
        di = NULL;
    }

    /* Do an initial slow fsync here while the parent is still sending
     * data, in order to make the next final fsync faster. */
    if (fflush(fp) == EOF) goto werr;
    if (fsync(fileno(fp)) == -1) goto werr;

    /* Read again a few times to get more data from the parent.
     * We can't read forever (the server may receive data from clients
     * faster than it is able to send data to the child), so we try to read
     * some more data in a loop as soon as there is a good chance more data
     * will come. If it looks like we are wasting time, we abort (this
     * happens after 20 ms without new data). */
    int nodata = 0;
    mstime_t start = mstime();
    while(mstime()-start < 1000 && nodata < 20) {
        if (aeWait(server.aof_pipe_read_data_from_parent, AE_READABLE, 1) <= 0)
        {
            nodata++;
            continue;
        }
        nodata = 0; /* Start counting from zero, we stop on N *contiguous*
                       timeouts. */
        aofReadDiffFromParent();
    }

    /* Ask the master to stop sending diffs. */
    // 告诉父进程停止发送数据
    if (write(server.aof_pipe_write_ack_to_parent,"!",1) != 1) goto werr;
    if (anetNonBlock(NULL,server.aof_pipe_read_ack_from_parent) != ANET_OK)
        goto werr;
    /* We read the ACK from the server using a 10 seconds timeout. Normally
     * it should reply ASAP, but just in case we lose its reply, we are sure
     * the child will eventually get terminated. */
    if (syncRead(server.aof_pipe_read_ack_from_parent,&byte,1,5000) != 1 ||
        byte != '!') goto werr;
    redisLog(REDIS_NOTICE,"Parent agreed to stop sending diffs. Finalizing AOF...");

    /* Read the final diff if any. */
    // 读取差异化数据
    aofReadDiffFromParent();

    /* Write the received diff to the file. */
    // 将接收到的差异化数据写入AOF文件中
    redisLog(REDIS_NOTICE,
        "Concatenating %.2f MB of AOF diff received from parent.",
        (double) sdslen(server.aof_child_diff) / (1024*1024));
    if (rioWrite(&aof,server.aof_child_diff,sdslen(server.aof_child_diff)) == 0)
        goto werr;

    /* Make sure data will not remain on the OS's output buffers */
    // 确保系统缓冲区中的数据已经保存到文件中
    if (fflush(fp) == EOF) goto werr;
    if (fsync(fileno(fp)) == -1) goto werr;
    if (fclose(fp) == EOF) goto werr;

    /* Use RENAME to make sure the DB file is changed atomically only
     * if the generate DB file is ok. */
    // 文件重命令
    if (rename(tmpfile,filename) == -1) {
        redisLog(REDIS_WARNING,"Error moving temp append only file on the final destination: %s", strerror(errno));
        unlink(tmpfile);
        return REDIS_ERR;
    }
    redisLog(REDIS_NOTICE,"SYNC append only file rewrite performed");
    return REDIS_OK;

werr:
    redisLog(REDIS_WARNING,"Write error writing append only file on disk: %s", strerror(errno));
    fclose(fp);
    unlink(tmpfile);
    if (di) dictReleaseIterator(di);
    return REDIS_ERR;
}

rewriteAppendOnlyFile函数的实现中可以看出:为了最小化写入的命令数量,Redis会尽可能使用如RPUSH、SADD和ZADD等具有可变参数的命令。

为了避免缓冲区溢出,Redis在处理hash、list、set、zset等可能含有多个元素的对象时,如果这些类型对象中元素个数超过REDIS_AOF_REWRITE_ITEMS_PER_CMD(默认值为64),则使用多条命令保存。保证每条命令的元素个数不超过REDIS_AOF_REWRITE_ITEMS_PER_CMD。

下面我们来看看Redis如何重写list类型对象,其它(hash、set、zset)类型也类似。

int rewriteListObject(rio *r, robj *key, robj *o) {
    long long count = 0, items = listTypeLength(o);

    // 处理ziplist编码的list对象
    if (o->encoding == REDIS_ENCODING_ZIPLIST) {
        unsigned char *zl = o->ptr;
        unsigned char *p = ziplistIndex(zl,0);
        unsigned char *vstr;
        unsigned int vlen;
        long long vlong;

        // 在AOF文件中,每条RPUSH命令只能添加REDIS_AOF_REWRITE_ITEMS_PER_CMD个元素
        // 这里遍历ziplist,将每REDIS_AOF_REWRITE_ITEMS_PER_CMD个元素组装到一条RPUSH命令中去
        // 想想为什么要这么做?如果list对象中存在大量的元素,将它们放到一条RPUSH命令中会如何
        while(ziplistGet(p,&vstr,&vlen,&vlong)) {
            if (count == 0) {
                int cmd_items = (items > REDIS_AOF_REWRITE_ITEMS_PER_CMD) ?
                    REDIS_AOF_REWRITE_ITEMS_PER_CMD : items;

                if (rioWriteBulkCount(r,'*',2+cmd_items) == 0) return 0;
                if (rioWriteBulkString(r,"RPUSH",5) == 0) return 0;
                if (rioWriteBulkObject(r,key) == 0) return 0;
            }

            // 取出元素值并写入rio对象中
            if (vstr) {
                if (rioWriteBulkString(r,(char*)vstr,vlen) == 0) return 0;
            } else {
                if (rioWriteBulkLongLong(r,vlong) == 0) return 0;
            }
            // 移动迭代器,除以下一个元素
            p = ziplistNext(zl,p);
            // 取出元素个数加1,如果取出元素个数等于REDIS_AOF_REWRITE_ITEMS_PER_CMD规定的数量
            // 则剩余元素放到另一条RPUSH命令中
            if (++count == REDIS_AOF_REWRITE_ITEMS_PER_CMD) count = 0;
            items--;
        }
    } 
    // 处理linked list编码的list对象
    else if (o->encoding == REDIS_ENCODING_LINKEDLIST) {
        list *list = o->ptr;
        listNode *ln;
        listIter li;

        // 类似ziplist的处理方式,遍历linked list将每REDIS_AOF_REWRITE_ITEMS_PER_CMD个元素组装到一条RPUSH命令中
        listRewind(list,&li);
        while((ln = listNext(&li))) {
            robj *eleobj = listNodeValue(ln);

            if (count == 0) {
                int cmd_items = (items > REDIS_AOF_REWRITE_ITEMS_PER_CMD) ?
                    REDIS_AOF_REWRITE_ITEMS_PER_CMD : items;

                if (rioWriteBulkCount(r,'*',2+cmd_items) == 0) return 0;
                if (rioWriteBulkString(r,"RPUSH",5) == 0) return 0;
                if (rioWriteBulkObject(r,key) == 0) return 0;
            }
            if (rioWriteBulkObject(r,eleobj) == 0) return 0;
            if (++count == REDIS_AOF_REWRITE_ITEMS_PER_CMD) count = 0;
            items--;
        }
    } else {
        redisPanic("Unknown list encoding");
    }
    return 1;
}

3.2、AOF后台重写

上面介绍的rewriteAppendOnlyFile函数很好地完成了AOF重写的任务,但是又带来了另一个问题:阻塞。该函数中包含大量的写入操作会阻塞Redis主进程,导致在AOF重写期间Redis服务器无法对外服务。和RDB持久化机制类似,为了解决这个问题,Redis采用“创建子进程执行AOF重写”的方法,通过fork出一个子进程进行重写操作,而父进程继续接受命令,对外提供服务。Redis中把该过程称之为“AOF后台重写(AOF background rewrite)”

AOF后台重写解决了主进程阻塞问题的同时又带来了一个新问题:子进程执行AOF重写的同时父进程还继续处理命令,如果新的命令是写命令会造成服务器当前的状态和和子进程重写后的AOF文件还原后的状态不一致。为了解决这个问题,Redis又引入了AOF重写缓存,用来在后台子进程执行AOF重写时积攒所有修改数据库的操作。也就是说,当Redis在执行AOF后台重写任务时,父进程接受的写命令都会被额外添加到AOF重写缓存中。当子进程重写结束后,父进程收到子进程退出信号,把AOF重写缓存中的数据添加到重写后的AOF文件中。

下面我们逐一介绍上面提到的几个概念:

3.2.1、AOF重写缓存

AOF重写缓存定义在redisServer结构体中:

struct redisServer {
    ...
    // AOF重写缓存链表,链接了多个缓冲区
    list *aof_rewrite_buf_blocks;   /* Hold changes during an AOF rewrite. */
    ...
}

可以看到AOF重写缓存aof_rewrite_buf_blocks实际上是一个链表,链表中的每一个元素是一个缓存区,定义如下:

/*  定义每个缓冲区的大小为10M  */
#define AOF_RW_BUF_BLOCK_SIZE (1024*1024*10)    /* 10 MB per block */

/*  AOF重写缓存结构体 */
typedef struct aofrwblock {
    // 缓冲区中已经使用的字节数和可用字节数
    unsigned long used, free;
    // 缓冲区
    char buf[AOF_RW_BUF_BLOCK_SIZE];
} aofrwblock;

对于AOF重写缓存,Redis只需要append操作,但是我们无法分配一个非常大的空间(因为并不总是能成功分配一个非常大的空间),因此Redis使用多个大小为AOF_RW_BUF_BLOCK_SIZE字节的空间来实现缓存功能。

3.2.2、当Redis在执行AOF后台重写任务时,父进程接受的写命令都会被额外添加到AOF重写缓存中

这个过程在feedAppendOnlyFile函数中实现,前面已经介绍过feedAppendOnlyFile函数,这里不再赘述。

3.2.3、当子进程重写结束后,父进程收到子进程退出信号,把AOF重写缓存中的数据添加到重写后的AOF文件中

在子进程完成AOF重写过程后,父进程(也是Redis主进程)会在redis.h文件中serverCron函数中获得子进程的退出状态,然后调用backgroundRewriteDoneHandler函数处理。
backgroundRewriteDoneHandler函数负责将AOF重写缓存aof_rewrite_buf_blocks中的数据添加到AOF文件里。

我们前面介绍过,serverCron函数会周期性执行,我们只看看与AOF后台重写相关的代码:

int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
    ...
    /* Check if a background saving or AOF rewrite in progress terminated. */
    if (server.rdb_child_pid != -1 || server.aof_child_pid != -1) {
        int statloc;
        pid_t pid;

        if ((pid = wait3(&statloc,WNOHANG,NULL)) != 0) {
            int exitcode = WEXITSTATUS(statloc);
            int bysignal = 0;

            if (WIFSIGNALED(statloc)) bysignal = WTERMSIG(statloc);

            if (pid == -1) {
                ...
            } else if (pid == server.aof_child_pid) {
                // 调用backgroundRewriteDoneHandler函数!!
                backgroundRewriteDoneHandler(exitcode,bysignal);
            } 
            ...
        }
    } 
    ...
}

backgroundRewriteDoneHandler函数的实现如下。

void backgroundRewriteDoneHandler(int exitcode, int bysignal) {
    if (!bysignal && exitcode == 0) {
        int newfd, oldfd;
        char tmpfile[256];
        long long now = ustime();
        mstime_t latency;

        redisLog(REDIS_NOTICE,
            "Background AOF rewrite terminated with success");

        /* Flush the differences accumulated by the parent to the
         * rewritten AOF. */
        // 将父进程中记录在重写缓存中的数据追加到AOF文件中
        latencyStartMonitor(latency);
        snprintf(tmpfile,256,"temp-rewriteaof-bg-%d.aof",
            (int)server.aof_child_pid);
        // 打开临时文件
        newfd = open(tmpfile,O_WRONLY|O_APPEND);
        if (newfd == -1) {
            redisLog(REDIS_WARNING,
                "Unable to open the temporary AOF produced by the child: %s", strerror(errno));
            goto cleanup;
        }

        // 将重写缓存中的数据追加到AOF文件中
        if (aofRewriteBufferWrite(newfd) == -1) {
            redisLog(REDIS_WARNING,
                "Error trying to flush the parent diff to the rewritten AOF: %s", strerror(errno));
            close(newfd);
            goto cleanup;
        }
        latencyEndMonitor(latency);
        latencyAddSampleIfNeeded("aof-rewrite-diff-write",latency);

        redisLog(REDIS_NOTICE,
            "Residual parent diff successfully flushed to the rewritten AOF (%.2f MB)", (double) aofRewriteBufferSize() / (1024*1024));

        /* The only remaining thing to do is to rename the temporary file to
         * the configured file and switch the file descriptor used to do AOF
         * writes. We don't want close(2) or rename(2) calls to block the
         * server on old file deletion.
         *  剩下的事情就是将临时文件重命名为指定的名称,并切换该文件的文件描述符为AOF重写文件。
         *  我们不想让close(2)和rename(2)函数在删除旧文件时阻塞服务器。
         *
         * There are two possible scenarios:
         *  这里有两个可能的情景:
         *
         * 1) AOF is DISABLED and this was a one time rewrite. The temporary
         * file will be renamed to the configured file. When this file already
         * exists, it will be unlinked, which may block the server.
         *  如果AOF被关闭,且这是一次单词重写操作,临时文件会被命名为指定的文件名。如果AOF文件已经存在,
         *  则会被unlink掉,这个操作可能会阻塞服务器。
         *
         * 2) AOF is ENABLED and the rewritten AOF will immediately start
         * receiving writes. After the temporary file is renamed to the
         * configured file, the original AOF file descriptor will be closed.
         * Since this will be the last reference to that file, closing it
         * causes the underlying file to be unlinked, which may block the
         * server.
         *  如果AOF被开启,并且重写后的AOF文件会马上被用来接收写命令。当临时文件被重命名为指定的名称后,原来
         *  旧的文件描述符将会被关闭。因为Redis是最后一个引用该文件的进程,所以关闭这个文件会造成该文件被
         *  unlink,这也可能阻塞服务器
         *
         * To mitigate the blocking effect of the unlink operation (either
         * caused by rename(2) in scenario 1, or by close(2) in scenario 2), we
         * use a background thread to take care of this. First, we
         * make scenario 1 identical to scenario 2 by opening the target file
         * when it exists. The unlink operation after the rename(2) will then
         * be executed upon calling close(2) for its descriptor. Everything to
         * guarantee atomicity for this switch has already happened by then, so
         * we don't care what the outcome or duration of that close operation
         * is, as long as the file descriptor is released again. 
         *  为了避免unlink操作造成服务器阻塞,这里使用一个后台线程来执行close(2)操作。
         *  如果原来的文件存在,先打开原来文件这样就可以将场景1和场景2等同考虑。
         *  那么rename操作后,因为原来的文件是打开的,所以不会unlink。
         *  将unlink推迟到关闭原来文件的描述符时。
         *  最后,将close()操作放到异步IO线程执行
         */

        if (server.aof_fd == -1) {
            /* AOF disabled */
            // AOF关闭

             /* Don't care if this fails: oldfd will be -1 and we handle that.
              * One notable case of -1 return is if the old file does
              * not exist. */
             // 打开已存在的文件
             oldfd = open(server.aof_filename,O_RDONLY|O_NONBLOCK);
        } else {
            /* AOF enabled */
            // AOF开启
            oldfd = -1; /* We'll set this to the current AOF filedes later. */
        }

        /* Rename the temporary file. This will not unlink the target file if
         * it exists, because we reference it with "oldfd". */
        latencyStartMonitor(latency);
        // 对临时文件重命名。这是旧的AOF文件(如果存在)不会被unlink掉,因为oldfd引用它
        if (rename(tmpfile,server.aof_filename) == -1) {
            redisLog(REDIS_WARNING,
                "Error trying to rename the temporary AOF file: %s", strerror(errno));
            close(newfd);
            if (oldfd != -1) close(oldfd);
            goto cleanup;
        }
        latencyEndMonitor(latency);
        latencyAddSampleIfNeeded("aof-rename",latency);

        if (server.aof_fd == -1) {
            /* AOF disabled, we don't need to set the AOF file descriptor
             * to this new file, so we can close it. */
            // 如果AOF被关闭,则直接关闭AOF文件
            close(newfd);
        } else {
            /* AOF enabled, replace the old fd with the new one. */
            // 如果AOF被开启,用新的AOF文件的fd替代旧的AOF文件的fd
            oldfd = server.aof_fd;
            server.aof_fd = newfd;
            // 再次执行同步操作(前面讲AOF重写缓存中的数据追加到AOF文件中)
            if (server.aof_fsync == AOF_FSYNC_ALWAYS)
                aof_fsync(newfd);
            else if (server.aof_fsync == AOF_FSYNC_EVERYSEC)
                aof_background_fsync(newfd);

            // 强制引发SELECT
            server.aof_selected_db = -1; /* Make sure SELECT is re-issued */
            aofUpdateCurrentSize();
            server.aof_rewrite_base_size = server.aof_current_size;

            /* Clear regular AOF buffer since its contents was just written to
             * the new AOF from the background rewrite buffer. */
            // 清空AOF缓冲区,因为缓冲区中的内容已经写入到了AOF文件中了
            sdsfree(server.aof_buf);
            server.aof_buf = sdsempty();
        }

        server.aof_lastbgrewrite_status = REDIS_OK;

        redisLog(REDIS_NOTICE, "Background AOF rewrite finished successfully");
        /* Change state from WAIT_REWRITE to ON if needed */
        if (server.aof_state == REDIS_AOF_WAIT_REWRITE)
            server.aof_state = REDIS_AOF_ON;

        /* Asynchronously close the overwritten AOF. */
        // 异步关闭旧AOF文件
        if (oldfd != -1) bioCreateBackgroundJob(REDIS_BIO_CLOSE_FILE,(void*)(long)oldfd,NULL,NULL);

        redisLog(REDIS_VERBOSE,
            "Background AOF rewrite signal handler took %lldus", ustime()-now);
    } 
    // AOF重写出错
    else if (!bysignal && exitcode != 0) {
        server.aof_lastbgrewrite_status = REDIS_ERR;

        redisLog(REDIS_WARNING,
            "Background AOF rewrite terminated with error");
    } else {
        server.aof_lastbgrewrite_status = REDIS_ERR;

        redisLog(REDIS_WARNING,
            "Background AOF rewrite terminated by signal %d", bysignal);
    }

cleanup:
    // 释放匿名管道
    aofClosePipes();
    // 重置AOF重写缓存
    aofRewriteBufferReset();
    // 移除临时文件
    aofRemoveTempFile(server.aof_child_pid);
    // 重置相关状态
    server.aof_child_pid = -1;
    server.aof_rewrite_time_last = time(NULL)-server.aof_rewrite_time_start;
    server.aof_rewrite_time_start = -1;
    /* Schedule a new rewrite if we are waiting for it to switch the AOF ON. */
    if (server.aof_state == REDIS_AOF_WAIT_REWRITE)
        server.aof_rewrite_scheduled = 1;
}

最后,奉上注释版源码:
aof.c:https://github.com/xiejingfa/the-annotated-redis-3.0/blob/master/aof.c

各位读者,如果你觉得这篇文章还不错,赏个star呗,谢谢~

  • 6
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值