除了RDB持久化功能之外,Redis还提供了AOF(AppendOnly File)持久化功能。与RDB持久化通过保存数据库中的键值对来记录数据库状态不同,AOF持久化是通过保存Redis服务器所执行的写命令来记录数据库状态的。与RDB持久化相比,AOF持久化可能丢失的数据更少,但是AOF持久化可能会降低Redis的性能。
写人AOF文件的所有命令都是以Redis的统一请求协议格式保存的。
在表示Redis服务器的结构体redisServer中,有关AOF的成员如下:
struct redisServer {
...
/* AOF persistence */
int aof_state; /* REDIS_AOF_(ON|OFF|WAIT_REWRITE) */
int aof_fsync; /* Kind of fsync() policy */
char *aof_filename; /* Name of the AOF file */
...
pid_t aof_child_pid; /* PID if rewriting process */
list *aof_rewrite_buf_blocks; /* Hold changes during an AOF rewrite. */
sds aof_buf; /* AOF buffer, written before entering the event loop */
int aof_fd; /* File descriptor of currently selected AOF file */
...
/* AOF pipes used to communicate between parent and child during rewrite. */
int aof_pipe_write_data_to_child;
int aof_pipe_read_data_from_parent;
int aof_pipe_write_ack_to_parent;
int aof_pipe_read_ack_from_child;
int aof_pipe_write_ack_to_child;
int aof_pipe_read_ack_from_parent;
int aof_stop_sending_diff; /* If true stop sending accumulated diffs
to child process. */
sds aof_child_diff; /* AOF diff accumulator child side. */
...
};
一:AOF持久化
AOF持久化功能的实现可以分为命令追加、文件写人、文件同步(sync)三个步骤。
1:命令追加
开启了AOF快照功能后,当Redis服务器收到客户端命令时,会调用函数feedAppendOnlyFile。该函数按照统一请求协议对命令进行编码,将编码后的内容追加到AOF缓存server.aof_buf中。feedAppendOnlyFile代码如下:
void feedAppendOnlyFile(struct redisCommand *cmd, int dictid, robj **argv, int argc) {
sds buf = sdsempty();
robj *tmpargv[3];
/* The DB this command was targeting is not the same as the last command
* we appended. To issue a SELECT command is needed. */
if (dictid != server.aof_selected_db) {
char seldb[64];
snprintf(seldb,sizeof(seldb),"%d",dictid);
buf = sdscatprintf(buf,"*2\r\n$6\r\nSELECT\r\n$%lu\r\n%s\r\n",
(unsigned long)strlen(seldb),seldb);
server.aof_selected_db = dictid;
}
if (cmd->proc == expireCommand || cmd->proc == pexpireCommand ||
cmd->proc == expireatCommand) {
/* Translate EXPIRE/PEXPIRE/EXPIREAT into PEXPIREAT */
buf = catAppendOnlyExpireAtCommand(buf,cmd,argv[1],argv[2]);
} else if (cmd->proc == setexCommand || cmd->proc == psetexCommand) {
/* Translate SETEX/PSETEX to SET and PEXPIREAT */
tmpargv[0] = createStringObject("SET",3);
tmpargv[1] = argv[1];
tmpargv[2] = argv[3];
buf = catAppendOnlyGenericCommand(buf,3,tmpargv);
decrRefCount(tmpargv[0]);
buf = catAppendOnlyExpireAtCommand(buf,cmd,argv[1],argv[2]);
} else {
/* All the other commands don't need translation or need the
* same translation already operated in the command vector
* for the replication itself. */
buf = catAppendOnlyGenericCommand(buf,argc,argv);
}
/* Append to the AOF buffer. This will be flushed on disk just before
* of re-entering the event loop, so before the client will get a
* positive reply about the operation performed. */
if (server.aof_state == REDIS_AOF_ON)
server.aof_buf = sdscatlen(server.aof_buf,buf,sdslen(buf));
/* If a background append only file rewriting is in progress we want to
* accumulate the differences between the child DB and the current one
* in a buffer, so that when the child process will do its work we
* can append the differences to the new append only file. */
if (server.aof_child_pid != -1)
aofRewriteBufferAppend((unsigned char*)buf,sdslen(buf));
sdsfree(buf);
}
该函数中,首先判断本次命令的数据库索引dictid,是否与上次命令的数据库索引server.aof_selected_db相同,如果不同,则编码select命令;
如果命令为EXPIRE、PEXPIRE或者EXPIREAT,则调用catAppendOnlyExpireAtCommand将命令编码为PEXPIREAT命令的格式;
如果命令为setex或psetex,则先调用catAppendOnlyGenericCommand编码SET命令,然后调用catAppendOnlyExpireAtCommand编码PEXPIREAT命令;
其他命令直接用catAppendOnlyGenericCommand对命令进行编码;
如果server.aof_state为REDIS_AOF_ON,则说明开启了AOF功能,将编码后的buf追加到AOF缓存server.aof_buf中;
另外,如果server.aof_child_pid不是-1,说明有子进程在进行AOF重写,则调用aofRewriteBufferAppend将编码后的buf追加到AOF重写缓存server.aof_rewrite_buf_blocks中。
2:文件写人、文件同步
为了提高文件的写入效率,在现代操作系统中,当用户调用write函数将数据写入到文件描述符后,操作系统通常会将写入数据暂时保存在一个内存缓冲区里面,等到缓冲区的空间被填满、或者超过了指定的时限之后,操作系统才真正地将缓冲区中的数据写入到磁盘里面。
这种做法虽然提高了效率,但也为写入数据带来了安全问题,如果计算机发生宕机,那么保存在内存缓冲区里面的写入数据将会丢失。
为此,操作系统提供了fsync同步函数,可以手动让操作系统立即将缓冲区中的数据写入到硬盘里面,从而确保写入数据的安全性。
Redis服务器的主循环中,每隔一段时间就会将AOF缓存server.aof_buf中的内容写入到AOF文件中。并且根据同步策略的不同,而选择不同的时机进行fsync。同步策略通过配置文件中的appendfsync选项设置,总共有三种同步策略,分别是:
a:appendfsync no
不执行fsync操作,完全交由操作系统进行同步。这种方式是最快的,但也是最不安全的。
b:appendfsync always
每次调用write将AOF缓存server.aof_buf中的内容写入到AOF文件时,立即调用fsync函数。这种方式是最安全的,却也是最慢的。
c:appendfsync everysec
每隔1秒钟进行一次fsync操作,这是一种对速度和安全性进行折中的方法。如果用户没有设置appendfsync选项的值,则使用everysec作为选项默认值。
将AOF缓存server.aof_buf中的内容写入到AOF文件中。并且根据同步策略的不同,而选择不同的时机进行fsync。这都是在函数flushAppendOnlyFile中实现的,其代码如下:
void flushAppendOnlyFile(int force) {
ssize_t nwritten;
int sync_in_progress = 0;
mstime_t latency;
if (sdslen(server.aof_buf) == 0) return;
if (server.aof_fsync == AOF_FSYNC_EVERYSEC)
sync_in_progress = bioPendingJobsOfType(REDIS_BIO_AOF_FSYNC) != 0;
if (server.aof_fsync == AOF_FSYNC_EVERYSEC && !force) {
/* With this append fsync policy we do background fsyncing.
* If the fsync is still in progress we can try to delay
* the write for a couple of seconds. */
if (sync_in_progress) {
if (server.aof_flush_postponed_start == 0) {
/* No previous write postponing, remember that we are
* postponing the flush and return. */
server.aof_flush_postponed_start = server.unixtime;
return;
} else if (server.unixtime - server.aof_flush_postponed_start < 2) {
/* We were already waiting for fsync to finish, but for less
* than two seconds this is still ok. Postpone again. */
return;
}
/* Otherwise fall trough, and go write since we can't wait
* over two seconds. */
server.aof_delayed_fsync++;
redisLog(REDIS_NOTICE,"Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.");
}
}
/* We want to perform a single write. This should be guaranteed atomic
* at least if the filesystem we are writing is a real physical one.
* While this will save us against the server being killed I don't think
* there is much to do about the whole server stopping for power problems
* or alike */
latencyStartMonitor(latency);
nwritten = write(server.aof_fd,server.aof_buf,sdslen(server.aof_buf));
latencyEndMonitor(latency);
/* We want to capture different events for delayed writes:
* when the delay happens with a pending fsync, or with a saving child
* active, and when the above two conditions are missing.
* We also use an additional event name to save all sampl