redis源码

最新推荐文章于 2024-08-22 10:22:39 发布

qq_41157212

最新推荐文章于 2024-08-22 10:22:39 发布

阅读量601

点赞数

文章标签： default

本文链接：https://blog.csdn.net/qq_41157212/article/details/120161788

版权

原文链接

链接

版本

使用6.2.4

sds.h sds.c

内存对齐

__attribute__((__packed__))可以让编译器对结构体不进行内存对齐，详细参考

#include <stdint.h>
#include <stdio.h>

struct __attribute__((__packed__)) sdshdr64 {
    uint64_t len;        /* used */
    uint64_t alloc;      /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

struct _sdshdr64 {
    uint64_t len;        /* used */
    uint64_t alloc;      /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

int main() {
    printf("packed: %d\n", sizeof(struct sdshdr64));
    printf("nopacked: %d\n", sizeof(struct _sdshdr64));
}
/*
gcc a.c -o a && ./a
packed: 17
nopacked: 24
*/

宏#\

##后标识的字符串会被替换，然后其左右的内容加上自己会被合并到一起，编译器将其视为标识符进行解析，详细参考

sds.h 源码

sds 可以被简单的认为是一个 char*

typedef char *sds;

接下来是5种 sds 他们是sdshdr5, sdshdr8, sdshdr16, sdshdr32, sdshdr64, 分别可以储存长度为$2^5$, $2^8$, $2^{16}$, $2^{32}$, $2^{64}$ 的字符串。

__attribute__ ((__packed__))是编译器指令，可以取消内存对齐，让内存紧凑排列，这部分首先看后四个结构体，他们的内存结构定义几乎一摸一样。

len: 字符串的长度
alloc：分配的空间大小
flags：字符串的类型（5种），所以只有最低的三位有意义，高5位不做使用。
buf：字符串的实际内容

对于sdshdr5,他比较特殊，实际上他的len和alloc一定相等，并储存于flags的高5位上，借此实现了内存压缩。

/* Note: sdshdr5 is never used, we just access the flags byte directly.
 * However is here to document the layout of type 5 SDS strings. */
struct __attribute__ ((__packed__)) sdshdr5 {
    unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr8 {
    uint8_t len; /* used */
    uint8_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
    uint16_t len; /* used */
    uint16_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
    uint32_t len; /* used */
    uint32_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
    uint64_t len; /* used */
    uint64_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

sds 把字符串的内容，以及他的元信息（字符串类型、字符串长度、字符串分配的空间）储存在了一起，让内存排列更加紧致。

adlist.c adlist.h

很普通的链表，并没有什么很特殊的地方，注意listIter的direction是迭代器的方向。

typedef struct listNode {
    struct listNode *prev;
    struct listNode *next;
    void *value;
} listNode;

typedef struct listIter {
    listNode *next;
    int direction;
} listIter;

typedef struct list {
    listNode *head;
    listNode *tail;
    void *(*dup)(void *ptr);
    void (*free)(void *ptr);
    int (*match)(void *ptr, void *key);
    unsigned long len;
} list;

mt19937-64.c mt19937-64.h

梅森素数

在OEIS上，梅森素数有这些, 维基百科上也有说明, 我们需要注意到的是$2^{19937}-1$是一个梅森素数

线性反馈移位寄存器

线性反馈移位寄存器（Linear Feedback Shifting Register，简称 LFSR）

假设你有一个寄存器，寄存器中储存着一些二进制位，寄存器中有几个位被标记了，接下来会有无限轮操作，每轮操作如下

寄存器输出最低位x（x=0或1）。
寄存器选择被标记的位和x，取出其值，放到一起进行异或，得到y（y=0或1）。
寄存器把自己右移1位，然后把值y放入最高位。

具体来说，你有一个$8$位寄存器，初始储存着$00001111$，其中$3$,$5$,$7$位被标记了，于是开始操作。

第一轮输出$x=1$，然后从低位到高位选择了$1$,$0$,$0$, 最后$y=1 \oplus1 \oplus 0 \oplus 0=0$，寄存器变成了$00000111$

第二轮输出$x=1$，然后从低位到高位选择了$1$,$0$,$0$, 最后$y=1 \oplus1 \oplus 0 \oplus 0=0$，寄存器变成了$00000011$

第三轮输出$x=1$，然后从低位到高位选择了$0$,$0$,$0$, 最后$y=1 \oplus 0 \oplus 0 \oplus 0=1$，寄存器变成了$10000001$

第四轮输出$x=1$，然后从低位到高位选择了$0$,$0$,$0$, 最后$y=1 \oplus 0 \oplus 0 \oplus 0=1$，寄存器变成了$11000000$

第五轮输出$x=0$，然后从低位到高位选择了$0$,$0$,$1$, 最后$y=0 \oplus 0 \oplus 0 \oplus 1=1$，寄存器变成了$11100000$

……

梅森旋转算法

这是一个随机数生成算法，这里有一篇有趣的Blog，有兴趣可以读一下。这里引用一些主要内容。

梅森旋转算法（Mersenne Twister Algorithm，简称 MT）

$32$ 位的梅森旋转算法能够产生周期为 $P$ 的 $w$-比特的随机数序列${\vec xi}$；其中 $w=32$。这也就是说，每一个$\vec x$ 是一个长度为 $32$ 的行向量，并且其中的每一个元素都是二元数域$\mathbb{F}2 \overset{\text{def}}{=} {0, 1}$中的元素。现在，我们定义如下一些记号，来描述梅森旋转算法是如何进行旋转（线性移位）的。

$n$：参与梅森旋转的随机数个数；
$r$：$[0, w)$ 之间的整数；
$m$：$(0, n]$之间的整数；
$\mathbf{A}$：$w \times w$ 的常矩阵；
$\vec x^{(u)}$：$\vec x$的最高 $w - r$ 比特组成的数（低位补零）；
$\vec x^{(l)}$：$\vec x$的最低 r 比特组成的数（高位补零）。

梅森旋转算法，首先需要根据随机数种子初始化$ n $个行向量： $$ \vec x0, \vec x1, \ldots, \vec x{n - 1}. $$ 而后根据下式，从$ k=0$ 开始依次计算 $\vec x{n}$： $$ \begin{equation}\vec x{k + n} \overset{\text{def}}{=} \vec x{k + m}\oplus \bigl(\vec x{k}^{(u)}\mid \vec x{k + 1}^{(l)}\bigr)\mathbf{A}.\label{eq:twister}\end{equation} $$

其中，$\vec x\mid \vec x'$表示两个二进制数按位或；$\vec x\oplus \vec x'$表示两个二进制数按位半加（不进位，也就是按位异或）；$\vec x\mathbf A$ 则表示按位半加的矩阵乘法。在 MT 中，$\mathbf A$ 被定义为 $$ \begin{pmatrix} & 1 \ & & 1 \ & & & \ddots \ & & & & 1 \ a{w - 1} & a{w - 2} & a{w - 3} & \cdots & a0 \end{pmatrix} $$

我们现在看看这个计算和旋转有什么关系。首先不考虑矩阵$\mathbf A$.

则有$\vec x{k + n} \overset{\text{def}}{=} \vec x{k + m}\oplus \bigl(\vec x{k}^{(u)}\mid \vec x{k + 1}^{(l)}\bigr)$, 这个式子笔者看了很久才明白他就是$w$轮线性反馈移位寄存器变换。下图是计算$xn$的时候的异或情况，可以看到$xn$的每一个位都是独立的异或

回过头来看 2 式，不难发现，这其实相当于一个 $nw - r$ 级的线性反馈移位寄存器（取 $\vec xk^{(u)}$的最高 $w−r$ 位与 $\vec x{k + 1}^{(l)}$的最低 $r $位进行迭代异或，再经过一个不影响周期的线性变换 $\mathbf A$）。只不过，2 式每一次运算，相当于 $LFSR$ 进行了 $w$ 轮计算。若 $w$ 与 $nw−r$ 互素，那么这一微小的改变是不会影响 $LFSR$ 的周期的。考虑到 $LFSR$ 的计算过程像是在「旋转」，这即是「梅森『旋转』」名字的来由。

mt19937源码

主要的计算都在这里

unsigned long long genrand64_int64(void)
{
    //...
    for (i=0;i<NN-MM;i++) {
        x = (mt[i]&UM)|(mt[i+1]&LM);
        mt[i] = mt[i+MM] ^ (x>>1) ^ mag01[(int)(x&1ULL)];
    }
    for (;i<NN-1;i++) {
        x = (mt[i]&UM)|(mt[i+1]&LM);
        mt[i] = mt[i+(MM-NN)] ^ (x>>1) ^ mag01[(int)(x&1ULL)];
    }
    //...
}

然后是63位生成

/* generates a random number on [0, 2^63-1]-interval */
long long genrand64_int63(void)
{
    return (long long)(genrand64_int64() >> 1);
}

实数的生成

/* generates a random number on [0,1]-real-interval */
double genrand64_real1(void)
{
    return (genrand64_int64() >> 11) * (1.0/9007199254740991.0);
}

/* generates a random number on [0,1)-real-interval */
double genrand64_real2(void)
{
    return (genrand64_int64() >> 11) * (1.0/9007199254740992.0);
}

/* generates a random number on (0,1)-real-interval */
double genrand64_real3(void)
{
    return ((genrand64_int64() >> 12) + 0.5) * (1.0/4503599627370496.0);
}

dict.c dict.h

字典源码

字典结构体定义，需要注意这里有两个dictht，即两个字典，这涉及到了一个重hash问题，redis使用了渐进式rehash算法，即把重hash分布到各个地方(插入、查询等)，使得重hash的复杂度降低为$O1$，

redis是单线程，绝对不能出现过于耗时的操作，否则影响redis延时

typedef struct dict {
    dictType *type;
    void *privdata;
    dictht ht[2];
    long rehashidx; /* rehashing not in progress if rehashidx == -1 */
    int16_t pauserehash; /* If >0 rehashing is paused (<0 indicates coding error) */
} dict;

server.h server.c-1-跳表

跳表定义在这里

/* ZSETs use a specialized version of Skiplists */
typedef struct zskiplistNode {
    sds ele;
    double score;
    struct zskiplistNode *backward;
    struct zskiplistLevel {
        struct zskiplistNode *forward;
        unsigned long span;
    } level[];
} zskiplistNode;

typedef struct zskiplist {
    struct zskiplistNode *header, *tail;
    unsigned long length;
    int level;
} zskiplist;

intset.c intset.h

整数集合，这里可以储存整数

typedef struct intset {
    uint32_t encoding;
    uint32_t length;
    int8_t contents[];
} intset;

intset *intsetNew(void);
intset *intsetAdd(intset *is, int64_t value, uint8_t *success);
intset *intsetRemove(intset *is, int64_t value, int *success);
uint8_t intsetFind(intset *is, int64_t value);
int64_t intsetRandom(intset *is);
uint8_t intsetGet(intset *is, uint32_t pos, int64_t *value);
uint32_t intsetLen(const intset *is);
size_t intsetBlobLen(intset *is);
int intsetValidateIntegrity(const unsigned char *is, size_t size, int deep);

encoding是编码方式，指的是contents中的数据如何储存，编码方式分为三种

/* Note that these encodings are ordered, so:
 * INTSET_ENC_INT16 < INTSET_ENC_INT32 < INTSET_ENC_INT64. */
#define INTSET_ENC_INT16 (sizeof(int16_t))
#define INTSET_ENC_INT32 (sizeof(int32_t))
#define INTSET_ENC_INT64 (sizeof(int64_t))

length是数字的个数

contents是内容，但是他不一定是8位的整数，取决于encoding的值。

整数集合升级

由于整数集合初始情况储存的是INTSETENCINT16，当你插入一个32位的数字以后，会出现溢出，这时候就需要进行升级，就直接开辟新的空间然后拷贝过去，复杂的$O(N)$

不支持降级

ziplist.c ziplist.h

压缩列表

server.h server.c-2-对象

redis对象都在这里统一起来

```c typedef struct redisObject { unsigned type:4; unsigned encoding:4; unsigned lru:LRUBITS; /* LRU time (relative to global lruclock) or * LFU data (least significant 8 bits frequency * and most significant 16 bits access time). */ int refcount; void *ptr; } robj;

# server.h-3-db

这次主要关注redisServer，这个结构体有460行，笔者省去了一些,可以砍刀redisDb是一个数组，dbnum记录他的数量，一般情况下，dbnum为6

c struct redisServer { // … redisDb db; // … int dbnum; / Total number of configured DBs */ // … };

然后是客户端这边, 注意到client,. 这里也有一个指针，当然他指向的就是当前使用的db，而不是数组。

c typedef struct client { // … redisDb db; / Pointer to currently SELECTed DB. */ // … } client;

看完服务器和客户端，然后看db

c /* Redis database representation. There are multiple databases identified

by integers from 0 (the default database) up to the max configured
database. The database number is the 'id' field in the structure. */ typedef struct redisDb { dict *dict; /* The keyspace for this DB */ dict *expires; /* Timeout of keys with a timeout set */ dict *blockingkeys; /* Keys with clients waiting for data (BLPOP)*/ dict *readykeys; /* Blocked keys that received a PUSH */ dict *watchedkeys; /* WATCHED keys for MULTI/EXEC CAS / int id; / Database ID / long long avgttl; / Average TTL, just for stats / unsigned long expirescursor; / Cursor of the active expire cycle. */ list *defraglater; /* List of key names to attempt to defrag one by one, gradually. */ } redisDb;

对于redisDb，笔者这里引用一下《Redis设计与实现》中的一个图，读者可以看的更加清晰

> ![](image-2021-06-30-16.09.00.000.png)



# rio.c rio.h

rio即redis io， 主要实现了redis中的io操作, rio是一个结构体，他就是`_rio`, 下面是源码。

c struct rio { /* Backend functions. * Since this functions do not tolerate short writes or reads the return * value is simplified to: zero on error, non zero on complete success. / sizet (read)(struct rio *, void *buf, sizet len); size_t (*write)(struct _rio *, const void buf, sizet len); offt (tell)(struct _rio ); int (flush)(struct _rio ); / The updatecksum method if not NULL is used to compute the checksum of * all the data that was read or written so far. The method should be * designed so that can be called with the current checksum, and the buf * and len fields pointing to the new block of data to add to the checksum * computation. / void (updatecksum)(struct rio *, const void *buf, sizet len);

/* The current checksum and flags (see RIO_FLAG_*) */
uint64_t cksum, flags;

/* number of bytes read or written */
size_t processed_bytes;

/* maximum single read or write chunk size */
size_t max_processing_chunk;

/* Backend-specific vars. */
union {
    /* In-memory buffer target. */
    struct {
        sds ptr;
        off_t pos;
    } buffer;
    /* Stdio file pointer target. */
    struct {
        FILE *fp;
        off_t buffered; /* Bytes written since last fsync. */
        off_t autosync; /* fsync after 'autosync' bytes written. */
    } file;
    /* Connection object (used to read from socket) */
    struct {
        connection *conn;   /* Connection */
        off_t pos;    /* pos in buf that was returned */
        sds buf;      /* buffered data */
        size_t read_limit;  /* don't allow to buffer/read more than that */
        size_t read_so_far; /* amount of data read from the rio (not buffered) */
    } conn;
    /* FD target (used to write to pipe). */
    struct {
        int fd;       /* File descriptor. */
        off_t pos;
        sds buf;
    } fd;
} io;

};

简单来说，他的这些字段，分别对应这些内容：

|         字段         |         内容         |
| :------------------: | :------------------: |
|         read         |  读数据，是函数指针  |
|        write         |  写数据，是函数指针  |
|         tell         |   tell，是函数指针   |
|        flush         |  flush，是函数指针   |
|     update_cksum     |  校验和，是函数指针  |
|        cksum         |      当前校验和      |
|        flags         |   是否发生读写错误   |
|   processed_bytes    |   已经处理的字节数   |
| max_processing_chunk | 单次最大处理的字节数 |
|          io          |    具体的读写目标    |

这里的函数指针主要作用是给后面的下面这些函数使用，这种编程方式有一点像面向对象中的抽象类。注意看，下面的`rioWrite`使用了对象`r`的`write`方法，实现了任意 长度`len`的写入。而对象`r`的`write`方法是不支持任意长度len的。`rioRead`也是同理了。

c static inline sizet rioWrite(rio *r, const void *buf, sizet len) { if (r->flags & RIOFLAGWRITEERROR) return 0; while (len) { sizet bytestowrite = (r->maxprocessingchunk && r->maxprocessingchunk < len) ? r->maxprocessingchunk : len; if (r->updatecksum) r->updatecksum(r,buf,bytestowrite); if (r->write(r,buf,bytestowrite) == 0) { r->flags |= RIOFLAGWRITEERROR; return 0; } buf = (char*)buf + bytestowrite; len -= bytestowrite; r->processedbytes += bytestowrite; } return 1; }

static inline sizet rioRead(rio *r, void *buf, sizet len) { if (r->flags & RIOFLAGREADERROR) return 0; while (len) { sizet bytestoread = (r->maxprocessingchunk && r->maxprocessingchunk < len) ? r->maxprocessingchunk : len; if (r->read(r,buf,bytestoread) == 0) { r->flags |= RIOFLAGREADERROR; return 0; } if (r->updatecksum) r->updatecksum(r,buf,bytestoread); buf = (char*)buf + bytestoread; len -= bytestoread; r->processedbytes += bytestoread; } return 1; }

static inline off_t rioTell(rio *r) { return r->tell(r); }

static inline int rioFlush(rio *r) { return r->flush(r); }

这里有一个有趣的函数

c /* Flushes any buffer to target device if applicable. Returns 1 on success

and 0 on failures. */ static int rioBufferFlush(rio *r) { UNUSED(r); return 1; /* Nothing to do, our write just appends to the buffer. */ }

其中的`UNUSED`来自于一个宏` #define UNUSED(V) ((void) V)`, 其作用是消除编译器的警告： 变量未使用。

最后是整个`bufferio`的源码, 定义了一些函数，这些函数只给rioBufferIO这个对象使用。这是一种单例模式。

c /* ------------------------- Buffer I/O implementation ----------------------- */

/* Returns 1 or 0 for success/failure. */ static sizet rioBufferWrite(rio *r, const void buf, sizet len) { r->io.buffer.ptr = sdscatlen(r->io.buffer.ptr,(char)buf,len); r->io.buffer.pos += len; return 1; }

/* Returns 1 or 0 for success/failure. */ static sizet rioBufferRead(rio *r, void buf, sizet len) { if (sdslen(r->io.buffer.ptr)-r->io.buffer.pos < len) return 0; / not enough buffer to return len bytes. */ memcpy(buf,r->io.buffer.ptr+r->io.buffer.pos,len); r->io.buffer.pos += len; return 1; }

/* Returns read/write position in buffer. */ static off_t rioBufferTell(rio *r) { return r->io.buffer.pos; }

/* Flushes any buffer to target device if applicable. Returns 1 on success

and 0 on failures. */ static int rioBufferFlush(rio *r) { UNUSED(r); return 1; /* Nothing to do, our write just appends to the buffer. */ }

static const rio rioBufferIO = { rioBufferRead, rioBufferWrite, rioBufferTell, rioBufferFlush, NULL, /* update_checksum / 0, / current checksum / 0, / flags / 0, / bytes read or written / 0, / read/write chunk size / { { NULL, 0 } } / union for io-specific vars */ };

void rioInitWithBuffer(rio *r, sds s) { *r = rioBufferIO; r->io.buffer.ptr = s; r->io.buffer.pos = 0; }

文件io和缓冲区io相差不大，注意关注文件io的写函数，这里涉及到一个[异步刷盘](https://blog.csdn.net/mengyafei43/article/details/38319783)的问题。

redis对多个操作系统做了兼容,在linux下`redis_fsync`就是`fsync`，文件读写也有自己的缓冲区，一旦开启了自动同步`io.file.autosync`，则每写入一定数量`io.file.buffered`的数据，就进行同步`fsync(fileno(fp))`。

c /* Returns 1 or 0 for success/failure. */ static sizet rioFileWrite(rio *r, const void *buf, sizet len) { size_t retval;

retval = fwrite(buf,len,1,r->io.file.fp);
r->io.file.buffered += len;

if (r->io.file.autosync &&
    r->io.file.buffered >= r->io.file.autosync)
{
    fflush(r->io.file.fp);
    if (redis_fsync(fileno(r->io.file.fp)) == -1) return 0;
    r->io.file.buffered = 0;
}
return retval;

}

接下来的两个io分别是connection io和 file descriptor io, 前者只实现了从socket中读取数据的接口，后者只实现了向fd中写数据的接口（`This target is used to write the RDB file to pipe, when the master just streams the data to the replicas without creating an RDB on-disk image (diskless replication option)`）。

# rdb.c rdb.h

## rdbSaveRio

直接看函数`rdbSaveRio`的实现，第一部分是一些准备工作，RDB的版本被储存到了字符串magic中

c int rdbSaveRio(rio *rdb, int *error, int rdbflags, rdbSaveInfo *rsi) { // … dictIterator *di = NULL; dictEntry *de; char magic[10]; uint64t cksum; sizet processed = 0; int j; long keycount = 0; long long infoupdatedtime = 0; char *pname = (rdbflags & RDBFLAGSAOF_PREAMBLE) ? "AOF rewrite" : "RDB";

if (server.rdb_checksum)
    rdb->update_cksum = rioGenericUpdateChecksum;
snprintf(magic,sizeof(magic),"REDIS%04d",RDB_VERSION);
// ...

}

第二部分`rdbWriteRaw`直接把magic版本数据写入rdb输出流，`rdbSaveInfoAuxFields`写入了一些kv对，分别是`redis-ver`,`redis-bits`,`ctime`和`used-mem`。

对于`rdbSaveModulesAux`，他是module.c和module.h中的内容，大概就是保存了一个modules字典。

c int rdbSaveInfoAuxFields(rio *rdb, int rdbflags, rdbSaveInfo *rsi) { // … if (rdbSaveAuxFieldStrStr(rdb,"redis-ver",REDISVERSION) == -1) return -1; if (rdbSaveAuxFieldStrInt(rdb,"redis-bits",redisbits) == -1) return -1; if (rdbSaveAuxFieldStrInt(rdb,"ctime",time(NULL)) == -1) return -1; if (rdbSaveAuxFieldStrInt(rdb,"used-mem",zmallocusedmemory()) == -1) return -1; // … return 1; }

int rdbSaveRio(rio *rdb, int *error, int rdbflags, rdbSaveInfo *rsi) { // … if (rdbWriteRaw(rdb,magic,9) == -1) goto werr; if (rdbSaveInfoAuxFields(rdb,rdbflags,rsi) == -1) goto werr; if (rdbSaveModulesAux(rdb, REDISMODULEAUXBEFORE_RDB) == -1) goto werr; // … }

第三部分开始处理数据库,其主体如下。依次写入了数据库的编号、数据库kv个数，数据库超时kv个数。

c int rdbSaveRio(rio *rdb, int *error, int rdbflags, rdbSaveInfo *rsi) { // … for (j = 0; j < server.dbnum; j++) { redisDb *db = server.db+j; dict *d = db->dict; if (dictSize(d) == 0) continue; di = dictGetSafeIterator(d);

    /* Write the SELECT DB opcode */
    if (rdbSaveType(rdb,RDB_OPCODE_SELECTDB) == -1) goto werr;
    if (rdbSaveLen(rdb,j) == -1) goto werr;

    /* Write the RESIZE DB opcode. */
    uint64_t db_size, expires_size;
    db_size = dictSize(db->dict);
    expires_size = dictSize(db->expires);
    if (rdbSaveType(rdb,RDB_OPCODE_RESIZEDB) == -1) goto werr;
    if (rdbSaveLen(rdb,db_size) == -1) goto werr;
    if (rdbSaveLen(rdb,expires_size) == -1) goto werr;

    /* Iterate this DB writing every entry */
    while((de = dictNext(di)) != NULL) {
        // ...
    }
}
// ...

}

第三部分的`while`循环中，对整个数据库的kv字典进行了迭代，依次写入了rio的流。

c /* Iterate this DB writing every entry */ while((de = dictNext(di)) != NULL) { sds keystr = dictGetKey(de); robj key, *o = dictGetVal(de); long long expire;

initStaticStringObject(key,keystr);
expire = getExpire(db,&key);
if (rdbSaveKeyValuePair(rdb,&key,o,expire) == -1) goto werr;

/* When this RDB is produced as part of an AOF rewrite, move
         * accumulated diff from parent to child while rewriting in
         * order to have a smaller final write. */
if (rdbflags & RDBFLAGS_AOF_PREAMBLE &&
    rdb->processed_bytes > processed+AOF_READ_DIFF_INTERVAL_BYTES)
{
    processed = rdb->processed_bytes;
    aofReadDiffFromParent();
}

/* Update child info every 1 second (approximately).
         * in order to avoid calling mstime() on each iteration, we will
         * check the diff every 1024 keys */
if ((key_count++ & 1023) == 0) {
    long long now = mstime();
    if (now - info_updated_time >= 1000) {
        sendChildInfo(CHILD_INFO_TYPE_CURRENT_INFO, key_count, pname);
        info_updated_time = now;
    }
}

}

最后一部分，写入了结束符和checksum

c /* If we are storing the replication information on disk, persist * the script cache as well: on successful PSYNC after a restart, we need * to be able to process any EVALSHA inside the replication backlog the * master will send us. */ if (rsi && dictSize(server.luascripts)) { di = dictGetIterator(server.luascripts); while((de = dictNext(di)) != NULL) { robj *body = dictGetVal(de); if (rdbSaveAuxField(rdb,"lua",3,body->ptr,sdslen(body->ptr)) == -1) goto werr; } dictReleaseIterator(di); di = NULL; /* So that we don't release it again on error. */ }

if (rdbSaveModulesAux(rdb, REDISMODULEAUXAFTER_RDB) == -1) goto werr;

/* EOF opcode */ if (rdbSaveType(rdb,RDBOPCODEEOF) == -1) goto werr;

/* CRC64 checksum. It will be zero if checksum computation is disabled, the * loading code skips the check in this case. */ cksum = rdb->cksum; memrev64ifbe(&cksum); if (rioWrite(rdb,&cksum,8) == 0) goto werr; return C_OK;

## rdbSave

首先rdbSave创建了一个名为`temp-pid.rdb`的文件，该文件将用于输出rdb的结果。

c int rdbSave(char *filename, rdbSaveInfo *rsi) { char tmpfile[256]; char cwd[MAXPATHLEN]; /* Current working dir path for error messages. */ FILE *fp = NULL; rio rdb; int error = 0;

snprintf(tmpfile,256,"temp-%d.rdb", (int) getpid());
fp = fopen(tmpfile,"w");
if (!fp) {
    char *cwdp = getcwd(cwd,MAXPATHLEN);
    serverLog(LL_WARNING,
        "Failed opening the RDB file %s (in server root dir %s) "
        "for saving: %s",
        filename,
        cwdp ? cwdp : "unknown",
        strerror(errno));
    return C_ERR;
}
// ...

}

然后使用该文件初始化rio流，并根据配置文件rio是否进行自动刷盘。

c int rdbSave(char *filename, rdbSaveInfo *rsi) { // … rioInitWithFile(&rdb,fp); startSaving(RDBFLAGS_NONE);

if (server.rdb_save_incremental_fsync)
    rioSetAutoSync(&rdb,REDIS_AUTOSYNC_BYTES);
// ...

}

接着执行`rdbSaveRio`，并刷盘

c int rdbSave(char *filename, rdbSaveInfo *rsi) { // … if (rdbSaveRio(&rdb,&error,RDBFLAGSNONE,rsi) == CERR) { errno = error; goto werr; }

/* Make sure data will not remain on the OS's output buffers */
if (fflush(fp)) goto werr;
if (fsync(fileno(fp))) goto werr;
if (fclose(fp)) { fp = NULL; goto werr; }
fp = NULL;
// ...

}

最后把这个rdb文件命名为`filename`，并结束rdb。

## rdbSaveBackground

fork出一个子进程，子进程执行rdb任务。

c int rdbSaveBackground(char *filename, rdbSaveInfo *rsi) { pid_t childpid;

if (hasActiveChildProcess()) return C_ERR;

server.dirty_before_bgsave = server.dirty;
server.lastbgsave_try = time(NULL);

if ((childpid = redisFork(CHILD_TYPE_RDB)) == 0) {
    int retval;

    /* Child */
    redisSetProcTitle("redis-rdb-bgsave");
    redisSetCpuAffinity(server.bgsave_cpulist);
    retval = rdbSave(filename,rsi);
    if (retval == C_OK) {
        sendChildCowInfo(CHILD_INFO_TYPE_RDB_COW_SIZE, "RDB");
    }
    exitFromChild((retval == C_OK) ? 0 : 1);
} else {
    /* Parent */
    if (childpid == -1) {
        server.lastbgsave_status = C_ERR;
        serverLog(LL_WARNING,"Can't save in background: fork: %s",
            strerror(errno));
        return C_ERR;
    }
    serverLog(LL_NOTICE,"Background saving started by pid %ld",(long) childpid);
    server.rdb_save_time_start = time(NULL);
    server.rdb_child_type = RDB_CHILD_TYPE_DISK;
    return C_OK;
}
return C_OK; /* unreached */

} ```