这里将要分享的是redis6.2版本的源码,直接从github上面拉取
redis的启动入口函数是main函数,在main函数中主要是数据的初始化、handler的初始化及注册、事件的创建、时间的处理初始化,main方法中主要的核心函数主要有三个initServer()、initServerlast()、aeMain(server.el)。这个版本的redis采取的响应器模型是main+thread+worker+多路复用器循环处理(accept+eventprocess),下图是整个的基本处理模型图。
1、启动初始化
1.1 在initServer()方法中主要的初始化工作
- 调用方法aeCreateEventLoop初始化事件循环处理器
- 创建数据库server.db= zmalloc
- 端口监听listenToPort
- 绑定时间调度器调度函数aeCreateTimeEvent(server.el, 1, serverCron, NULL, NULL)一些耗时的操作就是在这里serverCron,有个周期性的调度器,比如定时检查key的过期数据备份等,redis中的涉及到的很多耗时的处理一般都会fork一个子进程通过pipleline的方式与父进程通信,使得父进程中redis的读写不受影响
- createSocketAcceptHandler(&server.ipfd, acceptTcpHandler)、createSocketAcceptHandler(&server.tlsfd, acceptTLSHandler) 这里的handler里面在连接事件到达的时候注册了读的handler readQueryFromClient,写的handler writeToClient,今后redis的读写都是通过这个两个handler处理
- aeCreateFileEvent(server.el, server.module_blocked_pipe[0], AE_READABLE, moduleBlockedClientPipeReadable,NULL)
底层系统调用epoll_ctl(state->epfd,op,fd,&ee),这里的系统调用指的是linux系统下。
- 设置eventloop的beforeSleep
- 单线程
writeToClient(c,0)
writeToClient(client *c, int handler_installed)
- 多线程handleClientsWithPendingWritesUsingThreads ,这里需要根据配置执
实际上redis并不会根据配置就开启多线程,这里的多线程开启是有条件的,在networking.c中的handleClientsWithPendingWritesUsingThreads方法中有这么一个判断。
if (server.io_threads_num == 1 || stopThreadedIOIfNeeded()) {
return handleClientsWithPendingWrites();
}
/* Start threads if needed. */
if (!server.io_threads_active) startThreadedIO();
stopThreadedIOIfNeeded()这个方法在满足pending < (server.io_threads_num*2)当前等待写的线程数小于我们配置文件中的线程数的时候会stopThreadedIO()关闭线程,这里的线程操作也是调用系统函数pthread_mutex_lock,并且在关闭前会再次检查当前是否有在等待的读线程handleClientsWithPendingReadsUsingThreads()
handleClientsWithPendingWrites
- 注册handler方法sendReplyToClient
1.2 initServerlast()方法主要涉及到iothread的准备工作
- initThreadedIO(void)中创建线程,这里主要涉及到到的是多线程的读写,就是的redis多线程io
- 这里是线程创建的核心部分,并且在这里注册 ioThreadMian()函数
//这里是系统调用线程的创建
pthread_create(&tid,NULL,IOThreadMain,(void*)(long)i)
ioThreadMain的核心部分代码是一个while(1)的死循环,系统调用epoll_wait来获取到事件,通过循环的方式来处理事件,读事件的方法是 readQueryFromClient(c->conn),这里还有一个postponeClientRead 方法是把client添加到list的头部,这里是多线程的处理方式,为客户端的读事件分配做准备,然后交给多线程去处理,主要处理函数是processInputBuffer(c)。
void *IOThreadMain(void *myid) {
/* The ID is the thread number (from 0 to server.iothreads_num-1), and is
* used by the thread to just manipulate a single sub-array of clients. */
long id = (unsigned long)myid;
char thdname[16];
snprintf(thdname, sizeof(thdname), "io_thd_%ld", id);
redis_set_thread_title(thdname);
redisSetCpuAffinity(server.server_cpulist);
makeThreadKillable();
while(1) {
/* Wait for start */
for (int j = 0; j < 1000000; j++) {
if (getIOPendingCount(id) != 0) break;
}
/* Give the main thread a chance to stop this thread. */
if (getIOPendingCount(id) == 0) {
pthread_mutex_lock(&io_threads_mutex[id]);
pthread_mutex_unlock(&io_threads_mutex[id]);
continue;
}
serverAssert(getIOPendingCount(id) != 0);
/* Process: note that the main thread will never touch our list
* before we drop the pending count to 0. */
listIter li;
listNode *ln;
listRewind(io_threads_list[id],&li);
while((ln = listNext(&li))) {
client *c = listNodeValue(ln);
if (io_threads_op == IO_THREADS_OP_WRITE) {
writeToClient(c,0);
} else if (io_threads_op == IO_THREADS_OP_READ) {
readQueryFromClient(c->conn);
} else {
serverPanic("io_threads_op value is unknown");
}
}
listEmpty(io_threads_list[id]);
setIOPendingCount(id, 0);
}
}
1.3 aeMain(server.el)方法启动事件循环处理器
核心部分代码,这里就是事件模型的核心部分,这里aeProcessEvents会调用aeApiPoll方法,这里就是nio部分系统调用,大家熟悉的epoll了,这里调用epoll_wait。
while (!eventLoop->stop) {
//这里是事件处理的入口函数
aeProcessEvents(eventLoop, AE_ALL_EVENTS|
AE_CALL_BEFORE_SLEEP|
AE_CALL_AFTER_SLEEP);
}
下面就是具体的处理部分
int aeProcessEvents(aeEventLoop *eventLoop, int flags)
{
int processed = 0, numevents;
/* Nothing to do? return ASAP */
if (!(flags & AE_TIME_EVENTS) && !(flags & AE_FILE_EVENTS)) return 0;
/* Note that we want to call select() even if there are no
* file events to process as long as we want to process time
* events, in order to sleep until the next time event is ready
* to fire. */
if (eventLoop->maxfd != -1 ||
((flags & AE_TIME_EVENTS) && !(flags & AE_DONT_WAIT))) {
int j;
struct timeval tv, *tvp;
long msUntilTimer = -1;
if (flags & AE_TIME_EVENTS && !(flags & AE_DONT_WAIT))
msUntilTimer = msUntilEarliestTimer(eventLoop);
if (msUntilTimer >= 0) {
tv.tv_sec = msUntilTimer / 1000;
tv.tv_usec = (msUntilTimer % 1000) * 1000;
tvp = &tv;
} else {
/* If we have to check for events but need to return
* ASAP because of AE_DONT_WAIT we need to set the timeout
* to zero */
if (flags & AE_DONT_WAIT) {
tv.tv_sec = tv.tv_usec = 0;
tvp = &tv;
} else {
/* Otherwise we can block */
tvp = NULL; /* wait forever */
}
}
if (eventLoop->flags & AE_DONT_WAIT) {
tv.tv_sec = tv.tv_usec = 0;
tvp = &tv;
}
if (eventLoop->beforesleep != NULL && flags & AE_CALL_BEFORE_SLEEP)
eventLoop->beforesleep(eventLoop);
/* 系统调用epoll_wait函数,只有超时或者有事件发生的时候才会返回,
* 拿到事件后循环处理事件
*/
numevents = aeApiPoll(eventLoop, tvp);
/* After sleep callback. */
if (eventLoop->aftersleep != NULL && flags & AE_CALL_AFTER_SLEEP)
eventLoop->aftersleep(eventLoop);
for (j = 0; j < numevents; j++) {
aeFileEvent *fe = &eventLoop->events[eventLoop->fired[j].fd];
int mask = eventLoop->fired[j].mask;
int fd = eventLoop->fired[j].fd;
int fired = 0; /* 触发事件的 fd个数 */
/* Normally we execute the readable event first, and the writable
* event later. This is useful as sometimes we may be able
* to serve the reply of a query immediately after processing the
* query.
*
* However if AE_BARRIER is set in the mask, our application is
* asking us to do the reverse: never fire the writable event
* after the readable. In such a case, we invert the calls.
* This is useful when, for instance, we want to do things
* in the beforeSleep() hook, like fsyncing a file to disk,
* before replying to a client. */
int invert = fe->mask & AE_BARRIER;
/* Note the "fe->mask & mask & ..." code: maybe an already
* processed event removed an element that fired and we still
* didn't processed, so we check if the event is still valid.
*
* Fire the readable event if the call sequence is not
* inverted. */
if (!invert && fe->mask & mask & AE_READABLE) {
fe->rfileProc(eventLoop,fd,fe->clientData,mask);
fired++;
fe = &eventLoop->events[fd]; /* Refresh in case of resize. */
}
/* Fire the writable event. */
if (fe->mask & mask & AE_WRITABLE) {
if (!fired || fe->wfileProc != fe->rfileProc) {
fe->wfileProc(eventLoop,fd,fe->clientData,mask);
fired++;
}
}
/* If we have to invert the call, fire the readable event now
* after the writable one. */
if (invert) {
fe = &eventLoop->events[fd]; /* Refresh in case of resize. */
if ((fe->mask & mask & AE_READABLE) &&
(!fired || fe->wfileProc != fe->rfileProc))
{
fe->rfileProc(eventLoop,fd,fe->clientData,mask);
fired++;
}
}
processed++;
}
}
/* Check time events */
//这里就是redis处理key过期、生成rdb的触发部分,对于某些耗时的操作会fork子进程处理
if (flags & AE_TIME_EVENTS)
processed += processTimeEvents(eventLoop);
return processed; /* return the number of processed file/time events */
}
2.读写的主要处理流程
写流程主要过程是网络层准备好数据,redisserver解析并且找到命令函数,调用函数 XXXgenericCommand解析指令,执行命令回调函数回来第一件事是encoding优化,这里涉及到redisd存储的编码和数据结构quicklist、ziplist、skiplist、ht、intset等,然后存入db并且夹扎着rehash的推进,新数据写进hs[1],具体的数据格式可以自行翻阅源码。
2.1 源码部分
对应读写事件的处理在ae.c中的aeProcessEvents(aeEventLoop *eventLoop, int flags)中,核心部分是这个numevents = aeApiPoll(eventLoop, tvp),这里通过系统调用拿到事件,ae_poll.c中方法aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) 中的对系统函数的调用拿到已经触发的事件,根据在initServer()方法中注册的相应handler进行调用处理
epoll_wait(state->epfd,state->events,eventLoop->setsize,
tvp ? (tvp->tv_sec*1000 + (tvp->tv_usec + 999)/1000) : -1);
读事件的处理fe->rfileProc(eventLoop,fd,fe->clientData,mask);对应的handler是readQueryFromClient。
fe->wfileProc(eventLoop,fd,fe->clientData,mask);
fired++;