redis bind多个ip_Redis 网络通信模块源码分析（一）

friendcallmetwodog

于 2021-01-20 01:27:58 发布

阅读量359

点赞数

文章标签： redis bind多个ip

本文链接：https://blog.csdn.net/weixin_30916255/article/details/113538448

版权

Redis 网络通信模块源码分析

gdb不会用的先学下gdb

redis源码安装和编译

wget http://download.redis.io/releases/redis-2.8.17.tar.gz   //可以下载新的
tar zxvf redis-2.8.17.tar.gz
cd redis-2.8.17
make -j 4

编译成功后，会在 src 目录下生成多个可执行程序，其中 redis-server 和 redis-cli 是我们即将调试的程序。

进入 src 目录，使用 GDB 启动 redis-server 这个程序：

以上是 redis-server 启动成功后的画面。

我们再开一个 session，再次进入 Redis 源码所在的 src 目录，然后使用 GDB 启动 Redis 客户端 redis-cli：

上是 redis-cli 启动成功后的画面。

通信示例

侦听 socket

我们知道网络通信在应用层上的大致流程如下：

服务器端创建侦听 socket；
将侦听 socket 绑定到需要的 IP 地址和端口上（调用 Socket API bind 函数）；
启动侦听（调用 socket API listen 函数）；
无限等待客户端连接到来，调用 Socket API accept 函数接受客户端连接，并产生一个与该客户端对应的客户端 socket；
处理客户端 socket 上网络数据的收发，必要时关闭该 socket。

根据上面的流程，先来探究前三步的流程。由于 redis-server 默认对客户端的端口号是 6379，可以使用这个信息作为依据。

然后全局搜索一下 Redis 的代码，寻找调用了 bind() 函数的代码：

static int anetListen(char *err, int s, struct sockaddr *sa, socklen_t len, int backlog) {
    if (bind(s,sa,len) == -1) {
        anetSetError(err, "bind: %s", strerror(errno));
        close(s);
        return ANET_ERR;
    }

    if (listen(s, backlog) == -1) {
        anetSetError(err, "listen: %s", strerror(errno));
        close(s);
        return ANET_ERR;
    }
    return ANET_OK;
}

gdb在这个函数上加个断点。

中断的时候查看一下函数调用栈

通过这个堆栈，结合堆栈 #2 的 6379 端口号可以确认这就是我们要找的逻辑，并且这个逻辑在主线程（因为从堆栈上看，最顶层堆栈是 main() 函数）中进行。

我们看下堆栈 #1 处的代码：

static int _anetTcpServer(char *err, int port, char *bindaddr, int af, int backlog)
{
    int s = -1, rv;
    char _port[6];  /* strlen("65535") */
    struct addrinfo hints, *servinfo, *p;

    snprintf(_port,6,"%d",port);
    memset(&hints,0,sizeof(hints));
    hints.ai_family = af;
    hints.ai_socktype = SOCK_STREAM;
    hints.ai_flags = AI_PASSIVE;    /* No effect if bindaddr != NULL */

    if ((rv = getaddrinfo(bindaddr,_port,&hints,&servinfo)) != 0) {
        anetSetError(err, "%s", gai_strerror(rv));
        return ANET_ERR;
    }
    for (p = servinfo; p != NULL; p = p->ai_next) {
        if ((s = socket(p->ai_family,p->ai_socktype,p->ai_protocol)) == -1)
            continue;

        if (af == AF_INET6 && anetV6Only(err,s) == ANET_ERR) goto error;
        if (anetSetReuseAddr(err,s) == ANET_ERR) goto error;
        if (anetListen(err,s,p->ai_addr,p->ai_addrlen,backlog) == ANET_ERR) goto error;
        goto end;
    }
    if (p == NULL) {
        anetSetError(err, "unable to bind socket, errno: %d", errno);
        goto error;
    }

error:
    if (s != -1) close(s);
    s = ANET_ERR;
end:
    freeaddrinfo(servinfo);
    return s;
}

使用系统 API getaddrinfo 来解析得到当前主机的 IP 地址和端口信息。这里没有选择使用 gethostbyname 这个 API 是因为 gethostbyname 仅能用于解析 ipv4 相关的主机信息，而 getaddrinfo 既可以用于 ipv4 也可以用于 ipv6 ，这个函数的签名如下：

int getaddrinfo(const char *node, const char *service,
                       const struct addrinfo *hints,
                       struct addrinfo **res);

这个函数的具体用法可以在 Linux man 手册上查看。通常服务器端在调用 getaddrinfo 之前，将 hints 参数的 ai_flags 设置为 AI_PASSIVE，用于 bind；主机名 nodename 通常会设置为 NULL，返回通配地址 [::]。当然，客户端调用 getaddrinfo 时，hints 参数的 ai_flags 一般不设置 AI_PASSIVE，但是主机名 node 和服务名 service（更愿意称之为端口）则应该不为空。
解析完协议信息后，利用得到的协议信息创建侦听 socket，并开启该 socket 的 reuseAddr 选项。然后调用 anetListen 函数，在该函数中先 bind 后 listen。至此，redis-server 就可以在 6379 端口上接受客户端连接了。

接受客户端连接

同样的道理，要研究 redis-server 如何接受客户端连接，只要搜索 socket API accept 函数即可。

经定位，我们最终在 anet.c 文件中找到 anetGenericAccept 函数：

static int anetGenericAccept(char *err, int s, struct sockaddr *sa, socklen_t *len) {
    int fd;
    while(1) {
        fd = accept(s,sa,len);
        if (fd == -1) {
            if (errno == EINTR)
                continue;
            else {
                anetSetError(err, "accept: %s", strerror(errno));
                return ANET_ERR;
            }
        }
        break;
    }
    return fd;
}

我们用 b 命令在这个函数处加个断点，然后重新运行 redis-server。一直到程序全部运行起来，GDB 都没有触发该断点，这时新打开一个 redis-cli，以模拟新客户端连接到 redis-server 上的行为。断点触发了，此时查看一下调用堆栈。

Breakpoint 2, anetGenericAccept (err=0x745bb0 <server+560> "", s=s@entry=11, sa=sa@entry=0x7fffffffe2b0, len=len@entry=0x7fffffffe2ac) at anet.c:531
531     static int anetGenericAccept(char *err, int s, struct sockaddr *sa, socklen_t *len) {
(gdb) bt
#0  anetGenericAccept (err=0x745bb0 <server+560> "", s=s@entry=11, sa=sa@entry=0x7fffffffe2b0, len=len@entry=0x7fffffffe2ac) at anet.c:531
#1  0x0000000000427a1d in anetTcpAccept (err=<optimized out>, s=s@entry=11, ip=ip@entry=0x7fffffffe370 "317P237[", ip_len=ip_len@entry=46, 
    port=port@entry=0x7fffffffe36c) at anet.c:552
#2  0x0000000000437fb1 in acceptTcpHandler (el=<optimized out>, fd=11, privdata=<optimized out>, mask=<optimized out>) at networking.c:689
#3  0x00000000004267f0 in aeProcessEvents (eventLoop=eventLoop@entry=0x7ffff083a0a0, flags=flags@entry=11) at ae.c:440
#4  0x0000000000426adb in aeMain (eventLoop=0x7ffff083a0a0) at ae.c:498
#5  0x00000000004238ef in main (argc=<optimized out>, argv=0x7fffffffe588) at server.c:3894

分析这个调用堆栈，梳理一下这个调用流程。在 main 函数的 initServer 函数中创建侦听 socket、绑定地址然后开启侦听，接着调用 aeMain 函数启动一个循环不断地处理“事件”。

void aeMain(aeEventLoop *eventLoop) {
    eventLoop->stop = 0;
    while (!eventLoop->stop) {
        if (eventLoop->beforesleep != NULL)
            eventLoop->beforesleep(eventLoop);
        aeProcessEvents(eventLoop, AE_ALL_EVENTS|AE_CALL_AFTER_SLEEP);
    }
}

循环的退出条件是 eventLoop→stop 为 1。事件处理的代码如下：

int aeProcessEvents(aeEventLoop *eventLoop, int flags)
{
    int processed = 0, numevents;

    /* Nothing to do? return ASAP */
    if (!(flags & AE_TIME_EVENTS) && !(flags & AE_FILE_EVENTS)) return 0;

    /* Note that we want call select() even if there are no
     * file events to process as long as we want to process time
     * events, in order to sleep until the next time event is ready
     * to fire. */
    if (eventLoop->maxfd != -1 ||
        ((flags & AE_TIME_EVENTS) && !(flags & AE_DONT_WAIT))) {
        int j;
        aeTimeEvent *shortest = NULL;
        struct timeval tv, *tvp;

        if (flags & AE_TIME_EVENTS && !(flags & AE_DONT_WAIT))
            shortest = aeSearchNearestTimer(eventLoop);
        if (shortest) {
            long now_sec, now_ms;

            aeGetTime(&now_sec, &now_ms);
            tvp = &tv;

            /* How many milliseconds we need to wait for the next
             * time event to fire? */
            long long ms =
                (shortest->when_sec - now_sec)*1000 +
                shortest->when_ms - now_ms;

            if (ms > 0) {
                tvp->tv_sec = ms/1000;
                tvp->tv_usec = (ms % 1000)*1000;
            } else {
                tvp->tv_sec = 0;
                tvp->tv_usec = 0;
            }
        } else {
            /* If we have to check for events but need to return
             * ASAP because of AE_DONT_WAIT we need to set the timeout
             * to zero */
            if (flags & AE_DONT_WAIT) {
                tv.tv_sec = tv.tv_usec = 0;
                tvp = &tv;
            } else {
                /* Otherwise we can block */
                tvp = NULL; /* wait forever */
            }
        }

        /* Call the multiplexing API, will return only on timeout or when
         * some event fires. */
        numevents = aeApiPoll(eventLoop, tvp);

        /* After sleep callback. */
        if (eventLoop->aftersleep != NULL && flags & AE_CALL_AFTER_SLEEP)
            eventLoop->aftersleep(eventLoop);

        for (j = 0; j < numevents; j++) {
            aeFileEvent *fe = &eventLoop->events[eventLoop->fired[j].fd];
            int mask = eventLoop->fired[j].mask;
            int fd = eventLoop->fired[j].fd;
            int rfired = 0;

        /* note the fe->mask & mask & ... code: maybe an already processed
             * event removed an element that fired and we still didn't
             * processed, so we check if the event is still valid. */
            if (fe->mask & mask & AE_READABLE) {
                rfired = 1;
                fe->rfileProc(eventLoop,fd,fe->clientData,mask);
            }
            if (fe->mask & mask & AE_WRITABLE) {
                if (!rfired || fe->wfileProc != fe->rfileProc)
                    fe->wfileProc(eventLoop,fd,fe->clientData,mask);
            }
            processed++;
        }
    }
    /* Check time events */
    if (flags & AE_TIME_EVENTS)
        processed += processTimeEvents(eventLoop);

    return processed; /* return the number of processed file/time events */
}

这段代码先通过 flag 参数检查是否有事件需要处理。如果有定时器事件（ AE_TIME_EVENTS 标志），则寻找最近要到期的定时器。

/* Search the first timer to fire.
 * This operation is useful to know how many time the select can be
 * put in sleep without to delay any event.
 * If there are no timers NULL is returned.
 *
 * Note that's O(N) since time events are unsorted.
 * Possible optimizations (not needed by Redis so far, but...):
 * 1) Insert the event in order, so that the nearest is just the head.
 *    Much better but still insertion or deletion of timers is O(N).
 * 2) Use a skiplist to have this operation as O(1) and insertion as O(log(N)).
 */
static aeTimeEvent *aeSearchNearestTimer(aeEventLoop *eventLoop)
{
    aeTimeEvent *te = eventLoop->timeEventHead;
    aeTimeEvent *nearest = NULL;

    while(te) {
        if (!nearest || te->when_sec < nearest->when_sec ||
                (te->when_sec == nearest->when_sec &&
                 te->when_ms < nearest->when_ms))
            nearest = te;
        te = te->next;
    }
    return nearest;
}

这段代码有详细的注释，也非常好理解。注释告诉我们，由于这里的定时器集合是无序的，所以需要遍历一下这个链表，算法复杂度是 O(n) 。同时，注释中也“暗示”了我们将来 Redis 在这块的优化方向，即把这个链表按到期时间从小到大排序，这样链表的头部就是我们要的最近时间点的定时器对象，算法复杂度是 O(1) 。或者使用 Redis 中的 skiplist ，算法复杂度是 O(log(N)) 。

接着获取当前系统时间（ aeGetTime(&now_sec, &now_ms); ）将最早要到期的定时器时间减去当前系统时间获得一个间隔。这个时间间隔作为 numevents = aeApiPoll(eventLoop, tvp); 调用的参数，aeApiPoll() 在 Linux 平台上使用 epoll 技术，Redis 在这个 IO 复用技术上、在不同的操作系统平台上使用不同的系统函数，在 Windows 系统上使用 select，在 Mac 系统上使用 kqueue。这里重点看下 Linux 平台下的实现：

static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) {
    aeApiState *state = eventLoop->apidata;
    int retval, numevents = 0;

    retval = epoll_wait(state->epfd,state->events,eventLoop->setsize,
            tvp ? (tvp->tv_sec*1000 + tvp->tv_usec/1000) : -1);
    if (retval > 0) {
        int j;

        numevents = retval;
        for (j = 0; j < numevents; j++) {
            int mask = 0;
            struct epoll_event *e = state->events+j;

            if (e->events & EPOLLIN) mask |= AE_READABLE;
            if (e->events & EPOLLOUT) mask |= AE_WRITABLE;
            if (e->events & EPOLLERR) mask |= AE_WRITABLE;
            if (e->events & EPOLLHUP) mask |= AE_WRITABLE;
            eventLoop->fired[j].fd = e->data.fd;
            eventLoop->fired[j].mask = mask;
        }
    }
    return numevents;
}

epoll_wait 这个函数的签名如下：

int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);

最后一个参数 timeout 的设置非常有讲究，如果传入进来的 tvp 是 NULL ，根据上文的分析，说明没有定时器事件，则将等待时间设置为 -1 ，这会让 epoll_wait 无限期地挂起来，直到有事件时才会被唤醒。挂起的好处就是不浪费 CPU 时间片。反之，将 timeout 设置成最近的定时器事件间隔，将 epoll_wait 的等待时间设置为最近的定时器事件来临的时间间隔，可以及时唤醒 epoll_wait ，这样程序流可以尽快处理这个到期的定时器事件（下文会介绍）。

对于 epoll_wait 这种系统调用，所有的 fd（对于网络通信，也叫 socket）信息包括侦听 fd 和普通客户端 fd 都记录在事件循环对象 aeEventLoop 的 apidata 字段中，当某个 fd 上有事件触发时，从 apidata 中找到该 fd，并把事件类型（mask 字段）一起记录到 aeEventLoop 的 fired 字段中去。我们先把这个流程介绍完，再介绍 epoll_wait 函数中使用的 epfd 是在何时何地创建的，侦听 fd、客户端 fd 是如何挂载到 epfd 上去的。

在得到了有事件的 fd 以后，接下来就要处理这些事件了。在主循环 aeProcessEvents 中从 aeEventLoop 对象的 fired 数组中取出上一步记录的 fd，然后根据事件类型（读事件和写事件）分别进行处理。

for (j = 0; j < numevents; j++) {
            aeFileEvent *fe = &eventLoop->events[eventLoop->fired[j].fd];
            int mask = eventLoop->fired[j].mask;
            int fd = eventLoop->fired[j].fd;
            int rfired = 0;

        /* note the fe->mask & mask & ... code: maybe an already processed
             * event removed an element that fired and we still didn't
             * processed, so we check if the event is still valid. */
            if (fe->mask & mask & AE_READABLE) {
                rfired = 1;
                fe->rfileProc(eventLoop,fd,fe->clientData,mask);
            }
            if (fe->mask & mask & AE_WRITABLE) {
                if (!rfired || fe->wfileProc != fe->rfileProc)
                    fe->wfileProc(eventLoop,fd,fe->clientData,mask);
            }
            processed++;
        }

读事件字段 rfileProc 和写事件字段 wfileProc 都是函数指针，在程序早期设置好，这里直接调用就可以了。

typedef void aeFileProc(struct aeEventLoop *eventLoop, int fd, void *clientData, int mask);

/* File event structure */
typedef struct aeFileEvent 
    int mask; /* one of AE_(READABLE|WRITABLE) */
    aeFileProc *rfileProc;
    aeFileProc *wfileProc;
    void *clientData;
} aeFileEvent;

15.3 epollfd 的创建

我们通过搜索关键字 epoll_create 在 ae_epoll.c 文件中找到 EPFD 的创建函数 aeApiCreate 。

static int aeApiCreate(aeEventLoop *eventLoop) {
    aeApiState *state = zmalloc(sizeof(aeApiState));

    if (!state) return -1;
    state->events = zmalloc(sizeof(struct epoll_event)*eventLoop->setsize);
    if (!state->events) {
        zfree(state);
        return -1;
    }
    state->epfd = epoll_create(1024); /* 1024 is just a hint for the kernel */
    if (state->epfd == -1) {
        zfree(state->events);
        zfree(state);
        return -1;
    }
    eventLoop->apidata = state;
    return 0;
}

使用 GDB 的 b 命令在这个函数上加个断点，然后使用 run 命令重新运行一下 redis-server，触发断点，使用 bt 命令查看此时的调用堆栈。发现 EPFD 也是在上文介绍的 initServer 函数中创建的。

(gdb) bt
#0  aeCreateEventLoop (setsize=10128) at ae.c:79
#1  0x000000000042f542 in initServer () at server.c:1841
#2  0x0000000000423803 in main (argc=<optimized out>, argv=0x7fffffffe588) at server.c:3857

在 aeCreateEventLoop 中不仅创建了 EPFD，也创建了整个事件循环需要的 aeEventLoop 对象，并把这个对象记录在 Redis 的一个全局变量的 el 字段中。这个全局变量叫 server，这是一个结构体类型。其定义如下：

//位于 server.c 文件中
struct redisServer server; /* Server global state */
//位于 server.h 文件中
struct redisServer {
    /* General */
    //省略部分字段...
    aeEventLoop *el;
    unsigned int lruclock;      /* Clock for LRU eviction */
    //太长了，省略部分字段...
}