在经典的TCP/IP网络编程书籍中都介绍过这样一种模型:
“服务器在某知名端口监听,并fork若干子进程,当有新的连接请求到来时在子进程中通过accept调用获取新连接并进行处理”;
听起来一切顺理成章,但仔细想想就会有很多疑问,比如“父子进程属于两个不同的进程空间,父进程中监听的端口如何在子进程中accept?”;
另外网上还有一些讨论,比如“多个进程在同一个描述符上accept时会产生“惊群”效应”;
一切又扑朔迷离起来了。
本文将以此为背景,通过实践和源码相结合的方式来一探究竟。
本文所采用的服务器模型如下:
int main(int argc, char *argv[]){
socket();
bind();
listen();
fork();
if( parent ){
accept();
}
else if( child ){
accept();
}
else{
/*error*/
}
return 0;
}
这里比文章开头介绍的架构更进一步,我们在父进程中也调用了accept(),看看是个什么情形。
首先,启动服务器:
$ ps -ef | grep server
yyy 6182 3573 0 18:08 pts/3 00:00:00 ./tcp_server_tem
yyy 6183 6182 0 18:08 pts/3 00:00:00 ./tcp_server_tem
$ sudo netstat -antp | grep 54321
tcp 0 0 192.168.31.162:54321 0.0.0.0:* LISTEN 6182/tcp_server_tem
使用ps命令查看,父进程(6182)和子进程(6183)均已经正常启动,并且netstat命令中只显示了父进程(6182)监听在指定的端口上(54321)。
如果只有父进程在该端口上监听,那么子进程中是如何做到成功accept的呢?
我们知道,socket的实质也是描述符,那么就深入进程所拥有的描述符表中看一下吧:
$ ls -l /proc/6182/fd
total 0
lrwx------ 1 yyy yyy 64 Feb 3 18:08 0 -> /dev/pts/3
lrwx------ 1 yyy yyy 64 Feb 3 18:08 1 -> /dev/pts/3
lrwx------ 1 yyy yyy 64 Feb 3 18:08 2 -> /dev/pts/3
lrwx------ 1 yyy yyy 64 Feb 3 18:08 3 -> socket:[50899]
$ ls -l /proc/6183/fd
total 0
lrwx------ 1 yyy yyy 64 Feb 3 18:08 0 -> /dev/pts/3
lrwx------ 1 yyy yyy 64 Feb 3 18:08 1 -> /dev/pts/3
lrwx------ 1 yyy yyy 64 Feb 3 18:08 2 -> /dev/pts/3
lrwx------ 1 yyy yyy 64 Feb 3 18:08 3 -> socket:[50899]
$ cat /proc/net/tcp | grep 50899
1: A21FA8C0:D431 00000000:0000 0A 00000000:00000000 00:00000000 00000000 1001 0 50899 1 0000000000000000 100 0 0 10 -1
可以看到,虽然netstat中没有显示,但其实父子进程都拥有该监听socket(子进程是通过fork时的描述符拷贝而从父进程中继承来的),并指向同一个节点(50899)。这样在父子进程中就都可以通过对应的描述符来操作内核中对应的同一个sock对象了。
好了,下面启动客户端来看一下效果(测试中使用的客户端和服务器均跑在同一台物理机器上):
server 6183 accept clientsock 4
server 6183 recv zero
server 6183 now do accept!
server 6182 accept clientsock 4
server 6182 recv zero
server 6182 now do accept!
server 6183 accept clientsock 4
server 6183 recv zero
server 6183 now do accept!
server 6182 accept clientsock 4
server 6182 recv zero
server 6182 now do accept!
神奇,父子进程都能够通过accept获取新连接,并且看起来还是交替进行的。
那么accept函数究竟是如何实现的呢,还是得要去协议栈的源码里面扒一扒才行啊。
Kernel 3.16.1。
TCP/IP协议栈中,accept系统调用对应的实现函数是inet_csk_accept:
net/ipv4/inet_connection_sock.c
/*
* This will accept the next outstanding connection.
*/
struct sock *inet_csk_accept(struct sock *sk, int flags, int *err)
{
struct inet_connection_sock *icsk = inet_csk(sk);
struct request_sock_queue *queue = &icsk->icsk_accept_queue;
struct sock *newsk;
struct request_sock *req;
int error;
lock_sock(sk);
/* We need to make sure that this socket is listening,
* and that it has something pending.
*/
error = -EINVAL;
if (sk->sk_state != TCP_LISTEN)
goto out_err;
/* Find already established connection */
if (reqsk_queue_empty(queue)) {
long timeo = sock_rcvtimeo(sk, flags & O_NONBLOCK);
/* If this is a non blocking socket don't sleep */
error = -EAGAIN;
if (!timeo)
goto out_err;
error = inet_csk_wait_for_connect(sk, timeo);
if (error)
goto out_err;
}
req = reqsk_queue_remove(queue);
newsk = req->sk;
sk_acceptq_removed(sk);
if (sk->sk_protocol == IPPROTO_TCP && queue->fastopenq != NULL) {
spin_lock_bh(&queue->fastopenq->lock);
if (tcp_rsk(req)->listener) {
/* We are still waiting for the final ACK from 3WHS
* so can't free req now. Instead, we set req->sk to
* NULL to signify that the child socket is taken
* so reqsk_fastopen_remove() will free the req
* when 3WHS finishes (or is aborted).
*/
req->sk = NULL;
req = NULL;
}
spin_unlock_bh(&queue->fastopenq->lock);
}
out:
release_sock(sk);
if (req)
__reqsk_free(req);
return newsk;
out_err:
newsk = NULL;
req = NULL;
*err = error;
goto out;
}
EXPORT_SYMBOL(inet_csk_accept);
如果调用时还没有可以accept的连接且使用了阻塞模式的话,则会进入inet_csk_wait_for_connect函数:
/*
* Wait for an incoming connection, avoid race conditions. This must be called
* with the socket locked.
*/
static int inet_csk_wait_for_connect(struct sock *sk, long timeo)
{
struct inet_connection_sock *icsk = inet_csk(sk);
DEFINE_WAIT(wait);
int err;
/*
* True wake-one mechanism for incoming connections: only
* one process gets woken up, not the 'whole herd'.
* Since we do not 'race & poll' for established sockets
* anymore, the common case will execute the loop only once.
*
* Subtle issue: "add_wait_queue_exclusive()" will be added
* after any current non-exclusive waiters, and we know that
* it will always _stay_ after any new non-exclusive waiters
* because all non-exclusive waiters are added at the
* beginning of the wait-queue. As such, it's ok to "drop"
* our exclusiveness temporarily when we get woken up without
* having to remove and re-insert us on the wait queue.
*/
for (;;) {
prepare_to_wait_exclusive(sk_sleep(sk), &wait,
TASK_INTERRUPTIBLE);
release_sock(sk);
if (reqsk_queue_empty(&icsk->icsk_accept_queue))
timeo = schedule_timeout(timeo);
lock_sock(sk);
err = 0;
if (!reqsk_queue_empty(&icsk->icsk_accept_queue))
break;
err = -EINVAL;
if (sk->sk_state != TCP_LISTEN)
break;
err = sock_intr_errno(timeo);
if (signal_pending(current))
break;
err = -EAGAIN;
if (!timeo)
break;
}
finish_wait(sk_sleep(sk), &wait);
return err;
}
这里使用了等待队列来完成任务,并且注释中说的很清楚了,采用了“wake-one”的机制,不会发生“whole herd”,也就是“惊群”的情况。
还有另外一种模型,就是accept之后再fork,然后在父进程中关闭accept套接字,在子进程中关闭监听套接字,这样做的缺点在于fork系统调用的性能损耗,但好在现在的fork实现了“copy-on-write”机制,就不再展开说了。
马上就要过年放假了,下一篇文章应该就是猴年了,祝大家新年快乐!