The C10K Problem

最新推荐文章于 2023-11-08 19:45:00 发布

huntzw

最新推荐文章于 2023-11-08 19:45:00 发布

阅读量762

点赞数

分类专栏： System 文章标签： c thread socket freebsd asynchronous features

System 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

为了了解一些基础知识。简单翻译如下。。。

如今的web server要求能够同时地处理成百上千的client请求。毕竟，web和机器现在已经容量非常大了，硬件已不是瓶颈。接下来，我们介绍如何配置操作系统以及代码实现支持上千的client请求，主要介绍的是Unix系统。

当然，首先你应该阅读那本著名的Unix Network Programming : Networking Apis by Richard Stevens。

I/O frameworks

首先，我们先来看看几个著名的I/O框架。

· ACE – 一个重量级的C++ I/O框架，包含了以面向对象方式实现的I/O策略。以OO方式非阻塞I/O的Reactor，以及OO方式异步I/O的Proactor。

· ASIO – Boost库中的C++ I/O框架，如同ACE为STL做的更新。

· Libevent – 轻量级的C I/O框架，其支持kqueue和select，将很快支持poll与epoll。它只是level-triggered。

· Poller – 轻量级的C++ I/O框架，实现了level-triggered。对于benchmarks that compare the performance of the various APIs有用处。

· rn – 轻量级C I/O框架。它是LGPL，因此可以应用于商业软件中，而且可以应用于非C++的软件。

· Cory Nelson's Scale! library - windows下的异步socket，文件以及pipe I/O库。

I/O Strategies

网络软件的设计有多种选择。例如：

1. 如何从单一线程里发出多I/O调用？

· 请不要完全采用阻塞/同步的调用，可以采用多线程/多进程达到并行处理。

· 采用非阻塞系统调用（如对于设置O_NONBLOCK的socket进行write()）开始I/O，利用poll的迅速读通知来开始下一次I/O操作。一般来说，采用网络I/O，而不是disk I/O。

· 采用异步调用。如aio_write()开始I/O，当I/O完成时，利用signal或者completion port通知。采用网络I/O，或disk I/O均可以。

2. 怎样控制每一个client服务？

· 一个client对应一个进程。

· 一个OS-level的线程处理多个client，每一个client可以被一个user-level线程，或者一个state machine控制。

· 一个OS-level线程对应一个client。

· 一个OS-level线程对应一个active client。

3. 是否采用标准系统服务，或者直接把code放到kernel？

如下的几种组合比较流行：

1. Serve many clients with each thread, and use nonblocking I/O and level-triggered readiness notification

传统的方式为socket设为非阻塞，采用select或者poll进行数据通知。Level-triggered来自计算机硬件，与edge-triggered相反。由Jonathon Lemon在BSDCON 2000 paper on kqueue()引入。该方法的瓶颈在于read()和sendfile()阻塞于disk I/O（当发生page fault时候），即使对于disk文件设置为非阻塞的方式也无济于事。有一种解决方法是采用memory-mapped文件，当mincore()指示I/O准备好，才进行I/O操作。但是这种方法依然有上述的瓶颈。有以下几种方法可以得知非阻塞socket可以进行I/O操作：

· 传统的select()。不幸的的是select受限于FD_SETSIZE。

· 传统的poll()。它没有hardcode的限制，但是对于上千的socket时，处理的时间很慢，因为多数的socket是处于空闲状态。

· /dev/poll。只需要通过OS通知哪些文件是你希望写入的。

· Kqueue()。FreeBSD中代替poll。

2. Serve many clients with each thread, and use nonblocking I/O and readiness change notification

Readiness change即是edge-triggered。含义是给内核一个操作符，当它由not ready变为ready时候，内核通知你。并且，这个通知只发送一次。需要注意的是你要预防假的event通知。如果你丢失一个event，与这个event的链接可能会永远阻塞。尽管如此，OpenSSL实现了这种方式。如下的API可以做到：

· Kqueue()是edge-triggered方式，用于FreeBSD。

· Epoll是edge-triggered方式，用于2.6Linux内核。

· Realtime Signals是edge-triggered方式，用于2.4内核。

/* Mask off SIGIO and the signal you want to use. */

sigemptyset(&sigset);

sigaddset(&sigset, signum);

sigaddset(&sigset, SIGIO);

sigprocmask(SIG_BLOCK, &m_sigset, NULL);

/* For each file descriptor, invoke F_SETOWN, F_SETSIG, and set O_ASYNC. */

fcntl(fd, F_SETOWN, (int) getpid());

fcntl(fd, F_SETSIG, signum);

flags = fcntl(fd, F_GETFL);

flags |= O_NONBLOCK|O_ASYNC;

fcntl(fd, F_SETFL, flags);

3. Serve many clients with each server thread, and use asynchronous I/O

这种方法在UNIX系统中不是很流行。如AIO和glibc2.1.

4. serve one client with each server thread, and use blocking I/O

Linux下的NPTL线程。

While NPTL uses the three kernel features introduced by NGPT: getpid() returns PID, CLONE_THREAD and futexes; NPTL also uses (and relies on) a much wider set of new kernel features, developed as part of this project.
Some of the items NGPT introduced into the kernel around 2.5.8 got modified, cleaned up and extended, such as thread group handling (CLONE_THREAD). [the CLONE_THREAD changes which impacted NGPT's compatibility got synced with the NGPT folks, to make sure NGPT does not break in any unacceptable way.]
A short list: TLS support, various clone extensions (CLONE_SETTLS, CLONE_SETTID, CLONE_CLEARTID), POSIX thread-signal handling, sys_exit() extension (release TID futex upon VM-release), the sys_exit_group() system-call, sys_execve() enhancements and support for detached threads.
There was also work put into extending the PID space - eg. procfs crashed due to 64K PID assumptions, max_pid, and pid allocation scalability work. Plus a number of performance-only improvements were done as well.
In essence the new features are a no-compromises approach to 1:1 threading - the kernel now helps in everything where it can improve threading, and we precisely do the minimally necessary set of context switches and kernel calls for every basic threading primitive.