最近查了3天一个技术BUT,开始并不知道是网络问题,一步步定位,最终确定是网络接收问题。
进而深入前人的代码才有所察觉,因这个问题是随机性,很难重现。找到了网络问题,立马写代码重现。果然立马重现了。确定是EPOLL accept问题。
因为我们的问题是服务器端 接收缓冲区中一直有数据,拿不走,连接是已建立。
看看我们的代码
问题出在EPOLL的模式上,EPOLL有水平模式 和 边缘模式,而这里监听句柄 和 客户端句柄都采用边缘
边缘的意思 是只有状态变化了才通知,即从可读->不可读->可读 才会返回 ,一直可读则只返回一次,
即下面的代码如果客户端同时大量连接,服务器端可能同时又多个连接事件,但accept 只会通知一次,导致后续的连接还在内核,所以服务器端应用没有去接收,
解决该问题,是把监听句柄 改为水平模式,其实默认就是水平。去掉EPOLLET即可;下次accept时可以吧剩余的连接取出来。
另一种方法是 在处理客户端连接 吧accept 改为 while 就像读写数据一样,消灭这次所有的事件才能继续;
即
while( ( remoteFd = accept(m_socket, ( struct sockaddr* ) &client, &length))>0)
1.建立监听socket
m_epoll = epoll_create( 128);
ev.data.fd = m_socket;
ev.events = EPOLLIN|EPOLLET;//边缘模式
epoll_ctl(m_epoll, EPOLL_CTL_ADD, m_socket, &ev);
2.处理客户端连接
nfds = epoll_wait(m_epoll, events, 32, m_timeout);
for (i = 0; i < nfds; i++)
{
if (events[i].data.fd < 0)
continue;
if (events[i].data.fd == m_socket)
{
struct sockaddr_in client;
socklen_t length = sizeof (client );
remoteFd = accept(m_socket, ( struct sockaddr* ) &client, &length);
if (remoteFd > 0)
{
printf("[CSocketServer::WaitForEvent] client ip=%s port=%d",inet_ntoa(client.sin_addr),ntohs(client.sin_port));
fcntl(remoteFd, F_SETFL, fcntl(remoteFd, F_GETFL) | O_NONBLOCK);
ev.data.fd = remoteFd;
ev.events = EPOLLIN|EPOLLET|EPOLLHUP;//边缘模式
epoll_ctl(m_epoll, EPOLL_CTL_ADD, remoteFd, &ev);
}
continue;
}
else if (events[i].events & EPOLLIN)
{
}
else if (events[i].events & EPOLLHUP)
{
}
}
其实man epoll 中的例子监听句柄默认也是水平模式,客户端句柄才是边缘模式,估计前人是从网上拷贝下来,导致问题潜伏了N年
Example for Suggested Usage
While the usage of epoll when employed as a level-triggered interface does have the same semantics as poll(2), the edge-triggered usage requires more clari‐
fication to avoid stalls in the application event loop. In this example, listener is a non-blocking socket on which listen(2) has been called. The func‐
tion do_use_fd() uses the new ready file descriptor until EAGAIN is returned by either read(2) or write(2). An event-driven state machine application
should, after having received EAGAIN, record its current state so that at the next call to do_use_fd() it will continue to read(2) or write(2) from where it
stopped before.
#define MAX_EVENTS 10
struct epoll_event ev, events[MAX_EVENTS];
int listen_sock, conn_sock, nfds, epollfd;
/* Set up listening socket, 'listen_sock' (socket(),
bind(), listen()) */
epollfd = epoll_create(10);
if (epollfd == -1) {
perror("epoll_create");
exit(EXIT_FAILURE);
}
ev.events = EPOLLIN;//水平模式
ev.data.fd = listen_sock;
if (epoll_ctl(epollfd, EPOLL_CTL_ADD, listen_sock, &ev) == -1) {
perror("epoll_ctl: listen_sock");
exit(EXIT_FAILURE);
}
for (;;) {
nfds = epoll_wait(epollfd, events, MAX_EVENTS, -1);
if (nfds == -1) {
perror("epoll_pwait");
exit(EXIT_FAILURE);
}
for (n = 0; n < nfds; ++n) {
if (events[n].data.fd == listen_sock) {
conn_sock = accept(listen_sock, (struct sockaddr *) &local, &addrlen);
if (conn_sock == -1) {
perror("accept");
exit(EXIT_FAILURE);
}
setnonblocking(conn_sock);
ev.events = EPOLLIN | EPOLLET;
ev.data.fd = conn_sock;
if (epoll_ctl(epollfd, EPOLL_CTL_ADD, conn_sock,
&ev) == -1) {
perror("epoll_ctl: conn_sock");
exit(EXIT_FAILURE);
}
} else {
do_use_fd(events[n].data.fd);
}
}
}