POLLERR、POLLIN 同时上报

POLLERR、POLLIN 同时上报

目的

  • 使用poll机制监听虚拟网卡(tun/tap设备)和物理网卡
  • 设置events poller触发事件类型: POLLIN & POLLPRI
  • 监听数据

问题

  监听过程中,revent不停的有POLLERR事件产生

分析&定位

1、报文错误

  关闭所有模块的报文发送,保证网卡没有报文传输,问题依旧存在

2、文件描述符错误

  确认无误后,问题依旧存在

3、重新回顾POLLERR定义

POLLERR
  Error condition (only returned in revents; ignored in events). This bit is also set for a file descriptor referring to the write end of a pipe when the read end has been closed.
https://man7.org/linux/man-pages/man2/poll.2.html
  表示POLLERR不被events触发,有且未读到完整数据,因为写管道被关闭,会触发;
  但是实际情况不是这样的,因为没有任何数据往网卡写;

其它工程师回答:
  1、POLLERR意味着socket出现异步错误(具体不太懂); file descriptor 不支持polling
https://stackoverflow.com/questions/24791625/how-to-handle-the-linux-socket-revents-pollerr-pollhup-and-pollnval

4、重新查看所有网卡的revent,物理网卡和虚拟网卡不同,只有虚拟网卡有POLLERR

  发现虚拟网卡(虚拟网卡默认是关闭的,软件架构原因)和物理网卡的区别在于没有running,使用 ifconfig eth1.1 up后,该网卡POLLERR告警消失

5、为什么网卡的down会引发error?

  查看内核

  由于该设备是在这里仅用于以太二层,所以先查看内核tap.c文件(根据下图C代码),只有获取到数据内容为空的时候,才会返回POOLERR,但这里的空不仅仅是stream还可能包括其它(不熟悉VFS架构,暂时搞不清楚,推测是网卡(tap)设备的down状态引发poll机制,但同时没有数据上发上来)

static __poll_t tap_poll(struct file *file, poll_table *wait)
{
	struct tap_queue *q = file->private_data;
	__poll_t mask = EPOLLERR;

	if (!q)
		goto out;

	mask = 0;
	poll_wait(file, &q->sock.wq.wait, wait);

	if (!ptr_ring_empty(&q->ring))
		mask |= EPOLLIN | EPOLLRDNORM;

	if (sock_writeable(&q->sk) ||
	    (!test_and_set_bit(SOCKWQ_ASYNC_NOSPACE, &q->sock.flags) &&
	     sock_writeable(&q->sk)))
		mask |= EPOLLOUT | EPOLLWRNORM;

out:
	return mask;
}       

  查看tun.c文件,有两种的情况会触发PLLERR;
  1、无数据
  2、网络设备探测未成功【直译】

/* Poll */
static __poll_t tun_chr_poll(struct file *file, poll_table *wait)
{
	struct tun_file *tfile = file->private_data;
	struct tun_struct *tun = tun_get(tfile);
	struct sock *sk;
	__poll_t mask = 0;

	if (!tun)
		return EPOLLERR;

	sk = tfile->socket.sk;

	poll_wait(file, sk_sleep(sk), wait);

	if (!ptr_ring_empty(&tfile->tx_ring))
		mask |= EPOLLIN | EPOLLRDNORM;

	/* Make sure SOCKWQ_ASYNC_NOSPACE is set if not writable to
	 * guarantee EPOLLOUT to be raised by either here or
	 * tun_sock_write_space(). Then process could get notification
	 * after it writes to a down device and meets -EIO.
	 */
	if (tun_sock_writeable(tun, tfile) ||
	    (!test_and_set_bit(SOCKWQ_ASYNC_NOSPACE, &sk->sk_socket->flags) &&
	     tun_sock_writeable(tun, tfile)))
		mask |= EPOLLOUT | EPOLLWRNORM;

	if (tun->dev->reg_state != NETREG_REGISTERED)
		mask = EPOLLERR;

	tun_put(tun);
	return mask;
}

综上所述,还是网卡的down状态触发了poll上报机制;在其它书中看到过这样的描述:

“注意,fwide 并不改变已定向流的定向。还应注意的是,fwide 无出错返回。试想,如若流是无效>>的,那么将发生什么呢?我们唯一可依靠的是,在调用 fwide 前先清除 errno,从fwide返>>回时检查errno的值.

所以可以侧面判定出需要清除这个error,才有可能不会有POLLERR上报;如下

ret = getsockopt(_fd.fd, SOL_SOCKET, SO_ERROR, &opt_val, &optlen);

读完错误后就没有POLLERR上报了。

参考

stackOverflow_1

poll_manual

CSDN

tcpdump

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
分析一下下面这段代码while(1) { revents = 0; #ifndef DISABLE_LIBSSH if (session->ssh_chan != NULL) { /* we are getting data from libssh's channel */ status = ssh_channel_poll_timeout(session->ssh_chan, timeout, 0); if (status > 0) { revents = POLLIN; } } else #endif #ifdef ENABLE_TLS if (session->tls != NULL) { /* we are getting data from TLS session using OpenSSL */ fds.fd = SSL_get_fd(session->tls); fds.events = POLLIN; fds.revents = 0; status = poll(&fds, 1, timeout); revents = (unsigned long int) fds.revents; } else #endif if (session->fd_input != -1) { /* we are getting data from standard file descriptor */ fds.fd = session->fd_input; fds.events = POLLIN; fds.revents = 0; status = poll(&fds, 1, timeout); revents = (unsigned long int) fds.revents; } else { ERROR("Invalid session to receive data."); return (NC_MSG_UNKNOWN); } /* process the result */ if (status == 0) { /* timed out */ DBG_UNLOCK("mut_channel"); pthread_mutex_unlock(session->mut_channel); return (NC_MSG_WOULDBLOCK); } else if (((status == -1) && (errno == EINTR)) #ifndef DISABLE_LIBSSH || (status == SSH_AGAIN) #endif ) { /* poll was interrupted */ continue; } else if (status < 0) { /* poll failed - something wrong happend, close this socket and wait for another request */ DBG_UNLOCK("mut_channel"); pthread_mutex_unlock(session->mut_channel); #ifndef DISABLE_LIBSSH if (status == SSH_EOF) { emsg = "end of file"; } else if (!session->ssh_chan) { emsg = strerror(errno); } else if (session->ssh_sess) { emsg = ssh_get_error(session->ssh_sess); } else { emsg = "description not available"; } #else emsg = strerror(errno); #endif WARN("Input channel error (%s)", emsg); nc_session_close(session, NC_SESSION_TERM_DROPPED); if (nc_info) { pthread_rwlock_wrlock(&(nc_info->lock)); nc_info->stats.sessions_dropped++; pthread_rwlock_unlock(&(nc_info->lock)); } return (NC_MSG_UNKNOWN); } /* status > 0 */ /* check the status of the socket */ /* if nothing to read and POLLHUP (EOF) or POLLERR set */ if ((revents & POLLHUP) || (revents & POLLERR)) { /* close client's socket (it's probably already closed by client */ DBG_UNLOCK("mut_channel"); pthread_mutex_unlock(session->mut_channel); ERROR("Input channel closed"); nc_session_close(session, NC_SESSION_TERM_DROPPED); if (nc_info) { pthread_rwlock_wrlock(&(nc_info->lock)); nc_info->stats.sessions_dropped++; pthread_rwlock_unlock(&(nc_info->lock)); } return (NC_MSG_UNKNOWN); } /* we have something to read */ break; }
最新发布
06-08

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值