昨天一运行客户端频繁出现
Tue Jul 17 16:06:21 2012 us=390000 Attempting to establish TCP connection with 1
92.168.1.86:10443 [nonblock]
Tue Jul 17 16:06:21 2012 us=390000 TCP: connect to 192.168.1.86:10443 failed, wi
ll try again in 5 seconds: Operation would block (WSAEWOULDBLOCK)
Tue Jul 17 16:06:26 2012 us=390000 TCP: connect to 192.168.1.86:10443 failed, wi
ll try again in 5 seconds: Operation would block (WSAEWOULDBLOCK)
初步怀疑是客户端的问题,看了看客户端配置,没有发现任何异常,只好从服务器端判断。
看了一下服务器的log
Tue Jul 17 10:48:20 2012 us=189690 192.168.1.189:52252 SIGUSR1[soft,connection-reset] received, client-instance restarting
Tue Jul 17 10:48:25 2012 us=186285 192.168.1.189:52254 Connection reset, restarting [0]
Tue Jul 17 10:48:25 2012 us=186317 192.168.1.189:52254 SIGUSR1[soft,connection-reset] received, client-instance restarting
Tue Jul 17 10:48:30 2012 us=188313 192.168.1.189:52256 Connection reset, restarting [0]
Tue Jul 17 10:48:30 2012 us=188344 192.168.1.189:52256 SIGUSR1[soft,connection-reset] received, client-instance restarting
Tue Jul 17 10:48:35 2012 us=188504 192.168.1.189:52258 Connection reset, restarting [0]
Tue Jul 17 10:48:35 2012 us=188536 192.168.1.189:52258 SIGUSR1[soft,connection-reset] received, client-instance restarting
Tue Jul 17 10:48:45 2012 us=188543 192.168.1.189:52265 Connection reset, restarting [0]
Tue Jul 17 10:48:45 2012 us=188576 192.168.1.189:52265 SIGUSR1[soft,connection-reset] received, client-instance restarting
Tue Jul 17 10:48:50 2012 us=187213 192.168.1.189:52270 Connection reset, restarting [0]
Tue Jul 17 10:48:50 2012 us=187244 192.168.1.189:52270 SIGUSR1[soft,connection-reset] received, client-instance restarting
Tue Jul 17 10:48:55 2012 us=185208 192.168.1.189:52287 Connection reset, restarting [0]
Tue Jul 17 10:48:55 2012 us=185239 192.168.1.189:52287 SIGUSR1[soft,connection-reset] received, client-instance restarting
貌似客户端是自己退出的,通过抓包也发现客户端发出syn包,就换了端口。
设置了一下,发现客户端通过udp连接是正常的。
看了一天配置文档后,换了官方标准的客户端运行一切正常,当时晕倒。
通过和官方客户端的对比发现,
Tue Jul 17 16:13:32 2012 us=250000 Attempting to establish TCP connection with 1
92.168.1.86:10443
Tue Jul 17 16:13:32 2012 us=250000 TCP connection established with 192.168.1.86:
10443
官方使用了阻塞方式connect。
open***处理连接的代码在socket.c中,
int
open***_connect (socket_descriptor_t sd,
   struct open***_sockaddr *remote,
   int connect_timeout,
   volatile int *signal_received)
{
  int status = 0;

#ifdef CONNECT_NONBLOCK
  set_nonblock (sd);
  status = connect (sd, (struct sockaddr *) &remote->sa, sizeof (remote->sa));
  if (status)
    status = open***_errno_socket ();
  if (status == EINPROGRESS )
    {
      while (true)
 {
   fd_set writes;
   struct timeval tv;

   FD_ZERO (&writes);
   FD_SET (sd, &writes);
   tv.tv_sec = 0;
   tv.tv_usec = 0;

   status = select (sd + 1, NULL, &writes, NULL, &tv);

   if (signal_received)
     {
       get_signal (signal_received);
       if (*signal_received)
  {
    status = 0;
    break;
  }
     }
   if (status < 0)
     {
       status = open***_errno_socket ();
       break;
     }
   if (status <= 0)
     {
       if (--connect_timeout < 0)
  {
    status = ETIMEDOUT;
    break;
  }
       open***_sleep (1);
       continue;
     }

   /* got it */
   {
     int val = 0;
     socklen_t len;

     len = sizeof (val);
     if (getsockopt (sd, SOL_SOCKET, SO_ERROR, (void *) &val, &len) == 0
  && len == sizeof (val))
       status = val;
     else
       status = open***_errno_socket ();
     break;
   }
 }
    }
#else
  status = connect (sd, (struct sockaddr *) &remote->sa, sizeof (remote->sa));
  if (status)
    status = open***_errno_socket ();
#endif

  return status;
}
CONNECT_NONBLOCK 宏定义在 syshead.h中
/*
 * Is non-blocking connect() supported?
 */

#if defined(HAVE_GETSOCKOPT) && defined(SOL_SOCKET) && defined(SO_ERROR) && defined(EINPROGRESS) && defined(ETIMEDOUT)
#define CONNECT_NONBLOCK
#endif
这个文件一直没有修改,可能是环境变量的设置,导致CONNECT_NONBLOCK为1,导致open***_connect使用nonblock。
发现open***的一个bug,open***的作者不大熟悉windows编程
 set_nonblock (sd); windows下这行是废话,不用设置nonblock
  status = connect (sd, (struct sockaddr *) &remote->sa, sizeof (remote->sa));
  if (status)
    status = open***_errno_socket ();
返回 10035是正常的,
if (status == EINPROGRESS ) 就不对了, EINPROGRESS 115 /* Operation now in progress */,永远不进循环里面,
修改一下,
  if (status == WSAEWOULDBLOCK || status == EINPROGRESS  )

目前ok。