linux下多线程网络通信的服务端问题

最新推荐文章于 2022-03-29 10:10:52 发布

jerry_fight

最新推荐文章于 2022-03-29 10:10:52 发布

阅读量1.1k

点赞数 1

分类专栏： linux环境高级编程文章标签： linux 多线程网络编程

本文链接：https://blog.csdn.net/jerry_fight/article/details/8537887

版权

linux环境高级编程专栏收录该内容

1 篇文章 0 订阅

订阅专栏

自己写的linux下的一个基于多线程网络通信的服务端程序已经上线很长时间。如果客户端连接次数到达一定量值时，这个服务端就会死掉，产生core文件。

具体实现模型如下：

主线程循环等待客户端连接：
while (main_loop)
  {
    tmv.tv_sec = 5;
    tmv.tv_usec = 0;

    memcpy(&g_listener.read_set, &g_listener.active_set, sizeof(fd_set));
    if ((ns=select(g_listener.ns, &g_listener.read_set, NULL, NULL, &tmv)) < 0)
    {
      if (errno == EINTR)
        continue;

      goto err;
    }
    if (ns == 0)
    {
      /* timeout */
      continue;
    }

    if (FD_ISSET(g_listener.sd, &g_listener.read_set))
    {
re_accept:
      slen = sizeof(ws);
      if ((sd=accept(g_listener.sd, (struct sockaddr *)&ws, &slen)) < 0)
      {
        if (errno == EINTR)
          goto re_accept;
        continue;
      }

      xlog_debug1("new connection accepted, sd=%d ...", sd);

      pthread_mutex_lock(&g_ws_mutex);
      for (i=0; i<g_max_links; i++)
      {
        if (g_wst[i].ws_thrid == 0)
          break;
      }
      pthread_mutex_unlock(&g_ws_mutex);

      if (i >= g_max_links)
      {
        close(sd);
        continue;
      }

      /* create thread, ws_server_thread */
      if ((ec=pthread_create(&thrid, &attr, (void *)ws_server_thread, &sd)))
      {
        close(sd);
        xlog_error(__FILE__, __LINE__, "pthread_create(): %s", strerror(ec));
 continue;
      }

      xlog_debug1("new thread %u for socket %d created ...", thrid, sd);

      pthread_mutex_lock(&g_ws_mutex);
      g_wst[i].ws_thrid = thrid;
      pthread_mutex_unlock(&g_ws_mutex);

      /* save the socket id */
    }
  }

后来通过GDB调试发现关于共享内存的一个全局标志位被改变了，而我程序中并未主观或者显式地去修改这一变量。接着调试，发现这一变量前面是一个全局数组，而如果数组发生越界，这一问题就说通了。查看代码，一个出错处理不当导致数组越界！OK，原因找到了。

但是为什么被连接次数要达到一定量值时，服务端才会出现这种情况，而几次或者几十次根本没事，或者有时根本就不会死。后来发现很多次那个出错处理都对，但是某一情况下出错处理不对，这一特殊情况就是：主线程比子线程运行慢时。好吧，找到问题所在了。子线程正确执行有个前提：主线程的存储线程ID必须先于子线程执行，因为主线程要控制这些子线程，存储它们的线程ID。而我以前姑且以为主线程一般要比子线程快点，程序跑几次不出错，就得出这样的道理，太不严谨太不对了。

解决办法：子线程添加个等待主线程的函数，具体实现是调用pthread_cond_timedwait()，而通过sleep和select函数实现虽说也可以，但是在某些情况下不适合。

后来添加后又出现新的问题。即每次速度连接2次，每个传入子线程的参数都一样了。查看代码，的却，子线程开始时，我便等待，直至主线程下的一些变量已经改变，我才将通过指针取到传入的变量，当然会出现错误。

改完后的子线程模型如下：循环处理请求
/* connected socket descriptor */
  sd = *((int *)arg);
  
  /* wait 1 second for being putting my id into the queue by the main thread */
  thread_sleep(1, 0);
  
  /* set to noblock mode */
  if (fcntl(sd, F_SETFL, O_NONBLOCK) < 0)
  {
    xlog_error(__FILE__, __LINE__, "fcntl(): %s", strerror(errno));
  } 
  
  xlog_debug1("head shmm info: flag=%x, nseg=%d", g_shinf.flg, g_shinf.nseg);
  xlog_debug1("int [%u] thread, sd:%d", pthread_self(), sd);
  
  pthread_mutex_lock(&g_ws_mutex);
  for (n = 0; n < MANAGER_MAX_LINKS; n++)
  {
    if (g_wst[n].ws_thrid == pthread_self())
      break;
  }   
  pthread_mutex_unlock(&g_ws_mutex);
  
  if (n >= MANAGER_MAX_LINKS)
  {
    close(sd);
    xlog_debug1("not found myself:%u in the queue of all threads.", pthread_self());
    return;
  } 
  
  pthread_mutex_init(&g_wst[n].mutex, NULL);
  pthread_cond_init(&g_wst[n].cond, NULL);
  
  inited = 1;
  while (g_ws_server_loop)
  {
    tmv.tv_sec  = 5;
    tmv.tv_usec = 0;
    FD_ZERO(&rdset);
    FD_SET(sd, &rdset);

    xlog_debug1("[%u] [sd:%d] selecting...", pthread_self(), sd);
    if ((ns = select(sd+1, &rdset, NULL, NULL, &tmv)) < 0)
    {
      if (errno == EINTR)
        continue;

      break;
    }

    /* timeout, to next loop */
    if (ns == 0)
      continue;

    if (!FD_ISSET(sd, &rdset))
      continue;

    xlog_debug1("[%u] workstation command arrival ...", pthread_self());

    /* receive the workstation command */
to_recv1:
    if ((rc=recv(sd, buf, MAX_COMMAND_BUF_SIZE, 0)) < 0)
    {
      if (errno == EINTR || errno == EAGAIN)
        goto to_recv1;
      if (errno == EPIPE)
        break;

      break;
    }

    xlog_debug1("[%u] %d bytes received ...", pthread_self(), rc);

    /* peer closed */
    if (rc == 0)
      break;

    buf[rc] = '\0';
    /* process command */
    sz = manager_command(buf);
    p  = buf;

    while (sz > 0)
    {
      /* to send buf */
      if ((rc=send(sd, p, sz, MSG_NOSIGNAL)) < 0)
      {
        if (errno == EINTR || errno == EAGAIN)
          continue;
        goto err_exit;
      }

      if (rc == 0)
        goto err_exit;
   
      p  += rc;
      sz -= rc;
    } 
  } 

  close(sd);
    
  xlog_debug1("[%u] [sd:%u]thread will be exited.", pthread_self(), sd);

  pthread_exit(NULL);