lwip tcp_tw_pcbs list problem in tcp_slowtmr()

转载 2015年07月10日 14:18:05
lwip tcp_tw_pcbs list problem in tcp_slowtmr()

I have been having a problem in the tcp_slowtmr() function in tcp.c.  I have been using the raw api for a quite a while to implement TCP servers listening on several different ports.  I have not really had any problems so far.  Recently I have also implemented TCP client connection which quickly open up a TCP client connection read/write some data and then close the connection.  This sequence repeats itself over and over connecting to several different remote TCP servers.  Things seem to work fairly well until I start introducing some error conditions like extending remote server resonse times or reducing my timeouts waiting for data to be received.  This causes my code to timeout and close the TCP connection (probably while some responses may come back in later).  I have also tried disconnecting some of my remote servers network connections so that the initial client connection attempts will fail.  I am just mainly trying to do some general stress testing with normal conditions that may occur when deploying the application.



The problem that I am having occurs when adding these additional stress tests, and possibly with normal conditions after an extended period of time.  I have not nailed down the exact cause as of yet.  I finally get into a lock up condition when calling the tcp_slowtmr() function.  The lockup occurs cycling through the code lines highlighted below from the tcp_slowtmr() function:



  /* Steps through all of the TIME-WAIT PCBs. */

prev = NULL;

  pcb = tcp_tw_pcbs;

  while (pcb != NULL) {

    LWIP_ASSERT("tcp_slowtmr: TIME-WAIT pcb->state == TIME-WAIT", pcb->state == TIME_WAIT);

    pcb_remove = 0;



    /* Check if this PCB has stayed long enough in TIME-WAIT */

    if ((u32_t)(tcp_ticks - pcb->tmr) > 2 * TCP_MSL / TCP_SLOW_INTERVAL) {

      ++pcb_remove;

    }

   





    /* If the PCB should be removed, do it. */

    if (pcb_remove) {

      struct tcp_pcb *pcb2;

      tcp_pcb_purge(pcb);

      /* Remove PCB from tcp_tw_pcbs list. */

      if (prev != NULL) {

        LWIP_ASSERT("tcp_slowtmr: middle tcp != tcp_tw_pcbs", pcb != tcp_tw_pcbs);

        prev->next = pcb->next;

      } else {

        /* This PCB was the first. */

        LWIP_ASSERT("tcp_slowtmr: first pcb == tcp_tw_pcbs", tcp_tw_pcbs == pcb);

        tcp_tw_pcbs = pcb->next;

      }

      pcb2 = pcb;

      pcb = pcb->next;

      memp_free(MEMP_TCP_PCB, pcb2);

    } else {

      prev = pcb;

      pcb = pcb->next;

    }

  }



The problem occurs when the first item on the tcp_tw_pcbs list points back to itself:  pcb->next == pcb, so the code never exits the while loop.  The tcp_ticks value never changes in this part of the code so pcb_remove is never set > 0 either.



This is a single threaded application and only the standard interrupt handling function are being used.  This is an application running on an LPC4350 using the LPCOpen library from NXP with lwip v1.4.1.



It seems like the problem is created if I start calling tcp_close() to close my client connections.  If I use tcp_abort() instead then I don’t seem to have to problem – It does however cause undesirable sequences in wireshark.



What is the recommended sequence to close a tcp client session using the raw api?



Any suggestions as to what I may be doing wrong here or could this possibly be a bug that has been seen before in lwip?



Thanks,
Greg Dunn

相关文章推荐

TCP的核心算法在lwip中的实现

TCP的核心协议(滑动窗口、拥塞控制、慢启动、快速重传、快速恢复、Nagle算法、捎带ACK等)在lwip中的实现。...
  • sealyao
  • sealyao
  • 2010年08月25日 21:04
  • 9528

【经验总结】tcp_tw_recycle参数引发的故障

tcp_tw_recycle参数引发的故障 By Eric  故障描述:     2010年9月7日,新上线的手机游戏论坛有部分地区用户反应登陆游戏时出现不能登陆或登陆超时等情况,观察用户同时在...

【经验总结】tcp_tw_recycle参数引发的故障

tcp_tw_recycle参数引发的故障 By Eric  故障描述:     2010年9月7日,新上线的手机游戏论坛有部分地区用户反应登陆游戏时出现不能登陆或登陆超时等情况,观察用户同时在...
  • sxtobj
  • sxtobj
  • 2016年10月21日 17:12
  • 117

关于内核参数tcp_tw_recycle

最近上线了一个手机类的网站,发现测试中苹果和安卓的系统访问效果区别很大。安卓系统访问的时候速度明显比较慢。然后进行排查, 当时环境是网站跑在系统是centos 5的虚拟机,物理机是centos 6 。...
  • zhsh87
  • zhsh87
  • 2013年09月17日 17:02
  • 1541

记录一次,三次握手中缺少第二次握手导致的问题(即缺少SYN ACK返回,实际是tcp_tw_recycle设置导致的问题)

三次握手中缺少第二次握手导致的问题阿里云新购的ECS,镜像使用的微柳科技的Centos7+PHP环境,实际使用中发现如下现象: 1.公司IP/某常用IP经常无故无法连接服务器(包括ssh-22,my...
  • kk1946n
  • kk1946n
  • 2017年10月24日 10:54
  • 60

【经验总结】tcp_tw_recycle参数引发的故障

tcp_tw_recycle参数引发的故障By Eric 故障描述:    2010年9月7日,新上线的手机游戏论坛有部分地区用户反应登陆游戏时出现不能登陆或登陆超时等情况,观察用户同时在线数量开始下...

net.ipv4.tcp_tw_recycle 与 网络连接失败

最近从公司内网使用线上应用,很多同事都反映时断时连的,抓包看不时会有连接失败的情况(奇怪的是总是卡住约11秒),但其实最终是会成功的,但由于客户端的超时都是10秒,所以表现出来的就是总是服务不可用,出...

tcp_tw_recycle参数引发的系统问题

启用TIME-WAIT状态sockets的快速回收,这个选项不推荐启用。在NAT(Network Address Translation)网络下,会导致大量的TCP连接建立错误。...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:lwip tcp_tw_pcbs list problem in tcp_slowtmr()
举报原因:
原因补充:

(最多只允许输入30个字)