连接linux tcp连接失败,linux – tcp连接在高负载下随机失败

我们的应用程序使用非阻塞套接字使用连接和选择操作(c代码). pusedo代码如下:

unsigned int ConnectToServer(struct sockaddr_in *pSelfAddr,struct sockaddr_in *pDestAddr)

{

int sktConnect = -1;

sktConnect = socket(AF_INET,SOCK_STREAM,0);

if(sktConnect == INVALID_SOCKET)

return -1;

fcntl(sktConnect,F_SETFL,fcntl(sktConnect,F_GETFL) | O_NONBLOCK);

if(pSelfAddr != 0)

{

if(bind(sktConnect,(const struct sockaddr*)(void *)pSelfAddr,sizeof(*pSelfAddr)) != 0)

{

closesocket(sktConnect);

return -1;

}

}

errno = 0;

int nRc = connect(sktConnect,(const struct sockaddr*)(void *)pDestAddr, sizeof(*pDestAddr));

if(nrC != -1)

{

return sktConnect;

}

if(errno != EINPROGRESS)

{

int savedError = errno;

closesocket(sktConnect);

return -1;

}

fd_set scanSet;

FD_ZERO(&scanSet);

FD_SET(sktConnect,&scanSet);

struct timeval waitTime;

waitTime.tv_sec = 2;

waitTime.tv_usec = 0;

int tmp;

tmp = select(sktConnect +1, (fd_set*)0, &scanSet, (fd_set*)0,&waitTime);

if(tmp == -1 || !FD_ISSET(sktConnect,&scanSet))

{

int savedErrorNo = errno;

writeLog("Connect %s failed after select, cause %d, error %s",inet_ntoa(pDestAddr->sin_addr),savedErrorNo,strerror(savedErrorNo));

closesocket(sktConnect);

return -1;

}

. . . . .}

有80个这样的节点,应用程序以循环方式连接到所有对等节点.

在此阶段,某些节点无法连接(api – connect select),错误号为115.

In the below logs (of tcpdump output) for success scenario, we can

see (SYN, SYN+ACK, ACK) but no entry of even SYN is present for failed

node in tcpdump logs.

tcpdump日志是:

387937 2012-07-05 07:45:30.646514 10.18.92.173 10.137.165.136 TCP 33728 > 8441 [SYN] Seq=0 Ack=0 Win=5792 Len=0 MSS=1460 TSV=1414450402 TSER=912308224 WS=8

387947 2012-07-05 07:45:30.780762 10.137.165.136 10.18.92.173 TCP 8441 > 33728 [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 TSV=912309754 TSER=1414450402 WS=8

387948 2012-07-05 07:45:30.780773 10.18.92.173 10.137.165.136 TCP 33728 > 8441 [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=1414450435 TSER=912309754

All the above three events indicate the success information.

387949 2012-07-05 07:45:30.782652 10.18.92.173 10.137.165.136 TCP 33728 > 8441 [PSH, ACK] Seq=1 Ack=1 Win=5888 Len=320 TSV=1414450436 TSER=912309754

387967 2012-07-05 07:45:30.915615 10.137.165.136 10.18.92.173 TCP 8441 > 33728 [ACK] Seq=1 Ack=321 Win=6912 Len=0 TSV=912309788 TSER=1414450436

388011 2012-07-05 07:45:31.362712 10.18.92.173 10.137.165.136 TCP 33728 > 8441 [PSH, ACK] Seq=321 Ack=1 Win=5888 Len=320 TSV=1414450581 TSER=912309788

388055 2012-07-05 07:45:31.495558 10.137.165.136 10.18.92.173 TCP 8441 > 33728 [ACK] Seq=1 Ack=641 Win=7936 Len=0 TSV=912309933 TSER=1414450581

388080 2012-07-05 07:45:31.702336 10.137.165.136 10.18.92.173 TCP 8441 > 33728 [PSH, ACK] Seq=1 Ack=641 Win=7936 Len=712 TSV=912309985 TSER=1414450581

388081 2012-07-05 07:45:31.702350 10.18.92.173 10.137.165.136 TCP 33728 > 8441 [ACK] Seq=641 Ack=713 Win=7424 Len=0 TSV=1414450666 TSER=912309985

388142 2012-07-05 07:45:32.185612 10.137.165.136 10.18.92.173 TCP 8441 > 33728 [PSH, ACK] Seq=713 Ack=641 Win=7936 Len=320 TSV=912310106 TSER=1414450666

388143 2012-07-05 07:45:32.185629 10.18.92.173 10.137.165.136 TCP 33728 > 8441 [ACK] Seq=641 Ack=1033 Win=8704 Len=0 TSV=1414450786 TSER=912310106

388169 2012-07-05 07:45:32.362622 10.18.92.173 10.137.165.136 TCP 33728 > 8441 [PSH, ACK] Seq=641 Ack=1033 Win=8704 Len=320 TSV=1414450831 TSER=912310106

388212 2012-07-05 07:45:32.494833 10.137.165.136 10.18.92.173 TCP 8441 > 33728 [ACK] Seq=1033 Ack=961 Win=9216 Len=0 TSV=912310183 TSER=1414450831

388219 2012-07-05 07:45:32.501613 10.137.165.136 10.18.92.173 TCP 8441 > 33728 [PSH, ACK] Seq=1033 Ack=961 Win=9216 Len=356 TSV=912310185 TSER=1414450831

388220 2012-07-05 07:45:32.501624 10.18.92.173 10.137.165.136 TCP 33728 > 8441 [ACK] Seq=961 Ack=1389 Win=10240 Len=0 TSV=1414450865 TSER=912310185

应用程序日志通知连接错误(即api – connect select)

[5258: 2012-07-05 07:45:30]Connect [10.137.165.136

[5258: 2012-07-05 07:45:32]Connect 10.137.165.137 fail after select, cause:115, error Operation now in progress. Check whether remote machine exist and the network is normal or not.

[5258: 2012-07-05 07:45:32]Connect to server([10.137.165.137

成功日志对应tcpdump的前3个条目.并且在tcpdump中没有事件的故障日志

My question is : When client initiates “connect” api for failed case,

i am not able to see any event in the tcpdump at client side (even

initial SYN). What can be the reason of this randomness.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值