本文原创为freas_1990,转载请标明出处:http://blog.csdn.net/freas_1990/article/details/18999825
众所周知,TCP/IP最为广泛的考题非三次握手、四次挥手莫属。
当server端接收到syn包时,开始进入三次握手,tcp_sock的state变为SYN_RECV。完成了三次握手的第一次握手,server端开始发送ack+syn包回客户端,等待第二次握手完成。
然而,你想过没有,这里的tcp_sock是如何构造的呢?我们熟悉的socket编程里面的socket()\bind()\listen()\accept()系统调用和三次握手有什么关系呢?他们是如何对应的呢?
简化的过程如下:
1,Server端调用socket()。本系统调用主要做2件事:
A,Linux内核的sys_socket()会创建socket描述符。
B,inet_create()会调用sk_alloc()创建一个传输控制块。
2,inet_connection_sock中的request_sock_queue保存了正在建立链接和已经建立链接但是违背accept的传输控制块。在request_sock_queue的rskq_accept_head和rskq_accept_tail构成的链表中,保存了已经完成链接的链接请求快。而在listen_sock的syn_table散列表(不是链表)中保存着两个链接状态中的链接请求块,用ld_next形成链表。
3,当server端调用了listen系统调用后,就可以接收新的连接。当client发送了syn段到server端后,就会为该链接请求创建链接请求块,建立完成之后,就发送ack+syn段。当服务器端再次收到ack段时,就会建立TCP传输控制块(如果是tcp协议),并将这个tcp_sock挂接到tcp_requst_sock的sk成员上去。同时,讲已经完成连接的连接请求快移动到rskq_accept_head队列中,等待server端的accept调用。
4,accept系统调用哦册那rskq_accept_head队列中取走请求传输控制块,与套接口相关联后释放该链接请求块(kfree)。
实现的核心代码如下:
/* This is not only more efficient than what we used to do, it eliminates * a lot of code duplication between IPv4/IPv6 SYN recv processing. -DaveM * * Actually, we could lots of memory writes here. tp of listening * socket contains all necessary default parameters. */ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req, struct sk_buff *skb) { struct sock *newsk = inet_csk_clone(sk, req, GFP_ATOMIC); if (newsk != NULL) { const struct inet_request_sock *ireq = inet_rsk(req); struct tcp_request_sock *treq = tcp_rsk(req); struct inet_connection_sock *newicsk = inet_csk(sk); struct tcp_sock *newtp; /* Now setup tcp_sock */ newtp = tcp_sk(newsk); newtp->pred_flags = 0; newtp->rcv_nxt = treq->rcv_isn + 1; newtp->snd_nxt = newtp->snd_una = newtp->snd_sml = treq->snt_isn + 1; tcp_prequeue_init(newtp); tcp_init_wl(newtp, treq->snt_isn, treq->rcv_isn); newtp->srtt = 0; newtp->mdev = TCP_TIMEOUT_INIT; newicsk->icsk_rto = TCP_TIMEOUT_INIT; newtp->packets_out = 0; newtp->left_out = 0; newtp->retrans_out = 0; newtp->sacked_out = 0; newtp->fackets_out = 0; newtp->snd_ssthresh = 0x7fffffff; /* So many TCP implementations out there (incorrectly) count the * initial SYN frame in their delayed-ACK and congestion control * algorithms that we must have the following bandaid to talk * efficiently to them. -DaveM */ newtp->snd_cwnd = 2; newtp->snd_cwnd_cnt = 0; newtp->bytes_acked = 0; newtp->frto_counter = 0; newtp->frto_highmark = 0; newicsk->icsk_ca_ops = &tcp_init_congestion_ops; tcp_set_ca_state(newsk, TCP_CA_Open); tcp_init_xmit_timers(newsk); skb_queue_head_init(&newtp->out_of_order_queue); newtp->rcv_wup = treq->rcv_isn + 1; newtp->write_seq = treq->snt_isn + 1; newtp->pushed_seq = newtp->write_seq; newtp->copied_seq = treq->rcv_isn + 1; newtp->rx_opt.saw_tstamp = 0; newtp->rx_opt.dsack = 0; newtp->rx_opt.eff_sacks = 0; newtp->rx_opt.num_sacks = 0; newtp->urg_data = 0; if (sock_flag(newsk, SOCK_KEEPOPEN)) inet_csk_reset_keepalive_timer(newsk, keepalive_time_when(newtp)); newtp->rx_opt.tstamp_ok = ireq->tstamp_ok; if((newtp->rx_opt.sack_ok = ireq->sack_ok) != 0) { if (sysctl_tcp_fack) newtp->rx_opt.sack_ok |= 2; } newtp->window_clamp = req->window_clamp; newtp->rcv_ssthresh = req->rcv_wnd; newtp->rcv_wnd = req->rcv_wnd; newtp->rx_opt.wscale_ok = ireq->wscale_ok; if (newtp->rx_opt.wscale_ok) { newtp->rx_opt.snd_wscale = ireq->snd_wscale; newtp->rx_opt.rcv_wscale = ireq->rcv_wscale; } else { newtp->rx_opt.snd_wscale = newtp->rx_opt.rcv_wscale = 0; newtp->window_clamp = min(newtp->window_clamp, 65535U); } newtp->snd_wnd = ntohs(skb->h.th->window) << newtp->rx_opt.snd_wscale; newtp->max_window = newtp->snd_wnd; if (newtp->rx_opt.tstamp_ok) { newtp->rx_opt.ts_recent = req->ts_recent; newtp->rx_opt.ts_recent_stamp = xtime.tv_sec; newtp->tcp_header_len = sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED; } else { newtp->rx_opt.ts_recent_stamp = 0; newtp->tcp_header_len = sizeof(struct tcphdr); } #ifdef CONFIG_TCP_MD5SIG newtp->md5sig_info = NULL; /*XXX*/ if (newtp->af_specific->md5_lookup(sk, newsk)) newtp->tcp_header_len += TCPOLEN_MD5SIG_ALIGNED; #endif if (skb->len >= TCP_MIN_RCVMSS+newtp->tcp_header_len) newicsk->icsk_ack.last_seg_size = skb->len - newtp->tcp_header_len; newtp->rx_opt.mss_clamp = req->mss; TCP_ECN_openreq_child(newtp, req); TCP_INC_STATS_BH(TCP_MIB_PASSIVEOPENS); } return newsk; }