作者:gfree.wind@gmail.com
博客:blog.focus-linux.net   linuxfocus.blog.chinaunix.net
 
 
本文的copyleft归gfree.wind@gmail.com所有,使用GPL发布,可以自由拷贝,转载。但转载请保持文档的完整性,注明原作者及原链接,严禁用于任何商业用途。
======================================================================================================
昨天写了一篇博文关于TCP的连接过程的实现——主要是接到第一个syn包的处理。那部分代码有不少地方没有看明白,只不过走了一遍流程。惭愧。

今天继续昨天的流程,在回复了syn+ack包后,新创建的request_sock结构被加入到父socket的icsk_accept_queue中。接下来不考虑错误等情况,如重传。接下来就考虑如何处理TCP三次握手中的最后一个ack包。

依然按照前文中的流程,最后一个ack包会进入函数tcp_v4_do_rcv。此时,仍然匹配的是父socket,即处于listening状态的socket,因此再次调用tcp_v4_hnd_req——前文并没有对这个函数,进行分析,只是说明了对于第一个syn包,该函数返回的仍然是传入的参数sock。

下面看一下tcp_v4_hnd_req的代码
  1. static struct sock *tcp_v4_hnd_req(struct sock *sk, struct sk_buff *skb)
  2. {
  3.     struct tcphdr *th = tcp_hdr(skb);
  4.     const struct iphdr *iph = ip_hdr(skb);
  5.     struct sock *nsk;
  6.     struct request_sock **prev;
  7.     /* Find possible connection requests. */
  8.     /*
  9.     上次处理syn包时,已经将对应的request_sock加入了icsk_accept_queue中的listen_opt,
  10.     因此这次可以找到req。
  11.     并且可以注意到这个函数还有一个返回值prev,为找到的request_sock在queue中的前一个元素。
  12.     返回前一个元素,可以在后面的tcp_check_req中,在移除req时,避免二次查找。
  13.     */
  14.     struct request_sock *req = inet_csk_search_req(sk, &prev, th->source,
  15.                          iph->saddr, iph->daddr);
  16.     if (req)
  17.         return tcp_check_req(sk, skb, req, prev);
     ...... ......
  1. }
进入tcp_check_req
  1. struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
  2.              struct request_sock *req,
  3.              struct request_sock **prev)
  4. {
  5.     struct tcp_options_received tmp_opt;
  6.     const u8 *hash_location;
  7.     struct sock *child;
  8.     const struct tcphdr *th = tcp_hdr(skb);
  9.     __be32 flg = tcp_flag_word(th) & (TCP_FLAG_RST|TCP_FLAG_SYN|TCP_FLAG_ACK);
  10.     int paws_reject = 0;
     
     //每次都要将saw_tstamp重置,因为其依赖于每一个TCP包
  1.     tmp_opt.saw_tstamp = 0;
  2.     if (th->doff > (sizeof(struct tcphdr)>>2)) {
  3.         //表明有option存在于TCP首部,解析TCP的option
  4.         tcp_parse_options(skb, &tmp_opt, &hash_location, 0);
         /* 
         TCP首部含有Timestamp Option
         该option有两个用途:
         1.计算RTT
         2.PAWS,即Protection Against Wrapped Sequence
         参见RFC1323
         */
  1.         if (tmp_opt.saw_tstamp) {
  2.             //这里就是进行PAWS检查
  3.             tmp_opt.ts_recent = req->ts_recent;
  4.             /* We do not store true stamp, but it is not required,
  5.              * it can be estimated (approximately)
  6.              * from another data.
  7.              */
  8.             tmp_opt.ts_recent_stamp = get_seconds() - ((TCP_TIMEOUT_INIT/HZ)<<req->retrans);
  9.             paws_reject = tcp_paws_reject(&tmp_opt, th->rst);
  10.         }
  11.     }

  12.     /* Check for pure retransmitted SYN. */
  13.     if (TCP_SKB_CB(skb)->seq == tcp_rsk(req)->rcv_isn &&
  14.      flg == TCP_FLAG_SYN &&
  15.      !paws_reject) {
  16.         /* 这是重发的syn包,因为sequence相同,回复syn+ack */
  17.         /*
  18.          * RFC793 draws ( It was fixed in RFC1122)
  19.          * this case on figure 6 and figure 8, but formal
  20.          * protocol description says NOTHING.
  21.          * To be more exact, it says that we should send ACK,
  22.          * because this segment (at least, if it has no data)
  23.          * is out of window.
  24.          *
  25.          * CONCLUSION: RFC793 (even with RFC1122) DOES NOT
  26.          * describe SYN-RECV state. All the description
  27.          * is wrong, we cannot believe to it and should
  28.          * rely only on common sense and implementation
  29.          * experience.
  30.          *
  31.          * Enforce "SYN-ACK" according to figure 8, figure 6
  32.          * of RFC793, fixed by RFC1122.
  33.          */
  34.         req->rsk_ops->rtx_syn_ack(sk, req, NULL);
  35.         return NULL;
  36.     }

      //省略一大堆检查和检验,感兴趣的朋友可以直接看代码。注释很清楚
      ...... ......

  1.     /* OK, ACK is valid, create big socket and
  2.      * feed this segment to it. It will repeat all
  3.      * the tests. THIS SEGMENT MUST MOVE SOCKET TO
  4.      * ESTABLISHED STATE. If it will be dropped after
  5.      * socket is created, wait for troubles.
  6.      */
  7.     /*
  8.     tcp在IPv4下的实现为tcp_v4_syn_recv_sock。这里不去看tcp_v4_syn_recv_sock了,它的主要作用就是利用
  9.     sk, skb, req中的信息,生成一个新的socket。
  10.     */
  11.     child = inet_csk(sk)->icsk_af_ops->syn_recv_sock(sk, skb, req, NULL);
  12.     if (child == NULL)
  13.         goto listen_overflow;
      //利用prev将req从accept_queue的listen_opt 中删除
  1.     inet_csk_reqsk_queue_unlink(sk, req, prev);
  2.     inet_csk_reqsk_queue_removed(sk, req);
     
      //将这个req和新的socket child真正加入了父socket sock的accept queue中。
      //这里不要与前文中的inet_csk_reqsk_queue_hash_add混淆,inet_csk_reqsk_queue_hash_add是将
      //requst_sock加入到listen的队列中
  1.     inet_csk_reqsk_queue_add(sk, req, child);
  2.     //返回新生成的socket child
  3.     return child;

  4. listen_overflow:
  5.     if (!sysctl_tcp_abort_on_overflow) {
  6.         inet_rsk(req)->acked = 1;
  7.         return NULL;
  8.     }

  9. embryonic_reset:
  10.     NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_EMBRYONICRSTS);
  11.     if (!(flg & TCP_FLAG_RST))
  12.         req->rsk_ops->send_reset(sk, skb);

  13.     inet_csk_reqsk_queue_drop(sk, req, prev);
  14.     return NULL;
  15. }
那么对于tcp_v4_hnd_request最后返回的就是上面函数中新创建的socket,那么在tcp_v4_do_rcv中,就会进入下面的函数tcp_child_process
  1. int tcp_child_process(struct sock *parent, struct sock *child,
  2.          struct sk_buff *skb)
  3. {
  4.     int ret = 0;
  5.     int state = child->sk_state;
     
     /* 这个检查还是用来保证TCP状态的正确 */
  1.     if (!sock_owned_by_user(child)) {
  2.         
  3.         ret = tcp_rcv_state_process(child, skb, tcp_hdr(skb),
  4.                      skb->len);
  5.         /* Wakeup parent, send SIGIO */
  6.         if (state == TCP_SYN_RECV && child->sk_state != state)
  7.             parent->sk_data_ready(parent, 0);
  8.     } else {
  9.         /* Alas, it is possible again, because we do lookup
  10.          * in main socket hash table and lock on listening
  11.          * socket does not protect us more.
  12.          */
  13.         __sk_add_backlog(child, skb);
  14.     }

  15.     bh_unlock_sock(child);
  16.     sock_put(child);
  17.     return ret;
  18. }
今天有些困了。三次握手中的最后一个ack包的处理还是没有看完。不继续坚持看了,没有效率了。明天继续了。在完成被动连接的三次握手,还会看看主动连接的流程