TCP/IP驱动十一 ——内核2.6.26中inet_csk和inet_sk两个函数推导


http://fpcfjf.blog.163.com/blog/static/55469793201372033430691/ 

TCP/IP驱动十一   ——内核2.6.26inet_cskinet_sk两个函数推导

在内核的sys_bindsys_bind()---inet_bind()---inet_csk_get_port())过程中,在函数int inet_csk_get_port(struct sock *sk, unsigned short snum)中有一段代码

net\inet_connection_sock.c

……..

                            ret = 1;

                            if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, tb)) {

                                     if (sk->sk_reuse && sk->sk_state != TCP_LISTEN &&

                                         smallest_size != -1 && --attempts >= 0) {

                                               spin_unlock(&head->lock);

                                               goto again;

                                     }

 

…….

success:

         if (!inet_csk(sk)->icsk_bind_hash)

                   inet_bind_hash(sk, tb, snum);

……

Net\ipv4\inet_hashtables.c

void inet_bind_hash(struct sock *sk, struct inet_bind_bucket *tb,

                       const unsigned short snum)

{

         struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo;

 

         atomic_inc(&hashinfo->bsockets);

 

         inet_sk(sk)->num = snum;

         sk_add_bind_node(sk, &tb->owners);

         tb->num_owners++;

         inet_csk(sk)->icsk_bind_hash = tb;

}

其实上面的代码主要是引出下面的两个函数,这两个函数很简单,只是把sock的变量sk强制转换成inet_sockinet_connection_sock两个结构体。但大家有没有注意到,在前面SOCKET最初建立的时候只是sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot);net\ipv4\af_inet.cinet_create()函数中),也就是说,只是创建一个struct sock *sk类型的结构体,那么,在这里强制转换后,在后面直接操作这两个类型中的变量,为什么却可以使用呢?今天的问题就从这里开始。先看一下这两个强制转换的函数。

Include\net\inet_sock.h:

static inline struct inet_sock *inet_sk(const struct sock *sk)

{

         return (struct inet_sock *)sk;

}

Include\net\inet_connection_sock.h:

static inline struct inet_connection_sock *inet_csk(const struct sock *sk)

{

         return (struct inet_connection_sock *)sk;

}

再来看几个结构体,为了简单说明问题,不全部拷贝:

struct tcp_sock {

         struct inet_connection_sock     inet_conn; //inet_connection_sock has to be the first member of tcp_sock

         ...

};

inet_connection_sock - INET connection oriented sock

struct inet_connection_sock {

         struct inet_sock           icsk_inet; //inet_sock has to be the first member!

 

         ...

};

struct inet_sock - representation of INET sockets

struct inet_sock {

         struct sock             sk; //       sk and pinet6 has to be the first two members of inet_sock ...

};

 

一定要注意看英文注释,第一个变量必须且不得改变,所以才可以强制进行换,这是强制转换的前提。好,下面看既然可以强制转换了,那么是不是就可以使用了呢?大家可以发现,从上到下,也就是说sock是最小的,而上面的程序也说明了,只是分配了一个这样类型的结构体,既然可以强制转换后可以使用,就不 得不想到CC++里的结构体空间的大小分配后进行强制转换,动态取得适应自己类型空间大小的指针,在C++中可以用父子类更为形象的来理解。(即分配一个子类指针,然后可以强制转成父类指针,这样只访问父类的对象),那就很明显了:

一定是分配了一个比STRUCT SOCK大的空间大小的内存区间。这就需要回到SOCKET的创建过程中去了。

好,跳到真正的分配函数中去:

sk = sk_alloc(net, PF_INET, GFP_KERNEL, answer_prot);---- sk = sk_prot_alloc(prot, priority | __GFP_ZERO, family);

net\core\sock.c

static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority,

                   int family)

{

         struct sock *sk;

         struct kmem_cache *slab;

 

         slab = prot->slab;

         if (slab != NULL)

                   sk = kmem_cache_alloc(slab, priority);  1

         else

                   sk = kmalloc(prot->obj_size, priority);    2

 

         if (sk != NULL) {

                   if (security_sk_alloc(sk, family, priority))

                            goto out_free;

 

                   if (!try_module_get(prot->owner))

                            goto out_free_sec;

         }

 

         return sk;

 

out_free_sec:

         security_sk_free(sk);

out_free:

         if (slab != NULL)

                   kmem_cache_free(slab, sk);

         else

                   kfree(sk);

         return NULL;

}

看红色标注的(1)、(2)两部分,第二部分prot->obj_size比较简单:

struct proto tcp_prot = {

         .name                         = "TCP",

.init                     = tcp_v4_init_sock,

…..

         .obj_size          = sizeof(struct tcp_sock),

…..

         .compat_getsockopt        = compat_tcp_getsockopt,

#endif

};

很明显,分配的是tcp_sock的大小。

然后转回头来看第一部分,这部分是在高速slab中分配空间,那它的大小在哪儿呢,别急,咱们一点点往回推。

在代码中可以看到它使用的是struct proto *prot中自带的slab,好,那看一下这个prot是何方神圣?向上推进入上文提到的:

static int inet_create(struct net *net, struct socket *sock, int protocol)

{

         struct sock *sk;

         struct list_head *p;

         struct inet_protosw *answer;

         struct inet_sock *inet;

         struct proto *answer_prot;

         ………

 

         /* Look for the requested type/protocol pair. */

         answer = NULL;

lookup_protocol:

         err = -ESOCKTNOSUPPORT;

         rcu_read_lock();

         list_for_each_rcu(p, &inetsw[sock->type]) {

                   answer = list_entry(p, struct inet_protosw, list);

 

                   /* Check the non-wild match. */

                   if (protocol == answer->protocol) {

                            if (protocol != IPPROTO_IP)

                                     break;

                   } else {

                            /* Check for the two wild cases. */

                            if (IPPROTO_IP == protocol) {

                                     protocol = answer->protocol;

                                     break;

                            }

                            if (IPPROTO_IP == answer->protocol)

                                     break;

                   }

                   err = -EPROTONOSUPPORT;

                   answer = NULL;

         }

 

         if (unlikely(answer == NULL)) {

                   if (try_loading_module < 2) {

                            rcu_read_unlock();

                            /*

                             * Be more specific, e.g. net-pf-2-proto-132-type-1

                             * (net-pf-PF_INET-proto-IPPROTO_SCTP-type-SOCK_STREAM)

                             */

                            if (++try_loading_module == 1)

                                     request_module("net-pf-%d-proto-%d-type-%d",

                                                      PF_INET, protocol, sock->type);

                            /*

                             * Fall back to generic, e.g. net-pf-2-proto-132

                             * (net-pf-PF_INET-proto-IPPROTO_SCTP)

                             */

                            else

                                     request_module("net-pf-%d-proto-%d",

                                                      PF_INET, protocol);

                            goto lookup_protocol;

                   } else

                            goto out_rcu_unlock;

         }

……

         sock->ops = answer->ops;

         answer_prot = answer->prot;

}

大家可以发现这个分配用的prot是从answer的一个变量中来,那么自然就来到list_for_each_rcu宏及下面的几行代码,这个从数组变量inetsw[sock->type](这个变量定义在net\ipv4\af_inet.c: static struct list_head inetsw[SOCK_MAX];)它的作用是:

inet_protosw结构体数组注册到inetsw数组中元素的链表中去:这个动作是在

Net\ipv4\af_inet.c

static int __init inet_init(void)

{

         struct sk_buff *dummy_skb;

         struct inet_protosw *q;

         struct list_head *r;

         int rc = -EINVAL;

 

         BUILD_BUG_ON(sizeof(struct inet_skb_parm) > sizeof(dummy_skb->cb));

 

         rc = proto_register(&tcp_prot, 1);

         if (rc)

                   goto out;

 

         rc = proto_register(&udp_prot, 1);

         if (rc)

                   goto out_unregister_tcp_proto;

 

         rc = proto_register(&raw_prot, 1);

         if (rc)

                   goto out_unregister_udp_proto;

 

         /*

          *     Tell SOCKET that we are alive...

          */

 

         (void)sock_register(&inet_family_ops);

 

#ifdef CONFIG_SYSCTL

         ip_static_sysctl_init();

#endif

 

         /*

          *     Add all the base protocols.

          */

 

         if (inet_add_protocol(&icmp_protocol, IPPROTO_ICMP) < 0)

                   printk(KERN_CRIT "inet_init: Cannot add ICMP protocol\n");

         if (inet_add_protocol(&udp_protocol, IPPROTO_UDP) < 0)

                   printk(KERN_CRIT "inet_init: Cannot add UDP protocol\n");

         if (inet_add_protocol(&tcp_protocol, IPPROTO_TCP) < 0)

                   printk(KERN_CRIT "inet_init: Cannot add TCP protocol\n");

#ifdef CONFIG_IP_MULTICAST

         if (inet_add_protocol(&igmp_protocol, IPPROTO_IGMP) < 0)

                   printk(KERN_CRIT "inet_init: Cannot add IGMP protocol\n");

#endif

 

         /* Register the socket-side information for inet_create. */

         for (r = &inetsw[0]; r < &inetsw[SOCK_MAX]; ++r)

                   INIT_LIST_HEAD(r);

 

         for (q = inetsw_array; q < &inetsw_array[INETSW_ARRAY_LEN]; ++q)

                   inet_register_protosw(q);

 

         /*

          *     Set the ARP module up

          */

 

         arp_init();

 

         /*

          *     Set the IP module up

          */

 

         ip_init();

 

         tcp_v4_init();

 

         /* Setup TCP slab cache for open requests. */

         tcp_init();

 

         /* Setup UDP memory threshold */

         udp_init();

 

         /* Add UDP-Lite (RFC 3828) */

         udplite4_register();

 

         /*

          *     Set the ICMP layer up

          */

 

         if (icmp_init() < 0)

                   panic("Failed to create the ICMP control socket.\n");

 

         /*

          *     Initialise the multicast router

          */

#if defined(CONFIG_IP_MROUTE)

         if (ip_mr_init())

                   printk(KERN_CRIT "inet_init: Cannot init ipv4 mroute\n");

#endif

         /*

          *     Initialise per-cpu ipv4 mibs

          */

 

         if (init_ipv4_mibs())

                   printk(KERN_CRIT "inet_init: Cannot init ipv4 mibs\n");

 

         ipv4_proc_init();

 

         ipfrag_init();

 

         dev_add_pack(&ip_packet_type);

 

         rc = 0;

out:

         return rc;

out_unregister_udp_proto:

         proto_unregister(&udp_prot);

out_unregister_tcp_proto:

         proto_unregister(&tcp_prot);

         goto out;

}中完成的。

注意看上面代码中两个标红的for循环,第一个是初始化inetsw数组,第二个则是将下面的inetsw_array加入到inetsw中相应的链表中。

 

static struct inet_protosw inetsw_array[] =

{

         {

                   .type =       SOCK_STREAM,

                   .protocol =   IPPROTO_TCP,

                   .prot =       &tcp_prot,

                   .ops =        &inet_stream_ops,

                   .capability = -1,

                   .no_check =   0,

                   .flags =      INET_PROTOSW_PERMANENT |

                                  INET_PROTOSW_ICSK,

         },

 

         {

                   .type =       SOCK_DGRAM,

                   .protocol =   IPPROTO_UDP,

                   .prot =       &udp_prot,

                   .ops =        &inet_dgram_ops,

                   .capability = -1,

                   .no_check =   UDP_CSUM_DEFAULT,

                   .flags =      INET_PROTOSW_PERMANENT,

       },

 

 

       {

                .type =       SOCK_RAW,

                .protocol =   IPPROTO_IP,       /* wild card */

                .prot =       &raw_prot,

                .ops =        &inet_sockraw_ops,

                .capability = CAP_NET_RAW,

                .no_check =   UDP_CSUM_DEFAULT,

                .flags =      INET_PROTOSW_REUSE,

       }

};

而从这里大家又可以看出prot是从tcp_prot 赋值而来的,可是大家可以跳转到在前面看到tcp_prot这个静态变量中,发现并没有对slab进行初始化的定义的地方。那么,找来找去,还是没有找到slab分配的空间大小,在哪儿呢?

看到inet_init(void)函数中第一个标红的函数么?

Net\core\sock.c

int proto_register(struct proto *prot, int alloc_slab)

{

         if (alloc_slab) {

                   prot->slab = kmem_cache_create(prot->name, prot->obj_size, 0,

                                               SLAB_HWCACHE_ALIGN | prot->slab_flags,

                                               NULL);

 

……

}

千辛万苦啊,总算见到了它。这就说明了,在struct sock*sk的分配过程中其实是分配的struct tcp_sock这个结构体的大小,再加上前面tcp_sockt,sock以及inet_connection_sock三个结构体的关系,大家就非常清晰的看到了,强制转换为什么可以的原因。

另外在inet_create()这个函数还得多说一句,在这个函数的最后:

         if (sk->sk_prot->init) {

                   err = sk->sk_prot->init(sk);

这个init就是在前面的tcp_prot中定义的,可以去看,他调用的函数是static int tcp_v4_init_sock(struct sock *sk)这个函数主要是tcp_sockinet_connection_sock进行一些初始化在这个函数所在的文件中,还有一个void __init tcp_v4_init(void)函数,长得差不多,是在前面的proto_register中调用,(也就是说这个要先调用)大家不要和这个混淆。

用了比较长得时间整理这篇文档,非常有裨益。

最后,再次感谢网络上无私奉献资源的网友。


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值