How to learn Linux kernel network protocol stack

A new partner came to the Department, and the leader said that he wanted me to make a presentation about the Linux kernel network protocol stack, so this article came into being. Why is text not ppt? Because I really don’t like ppt!

Preparation

For those who haven’t studied Linux kernel network, they may yearn for it or fear it. But when you get positive feedback after deep understanding and verification, you will feel satisfied with the feeling of suddenly opening up.

In retrospect, when I entered this topic, I had the following questions:

Q1: with such a large kernel network subsystem, where should I start? Will it grow into it and faint?

Q2: the kernel network code is updated so fast. Which version should I learn from?

Q3: do you have any good materials or tutorials?

Q4: how to verify your understanding?

Now, I can simply answer:

Q1: with such a large kernel network subsystem, where should I start? Will it grow into it and faint?

Although there are many codes in kernel network subsystem, the core process and side support are separated. And I think it’s more than a lot of open source code. There are enough annotation places, only a few dazzling places. Every modification can find the reason from the community git warehouse, which is very good.

Q2: the kernel network code is updated so fast. Which version should I learn from?

Use whatever version you need. If a version is specified in the job, select it. If the tutorial book is based on a version, choose it. If it’s just my own research, I suggest several versions of the code in advance: for example, 2.6, 3.7, 4.4, 4.9, 5.3 (these version numbers are written by me).

Or https://elixir.bootlin.com/ you can browse each version of the code online

Q3: do you have any good materials or tutorials?

I haven’t seen the complete tutorial, I think it may be the reason why the content is too much and too miscellaneous. However, almost all aspects of the Internet have related content discussion and analysis. Here, I recommend the following three books, which are all comprehensive. The later part of this article is also greatly influenced by these books.

  • Understanding linux network internals
  • TCP/IP Architecture, Design, and Implementation in Linux
  • Linux Kernel Networking: Implementation and Theory

These books have Chinese versions, but I think reading the original English version can avoid the misunderstanding caused by translation

Q4: how to verify your understanding?

Verification is very important, otherwise how do you know if it is right. In addition to the original means of recompiling the kernel with printk plus debugging information, there are some morecleverWe can help with our tools.

  • Systemtap: almost omnipotent. You can put a probe point in the kernel and execute your own code.
  • Kprobe: a simple tool to quickly check whether a function is executed to
  • Packetdrill: useful for verifying TCP protocol behavior

Protocol stack details

The following will introduce some concepts that are often involved in kernel network protocol stack

sk_buff

The kernel obviously needs a data structure to represent packets. This structure is sk_buffer (short for socket buffer), which is equivalent to the MBUF in BSD kernel described in < TCP / IP detailed explanation Volume 2 >.

Sk ﹣ buff structure does not store message content itself. It points to the real message memory space through multiple pointers:

How to learn Linux kernel network protocol stack

Sk buff is a structure that runs through the whole protocol stack. When passing between layers, the kernel only needs to adjust the pointer position in SK buff.

How to learn Linux kernel network protocol stack

net_device

The kernel uses net device to represent the network card. Network card can be divided intoPhysical NICandVirtual NICPhysical NICIt refers to the network card that can send messages to the local machine, including the network card of the real physical machine and the network card of the VM virtual machine, while the network cards like Tun / tap, vxlan and Veth pair belong to the category of the virtual network card.

As shown in the figure below,Each network card has two endsOne end is the protocol stack (IP, TCP, UDP), the other end is different. For the physical network card, this end is the device driver provided by the network card manufacturer, while for the virtual network card, the difference is great. Because of the existence of the virtual network card, the kernel can support various functions such as tunnel encapsulation, container communication, etc.

How to learn Linux kernel network protocol stack

socket & sock

User space uses socket(), bind(), listen(), accept() and other library functions for network programming. Socket and sock mentioned here are two data structures in the kernel, in which socket is facing up to the user and sock is facing down to the protocol stack.

As shown in the figure below, the two structures are actually one-to-one correspondence

How to learn Linux kernel network protocol stack

Note that both structures have a pointer called OPS, but they are of different types. The OPS of socket is a pointer to struct proto? OPS, and the OPS of socket is a pointer to struct proto. They are determined when the structure is created

Recall the prototype of socket () function in network programming

#include <sys/socket.h>

sockfd = socket(int socket_family, int socket_type, int protocol);

In fact, socket – > ops and socket – > OPs are determined by the first two parameters, socket family and socket type.

If socket family is the most commonly used pf INET protocol cluster, the values of socket – > ops and socket – > OPs are recorded in INET protocol switch table

static struct inet_protosw inetsw_array[] =
{
    {
        .type =       SOCK_STREAM,
        .protocol =   IPPROTO_TCP,
        . prot = & tcp_prot, // corresponding to sock - > Ops
        . OPS = & inet_stream_ops, // corresponding socket - > Ops
        .flags =      INET_PROTOSW_PERMANENT |
                  INET_PROTOSW_ICSK,
    },

    {
        .type =       SOCK_DGRAM,
        .protocol =   IPPROTO_UDP,
        . prot = & udp_prot, // corresponding to sock - > Ops
        . OPS = & INET \ \ Dgram \ \ OPS, // corresponding socket - > Ops
        .flags =      INET_PROTOSW_PERMANENT,
       },
    }
    .......

L3->L4

We know that the network protocol stack is layered, but in fact, in terms of implementation, the layering of kernel protocol stack is only logical, and its essence is function call. The sending process (the upper layer calls the lower layer) is usually a direct call (because there is no uncertainty, for example, TCP knows that there is a certain IP below), but the receiving process is different. For example, when the message is in the IP layer, it may be TCP, UDP, ICMP and so on, so the receiving process usesRegistration callbackMechanism.

Take INET protocol cluster as an example. The registration interface is

int inet_add_protocol(const struct net_protocol *prot, unsigned char protocol);

When the kernel network subsystem is initialized, L4 layer protocols (such as TCP and UDP below) are registered

static struct net_protocol tcp_protocol = {
    ......
    .handler    =    tcp_v4_rcv,
    ......
};

static struct net_protocol udp_protocol = {
    .....
    .handler =    udp_rcv,
    .....
};

In the IP layer, if the message needs to be sent to the local machine after routing query, it will be sent to different L4 for processing according to the L4 protocol of the message

static int ip_local_deliver_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
{
      ......
      ipprot = rcu_dereference(inet_protos[protocol]);
      ......
      ret = ipprot->handler(skb);     
      ......
}

L2->L3

L2 – > L3 is the same. It’s just that the registration interface becomes

void dev_add_pack(struct packet_type *pt)

Who will register? Obviously at least IP will

static struct packet_type ip_packet_type = {
    .type = cpu_to_be16(ETH_P_IP),
    .func = ip_rcv,
}

In the process of message receiving, the device driver will set the L3 type of message to SKB – > protocol, and then call different callback functions according to the protocol when the kernel netif_receive_skb receives packets

__netif_receive_skb(struct sk_buff *skb)
{
    ......
    type = skb->protocol;
    ......
    ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
}

Netfilter

Netfilter is the path that packets will inevitably pass through in the kernel protocol stack. We can see from the following figure that Netfilter has set the hook point in five places of the kernel. Users can filter and modify messages at the hook point by configuring iptables rules.

How to learn Linux kernel network protocol stack


In kernel code, we can often make a call like nf_hook. My suggestion is that if you don’t consider Netfilter for the time being, just skip and track okfn.

static inline int
NF_HOOK(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk, struct sk_buff *skb,
    struct net_device *in, struct net_device *out,
    int (*okfn)(struct net *, struct sock *, struct sk_buff *))
{
    int ret = nf_hook(pf, hook, net, sk, skb, in, out, okfn);
    if (ret == 1)
        ret = okfn(net, sk, skb);
    return ret;
}

dst_entry

The kernel needs to determine whether the received message should be sent locally or forward. For the message sent locally, the kernel needs to determine which network card to send it from. This is determined by querying FIB (forward information base, forwarding information table). FIB can be understood as a database, and the data source is the route automatically generated by user configuration or kernel.

The input of FIB query is message sk_buff and the output is dst_entry. Dst_entry will be set to SKB

static inline void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)
{
    skb->_skb_refdst = (unsigned long)dst;
}
The most important thing in DST entry is an input pointer and an output pointer

struct dst_entry {

......
int            (*input)(struct sk_buff *);
int            (*output)(struct net *net, struct sock *sk, struct sk_buff *skb);
......

}

-For messages to be sent locally

rth->dst.input = ip_local_deliver;

-For messages to be forwarded

rth->dst.input = ip_forward;

-Message sent to the local machine

rth->dst.output = ip_output;

转自:https://developpaper.com/how-to-learn-linux-kernel-network-protocol-stack/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值