The journey of a packet through the Linux 2.6.10 network stack

The journey of a packet through the Linux 2.6.10 network stack

                    

 

http://svn.gnumonks.org/trunk/doc/packet-journey-2.6.xml

Author: Harald Welte netfilter core team

Email: laforge@netfilter.org

Date: Sep 14, 2004

Revision: 1.4

                   

 

 

This document describes the journey of a network packet inside the linux kernel 2.6.x. This has changed quite a bit since 2.6 because the globally serialized bottom half was abandoned in favor of the new softirq system.

                   

 

 

Preface

 

I have to excuse for my ignorance, but this document has a strong focus on the"default case": x86 architecture and ip packets which get forwarded. If youwant to contribute your favourite part, feel free to send me a patch.

                   

 

While I've been working on netfilter/iptabes for quite some time, I amdefinitely no core networking guru and the information provided by thisdocument may be wrong. So don't expect too much, I'll always appreciate Yourcomments and bugfixes.

            

 

The document tries to reflect the latest kernel at the time of it's writing,which is 2.6.10-rc2. If you are working on an earlier or later kernel, partsof the network stack might already have changed again.

                 

 

Receiving the packet

The receive interrupt

If the network card receives an ethernet frame which matches the local MACaddress, an address programmed into the multicast filter or for the linklayerbroadcast address, it issues an interrupt.

                     

 

The network driver for this particular card handles the interrupt, fetches thepacket data via DMA / PIO / whatever into RAM. It then allocates a skb andcalls a function of the protocol independent device support routines: net/core/dev.c:netif_rx(skb) .

             

 

Please note that in Linux 2.6.x, drivers can also be written to support theso-called NAPI (New API). NAPI tries to prevent DoS attackscaused by packet floods that make the cpu spin in the hardirq handler. Insteadof using netif_rx() the way described above, they disableinterrupt generation on the card and schedule polling by calling the function include/linux/netdev.h:netif_rx_schedule(dev) .

               

 

netif_rx()

At this early time, the kernel checks whether there are any users of netpollregistered. Netpoll is a low-level mechanism for network access to incomingpackets used by code that wants to avoid using the full network stack, likenetconsole.

                

 

If the driver didn't already timestamp the skb, and some piece of code insidethe kernel requested timestamps by asserting netstamp_needed, the kerneltimestamps the skb now by calling include/net/sock.h:net_timestamp() .

               

 

Afterwards the skb gets enqueued in the apropriate queue for the processorhandling this packet. If the queue backlog is full the packet is dropped atthis place. After enqueuing the skb the receive softinterrupt is marked forexecution via include/linux/netdev.h:netif_rx_schedule() .The cautious reader will have discovered that this function was previouslymentioned in relation to NAPI drivers. And yes, indeed, this is the pointwhere the two codepaths rejoin and continue their common way through the restof the stack.

             

 

If the queue is already full (queue->throttle != 0), then the packet isdropped rather than enqueued.

 

netif_rx() returns the queue congestion level to give somefeedback to the driver. The congestion level can be either NET_RX_SUCCESS , NET_RX_CN_LOW , NET_RX_CN_MOD , ET_RX_CN_HIGH or NET_RX_DROP .

                  

 

The interrupt handler now exits and all interrupts are reenabled

                      

 

The network RX softirq

Like in Linux 2.4, the whole network stack is running in softirq context. Softirqs have the major advantage that they may run on more than one CPUsimultaneously (as opposed to the old "bottom halves" in Linux 2.2.x).

                        

 

Our network receive softirq is registered in net/core/dev.c:net_dev_init() using the function kernel/softirq.c:open_softirq() provided by the softirq subsystem.

                 

 

Further handling of our packet is done in the network receive softirq (NET_RX_SOFTIRQ ) which is called from kernel/softirq.c:__do_softirq() via kernel/softirq.c:do_softirq() . do_softirq() itself is called from three places within the kernel:

from kernel/irq/handle.c:irq_exit() , which is called by architecture-specific code after the hardware interrupt handler has finished.

from kernel/softirq.c:ksoftirqd() , that is the kernel softirq daemon.

from kernel/softirq.c:local_bh_enable() , that is FIXME.

from net/core/dev.c:netif_rx_in() , which is a special version of netif_rx(), used by bluetooth bnep and the tun driver.

                                   

 

So if execution passes one of these points, __do_softirq() is called, it detects the NET_RX_SOFTIRQ marked an calls net/core/dev.c:net_rx_action() . Here the sbks are dequeuedfrom the local CPU's backlog queue ( net/core/dev.c:process_backlog() ) using a weighting scheme between thedifferent incoming devices.

                                            

 

netif_receive_skb()

The next function is net/core/dev.c:netif_receive_skb() , which is the main input function for the receive softirq.

                                  

 

First there is again a check for any netpoll useers via netpoll_rx() . If there is no timestamp, and timestamps have been requested somehwere in the kernel, net_timestamp() is called.

                             

 

In case the incoming interface is part of a group of bound interfaces, skb_bond() saves skb->dev to skb->real_dev and changes skb->dev to point to the master device structure.

                                         

 

Now the packet is devlivered to all layer 3 protocol handlers that have registered for all packets (such as PF_PACKET sockets) by calling deliver_skb()

                               

 

If the kernel supports 'tc actions' (i.e. it was compiled with CONFIG_NET_CLS_ACT enabled), the ingress filter is now run via ing_filter() . If the filter verdict is TC_ACT_SHOT or TC_ACT_STOLEN , the skb is dropped by kfree_skb() and thus all further processing of the packet stopped.

                              

 

Next, it is checked ( include/linux/divert.h:handle_diverter() ) if somebody uses the packet dirverter, nother obscure feature of the linux kernel. If yes, processing continues at net/core/dv.c:divert_frame() .

                        

 

If the kernel has support for ethernet bridging (i.e. CONFIG_BRIDGE is enabled), it is handled via net/core/dev.c:handle_bridge() and br_handle_frame_hook().

                                  

 

Finally, the regular layer 3 packet handlers are called by a lookup in the ptype hash and a successive call to net/core/dev.c:deliver_skb() .

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值