vxlan tunnel收发原理

 

commit ee122c79d4227f6ec642157834b6a90fcffa4382
Author: Thomas Graf <tgraf@suug.ch>
Date:   Tue Jul 21 10:43:58 2015 +0200

    vxlan: Flow based tunneling

    Allows putting a VXLAN device into a new flow-based mode in which
    skbs with a ip_tunnel_info dst metadata attached will be encapsulated
    according to the instructions stored in there with the VXLAN device
    defaults taken into consideration.

    Similar on the receive side, if the VXLAN_F_COLLECT_METADATA flag is
    set, the packet processing will populate a ip_tunnel_info struct for
    each packet received and attach it to the skb using the new metadata
    dst.  The metadata structure will contain the outer header and tunnel
    header fields which have been stripped off. Layers further up in the
    stack such as routing, tc or netfitler can later match on these fields
    and perform forwarding. It is the responsibility of upper layers to
    ensure that the flag is set if the metadata is needed. The flag limits
    the additional cost of metadata collecting based on demand.

    This prepares the VXLAN device to be steered by the routing and other
    subsystems which allows to support encapsulation for a large number
    of tunnel endpoints and tunnel ids through a single net_device which
    improves the scalability.

    It also allows for OVS to leverage this mode which in turn allows for
    the removal of the OVS specific VXLAN code.

    Because the skb is currently scrubed in vxlan_rcv(), the attachment of
    the new dst metadata is postponed until after scrubing which requires
    the temporary addition of a new member to vxlan_metadata. This member
    is removed again in a later commit after the indirect VXLAN receive API
    has been removed.

    Signed-off-by: Thomas Graf <tgraf@suug.ch>
    Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

以前创建vxlan的命令如下:

ip link add name vxlan1 type vxlan id $vni dev $link remote $link_remote_ip dstport $vxlan_port

每个vni都需要一个vxlan dev对应。

有了上面的commit后,一个host创建一个vxlan dev就够了:

ip link add $vx type vxlan dstport $vxlan_port dev $link external udp6zerocsumrx udp6zerocsumtx

PF收包报文后,调用函数如下:

        gro_cells_receive

        b'vxlan_rcv+0x1 [vxlan]'
        b'udp_queue_rcv_skb+0x45 [kernel]'
        b'udp_unicast_rcv_skb+0x79 [kernel]'
        b'__udp4_lib_rcv+0x326 [kernel]'
        b'udp_rcv+0x1a [kernel]'
        b'ip_protocol_deliver_rcu+0x1d0 [kernel]'
        b'ip_local_deliver_finish+0x94 [kernel]'
        b'ip_local_deliver+0x82 [kernel]'
        b'ip_sublist_rcv_finish+0xd2 [kernel]'
        b'ip_list_rcv_finish.constprop.0+0x19f [kernel]'
        b'ip_list_rcv+0x15b [kernel]'
        b'__netif_receive_skb_list_core+0x28d [kernel]'
        b'__netif_receive_skb_list+0x102 [kernel]'
        b'netif_receive_skb_list_internal+0x12c [kernel]'
        b'napi_complete_done+0x7a [kernel]'
        b'mlx5e_napi_poll+0x1bb [mlx5_core]'
        b'__napi_poll+0x2f [kernel]'
        b'net_rx_action+0x27b [kernel]'
        b'__softirqentry_text_start+0xc6 [kernel]'
        b'__irq_exit_rcu+0xbf [kernel]'
        b'irq_exit_rcu+0xe [kernel]'
        b'common_interrupt+0x8d [kernel]'
        b'asm_common_interrupt+0x1e [kernel]'
        b'cpuidle_enter_state+0x10a [kernel]'
        b'cpuidle_enter+0x2e [kernel]'
        b'cpuidle_idle_call+0x12d [kernel]'
        b'do_idle+0x94 [kernel]'
        b'cpu_startup_entry+0x20 [kernel]'
        b'start_secondary+0x96 [kernel]'
        b'secondary_startup_64_no_verify+0xb0 [kernel]'

在gro_cells_receive函数里会触发vxlan的软中断:

        b'fl_classify+0x1 [cls_flower]'
        b'tcf_classify+0x7a [kernel]'
        b'sch_handle_ingress.constprop.0+0x133 [kernel]'
        b'__netif_receive_skb_core+0x579 [kernel]'
        b'__netif_receive_skb_list_core+0x12a [kernel]'
        b'__netif_receive_skb_list+0x102 [kernel]'
        b'netif_receive_skb_list_internal+0x12c [kernel]'
        b'napi_complete_done+0x7a [kernel]'
        b'gro_cell_poll+0x77 [kernel]'
        b'__napi_poll+0x2f [kernel]'
        b'net_rx_action+0x27b [kernel]'
        b'__softirqentry_text_start+0xc6 [kernel]'
        b'__irq_exit_rcu+0xbf [kernel]'
        b'irq_exit_rcu+0xe [kernel]'
        b'common_interrupt+0x8d [kernel]'
        b'asm_common_interrupt+0x1e [kernel]'
        b'cpuidle_enter_state+0x10a [kernel]'
        b'cpuidle_enter+0x2e [kernel]'
        b'cpuidle_idle_call+0x12d [kernel]'
        b'do_idle+0x94 [kernel]'
        b'cpu_startup_entry+0x20 [kernel]'
        b'start_secondary+0x96 [kernel]'
        b'secondary_startup_64_no_verify+0xb0 [kernel]'

vxlan dev的tc flower filter会被调到:

filter parent ffff: protocol ip pref 1 flower chain 0
filter parent ffff: protocol ip pref 1 flower chain 0 handle 0x1
  dst_mac 02:25:d0:13:01:02
  src_mac 24:25:d0:e1:00:00
  eth_type ipv4
  enc_dst_ip 192.168.1.13
  enc_src_ip 192.168.1.14
  enc_key_id 4
  enc_dst_port 4789
  skip_hw
  not_in_hw
        action order 1: tunnel_key  unset pipe
         index 3 ref 1 bind 1 installed 20085 sec used 1 sec
        Action statistics:
        Sent 1647408 bytes 19612 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

        action order 2: mirred (Egress Redirect to device enp4s0f0_1) stolen
        index 3 ref 1 bind 1 installed 20085 sec used 1 sec
        Action statistics:
        Sent 1647408 bytes 19612 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

如果用老的方式创建vxlan dev,也就是不指定external关键字,在vxlan_rcv函数里面vxlan_collect_metadata()返回false,udp_tun_rx_dst()和skb_dst_set()都不会被掉到。tunnel的信息就丢了。在fl_classify()里就不会match,fl_mask_lookup()返回空,tcf_exts_exec()也不会执行。

但是如果用老的方式创建vxlan dev,并且直接offload的话也是可以通的。这样就直接走硬件了。

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值