ifb与tc police对接收方向限速性能比较

参考:

【1】https://www.cnblogs.com/xingmuxin/p/10826703.html

【2】https://blog.csdn.net/dog250/article/details/40680765  dog250大神对ifb的理解

1.使用ifb进行容器的出向限速

创建ifb,并mirror流量至ifb网卡上,限制发送侧5G的发送流量

modprobe ifb   //支持参数,默认创建ifbx网卡的数目

ifconfig ifb1 up

# tc qdisc add dev cor3cb4dbf9d2f ingress
# tc filter add dev cor3cb4dbf9d2f parent ffff: protocol ip u32 match u32 0 0 action mirred egress redirect dev ifb1

# tc qdisc add dev ifb1 root handle 1: htb default 10

# tc class add dev ifb1 parent 1: classid 1:1 htb rate 10000mbit

 

# tc -s qdisc ls dev cor3cb4dbf9d2f
qdisc noqueue 0: root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc ingress ffff: parent ffff:fff1 ----------------
Sent 3828667309684 bytes 2555852818 pkt (dropped 1751977805, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

cor3cb4dbf9d2f是宿主机上的一个容器对应的veth卡,bond0位宿主机物理卡,可以看到,虽然限速了10G,但是虚拟网卡由于mirror到了ifb网卡,导致丢包严重,性能急剧下降。

测试使用在容器中iperf发送udp包,实际可发送110wpps,但使用ifb后,只能发送到40wpps左右

2.使用tc ingress进行容器的出向限速

# tc qdisc add dev cor3cb4dbf9d2f ingress

# tc filter del dev cor3cb4dbf9d2f parent ffff: protocol all prio 49 basic police rate 20000mbit burst 1mb mtu 65535 drop

# tc -s qdisc ls dev cor3cb4dbf9d2f
qdisc noqueue 0: root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc ingress ffff: parent ffff:fff1 ----------------
Sent 60489927666 bytes 40380462 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

可以看出,加上ingress限速规则前后,对PPS的影响很小。

3. 性能为何相差这么大

3.1 先看下tc ingress

tc ingress在linux中使用了act_police模块,对应于act_police.c文件,该文件实现了令牌桶方式的限速功能

man tc-police:

The police action allows to limit bandwidth of traffic matched by the filter it is attached to. Basically there are two different algorithms available to measure the packet rate: The first one uses an internal dual token
bucket and is configured using the rate, burst, mtu, peakrate, overhead and linklayer parameters. The second one uses an in-kernel sampling mechanism. It can be fine-tuned using the estimator filter parameter.

# stap -L 'module("act_police").function("tcf_act_police")'

module("act_police").function("tcf_act_police@net/sched/act_police.c:201") $skb:struct sk_buff* $a:struct tc_action const* $res:struct tcf_result*

 

stap代码:

probe module("act_police").function("tcf_act_police@net/sched/act_police.c:201")

{

        print_regs()

        print_backtrace()

}

 

-- 使用stap --all-modules会出现dwfl_module_relocate_address no matching address range的错误

# stap -d kernel -d cls_basic ./ingress.stp -v

Pass 1: parsed user script and 478 library scripts using 252716virt/50228res/3484shr/46764data kb, in 380usr/20sys/399real ms.

Pass 2: analyzed script: 1 probe, 1 function, 0 embeds, 0 globals using 254592virt/53004res/4420shr/48640data kb, in 20usr/10sys/31real ms.

Pass 3: using cached /root/.systemtap/cache/02/stap_0230fe404204c0df4d44d24dae88f0de_1344.c

Pass 4: using cached /root/.systemtap/cache/02/stap_0230fe404204c0df4d44d24dae88f0de_1344.ko

Pass 5: starting run.

 0xffffffffc09bb000 : tcf_act_police+0x0/0x280 [act_police]

 0xffffffff9486fd75 : tcf_action_exec+0xa5/0x140 [kernel]

 0xffffffffc09b6211 : basic_classify+0x71/0xd0 [cls_basic]

 0xffffffff9486d1bb : tcf_classify+0x7b/0x140 [kernel]

 0xffffffff9483b41e : __netif_receive_skb_core+0x5ce/0xa10 [kernel]

 0xffffffff9483b878 : __netif_receive_skb+0x18/0x60 [kernel]

 0xffffffff9483c83e : process_backlog+0xae/0x180 [kernel]

 0xffffffff9483bf1f : net_rx_action+0x26f/0x390 [kernel]

 0xffffffff942a2155 : __do_softirq+0xf5/0x280 [kernel]

 0xffffffff9497a32c : call_softirq+0x1c/0x30 [kernel]

 0xffffffff9422e675 : do_softirq+0x65/0xa0 [kernel]

接收的限速流程为:__netif_receive_skb_core -> sch_handle_ingress -> tcf_classify -> basic_classify -> tcf_action_exec -> tcf_act_police

在tcf_act_police函数中对报文进行令牌桶限速,允许出现burst尖峰流量,当超过限速时,drop统计会增加,性能损耗主要在:
1)增加了tc处理逻辑,classify

2)令牌桶处理逻辑

3.2 再看下ifb的流程

ifb通常是和tc ingress使用,tc filter规则中创建的规则是mirred到ifb网卡,然后再ifb网卡上创建限速功能,man tc-mirred :

The mirred action allows packet mirroring (copying) or redirecting (stealing) the packet it receives. Mirroring is what is sometimes referred to as Switch Port Analyzer (SPAN) and is commonly used to analyze and/or debug flows

在内核中处理流程为:

#  stap -L 'module("act_mirred").function("tcf_mirred")'

module("act_mirred").function("tcf_mirred@net/sched/act_mirred.c:159") $skb:struct sk_buff* $a:struct tc_action const* $res:struct tcf_result* $m:struct tcf_mirred* $err:int

 

mirr.stp

probe module("act_mirred").function("tcf_mirred@net/sched/act_mirred.c:159")

{

        print_backtrace()

        println("")

}

 0xffffffffc09c5640 : tcf_mirred+0x0/0x2d0 [act_mirred]

 0xffffffff9486fd75 : tcf_action_exec+0xa5/0x140 [kernel]

 0xffffffffc0998d6f [cls_u32]

 0xffffffff9486d1bb : tcf_classify+0x7b/0x140 [kernel]

 0xffffffff9483b41e : __netif_receive_skb_core+0x5ce/0xa10 [kernel]

 0xffffffff9483b878 : __netif_receive_skb+0x18/0x60 [kernel]

 0xffffffff9483c83e : process_backlog+0xae/0x180 [kernel]

 0xffffffff9483bf1f : net_rx_action+0x26f/0x390 [kernel]

 0xffffffff942a2155 : __do_softirq+0xf5/0x280 [kernel]

接收的限速流程为:__netif_receive_skb_core -> sch_handle_ingress -> tcf_classify -> basic_classify -> tcf_action_exec -> tcf_mirred 到ifb 网卡,进入ifb网卡的xmit流程,

ifb驱动模拟一块虚拟网卡,它可以被看作是一个只有TC过滤功能的虚拟网卡,说它只有过滤功能,是因为它并不改变数据包的方向,即对于往外发的数据包被重定向到ifb之后,经过ifb的TC过滤之后,依然是通过重定向之前的网卡发出去,对于一个网卡接收的数据包,被重定向到ifb之后,经过ifb的TC过滤之后,依然被重定向之前的网卡继续进行接收处理,不管是从一块网卡发送数据包还是从一块网卡接收数据包,重定向到ifb之后,都要经过一个经由ifb虚拟网卡的dev_queue_xmit操作。参考[2]

以上图左边使用Ingress方向的Ifb为例,从网卡A收到的报文,经过tc mirred规则后,重新镜像到ifb网卡,ifb网卡经过一些列的处理后,最终会将流量“转发”至网卡A,代码层面来看:

#  stap -L 'module("ifb").function("ifb_xmit")'

module("ifb").function("ifb_xmit@drivers/net/ifb.c:193") $skb:struct sk_buff* $dev:struct net_device*

 

# cat ifb.stp

probe module("ifb").function("ifb_xmit@drivers/net/ifb.c:193")

{

        print_backtrace()

        println("Dev Index:%d ", $dev->ifindex)

}

 

stap -d kernel -d ifb -d act_mirred ./ifb.stp -v

 0xffffffffc09c0490 : ifb_xmit+0x0/0x100 [ifb]

 0xffffffff9483ace6 : dev_hard_start_xmit+0x246/0x3b0 [kernel]

 0xffffffff94867d6a : sch_direct_xmit+0x11a/0x250 [kernel]

 0xffffffff94867f2e : __qdisc_run+0x8e/0x360 [kernel]

 0xffffffff9483d9e8 : __dev_queue_xmit+0x218/0x650 [kernel]

 0xffffffff9483de30 : dev_queue_xmit+0x10/0x20 [kernel]

 0xffffffffc09c5898 : tcf_mirred+0x258/0x2d0 [act_mirred]

 0xffffffff9486fd75 : tcf_action_exec+0xa5/0x140 [kernel]

 0xffffffffc0998d6f [cls_u32]

 0xffffffff9486d1bb : tcf_classify+0x7b/0x140 [kernel]

 0xffffffff9483b41e : __netif_receive_skb_core+0x5ce/0xa10 [kernel]

 0xffffffff9483b878 : __netif_receive_skb+0x18/0x60 [kernel]

 0xffffffff9483c83e : process_backlog+0xae/0x180 [kernel]

 0xffffffff9483bf1f : net_rx_action+0x26f/0x390 [kernel]

 0xffffffff942a2155 : __do_softirq+0xf5/0x280 [kernel]

 0xffffffff942a2318 : run_ksoftirqd+0x38/0x50 [kernel]

 0xffffffff942cb814 : smpboot_thread_fn+0x144/0x1a0 [kernel]

 0xffffffff942c2e81 : kthread+0xd1/0xe0 [kernel]

 0xffffffff94976c1d : ret_from_fork_nospec_begin+0x7/0x21 [kernel]

Dev Index:%d 183

 

第一步:网卡A的接收

__netif_receive_skb_core -> sch_handle_ingress -> tcf_classify -> basic_classify -> tcf_action_exec -> tcf_mirred -> ifb_xmit

 

第二步:IFB网卡的tasklet调度

static netdev_tx_t ifb_xmit(struct sk_buff *skb, struct net_device *dev)

{

    struct ifb_private *dp = netdev_priv(dev);

 

    u64_stats_update_begin(&dp->rsync);

    dp->rx_packets++;

    dp->rx_bytes += skb->len;

    u64_stats_update_end(&dp->rsync);

 

    if (G_TC_FROM(skb->tc_verd) == AT_STACK || !skb->skb_iif) {

        dev_kfree_skb(skb);

        dev->stats.rx_dropped++;

        return NETDEV_TX_OK;

    }

 

    if (skb_queue_len(&dp->rq) >= dev->tx_queue_len) {

        netif_stop_queue(dev);

    }

 

    __skb_queue_tail(&dp->rq, skb);

    if (!dp->tasklet_pending) {

        dp->tasklet_pending = 1;

        tasklet_schedule(&dp->ifb_tasklet);

    }

 

    return NETDEV_TX_OK;

}

 

static void ri_tasklet(unsigned long dev)

{

    struct net_device *_dev = (struct net_device *)dev;

    struct ifb_private *dp = netdev_priv(_dev);

    struct netdev_queue *txq;

    struct sk_buff *skb;

 

    txq = netdev_get_tx_queue(_dev, 0);

    if ((skb = skb_peek(&dp->tq)) == NULL) {

        if (__netif_tx_trylock(txq)) {

            skb_queue_splice_tail_init(&dp->rq, &dp->tq);

            __netif_tx_unlock(txq);

        else {

            /* reschedule */

            goto resched;

        }

    }

 

    while ((skb = __skb_dequeue(&dp->tq)) != NULL) {

        u32 from = G_TC_FROM(skb->tc_verd);

 

        skb_reset_tc(skb);

        skb->tc_verd = SET_TC_NCLS(skb->tc_verd);           //  注意这里,经过IFB处理后,置位TC_NCLS

 

        u64_stats_update_begin(&dp->tsync);

        dp->tx_packets++;

        dp->tx_bytes += skb->len;

        u64_stats_update_end(&dp->tsync);

 

        rcu_read_lock();

        skb->dev = dev_get_by_index_rcu(dev_net(_dev), skb->skb_iif);

        if (!skb->dev) {

            rcu_read_unlock();

            dev_kfree_skb(skb);

            _dev->stats.tx_dropped++;

            if (skb_queue_len(&dp->tq) != 0)

                goto resched;

            break;

        }

        rcu_read_unlock();

        skb->skb_iif = _dev->ifindex;

 

        if (from & AT_EGRESS) {

            dev_queue_xmit(skb);

        else if (from & AT_INGRESS) {

            skb_pull_rcsum(skb, skb->mac_len);

            netif_receive_skb(skb);

        else

            BUG();

    }

 

    if (__netif_tx_trylock(txq)) {

        if ((skb = skb_peek(&dp->rq)) == NULL) {

            dp->tasklet_pending = 0;

            if (netif_queue_stopped(_dev))

                netif_wake_queue(_dev);

        else {

            __netif_tx_unlock(txq);

            goto resched;

        }

        __netif_tx_unlock(txq);

    else {

resched:

        dp->tasklet_pending = 1;

        tasklet_schedule(&dp->ifb_tasklet);

    }

 

}

 

第三步:重新回到网卡A

__netif_receive_skb_core ->  skb_skip_tc_classify

看一下skb_skip_tc_classify这个函数:

static inline bool skb_skip_tc_classify(struct sk_buff *skb)

{

#ifdef CONFIG_NET_CLS_ACT

    if (skb->tc_verd & TC_NCLS) {

        skb->tc_verd = CLR_TC_NCLS(skb->tc_verd);   // 判断TC_NCLS标志,当报文经过IFB处理再回到网卡A时, skip了此次tc处理

        return true;

    }

#endif

    return false;

}

真的是绕了一大圈

结合上面对ifb处理流程,可以看出ifb性能损耗如下几个地方:

1)tcf_mirred函数中存在skb_clone(skb, GFP_ATOMIC)操作,在perf查看中也可以看见此部分占据了一定的热点;

2)ifb网卡的入队,出队,tasklet调度;

3)ifb网卡后续的tc规则处理,整流;

4)重新enque至网卡的收队列,重新触发__netif_receive_skb_core

4. 结果

根据上述实际测试结果来看,只从入向限速的性能来看,tc police方式要比ifb方式性能高出很多。

=================================================================================

以上均为个人理解,若有错误欢迎各位指正批评,欢迎交流。Shawn

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值