switchdev qos

overview:

https://github.com/Mellanox/mlxsw/wiki/Quality-of-Service

Packet进入交换机之后,会被分配一个switch priority。Packet的switch priority(SP 0-7)可以根据8021q优先级或者ip头DSCP字段进行设置,如果是基于8021q优先级,那么它们之间是1对1的映射关系;如果基于DSCP,那么根据下面的命令进行设置:

lldptool -T -i sw1p5 -v APP      app=3,5,24  #insert map sp 3 to dscp 24
lldptool -T -i sw1p5 -v APP -d app=3,5,24  #delete
lldptool -T -i sw1p5 -v APP -c app                #show
当端口的第一条APP规则被添加之后,就切换到使用DSCP了,否则默认使用8021q进行映射。

然后Packet被根据SP(switch priority)放入到端口的headroom buffer。端口的headroom buffer(就是PG buffer,priority group buffer)用来存储端口的incoming packets(在packet被交换机的pipeline处理的过程中,报文一直放在这里,就是寻找出接口的过程吧),也用来存储不允许进入shared buffer的lossless flow。使用lldp ets up2tc设置SP到PG buffer的映射关系。

lldptool -T -i sw1p5 -V ETS-CFG up2tc=0:0,1:1,2:2,3:3,4:4,5:5,6:6,7:7
这个命令创建了SP到TC的映射
也创建了SP到PG buffer的映射

lldptool -T -i sw1p5 -V ETS-CFG tsa=0:ets,1:ets,2:ets,3:ets,4:ets,5:ets,6:ets,7:ets tcbw=12,12,12,12,13,13,13,13(和必须是100)
lldptool -T -i sw1p5 -V ETS-CFG tsa=0:strict,1:strict,2:strict,3:strict,4:strict,5:strict,6:strict,7:strict 

devlink sb
# 可以设置出方向每个接口每个tc使用的pool以及quota
# 可以设置入方向每个接口每个tc使用的pool以及quota


一旦经过交换机的pipeline处理完成之后,packet的ingress port、Switch Priority (SP)、egress port、TC就被确认了。根据这些信息,packet被分类放到不同的ingress和engress shared buffer(只有一个shared buffer,ingress pool和engress pool只是一虚拟的种容器,让你可以方便的控制admission rules。当有packet匹配pool之后,它们就会增加该pool的使用情况计数。
Note that there’s one shared buffers and the pools are simply containers meant to help you formulate the admission rules to the shared buffer.
As I explained above, there’s really one shard buffer. The devlink command you typed simply means that the packet will be counted as part of pool 4.
You can have up to 4 pools for each direction.
)。在进入shared buffer之前,与packet相关的shared buffer quota会被检查,确定packet是否允许进入shared buffer。(devlink sb)

packet驻留在shared buffer中直到被出端口发送出去。packet被根据它的TC放到不同的队列中。然后根据各个TC的TSA进行调度(有Strict Priority algorithm和ets两种)。使用lldp ets设置SP到TC的映射和各个TC的TSA。

> Thank you, Ido.
>
> So,according to your explain, my understanding is:
>
> The packet only have one TC(i think the packet have different ingress TC and egress TC before),
> and the TC is determined by egress port's up2tc setting ?

Ingress TC = PG.

When a packet arrives, it's classified to a PG buffer based on its
802.1p priority and up2tc mapping you configured on the ingress port.

The packet then goes through the switch's pipeline which determines its
egress port. The egress TC is determined based on the packet's 802.1p
priority and the egress port's up2tc mapping.

You now have the following information about the packet: Ingress Port,
Ingress PG, Egress Port, Egress TC, which the switch uses the check
admission for the shared buffer, where the packet is stored prior to
transmission.

> Once out of the switch's pipeline, the packet's TC is known, we assume it is TC1.
>
> If the TC1 packet pass below Admission Rules, it will be sent to shared buffer.
>     Ingress{Port}.Usage < Thres && Ingress{Port,PG}.Usage < Thres && Egress{Port}.Usage < Thres && Egress{Port,TC}.Usage < Thres
>
> We assume the packet is received from Port1 and egress port2.
> And the mapping between port TC to pool is like this:
>     devlink sb tc bind set Port1 tc 1 type ingress pool 0 th 9

Packet will be mapped to ingress TC (PG) 1 according to up2tc
configuration on Port1.

>     devlink sb tc bind set Port2 tc 1 type egress pool 4 th 9

Packet will be mapped to egress TC 1 according to up2tc configuration of
Port2.

> Then the packet will be counted as part of pool 0(because the packet is received from Port1, and it's TC is 1, so map it to pool 0 according above setting),
> and the packet will also be conuted as part of pool 4(because it will egress Port2, and it's TC is 1, so map it to pool 4 according above setting).
>
> Is this right ?

Yes. When a packet is admitted to the shared buffer it increments four
quotas:
Ingress{Port}, Ingress{Port, PG}, Egress{Port}, Egress{Port, TC}

Please let me know if further clarifications are required.

如果packet属于lossless flow(它所属的priority开启了PFC,就是lossless flow),并且这个packet不允许进入shared buffer,那么它会被存放到headroom中。

ETS:

The transmit path of a network port is modeled as a set of queues called traffic classes which are numbered 0 through N-1, where N is in the range 1 to 8. The user priorities 0-7 are mapped to the set of traffic classes. Further details and definition of the default priority to traffic class mappings are provided in the IEEE Standard 802.1Q-2011.

A transmission selection algorithm is used to select which traffic class is chosen next to dequeue a frame and transmit to the LAN. The default transmission selection algorithm is the Strict Priority algorithm. This algorithm always selects the highest numbered traffic class which has frames to transmit first before a lower numbered traffic class is selected.

Since the Strict Priority algorithm could allow a traffic flow on a higher numbered traffic class to block a lower numbered traffic class from getting a chance to transmit, another traffic selection algorithm has been defined for DCB called the Enhanced Transmission Selection (ETS) algorithm. ETS works by assigning a percentage of available bandwidth to traffic classes. Available bandwidth is defined as the amount of bandwidth left after higher priority transmission algorithms (like Strict Priority) have executed. The bandwidth percentage allocated to an ETS traffic class is the guaranteed amount of available bandwidth which will be made available to that traffic class. If an ETS traffic class does not use all of the bandwidth allocated to it, then other ETS traffic classes may be able to exceed their bandwidth allocations.

ETS allows multiple traffic flows operating on different traffic classes to each receive their fair share of network bandwidth. Obviously, if the strict priority algorithm is used in combination with the ETS algorithm, then care should be taken to ensure that the traffic flows on the strict priority traffic classes are relatively low volume flows.

lldptool Priority-based Flow Control (PFC)

To enable PFC for priorities 1, 2 and 3, run:
$ lldptool -T -i sw1p5 -V PFC enabled=1,2,3

devlink sb

To bind packets originating from a {Port, PG} to an ingress pool, run:
devlink sb tc bind set pci/0000:03:00.0/1 tc 0 type ingress pool 0 th 9
Similarly for egress, to bind packets directed to a {Port, TC} to an egress pool, run:
devlink sb tc bind set sw1p17 tc 0 type egress pool 4 th 9

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值