网络加速技术

最新推荐文章于 2024-04-18 16:51:05 发布

oneslide

最新推荐文章于 2024-04-18 16:51:05 发布

阅读量1.2k

点赞数

分类专栏：计算机网络

本文链接：https://blog.csdn.net/qq_33745102/article/details/103699802

版权

计算机网络专栏收录该内容

15 篇文章 0 订阅

订阅专栏

下面的文章介绍几种硬件网络加速技术：

RSS

Receive Side Scaling利用多核能力加速网络包的处理。

                                     RSS使用前后CPU负载对比

在这里插入图片描述
原来的方式是一个核负责处理网络包，处理完之后通过中断方式通知需要这些网络包的应用在的CPU。这种方式会因为单核处理能力称为整个网络处理的瓶颈。如上图所示。

为了减少锁竞争，一个核负责几个队列。RSS通过从网络包包头（header）提取e.g.(源地址目的地址)通过Toeplitz算法计算32-bit哈希码，提取LSB(最小有效位)作为索引（Index），并根据RSS转发表（RSS Direction Table）转发至某个队列。
如下图所示：

在这里插入图片描述
通常相同目的地的包都会转发到同一个队列，因此被同一核处理，因此核亲和性加并行处理能够提高网络包的处理速度。

现代NIC的网络包处理都是批处理的，只有处理包达到一定数量或时间才会产生一次系统中断，这种方法称为中断调制（Interrupt Moderation）

Flow Director

RSS存在一定的问题，就是接受这些包的应用不一定和处理包的应用在同一个CPU里面。为了补充这个RSS的这个缺点，Flow Director技术出现了。在服务器设计中有一种叫做Session persistence，跟这个技术的思想很像。一个数据流应该指定到某个核处理，这个核运行的需要这个数据流的应用。

要指定哪个核来处理数据流，有两种方式：

手动模式 --External Programmed

简称EP模式，由程序指定。当系统管理员了解数据流的特点可能喜欢这个。

自动模式 – Application Targeting Routing

简称ATR模式，通过对数据包进行抽样，来探测数据包可能来自于哪个CPU。通过数据流的特点不可预测时使用这个模式，这个模式也是默认的模式。

为了使数据流（应该就是连接的意思）和CPU编号建立Map关系，可以在多个方面获取信息，e.g.出去的数据包

通常ATR模式下，通过取样来填充数据流和CPU编号的映射表（Perfect-Match Table）。数据流通过数据包包头计算出的Hash值来表示，每次数据包进来之后，数据包的包头提取出来计算Hash值，如果匹配到映射表的表项，那这个网络包就会发到这个表项记录的CPU;否则，回退到普通RSS模式。

数据流的标识需要比较大的表项空间，Intel的映射表是8K个。

TSO

When a system needs to send large chunks of data out over a computer network, the chunks first need breaking down into smaller segments that can pass through all the network elements like routers and switches between the source and destination computers. This process is referred to as segmentation. Often the TCP protocol in the host computer performs this segmentation. Offloading this work to the NIC is called TCP segmentation offload (TSO).

称为tcp segmentation offload或 large send offload (LSO)，这种技术需要网卡支持。主要思想是将数据切分的工作由CPU转移到支持TSO的网卡上面，通过下面命令来查看网卡是否支持TSO。

[root@localhost ~]# ethtool --show-offload ens33
Features for ens33:
rx-checksumming: off
tx-checksumming: on
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: on
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp6-segmentation: off [fixed]
	tx-tcp-mangleid-segmentation: off
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
busy-poll: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
rx-gro-hw: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]