DPDK — L3 Forwarding 与 IP 路由选择算法

范桂飓

已于 2023-03-22 17:33:54 修改

阅读量1.7k

点赞数 1

文章标签： tcp/ip 网络协议网络

于 2023-03-21 01:00:11 首次发布

本文链接：https://blog.csdn.net/Jmilk/article/details/129673463

版权

L3 Forwarding Application

L3 Forwarding Application 是一个实现了 LPM（最长前缀匹配）和 EM（精确匹配）路由选择算法的 IP 数据包转发应用程序。

EM（精确匹配）：是最基本的路由匹配算法，对数据包的 IP 5-tuple 执行 Hash function 得到 Hash value，与 Route table 中的 Hash key 完全一致时匹配下一跳。基于 HASH match 具有更高的性能。
LPM（最长前缀匹配）：是基于 CIDR 的路由匹配算法，当数据包的 dstIP 地址和 Route table 中最长的 IP/Netmask 匹配时，找到下一跳。具有更好的灵活性。

安装部署

部署拓扑

                      |----------------|
                      |     l3fwd      |
                      |                |
                      |  eth1   eth2   |
              1.1.1.1 |----|------|----|  2.1.1.1
    52:54:00:5E:8C:DF      |      |       52:54:00:A7:A7:F6
                          /        \
                         /          \
                        /            \
                       /              \
  52:54:00:4a:1f:6d   /                \   52:54:00:53:5a:d2
           1.1.1.2   |                  |  2.1.1.2
            |--------|------|   |-------|-------|
            |      eth1     |   |     eth2      |
            |               |   |               |
            |    server0    |   |    server1    |
            |---------------|   |---------------|

编译运行 L3fwd

编译：

$ cat dpdk.rc
export RTE_SDK=/opt/dpdk-18.08
export RTE_TARGET=x86_64-native-linuxapp-gcc
export DPDK_BUILD=${DPDK_DIR}/${RTE_TARGET}
export LD_LIBRARY_PATH=${RTE_SDK}/${RTE_TARGET}/lib:/usr/local/lib:/usr/lib:

$ source dpdk.rc

$ cd ${RTE_SDK}/examples/l3fwd
$ make
$ ll build/l3fwd

指令行格式：
- -p PORTMASK：指定要使用的 Ports 的十六进制位掩码（Bitmap）。
- -P：将所有 Ports 都设置为混杂模式，使得无论 Frame 的 dstMAC 是不是本地网卡都可以接收。
- -E：启用精确匹配算法（Exact match）。
- -L：启动最长前缀匹配算法（Longest prefix match）。
- –config：指示 port、queue、lcore 的映射关系。和 -l {lcore_list} 、-p {PORTMASK} 都要能够对应上。
- –eth dest：指示 PortX 的 dstMAC 地址。
- –enable-jumbo：启用 Jumbo 数据帧。Jumbo 数据帧是一种比标准以太网帧（MTU 1500Byte）更大的数据帧。它的 MTU 可达 9000Byte，这使得它比标准以太网帧能够承载更多的数据。Jumbo 数据帧需要被所有传输数据的设备都支持，否则就会出现传输错误或丢失数据的情况。
- –max-pkt-len：在启用 Jumbo 的前提下，以十进制表示最大的 MTU。
- –no-numa：禁用 NUMA 亲和。
- –hash-entry-num：指示十六进制 HASH Entry 的数量。
- –parse-ptype：指示使用软件的方式分析数据包的 protocol type（协议类型），默认采用硬件分析的方式。

./l3fwd [EAL options] -- -p PORTMASK
                         [-P]
                         [-E]
                         [-L]
                         --config(port,queue,lcore)[,(port,queue,lcore)]
                         [--eth-dest=X,MM:MM:MM:MM:MM:MM]
                         [--enable-jumbo [--max-pkt-len PKTLEN]]
                         [--no-numa]
                         [--hash-entry-num]
                         [--ipv6]
                         [--parse-ptype]

运行：
- -l 1：只使用一个 lcore，因为测试机的网卡不支持多队列，而 l3fwd 会为每个 lcore 分配一个 Tx queue，所以只能使用 1 个 lcore，否则会在 rte_eth_dev_configure() 的时候失败。
- 52:54:00:4a:1f:6d - server0 eth1 发出的 Frame 的 dstMAC 地址。因为 l3fwd 没有 ARP 协议，所以需要手动指定每个 Ports 的 dstMAC。
- 52:54:00:53:5a:d2- server1 eth2 发出的 Frame 的 dstMAC 地址。

./build/l3fwd -l 1 -- -p 0x3 -P --config="(0,0,1),(1,0,1)" --parse-ptype --eth-dest=0,52:54:00:4a:1f:6d --eth-dest=1,52:54:00:53:5a:d2

soft parse-ptype is enabled
LPM or EM none selected, default LPM on

Initializing port 0 ... Creating queues: nb_rxq=1 nb_txq=1... Port 0 modified RSS hash function based on hardware support,requested:0xa38c configured:0
 Address:52:54:00:5E:8C:DF, Destination:52:54:00:4A:1F:6D, Allocated mbuf pool on socket 0

LPM: Adding route 0x01010100 / 24 (0)
LPM: Adding route 0x02010100 / 24 (1)
LPM: Adding route IPV6 / 48 (0)
LPM: Adding route IPV6 / 48 (1)
txq=1,0,0

Initializing port 1 ... Creating queues: nb_rxq=1 nb_txq=1... Port 1 modified RSS hash function based on hardware support,requested:0xa38c configured:0
 Address:52:54:00:A7:A7:F6, Destination:52:54:00:53:5A:D2, txq=1,0,0

Skipping disabled port 2

Initializing rx queues on lcore 1 ... rxq=0,0,0 rxq=1,0,0

Port 0: softly parse packet type info
Port 1: softly parse packet type info

Checking link statusdone
Port0 Link Up. Speed 10000 Mbps -full-duplex
Port1 Link Up. Speed 10000 Mbps -full-duplex
L3FWD: entering main loop on lcore 1
L3FWD:  -- lcoreid=1 portid=0 rxqueueid=0
L3FWD:  -- lcoreid=1 portid=1 rxqueueid=0

测试 L3 转发功能

l3fwd_lpm.c 中定义的默认路由表，port0 为 1.1.1.0/24，port1 为 2.1.1.0/24。

static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = {
	{IPv4(1, 1, 1, 0), 24, 0},
	{IPv4(2, 1, 1, 0), 24, 1},
	{IPv4(3, 1, 1, 0), 24, 2},
	{IPv4(4, 1, 1, 0), 24, 3},
	{IPv4(5, 1, 1, 0), 24, 4},
	{IPv4(6, 1, 1, 0), 24, 5},
	{IPv4(7, 1, 1, 0), 24, 6},
	{IPv4(8, 1, 1, 0), 24, 7},
};

server01：因为 l3fwd 不能处理 ARP 协议，所以需要手动添加 l3fwd port0 的静态 MAC 地址表。

$ ip addr add dev eth1 1.1.1.2/24
$ ip route add 2.1.1.2 via 1.1.1.1 dev eth1
$ ip nei add 1.1.1.1 lladdr 52:54:00:5E:8C:DF dev eth1

server02

$ ip addr add dev eth2 2.1.1.2/24
$ ip route add 1.1.1.2 via 2.1.1.1 dev eth2
$ ip nei add 2.1.1.1 lladdr 52:54:00:A7:A7:F6 dev eth2

server01 ping server02

$ ping 2.1.1.2
PING 2.1.1.2 (2.1.1.2) 56(84) bytes of data.
64 bytes from 2.1.1.2: icmp_seq=30 ttl=63 time=0.659 ms

性能参数

10GbE、Intel x710、2NUMA、4ports

在这里插入图片描述

实现分析

代码注释

Github：https://github.com/JmilkFan/dpdk-samples

函数关系调用图

在这里插入图片描述

LPM Library

Docs：http://doc.dpdk.org/guides/prog_guide/lpm_lib.html#implementation-details

DPDK LPM（Longest Prefix Match）库是一个高性能的前缀路由匹配库，用于在数据包转发过程中快速查找与 dstIP 地址最长匹配的路由表项。

LPM 库具有以下特点：

高性能：LPM 库使用基于前缀树的算法实现快速匹配。
多核（多线程）安全：LPM 库支持多线程并发安全，能够充分利用多核处理器的计算资源。
灵活配置：LPM 库支持动态配置路由表，可以在运行时添加、删除或修改路由表项，以适应网络拓扑的变化。
内存管理：LPM 库使用 Memory Pool 来管理内存。

算法设计

两级 HASH 表

LPM Library 具有很好的灵活性，同时为了兼顾性能，底层实现仍是基于 HASH 算法，并且将 32bit 的 IP 地址分为 2 个部分：

tbl24（24bit）：1 张 2^24 entries 的 HASH 表。
tbl8（8bit）：最多 256 张 2^8 entries 的 HASH 表。

在这里插入图片描述

查询算法

两级 HASH 表带来的效果是，当 Lookup IP/Netmask <= 24bit 时，只需要查一次表就可以得到 NextHop；而当 > 24bit 时，就需要查 2 次表，而这种情况相对较少。

在这里插入图片描述

示例

在这里插入图片描述

核心数据结构

在这里插入图片描述

rte_lpm

struct rte_lpm {

	/* LPM metadata. */
	char name[RTE_LPM_NAMESIZE];  // 表名
	uint32_t max_rules;           // 最大 Entries 数量
	uint32_t number_tbl8s;        // 最大 tbl8 表数量
	struct rte_lpm_rule_info rule_info[RTE_LPM_MAX_DEPTH];  // 存储 entry 信息的结构体数组，长 32。

	/* LPM Tables. */
	struct rte_lpm_tbl_entry tbl24[RTE_LPM_TBL24_NUM_ENTRIES] __rte_cache_aligned;  // tbl24 表数组，长度 2^24
	struct rte_lpm_tbl_entry *tbl8;  // tbl8 表空间指针，空间为 255 * number_tbl8s
	struct rte_lpm_rule *rules_tbl;  // Entries 空间指针，长度为 max_rules
};


/* 用来存储 entry 信息，具有相同掩码的条目从 first_rule 开始到 first_rule + used_rules - 1 结束。*/
struct rte_lpm_rule_info {
	uint32_t used_rules;  /**< Used rules so far. */
	uint32_t first_rule;  /**< Indexes the first rule of a given depth. */
};


/**
 * 用来存储 tbl24 和 tbl8 HASH 表。
 */
struct rte_lpm_tbl_entry {
	/**
	 * Stores Next hop (tbl8 or tbl24 when valid_group is not set) or
	 * a group index pointing to a tbl8 structure (tbl24 only, when
	 * valid_group is set)
	 */
	uint32_t next_hop    :24;  // 当为 tbl24 节点且此节点上挂载 tbl8 的时候, 此数据表示 tbl8 的开始索引；当为 tbl24 节点, 但是此节点上不存在 tbl8 的时候, 此数据为真实的下一跳。
	/* Using single uint8_t to store 3 values. */
	uint32_t valid       :1;   // 表明此节点是否有效
	/**
	 * For tbl24:
	 *  - valid_group == 0: entry stores a next hop
	 *  - valid_group == 1: entry stores a group_index pointing to a tbl8
	 * For tbl8:
	 *  - valid_group indicates whether the current tbl8 is in use or not
	 */
	uint32_t valid_group :1;  // 当为 tbl24 节点的时候 valid_group 为 1 表明 next_hop 为 tbl8 的开始索引，为 ０ 则表示真实的下一跳。
	uint32_t depth       :6;  // 此 HASH 节点的掩码位。
};

/* 存储的具体的路由表项 */
struct rte_lpm_rule {
	uint32_t ip;        // dstIP 地址
	uint32_t next_hop;  // 下一跳
};

rte_lpm_config

/** LPM configuration structure. */
struct rte_lpm_config {
	uint32_t max_rules;      /**< Max number of rules. */
	uint32_t number_tbl8s;   /**< Number of tbl8s to allocate. */
	int flags;               /**< This field is currently unused. */
};

接口函数

接口	实际实现	功能
rte_lpm_create	rte_lpm_create_v1604	创建路由表
rte_lpm_free	rte_lpm_free_v1604	释放路由表占用的空间
rte_lpm_add	rte_lpm_add_v1604	添加路由
rte_lpm_delete	rte_lpm_delete_v1604	删除路由
rte_lpm_delete_all	rte_lpm_delete_all_v1604	销毁路由表
rte_lpm_lookup		路由查找实现
rte_lpm_find_existing	rte_lpm_find_existing_v1604	根据名字找到路由表
rte_lpm_is_rule_present	rte_lpm_is_rule_present_v1604	检查是否条目存在

rte_lpm_create()

struct rte_lpm *rte_lpm_create(const char *name, int socket_id, struct rte_lpm_config *config)

功能：创建一个新的 LPM table。

参数：

name：LPM table 的名称。
socket_id：LPM table 在哪个 NUMA socket 上分配。
config：LPM table 的配置信息。

返回值：

成功：指向新创建的 LPM table 的指针。
失败：NULL。

rte_lpm_free()

void rte_lpm_free(struct rte_lpm *lpm)

功能：释放 LPM table 所占用的内存。

参数：

lpm：指向已初始化的 LPM table 的指针。

返回值：无。

rte_lpm_add()

int rte_lpm_add(struct rte_lpm *lpm, uint32_t ip, uint8_t depth, uint32_t next_hop)

功能：添加一条 Entry 到 LPM table 中。

参数：

lpm：指向已初始化的 LPM table 的指针。
ip：dstIP 地址
depth：dstIP 地址的地址掩码长度
next_hop：下一跳

返回值：

成功：0。
失败：负数。

rte_lpm_delete()

int rte_lpm_delete(struct rte_lpm *lpm, uint32_t ip, uint8_t depth)

功能：从 LPM table 中删除一条 Entry。

参数：

lpm：指向已初始化的 LPM table 的指针。
ip：dstIP 地址
depth：dstIP 地址的地址掩码长度

返回值：

成功：0。
失败：负数。

rte_lpm_lookup()

int rte_lpm_lookup(const struct rte_lpm *lpm, uint32_t ip, uint32_t *next_hop)

功能：在 LPM table 中查找与指定 dstIP 地址最长匹配的前缀，并返回其下一跳。

参数：

lpm：指向已初始化的 LPM table 的指针。
ip：指定的 dstIP 地址。
next_hop：查找成功后，存储匹配前缀的下一跳。

返回值：

匹配成功：匹配前缀的前缀长度。
匹配失败：-ENOENT。

rte_lpm_is_rule_present()

int rte_lpm_is_rule_present(const struct rte_lpm *lpm, uint32_t ip, uint8_t depth, uint32_t *next_hop)

功能：检查给定的 Entry 是否存在于 LPM table 中。

参数：

lpm：指向已初始化的 LPM table 的指针。
ip：dstIP 地址
depth：dstIP 地址的地址掩码长度
next_hop：下一跳

返回值：

Entry 存在：前缀长度。
Entry 不存在：-ENOENT。

rte_lpm_check_params()

int rte_lpm_check_params(const struct rte_lpm_config *config)

功能：检查 LPM table 的 config 配置参数是否有效。

参数：

config：LPM table 配置参数。

返回值：

参数有效：0。
参数无效：-EINVAL。

rte_lpm_find_existing()

int rte_lpm_find_existing(const char *name, int socket_id, struct rte_lpm_config *config)

功能：在指定的 NUMA socket 中查找 LPM table。

参数：

name：LPM table 的名称。
socket_id：NUMA socket ID。
config：LPM table 配置参数。

返回值：

找到：指向现有 LPM table 的指针。
没有找到：NULL。

范桂飓

关注

1
点赞
踩
8

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

DPDK — L3 Forwarding 与 IP 路由选择算法

目录

文章目录

L3 Forwarding Application

安装部署

部署拓扑

编译运行 L3fwd

测试 L3 转发功能

性能参数

实现分析

代码注释

函数关系调用图

LPM Library

算法设计

两级 HASH 表

查询算法

示例

核心数据结构

rte_lpm

rte_lpm_config

接口函数

rte_lpm_create()

rte_lpm_free()

rte_lpm_add()

rte_lpm_delete()

rte_lpm_lookup()

rte_lpm_is_rule_present()

rte_lpm_check_params()

rte_lpm_find_existing()