本文转载自DPDK与SPDK社区
当前的高性能网卡通常都支持对数据流的定向和过滤功能,可通过配置将指定的数据流定向到指定的设备队列中,并且如果监听此队列的核心正是运行处理此数据流的应用所在核心,将获得一定的性能优势。另外网卡的流过滤功能还可设定丢弃指定的流,可实现在硬件层面屏蔽非法的访问等,而不需要处理器的干预。DPDK的示例flow_filtering演示第一种流定向功能。
该示例flow_filtering用于配置网卡的流过滤规则,完成匹配数据流的设备队列定向功能。主函数是通用的初始化流程:包括EAL初始化,分配存储mbuf的内存池mempool,初始化接口init_port函数。DPDK的此示例使用一个port接口就可运行。随后调用完成流规则配置的函数generate_ipv4_flow。当然网卡的流过滤规则除了本示例的队列定向功能,还有其它的功能,比如丢弃匹配的流等
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | int main(int argc, char **argv) { ret = rte_eal_init(argc, argv);
nr_ports = rte_eth_dev_count_avail(); if (nr_ports == 0) rte_exit(EXIT_FAILURE, ":: no Ethernet ports found\n"); port_id = 0; if (nr_ports != 1) { printf(":: warn: %d ports detected, but we use only one: port %u\n", nr_ports, port_id); } mbuf_pool = rte_pktmbuf_pool_create("mbuf_pool", 4096, 128, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id()); if (mbuf_pool == NULL) rte_exit(EXIT_FAILURE, "Cannot init mbuf pool\n");
init_port();
flow = generate_ipv4_flow(port_id, selected_queue, SRC_IP, EMPTY_MASK, DEST_IP, FULL_MASK, &error); if (!flow) { printf("Flow can't be created %d message: %s\n", error.type, error.message ? error.message : "(no stated reason)"); rte_exit(EXIT_FAILURE, "error in creating flow"); }
main_loop(); return 0; } |
函数generate_ipv4_flow完成流规则配置功能,其实现将设定的数据流定向到指定的设备队列中。由以下的四个宏定义可知,本示例要定向的流为:源IP地址为0.0.0.0,mask为空EMPTY_MASK(0),即不区分源IP,目的IP地址为192.168.1.1,掩码为FULL_MASK(0xffffffff)。即所有目的地址为192.168.1.1的数据流,定向到设备的队列1中(selected_queue)。
1 2 3 4 5 | static uint8_t selected_queue = 1; #define SRC_IP ((0<<24) + (0<<16) + (0<<8) + 0) /* src ip = 0.0.0.0 */ #define DEST_IP ((192<<24) + (168<<16) + (1<<8) + 1) /* dest ip = 192.168.1.1 */ #define FULL_MASK 0xffffffff /* full mask */ #define EMPTY_MASK 0x0 /* empty mask */ |
首先,设置流的属性rte_flow_attr为ingress,即要操作的数据流为接收方向。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | struct rte_flow *generate_ipv4_flow(uint16_t port_id, uint16_t rx_q, uint32_t src_ip, uint32_t src_mask, uint32_t dest_ip, uint32_t dest_mask, struct rte_flow_error *error) { struct rte_flow_attr attr; struct rte_flow_item pattern[MAX_PATTERN_NUM]; struct rte_flow_action action[MAX_ACTION_NUM]; struct rte_flow *flow = NULL; struct rte_flow_action_queue queue = { .index = rx_q }; struct rte_flow_item_ipv4 ip_spec; struct rte_flow_item_ipv4 ip_mask;
/* set the rule attribute. in this case only ingress packets will be checked. */ memset(&attr, 0, sizeof(struct rte_flow_attr)); attr.ingress = 1; |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | struct rte_flow *generate_ipv4_flow(uint16_t port_id, uint16_t rx_q, uint32_t src_ip, uint32_t src_mask, uint32_t dest_ip, uint32_t dest_mask, struct rte_flow_error *error) { struct rte_flow_attr attr; struct rte_flow_item pattern[MAX_PATTERN_NUM]; struct rte_flow_action action[MAX_ACTION_NUM]; struct rte_flow *flow = NULL; struct rte_flow_action_queue queue = { .index = rx_q }; struct rte_flow_item_ipv4 ip_spec; struct rte_flow_item_ipv4 ip_mask;
/* set the rule attribute. in this case only ingress packets will be checked. */ memset(&attr, 0, sizeof(struct rte_flow_attr)); attr.ingress = 1; |
第二,设置流规则匹配之后,接下来定义采取的动作action,此处为RTE_FLOW_ACTION_TYPE_QUEUE,即匹配后定向到指定的设备队列中。另外,如果要实现丢弃匹配流的功能,动作类型应设定为RTE_FLOW_ACTION_TYPE_DROP,此示例中不涉及。
1 2 3 4 | /* create the action sequence. one action only, move packet to queue */ action[0].type = RTE_FLOW_ACTION_TYPE_QUEUE; action[0].conf = &queue; action[1].type = RTE_FLOW_ACTION_TYPE_END; |
第三,设置流规则匹配的模式序列pattern,由于此处最终匹配的为一个IPv4目的地址192.168.1.1,设置第一级匹配模式为以太网数据类型RTE_FLOW_ITEM_TYPE_ETH。第二级匹配模式设置为源IP/掩码和目的IP/掩码,类型为IPv4(RTE_FLOW_ITEM_TYPE_IPV4)。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | /* set the first level of the pattern (ETH). since in this example we just want to get the ipv4 we set this level to allow all. */ pattern[0].type = RTE_FLOW_ITEM_TYPE_ETH; /* * setting the second level of the pattern (IP). in this example this is the level we care about so we set it according to the parameters. */ memset(&ip_spec, 0, sizeof(struct rte_flow_item_ipv4)); memset(&ip_mask, 0, sizeof(struct rte_flow_item_ipv4)); ip_spec.hdr.dst_addr = htonl(dest_ip); ip_mask.hdr.dst_addr = dest_mask; ip_spec.hdr.src_addr = htonl(src_ip); ip_mask.hdr.src_addr = src_mask; pattern[1].type = RTE_FLOW_ITEM_TYPE_IPV4; pattern[1].spec = &ip_spec; pattern[1].mask = &ip_mask;
/* the final level must be always type end */ pattern[2].type = RTE_FLOW_ITEM_TYPE_END; |
第四,以上流规则的参数都已设置完成。此处调用rte_flow_validate函数验证参数配置的是否正确。该函数实现位于文件lib/librte_ethdev/rte_flow.c中,其获取设备的流规则处理函数集rte_flow_ops,调用其中的validate函数。例如对于INTEL的IXGBE网卡驱动设备而言,validate函数指针指向ixgbe_flow_validate函数,其实现位于文件drivers/net/ixgbe/ixgbe_flow.c中。此函数也仅是检查定义的流规则参数网卡是否支持,例如IXGBE网卡就不支持MAC流识别,以及检查参数中指定的队列号是否超出设备支持的最大队列值等,但是并不确保通过validate检查的flow流规则参数一定能最终设置成功,因为网卡中存储流规则的内存可能已满。
1 2 3 4 5 6 7 8 9 10 11 | int rte_flow_validate(uint16_t port_id, const struct rte_flow_attr *attr, const struct rte_flow_item pattern[], const struct rte_flow_action actions[], struct rte_flow_error *error) { const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error); struct rte_eth_dev *dev = &rte_eth_devices[port_id];
if (unlikely(!ops)) return -rte_errno; if (likely(!!ops->validate)) return flow_err(port_id, ops->validate(dev, attr, pattern, actions, error), error); return rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, rte_strerror(ENOSYS)); |
第五,调用函数rte_flow_create创建流规则,其实现位于文件lib/librte_ethdev/rte_flow.c中,与以上的函数rte_flow_validate类似,其也是封装了具体的网络设备的流规则创建函数create。还是以INTEL的IXGBE驱动为例,其流规则创建函数为ixgbe_flow_create,位于文件drivers/net/ixgbe/ixgbe_flow.c。主要工作时将设定的流规则写入网卡硬件中。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | struct rte_flow *rte_flow_create(uint16_t port_id, const struct rte_flow_attr *attr, const struct rte_flow_item pattern[], const struct rte_flow_action actions[], struct rte_flow_error *error) { struct rte_eth_dev *dev = &rte_eth_devices[port_id]; struct rte_flow *flow; const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
if (unlikely(!ops)) return NULL; if (likely(!!ops->create)) { flow = ops->create(dev, attr, pattern, actions, error); if (flow == NULL) flow_err(port_id, -rte_errno, error); return flow; } rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, rte_strerror(ENOSYS)); return NULL; |
示例flow_filtering使用的主要是网卡的Flow Director功能。对于INTEL的网卡82599的IXGBE驱动而言,函数ixgbe_parse_fdir_filter_normal用来解析上层设置的流规则参数,首先因为在之前的匹配模型pattern中并未设置类型为RTE_FLOW_ITEM_TYPE_FUZZY的规则,signature_match不成立,此处使用RTE_FDIR_MODE_PERFECT类型匹配规则。其次由于匹配模式链中第一个指定的为RTE_FLOW_ITEM_TYPE_ETH类型,但是并没有指定相应的spec和mask,所以IXGBE驱动不做处理,跳到下一个pattern。最后的pattern类型设置的为RTE_FLOW_ITEM_TYPE_IPV4,即将之前设置的源和目的IP地址赋予规则的ixgbe_fdir.formatted结构的成员src_ip[0]和dst_ip[0],将掩码赋予mask.dst_ipv4_mask和mask.src_ipv4_mask变量。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | static int ixgbe_parse_fdir_filter_normal(struct rte_eth_dev *dev, const struct rte_flow_attr *attr, const struct rte_flow_item pattern[], const struct rte_flow_action actions[], struct ixgbe_fdir_rule *rule, struct rte_flow_error *error) { if (signature_match(pattern)) rule->mode = RTE_FDIR_MODE_SIGNATURE; else rule->mode = RTE_FDIR_MODE_PERFECT;
if (item->type == RTE_FLOW_ITEM_TYPE_ETH) { /*** If both spec and mask are item, it means don't care about ETH. Do nothing. */
/** Check if the next not void item is vlan or ipv4. IPv6 is not supported. */ item = next_no_fuzzy_pattern(pattern, item); } if (item->type == RTE_FLOW_ITEM_TYPE_IPV4) { /** Set the flow type even if there's no content as we must have a flow type. */ rule->ixgbe_fdir.formatted.flow_type = IXGBE_ATR_FLOW_TYPE_IPV4; rule->b_mask = TRUE; ipv4_mask = item->mask; rule->mask.dst_ipv4_mask = ipv4_mask->hdr.dst_addr; rule->mask.src_ipv4_mask = ipv4_mask->hdr.src_addr;
if (item->spec) { rule->b_spec = TRUE; ipv4_spec = item->spec; rule->ixgbe_fdir.formatted.dst_ip[0] = ipv4_spec->hdr.dst_addr; rule->ixgbe_fdir.formatted.src_ip[0] = ipv4_spec->hdr.src_addr; } } return ixgbe_parse_fdir_act_attr(attr, actions, rule, error); } |
最后的函数ixgbe_parse_fdir_act_attr负责解析配置的动作action,此处为将配置的设备队列号赋值到规则的queue变量中。以上可见驱动函数ixgbe_parse_fdir_filter_normal正好对应之前的配置函数generate_ipv4_flow。解析完成之后得到一个初始化完成的ixgbe_fdir_rule结构,由函数ixgbe_fdir_set_input_mask和ixgbe_fdir_filter_program函数写入网卡的Flow Director相关硬件寄存器中。INTEL的82599网卡最多支持8K-2个RTE_FDIR_MODE_PERFECT类型的流规则过滤器,详情请见82599数据手册:https://www.intel.cn/content/www/cn/zh/embedded/products/networking/82599-10-gbe-controller-datasheet.html。
1 2 3 4 5 6 7 8 | static int ixgbe_parse_fdir_act_attr(const struct rte_flow_attr *attr, const struct rte_flow_action actions[], struct ixgbe_fdir_rule *rule, struct rte_flow_error *error) { if (act->type == RTE_FLOW_ACTION_TYPE_QUEUE) { act_q = (const struct rte_flow_action_queue *)act->conf; rule->queue = act_q->index; } } |
示例程序flow_filtering的最后,调用main_loop函数,从设备的所有队列中接收数据,并且打印接收到数据的源和目的IP地址,以及接收的设备队列编号。根据此打印信息可确认之前的流规则是否生效。应观察到目的IP地址为192.168.1.1的数据包都由队列1接收到。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | static void main_loop(void) { while (!force_quit) { for (i = 0; i < nr_queues; i++) { nb_rx = rte_eth_rx_burst(port_id, i, mbufs, 32); if (nb_rx) { for (j = 0; j < nb_rx; j++) { struct rte_mbuf *m = mbufs[j];
eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *); print_ether_addr("src=", ð_hdr->s_addr); print_ether_addr(" - dst=", ð_hdr->d_addr); printf(" - queue=0x%x", (unsigned int)i); printf("\n");
rte_pktmbuf_free(m); } } } } } |
最后,对于Linux而言,可使用ethtool工具配置以上的流规则。如下的命令依次为开启网卡设备的流规则功能;查看开启状态;配置目的IP地址为192.168.1.1的流定向到队列1中,查看流规则列表;最后为删除流规则的命令。更多ethtool配置可查看其帮助信息。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | / # ethtool --features eth0 ntuple on / # / # ethtool --show-features eth0 ntuple-filters: on / # / # ethtool --config-ntuple eth0 flow-type ip4 src-ip 0.0.0.0 m 0.0.0.0 dst-ip 192.168.1.1 m 255.255.255.255 action 1 Added rule with ID 7423 / # / # ethtool --show-ntuple eth0 4 RX rings available Total 1 rules
Filter: 7423 Rule Type: Raw IPv4 Src IP addr: 0.0.0.0 mask: 255.255.255.255 Dest IP addr: 192.168.1.1 mask: 255.255.255.255 TOS: 0x0 mask: 0xff Protocol: 0 mask: 0xff L4 bytes: 0x0 mask: 0xffffffff Action: Direct to queue 1
/ # / # ethtool --config-ntuple eth0 delete 7423 / # / # ethtool --show-ntuple eth0 4 RX rings available Total 0 rules
/ # |