作者简介
张凯 软件工程师
主要从事LINUX内核网络和虚拟化相关研发工作。
当前的高性能网卡通常都支持对数据流的定向和过滤功能,可通过配置将指定的数据流定向到指定的设备队列中,并且如果监听此队列的核心正是运行处理此数据流的应用所在核心,将获得一定的性能优势。另外网卡的流过滤功能还可设定丢弃指定的流,可实现在硬件层面屏蔽非法的访问等,而不需要处理器的干预。DPDK的示例flow_filtering演示第一种流定向功能。
▼
该示例flow_filtering用于配置网卡的流过滤规则,完成匹配数据流的设备队列定向功能。主函数是通用的初始化流程:包括EAL初始化,分配存储mbuf的内存池mempool,初始化接口init_port函数。DPDK的此示例使用一个port接口就可运行。随后调用完成流规则配置的函数generate_ipv4_flow。当然网卡的流过滤规则除了本示例的队列定向功能,还有其它的功能,比如丢弃匹配的流等。
int main(int argc, char **argv)
{
ret = rte_eal_init(argc, argv);
nr_ports = rte_eth_dev_count_avail();
if (nr_ports == 0)
rte_exit(EXIT_FAILURE, ":: no Ethernet ports found\n");
port_id = 0;
if (nr_ports != 1) {
printf(":: warn: %d ports detected, but we use only one: port %u\n", nr_ports, port_id);
}
mbuf_pool = rte_pktmbuf_pool_create("mbuf_pool", 4096, 128, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
if (mbuf_pool == NULL)
rte_exit(EXIT_FAILURE, "Cannot init mbuf pool\n");
init_port();
flow = generate_ipv4_flow(port_id, selected_queue, SRC_IP, EMPTY_MASK, DEST_IP, FULL_MASK, &error);
if (!flow) {
printf("Flow can't be created %d message: %s\n", error.type, error.message ? error.message : "(no stated reason)");
rte_exit(EXIT_FAILURE, "error in creating flow");
}
main_loop();
return 0;
}
函数generate_ipv4_flow完成流规则配置功能,其实现将设定的数据流定向到指定的设备队列中。由以下的四个宏定义可知,本示例要定向的流为:源IP地址为0.0.0.0,mask为空EMPTY_MASK(0),即不区分源IP,目的IP地址为192.168.1.1,掩码为FULL_MASK(0xffffffff)。即所有目的地址为192.168.1.1的数据流,定向到设备的队列1中(selected_queue)。
static uint8_t selected_queue = 1;
#define SRC_IP ((0<<24) + (0<<16) + (0<<8) + 0) /* src ip = 0.0.0.0 */
#define DEST_IP ((192<<24) + (168<<16) + (1<<8) + 1) /* dest ip = 192.168.1.1 */
#define FULL_MASK 0xffffffff /* full mask */
#define EMPTY_MASK 0x0 /* empty mask */
01
首先,设置流的属性rte_flow_attr为ingress,即要操作的数据流为接收方向。
struct rte_flow *generate_ipv4_flow(uint16_t port_id, uint16_t rx_q, uint32_t src_ip, uint32_t src_mask,
uint32_t dest_ip, uint32_t dest_mask, struct rte_flow_error *error)
{
struct rte_flow_attr attr;
struct rte_flow_item pattern[MAX_PATTERN_NUM];
struct rte_flow_action action[MAX_ACTION_NUM];
struct rte_flow *flow = NULL;
struct rte_flow_action_queue queue = { .index = rx_q };
struct rte_flow_item_ipv4 ip_spec;
struct rte_flow_item_ipv4 ip_mask;
/* set the rule attribute. in this case only ingress packets will be checked. */
memset(&attr, 0, sizeof(struct rte_flow_attr));
attr.ingress = 1;
02
第二,设置流规则匹配之后,接下来定义采取的动作action,此处为RTE_FLOW_ACTION_TYPE_QUEUE,即匹配后定向到指定的设备队列中。另外,如果要实现丢弃匹配流的功能,动作类型应设定为RTE_FLOW_ACTION_TYPE_DROP,此示例中不涉及。
/* create the action sequence. one action only, move packet to queue */
action[0].type = RTE_FLOW_ACTION_TYPE_QUEUE;
action[0].conf = &queue;
action[1].type = RTE_FLOW_ACTION_TYPE_END;
03
第三,设置流规则匹配的模式序列pattern,由于此处最终匹配的为一个IPv4目的地址192.168.1.1,设置第一级匹配模式为以太网数据类型RTE_FLOW_ITEM_TYPE_ETH。第二级匹配模式设置为源IP/掩码和目的IP/掩码,类型为IPv4(RTE_FLOW_ITEM_TYPE_IPV4)。
/* set the first level of the pattern (ETH). since in this example we just want to get the ipv4 we set this level to allow all. */
pattern[0].type = RTE_FLOW_ITEM_TYPE_ETH;
/*
* setting the second level of the pattern (IP). in this example this is the level we care about so we set it according to the parameters.
*/
memset(&ip_spec, 0, sizeof(struct rte_flow_item_ipv4));
memset(&ip_mask, 0, sizeof(struct rte_flow_item_ipv4));
ip_spec.hdr.dst_addr = htonl(dest_ip);
ip_mask.hdr.dst_addr = dest_mask;
ip_spec.hdr.src_addr = htonl(src_ip);
ip_mask.hdr.src_addr = src_mask;
pattern[1].type = RTE_FLOW_ITEM_TYPE_IPV4;
pattern[1].spec = &ip_spec;
pattern[1].mask = &ip_mask;
/* the final level must be always type end */
pattern[2].type = RTE_FLOW_ITEM_TYPE_END;
04
第四,以上流规则的参数都已设置完成。此处调用rte_flow_validate函数验证参数配置的是否正确。该函数实现位于文件lib/librte_ethdev/rte_flow.c中,其获取设备的流规则处理函数集rte_flow_ops,调用其中的validate函数。例如对于INTEL的IXGBE网卡驱动设备而言,validate函数指针指向ixgbe_flow_validate函数,其实现位于文件drivers/net/ixgbe/ixgbe_flow.c中。此函数也仅是检查定义的流规则参数网卡是否支持,例如IXGBE网卡就不支持MAC流识别,以及检查参数中指定的队列号是否超出设备支持的最大队列值等,但是并不确保通过validate检查的flow流规则参数一定能最终设置成功,因为网卡中存储流规则的内存可能已满。
int rte_flow_validate(uint16_t port_id, const struct rte_flow_attr *attr, const struct rte_flow_item pattern[],
const struct rte_flow_action actions[], struct rte_flow_error *error)
{
const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
struct rte_eth_dev *dev = &rte_eth_devices[port_id];
if (unlikely(!ops))
return -rte_errno;
if (likely(!!ops->validate))
return flow_err(port_id, ops->validate(dev, attr, pattern, actions, error), error);
return rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, rte_strerror(ENOSYS));
05
第五,调用函数rte_flow_create创建流规则,其实现位于文件lib/librte_ethdev/rte_flow.c中,与以上的函数rte_flow_validate类似,其也是封装了具体的网络设备的流规则创建函数create。还是以INTEL的IXGBE驱动为例,其流规则创建函数为ixgbe_flow_create,位于文件drivers/net/ixgbe/ixgbe_flow.c。主要工作时将设定的流规则写入网卡硬件中。
struct rte_flow *rte_flow_create(uint16_t port_id, const struct rte_flow_attr *attr, const struct rte_flow_item pattern[],
const struct rte_flow_action actions[], struct rte_flow_error *error)
{
struct rte_eth_dev *dev = &rte_eth_devices[port_id];
struct rte_flow *flow;
const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
if (unlikely(!ops))
return NULL;
if (likely(!!ops->create)) {
flow = ops->create(dev, attr, pattern, actions, error);
if (flow == NULL)
flow_err(port_id, -rte_errno, error);
return flow;
}
rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, rte_strerror(ENOSYS));
return NULL;
示例flow_filtering使用的主要是网卡的Flow Director功能。对于INTEL的网卡82599的IXGBE驱动而言,函数ixgbe_parse_fdir_filter_normal用来解析上层设置的流规则参数,首先因为在之前的匹配模型pattern中并未设置类型为RTE_FLOW_ITEM_TYPE_FUZZY的规则,signature_match不成立,此处使用RTE_FDIR_MODE_PERFECT类型匹配规则。其次由于匹配模式链中第一个指定的为RTE_FLOW_ITEM_TYPE_ETH类型,但是并没有指定相应的spec和mask,所以IXGBE驱动不做处理,跳到下一个pattern。最后的pattern类型设置的为RTE_FLOW_ITEM_TYPE_IPV4,即将之前设置的源和目的IP地址赋予规则的ixgbe_fdir.formatted结构的成员src_ip[0]和dst_ip[0],将掩码赋予mask.dst_ipv4_mask和mask.src_ipv4_mask变量。
static int
ixgbe_parse_fdir_filter_normal(struct rte_eth_dev *dev, const struct rte_flow_attr *attr, const struct rte_flow_item pattern[],
const struct rte_flow_action actions[], struct ixgbe_fdir_rule *rule, struct rte_flow_error *error)
{
if (signature_match(pattern))
rule->mode = RTE_FDIR_MODE_SIGNATURE;
else
rule->mode = RTE_FDIR_MODE_PERFECT;
if (item->type == RTE_FLOW_ITEM_TYPE_ETH) {
/*** If both spec and mask are item, it means don't care about ETH. Do nothing. */
/** Check if the next not void item is vlan or ipv4. IPv6 is not supported. */
item = next_no_fuzzy_pattern(pattern, item);
}
if (item->type == RTE_FLOW_ITEM_TYPE_IPV4) {
/** Set the flow type even if there's no content as we must have a flow type. */
rule->ixgbe_fdir.formatted.flow_type = IXGBE_ATR_FLOW_TYPE_IPV4;
rule->b_mask = TRUE;
ipv4_mask = item->mask;
rule->mask.dst_ipv4_mask = ipv4_mask->hdr.dst_addr;
rule->mask.src_ipv4_mask = ipv4_mask->hdr.src_addr;
if (item->spec) {
rule->b_spec = TRUE;
ipv4_spec = item->spec;
rule->ixgbe_fdir.formatted.dst_ip[0] = ipv4_spec->hdr.dst_addr;
rule->ixgbe_fdir.formatted.src_ip[0] = ipv4_spec->hdr.src_addr;
}
}
return ixgbe_parse_fdir_act_attr(attr, actions, rule, error);
}
▼
最后的函数ixgbe_parse_fdir_act_attr负责解析配置的动作action,此处为将配置的设备队列号赋值到规则的queue变量中。以上可见驱动函数ixgbe_parse_fdir_filter_normal正好对应之前的配置函数generate_ipv4_flow。解析完成之后得到一个初始化完成的ixgbe_fdir_rule结构,由函数ixgbe_fdir_set_input_mask和ixgbe_fdir_filter_program函数写入网卡的Flow Director相关硬件寄存器中。INTEL的82599网卡最多支持8K-2个RTE_FDIR_MODE_PERFECT类型的流规则过滤器,详情请见82599数据手册:https://www.intel.cn/content/www/cn/zh/embedded/products/networking/82599-10-gbe-controller-datasheet.html。
static int ixgbe_parse_fdir_act_attr(const struct rte_flow_attr *attr,
const struct rte_flow_action actions[], struct ixgbe_fdir_rule *rule, struct rte_flow_error *error)
{
if (act->type == RTE_FLOW_ACTION_TYPE_QUEUE) {
act_q = (const struct rte_flow_action_queue *)act->conf;
rule->queue = act_q->index;
}
}
示例程序flow_filtering的最后,调用main_loop函数,从设备的所有队列中接收数据,并且打印接收到数据的源和目的IP地址,以及接收的设备队列编号。根据此打印信息可确认之前的流规则是否生效。应观察到目的IP地址为192.168.1.1的数据包都由队列1接收到。
static void main_loop(void)
{
while (!force_quit) {
for (i = 0; i < nr_queues; i++) {
nb_rx = rte_eth_rx_burst(port_id, i, mbufs, 32);
if (nb_rx) {
for (j = 0; j < nb_rx; j++) {
struct rte_mbuf *m = mbufs[j];
eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
print_ether_addr("src=", ð_hdr->s_addr);
print_ether_addr(" - dst=", ð_hdr->d_addr);
printf(" - queue=0x%x", (unsigned int)i);
printf("\n");
rte_pktmbuf_free(m);
}
}
}
}
}
最后,对于Linux而言,可使用ethtool工具配置以上的流规则。如下的命令依次为开启网卡设备的流规则功能;查看开启状态;配置目的IP地址为192.168.1.1的流定向到队列1中,查看流规则列表;最后为删除流规则的命令。更多ethtool配置可查看其帮助信息。
/ # ethtool --features eth0 ntuple on
/ #
/ # ethtool --show-features eth0
ntuple-filters: on
/ #
/ # ethtool --config-ntuple eth0 flow-type ip4 src-ip 0.0.0.0 m 0.0.0.0 dst-ip 192.168.1.1 m 255.255.255.255 action 1
Added rule with ID 7423
/ #
/ # ethtool --show-ntuple eth0
4 RX rings available
Total 1 rules
Filter: 7423
Rule Type: Raw IPv4
Src IP addr: 0.0.0.0 mask: 255.255.255.255
Dest IP addr: 192.168.1.1 mask: 255.255.255.255
TOS: 0x0 mask: 0xff
Protocol: 0 mask: 0xff
L4 bytes: 0x0 mask: 0xffffffff
Action: Direct to queue 1
/ #
/ # ethtool --config-ntuple eth0 delete 7423
/ #
/ # ethtool --show-ntuple eth0
4 RX rings available
Total 0 rules
/ #
(本文已获得作者授权)
转载须知