OVS-DPDK Datapath Classifier

OVS-DPDK存在三层查询表/缓存。输入包首先将在EMC中进行匹配,若未匹配上那么将被送如dpcls。dpcls由一个元组空间查找算法(tuple space search TSS)实现,因此可对包头进行任意的逐位匹配。若包在dpcls中仍未匹配上那么将被送入openflow pipeline即ofproto classifier中进行处理,而该ofproto classifier由SDN控制器进行控制。


对包进行分类后,可对包执行多种不同的动作,比如将包转发至一个确定的端口,增加VLAN tag,丢包亦或者将包发送至连接跟踪模块。

EMC Call Graph

收到包后,包头将被提取存入miniflow中,miniflow是struct flow的稀疏表示,存在两个优势:

1.减少内存以及高速缓存区块(cache lines)

2.由于struct flow结构非常大并且大多数值为0,使用miniflow可快速对非0值进行迭代,每个struct flow中的uint64_t在miniflow.map.bits中占1位


emc_processing将会对struct netdev_flow_key keys[PKT_ARRAY_SIZE]赋值,keys[i]保存第i个miss emc_cache的包对应的miniflow


在EMC中,包将在以下几个结构体中进行处理:

emc_processing(struct dp_netdev_pmd_thread *pmd, struct dp_packet_batch *packets_,
               struct netdev_flow_key *keys,
               struct packet_batch_per_flow batches[], size_t *n_batches,
               bool md_is_valid, odp_port_t port_no)
{
    struct emc_cache *flow_cache = &pmd->flow_cache;
    struct netdev_flow_key *key = &keys[0];
    size_t i, n_missed = 0, n_dropped = 0;
    struct dp_packet **packets = packets_->packets;
    int cnt = packets_->count;

    /*逐个对dp_packet_batch中的每一个packet进行处理*/
    for (i = 0; i < cnt; i++) {
        struct dp_netdev_flow *flow;
        struct dp_packet *packet = packets[i];

        /*若packet包长小于以太头的长度直接丢包*/
        if (OVS_UNLIKELY(dp_packet_size(packet) < ETH_HEADER_LEN)) {
            dp_packet_delete(packet);
            n_dropped++;
            continue;
        }

        /*对数据手工预取可减少读取延迟,从而提高性能*/
        if (i != cnt - 1) {
            /* Prefetch next packet data and metadata. */
            OVS_PREFETCH(dp_packet_data(packets[i+1]));
            pkt_metadata_prefetch_init(&packets[i+1]->md);
        }

        /*初始化metadata
        *首先将pkt_metadata中flow_in_port前的字节全部设为0
        *然后将in_port.odp_port设为port_no,tunnel.ip_dst设为0从而tunnel中的其他字段*/
        if (!md_is_valid) {
            pkt_metadata_init(&packet->md, port_no);
        }

        /*根据pkt_metadata中的值以及dp_packet->mbuf提取miniflow*/
        miniflow_extract(packet, &key->mf);
        key->len = 0; /* Not computed yet. */
        /*计算与当前dp_packet相应的miniflow所在的netdev_flow_key中的hash
        *该hash将在emc_lookup中匹配entry
        *该hash可在NIC的RSS mode使能时可在收包时计算,或者由miniflow_hash_5tuple得到*/
        key->hash = dpif_netdev_packet_get_rss_hash(packet, &key->mf);

        /*根据key->hash,emc_entry alive,miniflow 3个条件得到dp_netdev_flow*/
        flow = emc_lookup(flow_cache, key);
        if (OVS_LIKELY(flow)) {
            /*根据dp_netdev_flow对dp_packet分类,
            将同以个dp_netdev_flow对应的所有dp_packet放入相同的packet_batch_per_flow*/
            dp_netdev_queue_batches(packet, flow, &key->mf, batches,
                                    n_batches);
        } else {
            /* Exact match cache missed. Group missed packets together at
             * the beginning of the 'packets' array.  */
            packets[n_missed] = packet;
            /* 'key[n_missed]' contains the key of the current packet and it
             * must be returned to the caller. The next key should be extracted
             * to 'keys[n_missed + 1]'. */
            key = &keys[++n_missed];
        }
    }

    dp_netdev_count_packet(pmd, DP_STAT_EXACT_HIT, cnt - n_dropped - n_missed);

    return n_missed;
}

Datapath Classifier Call Graph

在每个subtable中,使用每一个包提取的miniflow与subtable mask生成一个search key来用于dpcls_lookup中的匹配。使用命令ovs-ofctl add-flow br0 dl_type=0x0800,nw_src=21.2.10.1/24,actions=output:2将在ofproto classifier中创建一条flow那么,若src ip为“21.2.10.5”的包第一次进入时,在EMC与dpcls均无法找到匹配,根据学习机制该flow将会在dpcls与EMC中建立表项,若有如下规则:Rule #1:Src IP="21.2.10.*"


为了保存通配规则Rule #1,首先需创建一个合适的“Mask #1”,mask对需要进行匹配的位置1,其他置0,因此Mask #1为"0xFF.FF.FF.00"。此时一个hash-table "HT 1"将被实例化为一个subtable。


同时HT 1将保存一些类似的规则,即那些拥有相同域以及相同mask的规则,比如Rule #1A:Src IP="83.83.83.*"因此每个subtable保存拥有相同域以及相同mask的规则。


当对Src IP=21.2.10.99进行处理时,subtable HT 1对应的Mask #1 and Src IP之后将用来计算hash值从而用来与HT 1中的所有hash值进行对比。


dpcls-多个->subtables-多个->rules,cmap_find_batch在查找hash值的同时将对每个miniflow对应的rule进行赋值。

dpcls_lookup(struct dpcls *cls, const struct netdev_flow_key keys[],
             struct dpcls_rule **rules, const size_t cnt,
             int *num_lookups_p)
{
    /* The received 'cnt' miniflows are the search-keys that will be processed
     * to find a matching entry into the available subtables.
     * The number of bits in map_type is equal to NETDEV_MAX_BURST. */
    typedef uint32_t map_type;
#define MAP_BITS (sizeof(map_type) * CHAR_BIT)
    BUILD_ASSERT_DECL(MAP_BITS >= NETDEV_MAX_BURST);

    struct dpcls_subtable *subtable;

    map_type keys_map = TYPE_MAXIMUM(map_type); /* Set all bits. */
    map_type found_map;
    uint32_t hashes[MAP_BITS];
    const struct cmap_node *nodes[MAP_BITS];

    if (cnt != MAP_BITS) {
        /*keys_map中置1位数为包的总数,并且第i位对应第i个包*/
        keys_map >>= MAP_BITS - cnt; /* Clear extra bits. */
    }
    memset(rules, 0, cnt * sizeof *rules);

    int lookups_match = 0, subtable_pos = 1;

    /* The Datapath classifier - aka dpcls - is composed of subtables.
     * Subtables are dynamically created as needed when new rules are inserted.
     * Each subtable collects rules with matches on a specific subset of packet
     * fields as defined by the subtable's mask.  We proceed to process every
     * search-key against each subtable, but when a match is found for a
     * search-key, the search for that key can stop because the rules are
     * non-overlapping. */
    PVECTOR_FOR_EACH (subtable, &cls->subtables) {
        int i;

        /* Compute hashes for the remaining keys.  Each search-key is
         * masked with the subtable's mask to avoid hashing the wildcarded
         * bits. */
        ULLONG_FOR_EACH_1(i, keys_map) {
            /*Murmur hash对每一个包的miniflow keys[i]计算hash值*/
            hashes[i] = netdev_flow_key_hash_in_mask(&keys[i],
                                                     &subtable->mask);
        }
        /* Lookup. */
        /*keys_map中bit为1的位将根据hashes在subtable->rules中查找
        *找到了就将found_map中该位置1,然后将与之相应的rule指针存于nodes中*/
        found_map = cmap_find_batch(&subtable->rules, keys_map, hashes, nodes);
        /* Check results.  When the i-th bit of found_map is set, it means
         * that a set of nodes with a matching hash value was found for the
         * i-th search-key.  Due to possible hash collisions we need to check
         * which of the found rules, if any, really matches our masked
         * search-key. */
        ULLONG_FOR_EACH_1(i, found_map) {
            struct dpcls_rule *rule;

            CMAP_NODE_FOR_EACH (rule, cmap_node, nodes[i]) {
                /*rule->mask & keys[i]的值与rule->flow相比较*/
                if (OVS_LIKELY(dpcls_rule_matches_key(rule, &keys[i]))) {
                    rules[i] = rule;
                    /* Even at 20 Mpps the 32-bit hit_cnt cannot wrap
                     * within one second optimization interval. */
                    subtable->hit_cnt++;
                    lookups_match += subtable_pos;
                    goto next;
                }
            }
            /* None of the found rules was a match.  Reset the i-th bit to
             * keep searching this key in the next subtable. */
            ULLONG_SET0(found_map, i);  /* Did not match. */
        next:
            ;                     /* Keep Sparse happy. */
        }
        keys_map &= ~found_map;             /* Clear the found rules. */
        if (!keys_map) {
            if (num_lookups_p) {
                *num_lookups_p = lookups_match;
            }
            return true;              /* All found. */
        }
        subtable_pos++;
    }
    if (num_lookups_p) {
        *num_lookups_p = lookups_match;
    }
    return false;                     /* Some misses. */
}

每个dp_packet拥有自己对应的dp_netdev_flow(miniflow),每个dp_netdev_flow拥有自己对应的rules


fast_path_processing(struct dp_netdev_pmd_thread *pmd,
                     struct dp_packet_batch *packets_,
                     struct netdev_flow_key *keys,
                     struct packet_batch_per_flow batches[], size_t *n_batches,
                     odp_port_t in_port,
                     long long now)
{
    int cnt = packets_->count;
#if !defined(__CHECKER__) && !defined(_WIN32)
    const size_t PKT_ARRAY_SIZE = cnt;
#else
    /* Sparse or MSVC doesn't like variable length array. */
    enum { PKT_ARRAY_SIZE = NETDEV_MAX_BURST };
#endif
    struct dp_packet **packets = packets_->packets;
    struct dpcls *cls;
    struct dpcls_rule *rules[PKT_ARRAY_SIZE];
    struct dp_netdev *dp = pmd->dp;
    struct emc_cache *flow_cache = &pmd->flow_cache;
    int miss_cnt = 0, lost_cnt = 0;
    int lookup_cnt = 0, add_lookup_cnt;
    bool any_miss;
    size_t i;

    for (i = 0; i < cnt; i++) {
        /* Key length is needed in all the cases, hash computed on demand. */
        keys[i].len = netdev_flow_key_size(miniflow_n_values(&keys[i].mf));
    }
    /* Get the classifier for the in_port */
    /*根据in_port计算hash值,然后由此hash值在pmd->classifiers中查找dpcls
    *每个in_port拥有一个dpcls*/
    cls = dp_netdev_pmd_lookup_dpcls(pmd, in_port);
    if (OVS_LIKELY(cls)) {
        any_miss = !dpcls_lookup(cls, keys, rules, cnt, &lookup_cnt);
    } else {
        any_miss = true;
        memset(rules, 0, sizeof(rules));
    }
    /*对rules[i]为空的packets[i]转入upcall流程处理*/
    if (OVS_UNLIKELY(any_miss) && !fat_rwlock_tryrdlock(&dp->upcall_rwlock)) {
        uint64_t actions_stub[512 / 8], slow_stub[512 / 8];
        struct ofpbuf actions, put_actions;

        ofpbuf_use_stub(&actions, actions_stub, sizeof actions_stub);
        ofpbuf_use_stub(&put_actions, slow_stub, sizeof slow_stub);

        for (i = 0; i < cnt; i++) {
            struct dp_netdev_flow *netdev_flow;

            if (OVS_LIKELY(rules[i])) {
                continue;
            }

            /* It's possible that an earlier slow path execution installed
             * a rule covering this flow.  In this case, it's a lot cheaper
             * to catch it here than execute a miss. */
            /*根据keys中的miniflow得到in_port
            *利用该in_port查找dpcls,若找到就调用dpcls_lookup在进行一次rule的查找*/
            netdev_flow = dp_netdev_pmd_lookup_flow(pmd, &keys[i],
                                                    &add_lookup_cnt);
            if (netdev_flow) {
                lookup_cnt += add_lookup_cnt;
                rules[i] = &netdev_flow->cr;
                continue;
            }

            miss_cnt++;
            handle_packet_upcall(pmd, packets[i], &keys[i], &actions,
                                 &put_actions, &lost_cnt, now);
        }

        ofpbuf_uninit(&actions);
        ofpbuf_uninit(&put_actions);
        fat_rwlock_unlock(&dp->upcall_rwlock);
    } else if (OVS_UNLIKELY(any_miss)) {
        for (i = 0; i < cnt; i++) {
            if (OVS_UNLIKELY(!rules[i])) {
                dp_packet_delete(packets[i]);
                lost_cnt++;
                miss_cnt++;
            }
        }
    }

    for (i = 0; i < cnt; i++) {
        struct dp_packet *packet = packets[i];
        struct dp_netdev_flow *flow;

        if (OVS_UNLIKELY(!rules[i])) {
            continue;
        }
        /*根据每个包所对应的dpcls_rule得到相对应的dp_netdev_flow
        *其后将该flow插入到emc中
        *同时根据该flow对packet进行入队*/
        flow = dp_netdev_flow_cast(rules[i]);

        emc_insert(flow_cache, &keys[i], flow);
        dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches);
    }

    dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt);
    dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt);
    dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt);
    dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt);
}

Action Execution Call Graph

拥有相同流信息的包将入队至一个相同组(batch),而该组将根据流动作对包进行处理。为了提升包转发性能,将对同一组内的包将同时处理。


根据分组将对包执行特定的动作,以下是动作将包转发至出口的流程:

netdev_send->netdev_dpdk_send__->netdev_dpdk_eth_tx_burst->rte_eth_tx_burst进行发包





  • 1
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值