ovs的netdev, ofproto以及dpif etc.

How to Port Open vSwitch to New Software or Hardware
====================================================

Open vSwitch (OVS) is intended to be easily ported to new software and
hardware platforms.  This document describes the types of changes that
are most likely to be necessary in porting OVS to Unix-like platforms.
(Porting OVS to other kinds of platforms is likely to be more
difficult.)


Vocabulary
----------

For historical reasons, different words are used for essentially the
same concept in different areas of the Open vSwitch source tree.  Here
is a concordance, indexed by the area of the source tree:

        datapath/       vport           ---
        vswitchd/       iface           port
        ofproto/        port            bundle
        ofproto/bond.c  slave           bond
        lib/lacp.c      slave           lacp
        lib/netdev.c    netdev          ---
        database        Interface       Port


Open vSwitch Architectural Overview
-----------------------------------

The following diagram shows the very high-level architecture of Open
vSwitch from a porter's perspective.

                   +-------------------+
                   |    ovs-vswitchd   |<-->ovsdb-server
                   +-------------------+
                   |      ofproto      |<-->OpenFlow controllers
                   +--------+-+--------+
                   | netdev | | ofproto|
                   +--------+ |provider|
                   | netdev | +--------+
                   |provider|
                   +--------+

Some of the components are generic.  Modulo bugs or inadequacies,
these components should not need to be modified as part of a port:

  - "ovs-vswitchd" is the main Open vSwitch userspace program, in
    vswitchd/.  It reads the desired Open vSwitch configuration from
    the ovsdb-server program over an IPC channel and passes this
    configuration down to the "ofproto" library.  It also passes
    certain status and statistical information from ofproto back
    into the database.

  - "ofproto" is the Open vSwitch library, in ofproto/, that
    implements an OpenFlow switch.  It talks to OpenFlow controllers
    over the network and to switch hardware or software through an
    "ofproto provider", explained further below.

  - "netdev" is the Open vSwitch library, in lib/netdev.c, that
    abstracts interacting with network devices, that is, Ethernet
    interfaces.  The netdev library is a thin layer over "netdev
    provider" code, explained further below.

The other components may need attention during a port.  You will
almost certainly have to implement a "netdev provider".  Depending on
the type of port you are doing and the desired performance, you may
also have to implement an "ofproto provider" or a lower-level
component called a "dpif" provider.

The following sections talk about these components in more detail.


Writing a netdev Provider
-------------------------

A "netdev provider" implements an operating system and hardware
specific interface to "network devices", e.g. eth0 on Linux.  Open
vSwitch must be able to open each port on a switch as a netdev, so you
will need to implement a "netdev provider" that works with your switch
hardware and software.

struct netdev_class, in lib/netdev-provider.h, defines the interfaces
required to implement a netdev.  That structure contains many function
pointers, each of which has a comment that is meant to describe its
behavior in detail.  If the requirements are unclear, please report
this as a bug.

The netdev interface can be divided into a few rough categories:

  * Functions required to properly implement OpenFlow features.  For
    example, OpenFlow requires the ability to report the Ethernet
    hardware address of a port.  These functions must be implemented
    for minimally correct operation.

  * Functions required to implement optional Open vSwitch features.
    For example, the Open vSwitch support for in-band control
    requires netdev support for inspecting the TCP/IP stack's ARP
    table.  These functions must be implemented if the corresponding
    OVS features are to work, but may be omitted initially.

  * Functions needed in some implementations but not in others.  For
    example, most kinds of ports (see below) do not need
    functionality to receive packets from a network device.

The existing netdev implementations may serve as useful examples
during a port:

  * lib/netdev-linux.c implements netdev functionality for Linux
    network devices, using Linux kernel calls.  It may be a good
    place to start for full-featured netdev implementations.

  * lib/netdev-vport.c provides support for "virtual ports"
    implemented by the Open vSwitch datapath module for the Linux
    kernel.  This may serve as a model for minimal netdev
    implementations.

  * lib/netdev-dummy.c is a fake netdev implementation useful only
    for testing.


Porting Strategies
------------------

After a netdev provider has been implemented for a system's network
devices, you may choose among three basic porting strategies.

The lowest-effort strategy is to use the "userspace switch"
implementation built into Open vSwitch.  This ought to work, without
writing any more code, as long as the netdev provider that you
implemented supports receiving packets.  It yields poor performance,
however, because every packet passes through the ovs-vswitchd process.
See [INSTALL.userspace.md] for instructions on how to configure a
userspace switch.

If the userspace switch is not the right choice for your port, then
you will have to write more code.  You may implement either an
"ofproto provider" or a "dpif provider".  Which you should choose
depends on a few different factors:

  * Only an ofproto provider can take full advantage of hardware
    with built-in support for wildcards (e.g. an ACL table or a
    TCAM).

  * A dpif provider can take advantage of the Open vSwitch built-in
    implementations of bonding, LACP, 802.1ag, 802.1Q VLANs, and
    other features.  An ofproto provider has to provide its own
    implementations, if the hardware can support them at all.

  * A dpif provider is usually easier to implement, but most
    appropriate for software switching.  It "explodes" wildcard
    rules into exact-match entries (with an optional wildcard mask).
    This allows fast hash lookups in software, but makes
    inefficient use of TCAMs in hardware that support wildcarding.

The following sections describe how to implement each kind of port.


ofproto Providers
-----------------

An "ofproto provider" is what ofproto uses to directly monitor and
control an OpenFlow-capable switch.  struct ofproto_class, in
ofproto/ofproto-provider.h, defines the interfaces to implement an
ofproto provider for new hardware or software.  That structure contains
many function pointers, each of which has a comment that is meant to
describe its behavior in detail.  If the requirements are unclear,
please report this as a bug.

The ofproto provider interface is preliminary.  Please let us know if
it seems unsuitable for your purpose.  We will try to improve it.


Writing a dpif Provider
-----------------------

Open vSwitch has a built-in ofproto provider named "ofproto-dpif",
which is built on top of a library for manipulating datapaths, called
"dpif".  A "datapath" is a simple flow table, one that is only required
to support exact-match flows, that is, flows without wildcards.  When a
packet arrives on a network device, the datapath looks for it in this
table.  If there is a match, then it performs the associated actions.
If there is no match, the datapath passes the packet up to ofproto-dpif,
which maintains the full OpenFlow flow table.  If the packet matches in
this flow table, then ofproto-dpif executes its actions and inserts a
new entry into the dpif flow table.  (Otherwise, ofproto-dpif passes the
packet up to ofproto to send the packet to the OpenFlow controller, if
one is configured.)

When calculating the dpif flow, ofproto-dpif generates an exact-match
flow that describes the missed packet.  It makes an effort to figure out
what fields can be wildcarded based on the switch's configuration and
OpenFlow flow table.  The dpif is free to ignore the suggested wildcards
and only support the exact-match entry.  However, if the dpif supports
wildcarding, then it can use the masks to match multiple flows with
fewer entries and potentially significantly reduce the number of flow
misses handled by ofproto-dpif.

The "dpif" library in turn delegates much of its functionality to a
"dpif provider".  The following diagram shows how dpif providers fit
into the Open vSwitch architecture:

                _
               |   +-------------------+
               |   |    ovs-vswitchd   |<-->ovsdb-server
               |   +-------------------+
               |   |      ofproto      |<-->OpenFlow controllers
               |   +--------+-+--------+  _
               |   | netdev | |ofproto-|   |
     userspace |   +--------+ |  dpif  |   |
               |   | netdev | +--------+   |
               |   |provider| |  dpif  |   |
               |   +---||---+ +--------+   |
               |       ||     |  dpif  |   | implementation of
               |       ||     |provider|   | ofproto provider
               |_      ||     +---||---+   |
                       ||         ||       |
                _  +---||-----+---||---+   |
               |   |          |datapath|   |
        kernel |   |          +--------+  _|
               |   |                   |
               |_  +--------||---------+
                            ||
                         physical
                           NIC

struct dpif_class, in lib/dpif-provider.h, defines the interfaces
required to implement a dpif provider for new hardware or software.
That structure contains many function pointers, each of which has a
comment that is meant to describe its behavior in detail.  If the
requirements are unclear, please report this as a bug.

There are two existing dpif implementations that may serve as
useful examples during a port:

  * lib/dpif-netlink.c is a Linux-specific dpif implementation that
    talks to an Open vSwitch-specific kernel module (whose sources
    are in the "datapath" directory).  The kernel module performs
    all of the switching work, passing packets that do not match any
    flow table entry up to userspace.  This dpif implementation is
    essentially a wrapper around calls into the kernel module.

  * lib/dpif-netdev.c is a generic dpif implementation that performs
    all switching internally.  This is how the Open vSwitch
    userspace switch is implemented.

vswitchd是ovs中最核心的组件,openflow的相关逻辑都在vswitchd里实现,一般来说,ovs分为datapath, vswitchd以及ovsdb三个部分,datapath一般是和具体是数据面平台相关的,比如白盒交换机,或者linux内核等,同时datapath不是必须的组件。ovsdb用于存储vswitch本身的配置信息,比如端口,拓扑,规则等。vswitchd在ovs dist包里是以用户态进程形式呈现的,但这个不是绝对的,上文摘录的部分给出了把ovs移植到其他平台上的方法,也算是目前官方仅有的一篇大致描述了ovs架构的文档

可以看出vswitchd本身是分层的结构,最上面的daemon层主要用于和ovsdb通信,做配置的下发和更新等,中间是ofproto层,用于和openflow控制器通信,以及通过ofproto_class暴露了ofproto provider接口,不同平台上openflow的具体实现就通过ofproto_class统一了接口。

在ovs的定义里,netdev代表了具体平台的设备实现,e.g. linux内核的net_device或者移植到交换机平台下的port等,struct netdev_class定义了netdev-provider的具体实现需要的接口,具体的平台实现需要支持这些统一的接口,从而完成netdev设备的创建,销毁,打开,关闭等一系列操作。

不同的netdev类型通过netdev_register_provider被注册,vswitchd内部会保存一个struct cmap netdev_classes保存所有注册的netdev类型,struct netdev定义如下

/* A network device (e.g. an Ethernet device).
 *
 * Network device implementations may read these members but should not modify
 * them. */
struct netdev {
    /* The following do not change during the lifetime of a struct netdev. */
    char *name;                         /* Name of network device. */
    const struct netdev_class *netdev_class; /* Functions to control
                                                this device. */

    /* A sequence number which indicates changes in one of 'netdev''s
     * properties.   It must be nonzero so that users have a value which
     * they may use as a reset when tracking 'netdev'.
     *
     * Minimally, the sequence number is required to change whenever
     * 'netdev''s flags, features, ethernet address, or carrier changes. */
    uint64_t change_seq;

    /* A netdev provider might be unable to change some of the device's
     * parameter (n_rxq, mtu) when the device is in use.  In this case
     * the provider can notify the upper layer by calling
     * netdev_request_reconfigure().  The upper layer will react by stopping
     * the operations on the device and calling netdev_reconfigure() to allow
     * the configuration changes.  'last_reconfigure_seq' remembers the value
     * of 'reconfigure_seq' when the last reconfiguration happened. */
    struct seq *reconfigure_seq;
    uint64_t last_reconfigure_seq;

    /* If this is 'true', the user explicitly specified an MTU for this
     * netdev.  Otherwise, Open vSwitch is allowed to override it. */
    bool mtu_user_config;

    /* The core netdev code initializes these at netdev construction and only
     * provide read-only access to its client.  Netdev implementations may
     * modify them. */
    int n_txq;
    int n_rxq;
    int ref_cnt;                        /* Times this devices was opened. */
    struct shash_node *node;            /* Pointer to element in global map. */
    struct ovs_list saved_flags_list; /* Contains "struct netdev_saved_flags". */
};
struct netdev_class的定义如下,可以看出netdev_class更接近于一个ops结构体,同时加入了设备的队列管理操作

/* Network device class structure, to be defined by each implementation of a
 * network device.
 *
 * These functions return 0 if successful or a positive errno value on failure,
 * except where otherwise noted.
 *
 *
 * Data Structures
 * ===============
 *
 * These functions work primarily with two different kinds of data structures:
 *
 *   - "struct netdev", which represents a network device.
 *
 *   - "struct netdev_rxq", which represents a handle for capturing packets
 *     received on a network device
 *
 * Each of these data structures contains all of the implementation-independent
 * generic state for the respective concept, called the "base" state.  None of
 * them contains any extra space for implementations to use.  Instead, each
 * implementation is expected to declare its own data structure that contains
 * an instance of the generic data structure plus additional
 * implementation-specific members, called the "derived" state.  The
 * implementation can use casts or (preferably) the CONTAINER_OF macro to
 * obtain access to derived state given only a pointer to the embedded generic
 * data structure.
 *
 *
 * Life Cycle
 * ==========
 *
 * Four stylized functions accompany each of these data structures:
 *
 *            "alloc"          "construct"        "destruct"       "dealloc"
 *            ------------   ----------------  ---------------  --------------
 * netdev      ->alloc        ->construct        ->destruct        ->dealloc
 * netdev_rxq  ->rxq_alloc    ->rxq_construct    ->rxq_destruct    ->rxq_dealloc
 *
 * Any instance of a given data structure goes through the following life
 * cycle:
 *
 *   1. The client calls the "alloc" function to obtain raw memory.  If "alloc"
 *      fails, skip all the other steps.
 *
 *   2. The client initializes all of the data structure's base state.  If this
 *      fails, skip to step 7.
 *
 *   3. The client calls the "construct" function.  The implementation
 *      initializes derived state.  It may refer to the already-initialized
 *      base state.  If "construct" fails, skip to step 6.
 *
 *   4. The data structure is now initialized and in use.
 *
 *   5. When the data structure is no longer needed, the client calls the
 *      "destruct" function.  The implementation uninitializes derived state.
 *      The base state has not been uninitialized yet, so the implementation
 *      may still refer to it.
 *
 *   6. The client uninitializes all of the data structure's base state.
 *
 *   7. The client calls the "dealloc" to free the raw memory.  The
 *      implementation must not refer to base or derived state in the data
 *      structure, because it has already been uninitialized.
 *
 * If netdev support multi-queue IO then netdev->construct should set initialize
 * netdev->n_rxq to number of queues.
 *
 * Each "alloc" function allocates and returns a new instance of the respective
 * data structure.  The "alloc" function is not given any information about the
 * use of the new data structure, so it cannot perform much initialization.
 * Its purpose is just to ensure that the new data structure has enough room
 * for base and derived state.  It may return a null pointer if memory is not
 * available, in which case none of the other functions is called.
 *
 * Each "construct" function initializes derived state in its respective data
 * structure.  When "construct" is called, all of the base state has already
 * been initialized, so the "construct" function may refer to it.  The
 * "construct" function is allowed to fail, in which case the client calls the
 * "dealloc" function (but not the "destruct" function).
 *
 * Each "destruct" function uninitializes and frees derived state in its
 * respective data structure.  When "destruct" is called, the base state has
 * not yet been uninitialized, so the "destruct" function may refer to it.  The
 * "destruct" function is not allowed to fail.
 *
 * Each "dealloc" function frees raw memory that was allocated by the
 * "alloc" function.  The memory's base and derived members might not have ever
 * been initialized (but if "construct" returned successfully, then it has been
 * "destruct"ed already).  The "dealloc" function is not allowed to fail.
 *
 *
 * Device Change Notification
 * ==========================
 *
 * Minimally, implementations are required to report changes to netdev flags,
 * features, ethernet address or carrier through connectivity_seq. Changes to
 * other properties are allowed to cause notification through this interface,
 * although implementations should try to avoid this. connectivity_seq_get()
 * can be used to acquire a reference to the struct seq. The interface is
 * described in detail in seq.h. */
struct netdev_class {
    /* Type of netdevs in this class, e.g. "system", "tap", "gre", etc.
     *
     * One of the providers should supply a "system" type, since this is
     * the type assumed if no type is specified when opening a netdev.
     * The "system" type corresponds to an existing network device on
     * the system. */
    const char *type;

    /* If 'true' then this netdev should be polled by PMD threads. */
    bool is_pmd;

/* ## ------------------- ## */
/* ## Top-Level Functions ## */
/* ## ------------------- ## */

    /* Called when the netdev provider is registered, typically at program
     * startup.  Returning an error from this function will prevent any network
     * device in this class from being opened.
     *
     * This function may be set to null if a network device class needs no
     * initialization at registration time. */
    int (*init)(void);

    /* Performs periodic work needed by netdevs of this class.  May be null if
     * no periodic work is necessary.
     *
     * 'netdev_class' points to the class.  It is useful in case the same
     * function is used to implement different classes. */
    void (*run)(const struct netdev_class *netdev_class);

    /* Arranges for poll_block() to wake up if the "run" member function needs
     * to be called.  Implementations are additionally required to wake
     * whenever something changes in any of its netdevs which would cause their
     * ->change_seq() function to change its result.  May be null if nothing is
     * needed here.
     *
     * 'netdev_class' points to the class.  It is useful in case the same
     * function is used to implement different classes. */
    void (*wait)(const struct netdev_class *netdev_class);

/* ## ---------------- ## */
/* ## netdev Functions ## */
/* ## ---------------- ## */

    /* Life-cycle functions for a netdev.  See the large comment above on
     * struct netdev_class. */
    struct netdev *(*alloc)(void);
    int (*construct)(struct netdev *);
    void (*destruct)(struct netdev *);
    void (*dealloc)(struct netdev *);

    /* Fetches the device 'netdev''s configuration, storing it in 'args'.
     * The caller owns 'args' and pre-initializes it to an empty smap.
     *
     * If this netdev class does not have any configuration options, this may
     * be a null pointer. */
    int (*get_config)(const struct netdev *netdev, struct smap *args);

    /* Changes the device 'netdev''s configuration to 'args'.
     *
     * If this netdev class does not support configuration, this may be a null
     * pointer. */
    int (*set_config)(struct netdev *netdev, const struct smap *args);

    /* Returns the tunnel configuration of 'netdev'.  If 'netdev' is
     * not a tunnel, returns null.
     *
     * If this function would always return null, it may be null instead. */
    const struct netdev_tunnel_config *
        (*get_tunnel_config)(const struct netdev *netdev);

    /* Build Tunnel header.  Ethernet and ip header parameters are passed to
     * tunnel implementation to build entire outer header for given flow. */
    int (*build_header)(const struct netdev *, struct ovs_action_push_tnl *data,
                        const struct netdev_tnl_build_header_params *params);

    /* build_header() can not build entire header for all packets for given
     * flow.  Push header is called for packet to build header specific to
     * a packet on actual transmit.  It uses partial header build by
     * build_header() which is passed as data. */
    void (*push_header)(struct dp_packet *packet,
                        const struct ovs_action_push_tnl *data);

    /* Pop tunnel header from packet, build tunnel metadata and resize packet
     * for further processing.
     * Returns NULL in case of error or tunnel implementation queued packet for further
     * processing. */
    struct dp_packet * (*pop_header)(struct dp_packet *packet);

    /* Returns the id of the numa node the 'netdev' is on.  If there is no
     * such info, returns NETDEV_NUMA_UNSPEC. */
    int (*get_numa_id)(const struct netdev *netdev);

    /* Configures the number of tx queues of 'netdev'. Returns 0 if successful,
     * otherwise a positive errno value.
     *
     * 'n_txq' specifies the exact number of transmission queues to create.
     *
     * The caller will call netdev_reconfigure() (if necessary) before using
     * netdev_send() on any of the newly configured queues, giving the provider
     * a chance to adjust its settings.
     *
     * On error, the tx queue configuration is unchanged. */
    int (*set_tx_multiq)(struct netdev *netdev, unsigned int n_txq);

    /* Sends buffers on 'netdev'.
     * Returns 0 if successful (for every buffer), otherwise a positive errno
     * value.  Returns EAGAIN without blocking if one or more packets cannot be
     * queued immediately. Returns EMSGSIZE if a partial packet was transmitted
     * or if a packet is too big or too small to transmit on the device.
     *
     * If the function returns a non-zero value, some of the packets might have
     * been sent anyway.
     *
     * If 'may_steal' is false, the caller retains ownership of all the
     * packets.  If 'may_steal' is true, the caller transfers ownership of all
     * the packets to the network device, regardless of success.
     *
     * If 'concurrent_txq' is true, the caller may perform concurrent calls
     * to netdev_send() with the same 'qid'. The netdev provider is responsible
     * for making sure that these concurrent calls do not create a race
     * condition by using locking or other synchronization if required.
     *
     * The network device is expected to maintain one or more packet
     * transmission queues, so that the caller does not ordinarily have to
     * do additional queuing of packets.  'qid' specifies the queue to use
     * and can be ignored if the implementation does not support multiple
     * queues.
     *
     * May return EOPNOTSUPP if a network device does not implement packet
     * transmission through this interface.  This function may be set to null
     * if it would always return EOPNOTSUPP anyhow.  (This will prevent the
     * network device from being usefully used by the netdev-based "userspace
     * datapath".  It will also prevent the OVS implementation of bonding from
     * working properly over 'netdev'.) */
    int (*send)(struct netdev *netdev, int qid, struct dp_packet_batch *batch,
                bool may_steal, bool concurrent_txq);

    /* Registers with the poll loop to wake up from the next call to
     * poll_block() when the packet transmission queue for 'netdev' has
     * sufficient room to transmit a packet with netdev_send().
     *
     * The network device is expected to maintain one or more packet
     * transmission queues, so that the caller does not ordinarily have to
     * do additional queuing of packets.  'qid' specifies the queue to use
     * and can be ignored if the implementation does not support multiple
     * queues.
     *
     * May be null if not needed, such as for a network device that does not
     * implement packet transmission through the 'send' member function. */
    void (*send_wait)(struct netdev *netdev, int qid);

    /* Sets 'netdev''s Ethernet address to 'mac' */
    int (*set_etheraddr)(struct netdev *netdev, const struct eth_addr mac);

    /* Retrieves 'netdev''s Ethernet address into 'mac'.
     *
     * This address will be advertised as 'netdev''s MAC address through the
     * OpenFlow protocol, among other uses. */
    int (*get_etheraddr)(const struct netdev *netdev, struct eth_addr *mac);

    /* Retrieves 'netdev''s MTU into '*mtup'.
     *
     * The MTU is the maximum size of transmitted (and received) packets, in
     * bytes, not including the hardware header; thus, this is typically 1500
     * bytes for Ethernet devices.
     *
     * If 'netdev' does not have an MTU (e.g. as some tunnels do not), then
     * this function should return EOPNOTSUPP.  This function may be set to
     * null if it would always return EOPNOTSUPP. */
    int (*get_mtu)(const struct netdev *netdev, int *mtup);

    /* Sets 'netdev''s MTU to 'mtu'.
     *
     * If 'netdev' does not have an MTU (e.g. as some tunnels do not), then
     * this function should return EOPNOTSUPP.  This function may be set to
     * null if it would always return EOPNOTSUPP. */
    int (*set_mtu)(struct netdev *netdev, int mtu);

    /* Returns the ifindex of 'netdev', if successful, as a positive number.
     * On failure, returns a negative errno value.
     *
     * The desired semantics of the ifindex value are a combination of those
     * specified by POSIX for if_nametoindex() and by SNMP for ifIndex.  An
     * ifindex value should be unique within a host and remain stable at least
     * until reboot.  SNMP says an ifindex "ranges between 1 and the value of
     * ifNumber" but many systems do not follow this rule anyhow.
     *
     * This function may be set to null if it would always return -EOPNOTSUPP.
     */
    int (*get_ifindex)(const struct netdev *netdev);

    /* Sets 'carrier' to true if carrier is active (link light is on) on
     * 'netdev'.
     *
     * May be null if device does not provide carrier status (will be always
     * up as long as device is up).
     */
    int (*get_carrier)(const struct netdev *netdev, bool *carrier);

    /* Returns the number of times 'netdev''s carrier has changed since being
     * initialized.
     *
     * If null, callers will assume the number of carrier resets is zero. */
    long long int (*get_carrier_resets)(const struct netdev *netdev);

    /* Forces ->get_carrier() to poll 'netdev''s MII registers for link status
     * instead of checking 'netdev''s carrier.  'netdev''s MII registers will
     * be polled once every 'interval' milliseconds.  If 'netdev' does not
     * support MII, another method may be used as a fallback.  If 'interval' is
     * less than or equal to zero, reverts ->get_carrier() to its normal
     * behavior.
     *
     * Most network devices won't support this feature and will set this
     * function pointer to NULL, which is equivalent to returning EOPNOTSUPP.
     */
    int (*set_miimon_interval)(struct netdev *netdev, long long int interval);

    /* Retrieves current device stats for 'netdev' into 'stats'.
     *
     * A network device that supports some statistics but not others, it should
     * set the values of the unsupported statistics to all-1-bits
     * (UINT64_MAX). */
    int (*get_stats)(const struct netdev *netdev, struct netdev_stats *);

    /* Stores the features supported by 'netdev' into each of '*current',
     * '*advertised', '*supported', and '*peer'.  Each value is a bitmap of
     * NETDEV_F_* bits.
     *
     * This function may be set to null if it would always return EOPNOTSUPP.
     */
    int (*get_features)(const struct netdev *netdev,
                        enum netdev_features *current,
                        enum netdev_features *advertised,
                        enum netdev_features *supported,
                        enum netdev_features *peer);

    /* Set the features advertised by 'netdev' to 'advertise', which is a
     * set of NETDEV_F_* bits.
     *
     * This function may be set to null for a network device that does not
     * support configuring advertisements. */
    int (*set_advertisements)(struct netdev *netdev,
                              enum netdev_features advertise);

    /* Attempts to set input rate limiting (policing) policy, such that up to
     * 'kbits_rate' kbps of traffic is accepted, with a maximum accumulative
     * burst size of 'kbits' kb.
     *
     * This function may be set to null if policing is not supported. */
    int (*set_policing)(struct netdev *netdev, unsigned int kbits_rate,
                        unsigned int kbits_burst);

    /* Adds to 'types' all of the forms of QoS supported by 'netdev', or leaves
     * it empty if 'netdev' does not support QoS.  Any names added to 'types'
     * should be documented as valid for the "type" column in the "QoS" table
     * in vswitchd/vswitch.xml (which is built as ovs-vswitchd.conf.db(8)).
     *
     * Every network device must support disabling QoS with a type of "", but
     * this function must not add "" to 'types'.
     *
     * The caller is responsible for initializing 'types' (e.g. with
     * sset_init()) before calling this function.  The caller retains ownership
     * of 'types'.
     *
     * May be NULL if 'netdev' does not support QoS at all. */
    int (*get_qos_types)(const struct netdev *netdev, struct sset *types);

    /* Queries 'netdev' for its capabilities regarding the specified 'type' of
     * QoS.  On success, initializes 'caps' with the QoS capabilities.
     *
     * Should return EOPNOTSUPP if 'netdev' does not support 'type'.  May be
     * NULL if 'netdev' does not support QoS at all. */
    int (*get_qos_capabilities)(const struct netdev *netdev,
                                const char *type,
                                struct netdev_qos_capabilities *caps);

    /* Queries 'netdev' about its currently configured form of QoS.  If
     * successful, stores the name of the current form of QoS into '*typep'
     * and any details of configuration as string key-value pairs in
     * 'details'.
     *
     * A '*typep' of "" indicates that QoS is currently disabled on 'netdev'.
     *
     * The caller initializes 'details' before calling this function.  The
     * caller takes ownership of the string key-values pairs added to
     * 'details'.
     *
     * The netdev retains ownership of '*typep'.
     *
     * '*typep' will be one of the types returned by netdev_get_qos_types() for
     * 'netdev'.  The contents of 'details' should be documented as valid for
     * '*typep' in the "other_config" column in the "QoS" table in
     * vswitchd/vswitch.xml (which is built as ovs-vswitchd.conf.db(8)).
     *
     * May be NULL if 'netdev' does not support QoS at all. */
    int (*get_qos)(const struct netdev *netdev,
                   const char **typep, struct smap *details);

    /* Attempts to reconfigure QoS on 'netdev', changing the form of QoS to
     * 'type' with details of configuration from 'details'.
     *
     * On error, the previous QoS configuration is retained.
     *
     * When this function changes the type of QoS (not just 'details'), this
     * also resets all queue configuration for 'netdev' to their defaults
     * (which depend on the specific type of QoS).  Otherwise, the queue
     * configuration for 'netdev' is unchanged.
     *
     * 'type' should be "" (to disable QoS) or one of the types returned by
     * netdev_get_qos_types() for 'netdev'.  The contents of 'details' should
     * be documented as valid for the given 'type' in the "other_config" column
     * in the "QoS" table in vswitchd/vswitch.xml (which is built as
     * ovs-vswitchd.conf.db(8)).
     *
     * May be NULL if 'netdev' does not support QoS at all. */
    int (*set_qos)(struct netdev *netdev,
                   const char *type, const struct smap *details);

    /* Queries 'netdev' for information about the queue numbered 'queue_id'.
     * If successful, adds that information as string key-value pairs to
     * 'details'.  Returns 0 if successful, otherwise a positive errno value.
     *
     * Should return EINVAL if 'queue_id' is greater than or equal to the
     * number of supported queues (as reported in the 'n_queues' member of
     * struct netdev_qos_capabilities by 'get_qos_capabilities').
     *
     * The caller initializes 'details' before calling this function.  The
     * caller takes ownership of the string key-values pairs added to
     * 'details'.
     *
     * The returned contents of 'details' should be documented as valid for the
     * given 'type' in the "other_config" column in the "Queue" table in
     * vswitchd/vswitch.xml (which is built as ovs-vswitchd.conf.db(8)).
     */
    int (*get_queue)(const struct netdev *netdev,
                     unsigned int queue_id, struct smap *details);

    /* Configures the queue numbered 'queue_id' on 'netdev' with the key-value
     * string pairs in 'details'.  The contents of 'details' should be
     * documented as valid for the given 'type' in the "other_config" column in
     * the "Queue" table in vswitchd/vswitch.xml (which is built as
     * ovs-vswitchd.conf.db(8)).  Returns 0 if successful, otherwise a positive
     * errno value.  On failure, the given queue's configuration should be
     * unmodified.
     *
     * Should return EINVAL if 'queue_id' is greater than or equal to the
     * number of supported queues (as reported in the 'n_queues' member of
     * struct netdev_qos_capabilities by 'get_qos_capabilities'), or if
     * 'details' is invalid for the type of queue.
     *
     * This function does not modify 'details', and the caller retains
     * ownership of it.
     *
     * May be NULL if 'netdev' does not support QoS at all. */
    int (*set_queue)(struct netdev *netdev,
                     unsigned int queue_id, const struct smap *details);

    /* Attempts to delete the queue numbered 'queue_id' from 'netdev'.
     *
     * Should return EINVAL if 'queue_id' is greater than or equal to the
     * number of supported queues (as reported in the 'n_queues' member of
     * struct netdev_qos_capabilities by 'get_qos_capabilities').  Should
     * return EOPNOTSUPP if 'queue_id' is valid but may not be deleted (e.g. if
     * 'netdev' has a fixed set of queues with the current QoS mode).
     *
     * May be NULL if 'netdev' does not support QoS at all, or if all of its
     * QoS modes have fixed sets of queues. */
    int (*delete_queue)(struct netdev *netdev, unsigned int queue_id);

    /* Obtains statistics about 'queue_id' on 'netdev'.  Fills 'stats' with the
     * queue's statistics.  May set individual members of 'stats' to all-1-bits
     * if the statistic is unavailable.
     *
     * May be NULL if 'netdev' does not support QoS at all. */
    int (*get_queue_stats)(const struct netdev *netdev, unsigned int queue_id,
                           struct netdev_queue_stats *stats);

    /* Attempts to begin dumping the queues in 'netdev'.  On success, returns 0
     * and initializes '*statep' with any data needed for iteration.  On
     * failure, returns a positive errno value.
     *
     * May be NULL if 'netdev' does not support QoS at all. */
    int (*queue_dump_start)(const struct netdev *netdev, void **statep);

    /* Attempts to retrieve another queue from 'netdev' for 'state', which was
     * initialized by a successful call to the 'queue_dump_start' function for
     * 'netdev'.  On success, stores a queue ID into '*queue_id' and fills
     * 'details' with the configuration of the queue with that ID.  Returns EOF
     * if the last queue has been dumped, or a positive errno value on error.
     * This function will not be called again once it returns nonzero once for
     * a given iteration (but the 'queue_dump_done' function will be called
     * afterward).
     *
     * The caller initializes and clears 'details' before calling this
     * function.  The caller takes ownership of the string key-values pairs
     * added to 'details'.
     *
     * The returned contents of 'details' should be documented as valid for the
     * given 'type' in the "other_config" column in the "Queue" table in
     * vswitchd/vswitch.xml (which is built as ovs-vswitchd.conf.db(8)).
     *
     * May be NULL if 'netdev' does not support QoS at all. */
    int (*queue_dump_next)(const struct netdev *netdev, void *state,
                           unsigned int *queue_id, struct smap *details);

    /* Releases resources from 'netdev' for 'state', which was initialized by a
     * successful call to the 'queue_dump_start' function for 'netdev'.
     *
     * May be NULL if 'netdev' does not support QoS at all. */
    int (*queue_dump_done)(const struct netdev *netdev, void *state);

    /* Iterates over all of 'netdev''s queues, calling 'cb' with the queue's
     * ID, its statistics, and the 'aux' specified by the caller.  The order of
     * iteration is unspecified, but (when successful) each queue must be
     * visited exactly once.
     *
     * 'cb' will not modify or free the statistics passed in. */
    int (*dump_queue_stats)(const struct netdev *netdev,
                            void (*cb)(unsigned int queue_id,
                                       struct netdev_queue_stats *,
                                       void *aux),
                            void *aux);

    /* Assigns 'addr' as 'netdev''s IPv4 address and 'mask' as its netmask.  If
     * 'addr' is INADDR_ANY, 'netdev''s IPv4 address is cleared.
     *
     * This function may be set to null if it would always return EOPNOTSUPP
     * anyhow. */
    int (*set_in4)(struct netdev *netdev, struct in_addr addr,
                   struct in_addr mask);

    /* Returns all assigned IP address to  'netdev' and returns 0.
     * API allocates array of address and masks and set it to
     * '*addr' and '*mask'.
     * Otherwise, returns a positive errno value and sets '*addr', '*mask
     * and '*n_addr' to NULL.
     *
     * The following error values have well-defined meanings:
     *
     *   - EADDRNOTAVAIL: 'netdev' has no assigned IPv6 address.
     *
     *   - EOPNOTSUPP: No IPv6 network stack attached to 'netdev'.
     *
     * 'addr' may be null, in which case the address itself is not reported. */
    int (*get_addr_list)(const struct netdev *netdev, struct in6_addr **in,
                         struct in6_addr **mask, int *n_in6);

    /* Adds 'router' as a default IP gateway for the TCP/IP stack that
     * corresponds to 'netdev'.
     *
     * This function may be set to null if it would always return EOPNOTSUPP
     * anyhow. */
    int (*add_router)(struct netdev *netdev, struct in_addr router);

    /* Looks up the next hop for 'host' in the host's routing table.  If
     * successful, stores the next hop gateway's address (0 if 'host' is on a
     * directly connected network) in '*next_hop' and a copy of the name of the
     * device to reach 'host' in '*netdev_name', and returns 0.  The caller is
     * responsible for freeing '*netdev_name' (by calling free()).
     *
     * This function may be set to null if it would always return EOPNOTSUPP
     * anyhow. */
    int (*get_next_hop)(const struct in_addr *host, struct in_addr *next_hop,
                        char **netdev_name);

    /* Retrieves driver information of the device.
     *
     * Populates 'smap' with key-value pairs representing the status of the
     * device.  'smap' is a set of key-value string pairs representing netdev
     * type specific information.  For more information see
     * ovs-vswitchd.conf.db(5).
     *
     * The caller is responsible for destroying 'smap' and its data.
     *
     * This function may be set to null if it would always return EOPNOTSUPP
     * anyhow. */
    int (*get_status)(const struct netdev *netdev, struct smap *smap);

    /* Looks up the ARP table entry for 'ip' on 'netdev' and stores the
     * corresponding MAC address in 'mac'.  A return value of ENXIO, in
     * particular, indicates that there is no ARP table entry for 'ip' on
     * 'netdev'.
     *
     * This function may be set to null if it would always return EOPNOTSUPP
     * anyhow. */
    int (*arp_lookup)(const struct netdev *netdev, ovs_be32 ip,
                      struct eth_addr *mac);

    /* Retrieves the current set of flags on 'netdev' into '*old_flags'.  Then,
     * turns off the flags that are set to 1 in 'off' and turns on the flags
     * that are set to 1 in 'on'.  (No bit will be set to 1 in both 'off' and
     * 'on'; that is, off & on == 0.)
     *
     * This function may be invoked from a signal handler.  Therefore, it
     * should not do anything that is not signal-safe (such as logging). */
    int (*update_flags)(struct netdev *netdev, enum netdev_flags off,
                        enum netdev_flags on, enum netdev_flags *old_flags);

    /* If the provider called netdev_request_reconfigure(), the upper layer
     * will eventually call this.  The provider can update the device
     * configuration knowing that the upper layer will not call rxq_recv() or
     * send() until this function returns.
     *
     * On error, the configuration is indeterminant and the device cannot be
     * used to send and receive packets until a successful configuration is
     * applied. */
    int (*reconfigure)(struct netdev *netdev);

/* ## -------------------- ## */
/* ## netdev_rxq Functions ## */
/* ## -------------------- ## */

/* If a particular netdev class does not support receiving packets, all these
 * function pointers must be NULL. */

    /* Life-cycle functions for a netdev_rxq.  See the large comment above on
     * struct netdev_class. */
    struct netdev_rxq *(*rxq_alloc)(void);
    int (*rxq_construct)(struct netdev_rxq *);
    void (*rxq_destruct)(struct netdev_rxq *);
    void (*rxq_dealloc)(struct netdev_rxq *);

    /* Attempts to receive a batch of packets from 'rx'.  In 'batch', the
     * caller supplies 'packets' as the pointer to the beginning of an array
     * of NETDEV_MAX_BURST pointers to dp_packet.  If successful, the
     * implementation stores pointers to up to NETDEV_MAX_BURST dp_packets into
     * the array, transferring ownership of the packets to the caller, stores
     * the number of received packets into 'count', and returns 0.
     *
     * The implementation does not necessarily initialize any non-data members
     * of 'packets' in 'batch'.  That is, the caller must initialize layer
     * pointers and metadata itself, if desired, e.g. with pkt_metadata_init()
     * and miniflow_extract().
     *
     * Implementations should allocate buffers with DP_NETDEV_HEADROOM bytes of
     * headroom.
     *
     * Returns EAGAIN immediately if no packet is ready to be received or
     * another positive errno value if an error was encountered. */
    int (*rxq_recv)(struct netdev_rxq *rx, struct dp_packet_batch *batch);

    /* Registers with the poll loop to wake up from the next call to
     * poll_block() when a packet is ready to be received with
     * netdev_rxq_recv() on 'rx'. */
    void (*rxq_wait)(struct netdev_rxq *rx);

    /* Discards all packets waiting to be received from 'rx'. */
    int (*rxq_drain)(struct netdev_rxq *rx);
};

目前已经实现的netdev_class包括,netdev_linux_class, netdev_internal_class, netdev_tap_class, dpdk_class, dpdk_ring_class, dpdk_vhost_class, dpdk_vhost_client_class, patch_clas等。lib/netdev-linux.c里面实现的netdev通过调用内核api实现了基于内核的网络设备netdev,lib/netdev-vport.c则基于datapath模块实现了基于vport的网络设备netdev。

ofproto层通过ofproto_class定义了openflow的接口,除此之外,还有几个重要的数据结构和ofproto相关,struct ofproto, struct ofport, struct rule, struct oftable, struct ofgroup

1. struct ofproto代表了一个openflow switch结构体,内部包含了struct ofproto_class, struct ofport的hash map,struct oftable, struct ofgroup的hash map etc.

/* An OpenFlow switch.
 *
 * With few exceptions, ofproto implementations may look at these fields but
 * should not modify them. */
struct ofproto {
    struct hmap_node hmap_node; /* In global 'all_ofprotos' hmap. */
    const struct ofproto_class *ofproto_class;
    char *type;                 /* Datapath type. */
    char *name;                 /* Datapath name. */

    /* Settings. */
    uint64_t fallback_dpid;     /* Datapath ID if no better choice found. */
    uint64_t datapath_id;       /* Datapath ID. */
    bool forward_bpdu;          /* Option to allow forwarding of BPDU frames
                                 * when NORMAL action is invoked. */
    char *mfr_desc;             /* Manufacturer (NULL for default). */
    char *hw_desc;              /* Hardware (NULL for default). */
    char *sw_desc;              /* Software version (NULL for default). */
    char *serial_desc;          /* Serial number (NULL for default). */
    char *dp_desc;              /* Datapath description (NULL for default). */
    enum ofputil_frag_handling frag_handling;

    /* Datapath. */
    struct hmap ports;          /* Contains "struct ofport"s. */
    struct shash port_by_name;
    struct simap ofp_requests;  /* OpenFlow port number requests. */
    uint16_t alloc_port_no;     /* Last allocated OpenFlow port number. */
    uint16_t max_ports;         /* Max possible OpenFlow port num, plus one. */
    struct hmap ofport_usage;   /* Map ofport to last used time. */
    uint64_t change_seq;        /* Change sequence for netdev status. */

    /* Flow tables. */
    long long int eviction_group_timer; /* For rate limited reheapification. */
    struct oftable *tables;
    int n_tables;
    ovs_version_t tables_version;  /* Controls which rules are visible to
                                    * table lookups. */

    /* Rules indexed on their cookie values, in all flow tables. */
    struct hindex cookies OVS_GUARDED_BY(ofproto_mutex);
    struct hmap learned_cookies OVS_GUARDED_BY(ofproto_mutex);

    /* List of expirable flows, in all flow tables. */
    struct ovs_list expirable OVS_GUARDED_BY(ofproto_mutex);

    /* Meter table.
     * OpenFlow meters start at 1.  To avoid confusion we leave the first
     * pointer in the array un-used, and index directly with the OpenFlow
     * meter_id. */
    struct ofputil_meter_features meter_features;
    struct meter **meters; /* 'meter_features.max_meter' + 1 pointers. */

    /* OpenFlow connections. */
    struct connmgr *connmgr;

    /* Delayed rule executions.
     *
     * We delay calls to ->ofproto_class->rule_execute() past releasing
     * ofproto_mutex during a flow_mod, because otherwise a "learn" action
     * triggered by the executing the packet would try to recursively modify
     * the flow table and reacquire the global lock. */
    struct guarded_list rule_executes; /* Contains "struct rule_execute"s. */

    int min_mtu;                    /* Current MTU of non-internal ports. */

    /* Groups. */
    struct cmap groups;               /* Contains "struct ofgroup"s. */
    uint32_t n_groups[4] OVS_GUARDED; /* # of existing groups of each type. */
    struct ofputil_group_features ogf;
};

2. struct ofport代表了openflow switch的一个端口,同时关联一个struct netdev的设备抽象

/* An OpenFlow port within a "struct ofproto".
 *
 * The port's name is netdev_get_name(port->netdev).
 *
 * With few exceptions, ofproto implementations may look at these fields but
 * should not modify them. */
struct ofport {
    struct hmap_node hmap_node; /* In struct ofproto's "ports" hmap. */
    struct ofproto *ofproto;    /* The ofproto that contains this port. */
    struct netdev *netdev;
    struct ofputil_phy_port pp;
    ofp_port_t ofp_port;        /* OpenFlow port number. */
    uint64_t change_seq;
    long long int created;      /* Time created, in msec. */
    int mtu;
};

3. struct rule表示一条openflow规则,rule里面包含了一组struct rule_actions

struct rule {
    /* Where this rule resides in an OpenFlow switch.
     *
     * These are immutable once the rule is constructed, hence 'const'. */
    struct ofproto *const ofproto; /* The ofproto that contains this rule. */
    const struct cls_rule cr;      /* In owning ofproto's classifier. */
    const uint8_t table_id;        /* Index in ofproto's 'tables' array. */
    bool removed;                  /* Rule has been removed from the ofproto
                                    * data structures. */

    /* Protects members marked OVS_GUARDED.
     * Readers only need to hold this mutex.
     * Writers must hold both this mutex AND ofproto_mutex.
     * By implication writers can read *without* taking this mutex while they
     * hold ofproto_mutex. */
    struct ovs_mutex mutex OVS_ACQ_AFTER(ofproto_mutex);

    /* Number of references.
     * The classifier owns one reference.
     * Any thread trying to keep a rule from being freed should hold its own
     * reference. */
    struct ovs_refcount ref_count;

    /* A "flow cookie" is the OpenFlow name for a 64-bit value associated with
     * a flow.. */
    ovs_be64 flow_cookie OVS_GUARDED;
    struct hindex_node cookie_node OVS_GUARDED_BY(ofproto_mutex);

    enum ofputil_flow_mod_flags flags OVS_GUARDED;

    /* Timeouts. */
    uint16_t hard_timeout OVS_GUARDED; /* In seconds from ->modified. */
    uint16_t idle_timeout OVS_GUARDED; /* In seconds from ->used. */

    /* Eviction precedence. */
    const uint16_t importance;

    /* Removal reason for sending flow removed message.
     * Used only if 'flags' has OFPUTIL_FF_SEND_FLOW_REM set and if the
     * value is not OVS_OFPRR_NONE. */
    uint8_t removed_reason;

    /* Eviction groups (see comment on struct eviction_group for explanation) .
     *
     * 'eviction_group' is this rule's eviction group, or NULL if it is not in
     * any eviction group.  When 'eviction_group' is nonnull, 'evg_node' is in
     * the ->eviction_group->rules hmap. */
    struct eviction_group *eviction_group OVS_GUARDED_BY(ofproto_mutex);
    struct heap_node evg_node OVS_GUARDED_BY(ofproto_mutex);

    /* OpenFlow actions.  See struct rule_actions for more thread-safety
     * notes. */
    const struct rule_actions * const actions;

    /* In owning meter's 'rules' list.  An empty list if there is no meter. */
    struct ovs_list meter_list_node OVS_GUARDED_BY(ofproto_mutex);

    /* Flow monitors (e.g. for NXST_FLOW_MONITOR, related to struct ofmonitor).
     *
     * 'add_seqno' is the sequence number when this rule was created.
     * 'modify_seqno' is the sequence number when this rule was last modified.
     * See 'monitor_seqno' in connmgr.c for more information. */
    enum nx_flow_monitor_flags monitor_flags OVS_GUARDED_BY(ofproto_mutex);
    uint64_t add_seqno OVS_GUARDED_BY(ofproto_mutex);
    uint64_t modify_seqno OVS_GUARDED_BY(ofproto_mutex);

    /* Optimisation for flow expiry.  In ofproto's 'expirable' list if this
     * rule is expirable, otherwise empty. */
    struct ovs_list expirable OVS_GUARDED_BY(ofproto_mutex);

    /* Times.  Last so that they are more likely close to the stats managed
     * by the provider. */
    long long int created OVS_GUARDED; /* Creation time. */

    /* Must hold 'mutex' for both read/write, 'ofproto_mutex' not needed. */
    long long int modified OVS_GUARDED; /* Time of last modification. */
};

struct rule_actions {
    /* Flags.
     *
     * 'has_meter' is true if 'ofpacts' contains an OFPACT_METER action.
     *
     * 'has_learn_with_delete' is true if 'ofpacts' contains an OFPACT_LEARN
     * action whose flags include NX_LEARN_F_DELETE_LEARNED. */
    bool has_meter;
    bool has_learn_with_delete;
    bool has_groups;

    /* Actions. */
    uint32_t ofpacts_len;         /* Size of 'ofpacts', in bytes. */
    struct ofpact ofpacts[];      /* Sequence of "struct ofpacts". */
};

struct ofpact {
    /* We want the space advantage of an 8-bit type here on every
     * implementation, without giving up the advantage of having a useful type
     * on implementations that support packed enums. */
#ifdef HAVE_PACKED_ENUM
    enum ofpact_type type;      /* OFPACT_*. */
#else
    uint8_t type;               /* OFPACT_* */
#endif

    uint8_t raw;                /* Original type when added, if any. */
    uint16_t len;               /* Length of the action, in bytes, including
                                 * struct ofpact, excluding padding. */
};
struct ofproto_class是一个接口工厂类,对应的实现是ofproto-dpif,我们先来看接口定义

struct ofproto_class {
/* ## ----------------- ## */
/* ## Factory Functions ## */
/* ## ----------------- ## */

    /* Initializes provider.  The caller may pass in 'iface_hints',
     * which contains an shash of "struct iface_hint" elements indexed
     * by the interface's name.  The provider may use these hints to
     * describe the startup configuration in order to reinitialize its
     * state.  The caller owns the provided data, so a provider must
     * make copies of anything required.  An ofproto provider must
     * remove any existing state that is not described by the hint, and
     * may choose to remove it all. */
    void (*init)(const struct shash *iface_hints);

    /* Enumerates the types of all supported ofproto types into 'types'.  The
     * caller has already initialized 'types'.  The implementation should add
     * its own types to 'types' but not remove any existing ones, because other
     * ofproto classes might already have added names to it. */
    void (*enumerate_types)(struct sset *types);

    /* Enumerates the names of all existing datapath of the specified 'type'
     * into 'names' 'all_dps'.  The caller has already initialized 'names' as
     * an empty sset.
     *
     * 'type' is one of the types enumerated by ->enumerate_types().
     *
     * Returns 0 if successful, otherwise a positive errno value.
     */
    int (*enumerate_names)(const char *type, struct sset *names);

    /* Deletes the datapath with the specified 'type' and 'name'.  The caller
     * should have closed any open ofproto with this 'type' and 'name'; this
     * function is allowed to fail if that is not the case.
     *
     * 'type' is one of the types enumerated by ->enumerate_types().
     * 'name' is one of the names enumerated by ->enumerate_names() for 'type'.
     *
     * Returns 0 if successful, otherwise a positive errno value.
     */
    int (*del)(const char *type, const char *name);

    /* Returns the type to pass to netdev_open() when a datapath of type
     * 'datapath_type' has a port of type 'port_type', for a few special
     * cases when a netdev type differs from a port type.  For example,
     * when using the userspace datapath, a port of type "internal"
     * needs to be opened as "tap".
     *
     * Returns either 'type' itself or a string literal, which must not
     * be freed. */
    const char *(*port_open_type)(const char *datapath_type,
                                  const char *port_type);

    /* Performs any periodic activity required on ofprotos of type
     * 'type'.
     *
     * An ofproto provider may implement it or not, depending on whether
     * it needs type-level maintenance.
     *
     * Returns 0 if successful, otherwise a positive errno value. */
    int (*type_run)(const char *type);

    /* Causes the poll loop to wake up when a type 'type''s 'run'
     * function needs to be called, e.g. by calling the timer or fd
     * waiting functions in poll-loop.h.
     *
     * An ofproto provider may implement it or not, depending on whether
     * it needs type-level maintenance. */
    void (*type_wait)(const char *type);

    /* Performs any periodic activity required by 'ofproto'.  It should:
     *
     *   - Call connmgr_send_packet_in() for each received packet that missed
     *     in the OpenFlow flow table or that had a OFPP_CONTROLLER output
     *     action.
     *
     *   - Call ofproto_rule_expire() for each OpenFlow flow that has reached
     *     its hard_timeout or idle_timeout, to expire the flow.
     *
     * Returns 0 if successful, otherwise a positive errno value. */
    int (*run)(struct ofproto *ofproto);

    /* Causes the poll loop to wake up when 'ofproto''s 'run' function needs to
     * be called, e.g. by calling the timer or fd waiting functions in
     * poll-loop.h.  */
    void (*wait)(struct ofproto *ofproto);

    /* Every "struct rule" in 'ofproto' is about to be deleted, one by one.
     * This function may prepare for that, for example by clearing state in
     * advance.  It should *not* actually delete any "struct rule"s from
     * 'ofproto', only prepare for it.
     *
     * This function is optional; it's really just for optimization in case
     * it's cheaper to delete all the flows from your hardware in a single pass
     * than to do it one by one. */
    void (*flush)(struct ofproto *ofproto);

    /* Helper for the OpenFlow OFPT_TABLE_FEATURES request.
     *
     * The 'features' array contains 'ofproto->n_tables' elements.  Each
     * element is initialized as:
     *
     *   - 'table_id' to the array index.
     *
     *   - 'name' to "table#" where # is the table ID.
     *
     *   - 'metadata_match' and 'metadata_write' to OVS_BE64_MAX.
     *
     *   - 'config' to the table miss configuration.
     *
     *   - 'max_entries' to 1,000,000.
     *
     *   - Both 'nonmiss' and 'miss' to:
     *
     *     * 'next' to all 1-bits for all later tables.
     *
     *     * 'instructions' to all instructions.
     *
     *     * 'write' and 'apply' both to:
     *
     *       - 'ofpacts': All actions.
     *
     *       - 'set_fields': All fields.
     *
     *   - 'match', 'mask', and 'wildcard' to all fields.
     *
     * If 'stats' is nonnull, it also contains 'ofproto->n_tables' elements.
     * Each element is initialized as:
     *
     *   - 'table_id' to the array index.
     *
     *   - 'active_count' to the 'n_flows' of struct ofproto for the table.
     *
     *   - 'lookup_count' and 'matched_count' to 0.
     *
     * The implementation should update any members in each element for which
     * it has better values:
     *
     *   - Any member of 'features' to better describe the implementation's
     *     capabilities.
     *
     *   - 'lookup_count' to the number of packets looked up in this flow table
     *     so far.
     *
     *   - 'matched_count' to the number of packets looked up in this flow
     *     table so far that matched one of the flow entries.
     */
    void (*query_tables)(struct ofproto *ofproto,
                         struct ofputil_table_features *features,
                         struct ofputil_table_stats *stats);

    /* Sets the current tables version the provider should use for classifier
     * lookups.  This must be called with a new version number after each set
     * of flow table changes has been completed, so that datapath revalidation
     * can be triggered. */
    void (*set_tables_version)(struct ofproto *ofproto, ovs_version_t version);

    struct ofproto *(*alloc)(void);
    int (*construct)(struct ofproto *ofproto);
    void (*destruct)(struct ofproto *ofproto);
    void (*dealloc)(struct ofproto *ofproto);

    struct ofport *(*port_alloc)(void);
    int (*port_construct)(struct ofport *ofport);
    void (*port_destruct)(struct ofport *ofport, bool del);
    void (*port_dealloc)(struct ofport *ofport);

    /* Called after 'ofport->netdev' is replaced by a new netdev object.  If
     * the ofproto implementation uses the ofport's netdev internally, then it
     * should switch to using the new one.  The old one has been closed.
     *
     * An ofproto implementation that doesn't need to do anything in this
     * function may use a null pointer. */
    void (*port_modified)(struct ofport *ofport);

    /* Called after an OpenFlow request changes a port's configuration.
     * 'ofport->pp.config' contains the new configuration.  'old_config'
     * contains the previous configuration.
     *
     * The caller implements OFPUTIL_PC_PORT_DOWN using netdev functions to
     * turn NETDEV_UP on and off, so this function doesn't have to do anything
     * for that bit (and it won't be called if that is the only bit that
     * changes). */
    void (*port_reconfigured)(struct ofport *ofport,
                              enum ofputil_port_config old_config);

    /* Looks up a port named 'devname' in 'ofproto'.  On success, returns 0 and
     * initializes '*port' appropriately. Otherwise, returns a positive errno
     * value.
     *
     * The caller owns the data in 'port' and must free it with
     * ofproto_port_destroy() when it is no longer needed. */
    int (*port_query_by_name)(const struct ofproto *ofproto,
                              const char *devname, struct ofproto_port *port);

    /* Attempts to add 'netdev' as a port on 'ofproto'.  Returns 0 if
     * successful, otherwise a positive errno value.  The caller should
     * inform the implementation of the OpenFlow port through the
     * ->port_construct() method.
     *
     * It doesn't matter whether the new port will be returned by a later call
     * to ->port_poll(); the implementation may do whatever is more
     * convenient. */
    int (*port_add)(struct ofproto *ofproto, struct netdev *netdev);

    /* Deletes port number 'ofp_port' from the datapath for 'ofproto'.  Returns
     * 0 if successful, otherwise a positive errno value.
     *
     * It doesn't matter whether the new port will be returned by a later call
     * to ->port_poll(); the implementation may do whatever is more
     * convenient. */
    int (*port_del)(struct ofproto *ofproto, ofp_port_t ofp_port);

    /* Refreshes datapath configuration of 'port'.
     * Returns 0 if successful, otherwise a positive errno value. */
    int (*port_set_config)(const struct ofport *port, const struct smap *cfg);

    /* Get port stats */
    int (*port_get_stats)(const struct ofport *port,
                          struct netdev_stats *stats);

    /* Port iteration functions.
     *
     * The client might not be entirely in control of the ports within an
     * ofproto.  Some hardware implementations, for example, might have a fixed
     * set of ports in a datapath.  For this reason, the client needs a way to
     * iterate through all the ports that are actually in a datapath.  These
     * functions provide that functionality.
     *
     * The 'state' pointer provides the implementation a place to
     * keep track of its position.  Its format is opaque to the caller.
     *
     * The ofproto provider retains ownership of the data that it stores into
     * ->port_dump_next()'s 'port' argument.  The data must remain valid until
     * at least the next call to ->port_dump_next() or ->port_dump_done() for
     * 'state'.  The caller will not modify or free it.
     *
     * Details
     * =======
     *
     * ->port_dump_start() attempts to begin dumping the ports in 'ofproto'.
     * On success, it should return 0 and initialize '*statep' with any data
     * needed for iteration.  On failure, returns a positive errno value, and
     * the client will not call ->port_dump_next() or ->port_dump_done().
     *
     * ->port_dump_next() attempts to retrieve another port from 'ofproto' for
     * 'state'.  If there is another port, it should store the port's
     * information into 'port' and return 0.  It should return EOF if all ports
     * have already been iterated.  Otherwise, on error, it should return a
     * positive errno value.  This function will not be called again once it
     * returns nonzero once for a given iteration (but the 'port_dump_done'
     * function will be called afterward).
     *
     * ->port_dump_done() allows the implementation to release resources used
     * for iteration.  The caller might decide to stop iteration in the middle
     * by calling this function before ->port_dump_next() returns nonzero.
     *
     * Usage Example
     * =============
     *
     * int error;
     * void *state;
     *
     * error = ofproto->ofproto_class->port_dump_start(ofproto, &state);
     * if (!error) {
     *     for (;;) {
     *         struct ofproto_port port;
     *
     *         error = ofproto->ofproto_class->port_dump_next(
     *                     ofproto, state, &port);
     *         if (error) {
     *             break;
     *         }
     *         // Do something with 'port' here (without modifying or freeing
     *         // any of its data).
     *     }
     *     ofproto->ofproto_class->port_dump_done(ofproto, state);
     * }
     * // 'error' is now EOF (success) or a positive errno value (failure).
     */
    int (*port_dump_start)(const struct ofproto *ofproto, void **statep);
    int (*port_dump_next)(const struct ofproto *ofproto, void *state,
                          struct ofproto_port *port);
    int (*port_dump_done)(const struct ofproto *ofproto, void *state);

    struct rule *(*rule_alloc)(void);
    enum ofperr (*rule_construct)(struct rule *rule)
        /* OVS_REQUIRES(ofproto_mutex) */;
    void (*rule_insert)(struct rule *rule, struct rule *old_rule,
                        bool forward_counts)
        /* OVS_REQUIRES(ofproto_mutex) */;
    void (*rule_delete)(struct rule *rule) /* OVS_REQUIRES(ofproto_mutex) */;
    void (*rule_destruct)(struct rule *rule);
    void (*rule_dealloc)(struct rule *rule);

    /* Applies the actions in 'rule' to 'packet'.  (This implements sending
     * buffered packets for OpenFlow OFPT_FLOW_MOD commands.)
     *
     * Takes ownership of 'packet' (so it should eventually free it, with
     * ofpbuf_delete()).
     *
     * 'flow' reflects the flow information for 'packet'.  All of the
     * information in 'flow' is extracted from 'packet', except for
     * flow->tunnel and flow->in_port, which are assigned the correct values
     * for the incoming packet.  The register values are zeroed.  'packet''s
     * header pointers and offsets (e.g. packet->l3) are appropriately
     * initialized.  packet->l3 is aligned on a 32-bit boundary.
     *
     * The implementation should add the statistics for 'packet' into 'rule'.
     *
     * Returns 0 if successful, otherwise an OpenFlow error code. */
    enum ofperr (*rule_execute)(struct rule *rule, const struct flow *flow,
                                struct dp_packet *packet);

    /* Implements the OpenFlow OFPT_PACKET_OUT command.  The datapath should
     * execute the 'ofpacts_len' bytes of "struct ofpacts" in 'ofpacts'.
     *
     * The caller retains ownership of 'packet' and of 'ofpacts', so
     * ->packet_out() should not modify or free them.
     *
     * This function must validate that it can correctly implement 'ofpacts'.
     * If not, then it should return an OpenFlow error code.
     *
     * 'flow' reflects the flow information for 'packet'.  All of the
     * information in 'flow' is extracted from 'packet', except for
     * flow->in_port (see below).  flow->tunnel and its register values are
     * zeroed.
     *
     * flow->in_port comes from the OpenFlow OFPT_PACKET_OUT message.  The
     * implementation should reject invalid flow->in_port values by returning
     * OFPERR_OFPBRC_BAD_PORT.  (If the implementation called
     * ofproto_init_max_ports(), then the client will reject these ports
     * itself.)  For consistency, the implementation should consider valid for
     * flow->in_port any value that could possibly be seen in a packet that it
     * passes to connmgr_send_packet_in().  Ideally, even an implementation
     * that never generates packet-ins (e.g. due to hardware limitations)
     * should still allow flow->in_port values for every possible physical port
     * and OFPP_LOCAL.  The only virtual ports (those above OFPP_MAX) that the
     * caller will ever pass in as flow->in_port, other than OFPP_LOCAL, are
     * OFPP_NONE and OFPP_CONTROLLER.  The implementation should allow both of
     * these, treating each of them as packets generated by the controller as
     * opposed to packets originating from some switch port.
     *
     * (Ordinarily the only effect of flow->in_port is on output actions that
     * involve the input port, such as actions that output to OFPP_IN_PORT,
     * OFPP_FLOOD, or OFPP_ALL.  flow->in_port can also affect Nicira extension
     * "resubmit" actions.)
     *
     * 'packet' is not matched against the OpenFlow flow table, so its
     * statistics should not be included in OpenFlow flow statistics.
     *
     * Returns 0 if successful, otherwise an OpenFlow error code. */
    enum ofperr (*packet_out)(struct ofproto *ofproto, struct dp_packet *packet,
                              const struct flow *flow,
                              const struct ofpact *ofpacts,
                              size_t ofpacts_len);

    enum ofperr (*nxt_resume)(struct ofproto *ofproto,
                              const struct ofputil_packet_in_private *);

    /* Registers meta-data associated with the 'n_qdscp' Qualities of Service
     * 'queues' attached to 'ofport'.  This data is not intended to be
     * sufficient to implement QoS.  Instead, providers may use this
     * information to implement features which require knowledge of what queues
     * exist on a port, and some basic information about them.
     *
     * EOPNOTSUPP as a return value indicates that this ofproto_class does not
     * support QoS, as does a null pointer. */
    int (*set_queues)(struct ofport *ofport,
                      const struct ofproto_port_queue *queues, size_t n_qdscp);

    /* If 's' is nonnull, this function registers a "bundle" associated with
     * client data pointer 'aux' in 'ofproto'.  A bundle is the same concept as
     * a Port in OVSDB, that is, it consists of one or more "slave" devices
     * (Interfaces, in OVSDB) along with VLAN and LACP configuration and, if
     * there is more than one slave, a bonding configuration.  If 'aux' is
     * already registered then this function updates its configuration to 's'.
     * Otherwise, this function registers a new bundle.
     *
     * If 's' is NULL, this function unregisters the bundle registered on
     * 'ofproto' associated with client data pointer 'aux'.  If no such bundle
     * has been registered, this has no effect.
     *
     * This function affects only the behavior of the NXAST_AUTOPATH action and
     * output to the OFPP_NORMAL port.  An implementation that does not support
     * it at all may set it to NULL or return EOPNOTSUPP.  An implementation
     * that supports only a subset of the functionality should implement what
     * it can and return 0. */
    int (*bundle_set)(struct ofproto *ofproto, void *aux,
                      const struct ofproto_bundle_settings *s);

    /* If 'port' is part of any bundle, removes it from that bundle.  If the
     * bundle now has no ports, deletes the bundle.  If the bundle now has only
     * one port, deconfigures the bundle's bonding configuration. */
    void (*bundle_remove)(struct ofport *ofport);

    /* These functions should be NULL if an implementation does not support
     * them.  They must be all null or all non-null.. */

    /* Initializes 'features' to describe the metering features supported by
     * 'ofproto'. */
    void (*meter_get_features)(const struct ofproto *ofproto,
                               struct ofputil_meter_features *features);

    /* If '*id' is UINT32_MAX, adds a new meter with the given 'config'.  On
     * success the function must store a provider meter ID other than
     * UINT32_MAX in '*id'.  All further references to the meter will be made
     * with the returned provider meter id rather than the OpenFlow meter id.
     * The caller does not try to interpret the provider meter id, giving the
     * implementation the freedom to either use the OpenFlow meter_id value
     * provided in the meter configuration, or any other value suitable for the
     * implementation.
     *
     * If '*id' is a value other than UINT32_MAX, modifies the existing meter
     * with that meter provider ID to have configuration 'config', while
     * leaving '*id' unchanged.  On failure, the existing meter configuration
     * is left intact. */
    enum ofperr (*meter_set)(struct ofproto *ofproto, ofproto_meter_id *id,
                             const struct ofputil_meter_config *config);

    /* Gets the meter and meter band packet and byte counts for maximum of
     * 'stats->n_bands' bands for the meter with provider ID 'id' within
     * 'ofproto'.  The caller fills in the other stats values.  The band stats
     * are copied to memory at 'stats->bands' provided by the caller.  The
     * number of returned band stats is returned in 'stats->n_bands'. */
    enum ofperr (*meter_get)(const struct ofproto *ofproto,
                             ofproto_meter_id id,
                             struct ofputil_meter_stats *stats);

    /* Deletes a meter, making the 'ofproto_meter_id' invalid for any
     * further calls. */
    void (*meter_del)(struct ofproto *, ofproto_meter_id);

/* ## --------------------- ## */
/* ## Datapath information  ## */
/* ## --------------------- ## */
    /* Retrieve the version string of the datapath. The version
     * string can be NULL if it can not be determined.
     *
     * The version retuned is read only. The caller should not
     * free it.
     *
     * This function should be NULL if an implementation does not support it.
     */
    const char *(*get_datapath_version)(const struct ofproto *);

/* ## ------------------- ## */
/* ## Connection tracking ## */
/* ## ------------------- ## */
    /* Flushes the connection tracking tables. If 'zone' is not NULL,
     * only deletes connections in '*zone'. */
    void (*ct_flush)(const struct ofproto *, const uint16_t *zone);
};
下面来看看ofproto_class的具体实现ofproto-dpif,ofproto-dpif用于datapath未命中的报文,通过查找openflow流表计算出具体的action,并下发到datapath中,同时ofproto-dpif还会进一步把报文上送给ofproto,最终交给集中式控制器来处理。

/* Ofproto-dpif -- DPIF based ofproto implementation.
 *
 * Ofproto-dpif provides an ofproto implementation for those platforms which
 * implement the netdev and dpif interface defined in netdev.h and dpif.h.  The
 * most important of which is the Linux Kernel Module (dpif-linux), but
 * alternatives are supported such as a userspace only implementation
 * (dpif-netdev), and a dummy implementation used for unit testing.
 *
 * Ofproto-dpif is divided into three major chunks.
 *
 * - ofproto-dpif.c
 *   The main ofproto-dpif module is responsible for implementing the
 *   provider interface, installing and removing datapath flows, maintaining
 *   packet statistics, running protocols (BFD, LACP, STP, etc), and
 *   configuring relevant submodules.
 *
 * - ofproto-dpif-upcall.c
 *   Ofproto-dpif-upcall is responsible for retrieving upcalls from the kernel,
 *   processing miss upcalls, and handing more complex ones up to the main
 *   ofproto-dpif module.  Miss upcall processing boils down to figuring out
 *   what each packet's actions are, executing them (i.e. asking the kernel to
 *   forward it), and handing it up to ofproto-dpif to decided whether or not
 *   to install a kernel flow.
 *
 * - ofproto-dpif-xlate.c
 *   Ofproto-dpif-xlate is responsible for translating OpenFlow actions into
 *   datapath actions. */
ofproto-dpif的结构图如下

               |   +-------------------+
               |   |    ovs-vswitchd   |<-->ovsdb-server
               |   +-------------------+
               |   |      ofproto      |<-->OpenFlow controllers
               |   +--------+-+--------+  _
               |   | netdev | |ofproto-|   |
     userspace |   +--------+ |  dpif  |   |
               |   | netdev | +--------+   |
               |   |provider| |  dpif  |   |
               |   +---||---+ +--------+   |
               |       ||     |  dpif  |   | implementation of
               |       ||     |provider|   | ofproto provider
               |_      ||     +---||---+   |
                       ||         ||       |
                _  +---||-----+---||---+   |
               |   |          |datapath|   |
        kernel |   |          +--------+  _|
               |   |                   |
               |_  +--------||---------+
                            ||
                         physical
                           NIC
struct dpif_class是datapath interface实现的工厂接口类,用于和实际的datapath, e.g. openvswitch.ko, 或者userspace datapath交互。目前已有的两个dpif的实现是dpif-netlink和dpif-netdev,前者是基于内核datapath的dpif实现,后者基于用户态datapath。代码可以在lib/dpif-netlink.c以及lib/dpif-netdev.c里找到。
struct dpif_class的接口定义在lib/dpif-provider.h中,仅供参考


  • 1
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值