tcp-ip : net_device structure

The net_device struct represents the network device. It can be a physical device, like an Ethernet device, or it can be a software device, like a bridge device or a VLAN device.

[include/linux/netdevice.h]

  • char            name[IFNAMSIZ];
    The name of the network device. This is the name that you see with ifconfig or ip commands (for example eth0, eth1, and so on). The maximum length of the interface name is 16 characters. In newer distributions with biosdevname support, the naming scheme corresponds to the physical location of the network device. So PCI network devices are named p<slot>p<port>, according to the chassis labels, and embedded ports (on motherboard interfaces) are named em<port>—for example, em1, em2, and so on. There is a special suffix for SR-IOV devices and Network Partitioning (NPAR)–enabled devices. 
  • struct hlist_node    name_hlist;
    This is a hash table of network devices, indexed by the network device name. A lookup in this hash table is performed by dev_get_by_name(). Insertion into this hash table is performed by the list_netdevice() method, and removal from this hash table is done with the unlist_netdevice() method.
  • char *ifalias
    SNMP alias interface name. Its length can be up to 256 (IFALIASZ).
    You can create an alias to a network device using this command line:
    ip link set <devName> alias myalias
    The ifalias name is exported via sysfs by /sys/class/net/<devName>/ifalias.
  • unsigned int irq
    The Interrupt Request (IRQ) number of the device. The network driver should call request_irq() to register itself with this IRQ number. Typically this is done in the probe() callback of the network device driver.
    The network driver should call the free_irq() method when it no longer uses this irq. In many cases, this irq is shared (the request_irq() method is called with the IRQF_SHARED flag). You can view the number of interrupts that occurred on each core by running cat /proc/interrupts. You can set the SMP affinity of the irq by echo irqMask > /proc/irq/<irqNumber>/smp_affinity.In an SMP machine, setting the SMP affinity of interrupts means setting which cores are allowed to handle the interrupt. Some PCI network interfaces use Message Signaled Interrupts (MSIs). PCI MSI interrupts are never shared, so the IRQF_SHARED flag is not set when calling the request_irq() method in these network drivers. 
  • unsigned long state
    __LINK_STATE_START: This flag is set when the device is brought up, by the dev_open() method, and is cleared when the device is brought down.
    __LINK_STATE_PRESENT: This flag is set in device registration, by the register_netdevice() method, and is cleared in the netif_device_detach() method.
    __LINK_STATE_NOCARRIER: This flag shows whether the device detected loss of carrier. It is set by the netif_carrier_off() method and cleared by the netif_carrier_on() method. It is exported by sysfs via /sys/class/net/<devName>/carrier.
    __LINK_STATE_LINKWATCH_PENDING: This flag is set by the linkwatch_fire_event() method and cleared by the linkwatch_do_dev() method.
    __LINK_STATE_DORMANT: The dormant state indicates that the interface is not able to pass packets (that is, it is not “up”); however, this is a “pending” state, waiting for some external event.
    The state flag can be set with the generic set_bit() method.
  • netdev_features_t features
    The set of currently active device features. These features should be changed only by
    the network core or in error paths of the ndo_set_features() callback. Network driver
    developers are responsible for setting the initial set of the device features. Sometimes
    they can use a wrong combination of features. The network core fixes this by removing
    an offending feature in the netdev_fix_features() method, which is invoked when the
    network interface is registered (in the register_netdevice() method); a proper message
    is also written to the kernel log.
    1. NETIF_F_IP_CSUM means that the network device can checksum L4 IPv4 TCP/UDP packets.
      NETIF_F_IPV6_CSUM means that the network device can checksum L4 IPv6 TCP/UDP packets.
      NETIF_F_HW_CSUM means that the device can checksum in hardware all L4 packets.
      You cannot activate NETIF_F_HW_CSUM together with NETIF_F_IP_CSUM, or together with NETIF_F_IPV6_CSUM, because that will cause duplicate checksumming.
      If the driver features set includes both NETIF_F_HW_CSUM and NETIF_F_IP_CSUM features, then you will get a kernel message saying “mixed HW and IP checksum settings.” In such a case, the netdev_fix_features() method removes the NETIF_F_IP_CSUM feature. If the driver features set includes both NETIF_F_HW_CSUM and NETIF_F_IPV6_CSUM features, you get again the same message as in the previous case.
      In order for a device to support TSO (TCP Segmentation Offload), it needs also to support Scatter/Gather and TCP checksum; this means that both NETIF_F_SG and NETIF_F_IP_CSUM features must be set. If the driver features set does not include the NETIF_F_SG feature, then you will get a kernel message saying “Dropping TSO features since no SG feature,” and the NETIF_F_ALL_TSO feature will be removed. If the driver features set does not include the NETIF_F_IP_CSUM feature and does not include NETIF_F_HW_CSUM, then you will get a kernel message saying “Dropping TSO features since no CSUM feature,” and the NETIF_F_TSO will be removed.
    2. NETIF_F_LLTX is the LockLess TX flag and is considered deprecated. When it is set, you don’t use the generic Tx lock (This is why it is called LockLess TX).
      NETIF_F_LLTX is used in tunnel drivers like VXLAN, VETH, and in IP over IP (IPIP) tunneling driver. For example, in the IPIP tunnel module, you set the NETIF_F_LLTX flag in the ipip_tunnel_setup() method.
      The NETIF_F_LLTX flag is also used in a few drivers that have implemented their own Tx lock, like the cxgb network driver.
    3. NETIF_F_GRO is used to indicate that the device supports GRO (Generic Receive Offload). With GRO, incoming packets are merged at reception time. The GRO feature improves network performance. This flag is checked in the beginning of the dev_gro_receive() method; devices that do not have this flag set will not perform the GRO handling part in this method. A driver that wants to use GRO should call the napi_gro_receive() method in the Rx path of the driver.
      You can enable/disable GRO with ethtool, by ethtool -K <deviceName> gro on/ ethtool -K <deviceName> gro off, respectively. You can check whether GRO is set by running ethtool –k <deviceName> and looking at the gro field.
    4. NETIF_F_GSO is set to indicate that the device supports Generic Segmentation Offload(GSO). GSO is a performance optimization, based on traversing the networking stack once instead of many times, for big packets. So the idea is to avoid segmentation in Layer 4 and defer segmentation as much as possible.
      The sysadmin can enable/disable GSO with ethtool, by ethtool -K <driverName> gso on/ethtool -K <driverName> gso off, respectively. You can check whether GSO is set by running ethtool –k <deviceName> and looking at the gso field.
      To work with GSO, you should work in Scatter/Gather mode. The NETIF_F_SG flag must be set.
    5. NETIF_F_NETNS_LOCAL is set for network namespace local devices. These are network devices that are not allowed to move between network namespaces. The loopback, VXLAN, and PPP network devices are examples of namespace local devices. All these devices have the NETIF_F_NETNS_LOCAL flag set.
      A sysadmin can check whether an interface has the NETIF_F_NETNS_LOCAL flag set or not by ethtool -k <deviceName>.
      This feature is fixed and cannot be changed by ethtool. Trying to move a network device of this type to a different namespace results in an error (-EINVAL). For details, look in the dev_change_net_namespace() method.
      When deleting a network namespace, devices that do not have the NETIF_F_NETNS_LOCAL flag set are moved to the default initial network namespace (init_net). Network namespace local devices that have the NETIF_F_NETNS_LOCAL flag set are not moved to the default initial network namespace (init_net), but are deleted.
    6. NETIF_F_HW_VLAN_CTAG_RX is for use by devices which support VLAN Rx hardware acceleration.“CTAG” was added to indicate that this device differ from “STAG” device (Service provider tagging). A device driver that sets the NETIF_F_HW_VLAN_RX feature must also define the ndo_vlan_rx_add_vid() and ndo_vlan_rx_kill_vid() callbacks. Failure to do so will avoid device registration and result in a “Buggy VLAN acceleration in driver” kernel error message.
    7. NETIF_F_HW_VLAN_CTAG_TX is for use by devices that support VLAN Tx hardware acceleration.
    8. NETIF_F_VLAN_CHALLENGED is set for devices that can’t handle VLAN packets. Setting this feature avoids registration of a VLAN device.
      For example, some types of Intel e100 network device drivers set the NETIF_F_VLAN_CHALLENGED feature.
      You can check whether the NETIF_F_VLAN_CHALLENGED is set by running ethtool –k <deviceName> and looking at the vlan-challenged field. This is a fixed value that you cannot change with the ethtool command.
    9. NETIF_F_SG is set when the network interface supports Scatter/Gather IO. You can enable and disable Scatter/Gather with ethtool, by ethtool -K <deviceName> sg on/ ethtool -K <deviceName> sg off, respectively. You can check whether Scatter/Gather is set by running ethtool –k <deviceName> and looking at the sg field.
    10. NETIF_F_HIGHDMA is set if the device can perform access by DMA to high memory. The practical implication of setting this feature is that the ndo_start_xmit() callback of the net_device_ops object can manage SKBs, which have frags elements in high memory.
      You can check whether the NETIF_F_HIGHDMA is set by running ethtool –k <deviceName> and looking at the highdma field. This is a fixed value that you cannot change with the ethtool command.
  • netdev_features_t hw_features
    The set of features that are changeable features. This means that their state may possibly be changed (enabled or disabled) for a particular device by a user’s request. This set should be initialized in the ndo_init() callback and not changed later.
  • netdev_features_t wanted_features
    The set of features that were requested by the user. A user may request to change various offloading features—for example, by running ethtool -K eth1 rx on. This generates a feature change event notification (NETDEV_FEAT_CHANGE) to be sent by the netdev_features_change() method.
  • netdev_features_t vlan_features
    The set of features whose state is inherited by child VLAN devices.
    For example, let’s look at the rtl_init_one() method, which is the probe callback of the r8169 network device driver:
    int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
    {
            . . .
            dev->vlan_features=NETIF_F_SG|NETIF_F_IP_CSUM|NETIF_F_TSO| NETIF_F_HIGHDMA;
            . . .
    }
    This initialization means that all child VLAN devices will have these features.
    For example, let’s say that your eth0 device is an r8169 device, and you add a VLAN device thus: vconfig add eth0 100. Then, in the initialization in the VLAN module, there is this code related to vlan_features:
    static int vlan_dev_init(struct net_device *dev)
    {
            . . .
            dev->features |= real_dev->vlan_features | NETIF_F_LLTX;
            . . .
    }
    This means that it sets the features of the VLAN child device to be the vlan_features of the real device (which is eth0 in this case), which were set according to what you saw earlier in the rtl_init_one() method.
  • netdev_features_t hw_enc_features
    The mask of features inherited by encapsulating devices. This field indicates what encapsulation offloads the hardware is capable of doing, and drivers will need to set them appropriately.
  • ifindex
    The ifindex (Interface index) is a unique device identifier. This index is incremented by 1 each time you create a new network device, by the dev_new_index() method. The first network device you create, which is almost always the loopback device, has ifindex of 1.
    Cyclic integer overflow is handled by the method that handles assignment of the ifindex number.
    The ifindex is exported by sysfs via /sys/class/net/<devName>/ifindex.
  • struct net_device_stats stats
    The statistics struct, which was left as a legacy, includes fields like the number of rx_packets or the number of tx_packets. New device drivers use the rtnl_link_stats64 struct instead of the net_device_stats struct.
    The statistics are exported via /sys/class/net/<deviceName>/statistics.
    Some drivers implement the get_ethtool_stats() callback. These drivers show statistics by ethtool -S <deviceName>
  • atomic_long_t rx_dropped
    A counter of the number of packets that were dropped in the RX path by the core network stack. This counter should not be used by drivers.
    Do not confuse the rx_dropped field of the sk_buff with the dropped field of the softnet_data struct. The softnet_data struct represents a per-CPU object. They are not equivalent because the rx_dropped of the sk_buff might be incremented in several methods, whereas the dropped counter of softnet_data is incremented only by the enqueue_to_backlog() method. The dropped counter of softnet_data is exported by /proc/net/softnet_stat.
    In /proc/net/softnet_stat you have one line per CPU. The first column is the total packets counter, and the second one is the dropped packets counter.
  • struct net_device_ops *netdev_ops
    1. The ndo_init() callback is called when network device is registered
    2. The ndo_uninit() callback is called when the network device is unregistered or when the registration fails.
    3. The ndo_open() callback handles change of device state, when a network device state is being changed from down state to up state.
    4. The ndo_stop() callback is called when a network device state is being changed to be down.
    5. The ndo_validate_addr() callback is called to check whether the MAC is valid. Many network drivers set the generic eth_validate_addr() method to be the ndo_validate_addr() callback. The generic eth_validate_addr() method returns true if the MAC address is not a multicast address and is not all zeroes.
    6. The ndo_set_mac_address() callback sets the MAC address. Many network drivers set the generic eth_mac_addr() method to be the ndo_set_mac_address() callback of struct net_device_ops for setting their MAC address. For example, the VETH driver or the VXLAN driver
    7. The ndo_start_xmit() callback handles packet transmission. It cannot be NULL
    8. The ndo_select_queue() callback is used to select a Tx queue, when working with multiqueues. If the ndo_select_queue() callback is not set, then the  __netdev_pick_tx() is called. 
    9. The ndo_change_mtu() callback handles modifying the MTU. It should check that the specified MTU is not less than 68, which is the minimum MTU. In many cases, network drivers set the ndo_change_mtu() callback to be the generic eth_change_mtu() method. The eth_change_mtu() method should be overridden if jumbo frames are supported.
    10. The ndo_do_ioctl() callback is called when getting an IOCTL request which is not handled by the generic interface code.
    11. The ndo_tx_timeout() callback is called when the transmitter was idle for a quite a while (for watchdog usage).
    12. The ndo_add_slave() callback is called to set a specified network device as a slave to a specified netowrk device. It is used, for example, in the team network driver and in the bonding network driver.
    13. The ndo_del_slave() callback is called to remove a previously enslaved network device
    14. The ndo_set_features() callback is called to update the configuration of a network device with new features
    15. The ndo_vlan_rx_add_vid() callback is called when registering a VLAN id if the network device supports VLAN filtering (the NETIF_F_HW_VLAN_FILTER flag is set in the device features).
    16. The ndo_vlan_rx_kill_vid() callback is called when unregistering a VLAN id if the network device supports VLAN filtering (the NETIF_F_HW_VLAN_FILTER flag is set in the device features).
    17. There are also several callbacks for handling SR-IOV devices, for example, ndo_set_vf_mac() and ndo_set_vf_vlan().
    18. The dev_set_rx_mode() callback is called primarily whenever the unicast or multicast address lists or the network interface flags are updated.
  • struct ethtool_ops *ethtool_ops
    The ethtool_ops structure includes pointers for several callbacks for handling offloads, getting and setting various device settings, reading registers, getting statistics, reading RX flow hash indirection table, WakeOnLAN parameters, and many more. If the network driver does not initialize the ethtool_ops object, the networking core provides a default empty ethtool_ops object named default_ethtool_ops.
    SET_ETHTOOL_OPS (netdev,ops): A macro which sets the specified ethtool_ops for the specified net_device.
    You can view the offload parameters of a network interface device by running ethtool –k <deviceName>. You can set some offload parameters of a network interface device by running ethtool –K <deviceName> offloadParameter off/on. See man 8 ethtool
  • const struct header_ops *header_ops
    The header_ops struct include callbacks for creating the Layer 2 header, parsing it, rebuilding it, and more. For Ethernet it is eth_header_ops
  • unsigned int flags
    The interface flags of the network device that you can see from userspace
    1. IFF_UP flag is set when the interface state is changed from down to up.
    2. IFF_PROMISC is set when the interface is in promiscuous mode (receives all packets).
      When running sniffers like wireshark or tcpdump, the network interface is in promiscuous mode.
    3. IFF_LOOPBACK is set for the loopback device
    4. IFF_NOARP is set for devices which do not use the ARP protocol. IFF_NOARP is set, for example, in tunnel devices (see for example, in the ipip_tunnel_setup() method).
    5. IFF_POINTOPOINT is set for PPP devices
    6. IFF_MASTER is set for master devices. See, for example, for bonding devices, the bond_setup() method
    7. IFF_LIVE_ADDR_CHANGE flag indicates that the device supports hardware address modification when it’s running. See the eth_mac_addr() method
    8. IFF_UNICAST_FLT flag is set when the network driver handles unicast address filtering
    9. IFF_BONDING is set for a bonding master device or bonding slave device. The bonding driver provides a method for aggregating multiple network interfaces into a single logical interface
    10. IFF_TEAM_PORT is set for a device used as a team port. The teaming driver is a load-balancing network software driver intended to replace the bonding driver
    11. IFF_MACVLAN_PORT is set for a device used as a macvlan port
    12. IFF_EBRIDGE is set for an Ethernet bridging device

    The flags field is exported by sysfs via /sys/class/net/<devName>/flags Some of these flags can be set by userspace tools. For example, ifconfig <deviceName> -arp will set the IFF_NOARP network interface flag, and ifconfig <deviceName> arp will clear the IFF_NOARP flag. Note that you can do the same with the iproute2 ip command: ip link set dev <deviceName> arp on and ip link set dev <deviceName> arp off

  • unsigned int priv_flagsThe interface flags, which are invisible from userspace. For example, IFF_EBRIDGE for a bridge interface or IFF_BONDING for a bonding interface, or IFF_SUPP_NOFCS for an interface support sending custom FCS
  • unsigned short gflags
    Global flags (kept as legacy).
  • unsigned short padded
    How much padding is added by the alloc_netdev() method
  • unsigned char operstate
    RFC 2863 operstate
  • unsigned char link_mode
    Mapping policy to operstate
  • unsigned int mtu
    The network interface MTU (Maximum Transmission Unit) value. The maximum size of frame the device can handle. RFC 791 sets 68 as a minimum MTU. Each protocol has MTU of its own. The default MTU for Ethernet is 1,500 bytes. It is set in the ether_setup() method. Ethernet packets with sizes higher than 1,500 bytes, up to 9,000 bytes, are called Jumbo frames. The network interface MTU is exported by sysfs via /sys/class/net/<devName>/mtu.
    The sysadmin can change the MTU of a network interface to 1,400, for example, in one of the following ways:
    ifconfig <netDevice> mtu 1400
    ip link set <netDevice> mtu 1400
    echo 1400 > /sys/class/net/<netDevice>/mtu
    Many drivers implement the ndo_change_mtu() callback to change the MTU to perform driver-specific needed actions (like resetting the network card).
  • unsigned short type
    The network interface hardware type. For example, for Ethernet it is ARPHRD_ETHER and is set in ether_setup(). The type is exported by sysfs via /sys/class/net/<devName>/type
  • unsigned short hard_header_len
    The hardware header length. Ethernet headers, for example, consist of MAC source address, MAC destination address, and a type. The MAC source and destination addresses are 6 bytes each, and the type is 2 bytes. So the Ethernet header length is 14 bytes. The Ethernet header length is set to 14 (ETH_HLEN) in the ether_setup() method. The ether_setup() method is responsible for initializing some Ethernet device defaults, like the hard header len, Tx queue len, MTU, type, and more.
  • unsigned char perm_addr[MAX_ADDR_LEN]
    The permanent hardware address (MAC address) of the device
  • unsigned char addr_assign_type
    Hardware address assignment type, can be one of the following:
    NET_ADDR_PERM
    NET_ADDR_RANDOM
    NET_ADDR_STOLEN
    NET_ADDR_SET
    By default, the MAC address is permanent (NET_ADDR_PERM). If the MAC address was generated with a helper method named eth_hw_addr_random(), the type of the MAC address is NET_ADD_RANDOM. The type of the MAC address is stored in the addr_assign_type member of the net_device. Also when changing the MAC address of the device, with eth_mac_addr(), you reset the addr_assign_type with ~NET_ADDR_RANDOM (if it was marked as NET_ADDR_RANDOM before). When a network device is registered (by the register_netdevice() method), if the addr_assign_type equals NET_ADDR_PERM, dev->perm_addr is set to be dev->dev_addr. When you set a MAC address, you set the addr_assign_type to be NET_ADDR_SET. This indicates that the MAC address of a device has been set by the dev_set_mac_address() method.
    The addr_assign_type is exported by sysfs via /sys/class/net/<devName>/addr_assign_type.
  • unsigned char addr_len
    The hardware address length in octets. For Ethernet addresses, it is 6 (ETH_ALEN) bytes and is set in the ether_setup() method.
    The addr_len is exported by sysfs via /sys/class/net/<deviceName>/addr_len.
  • unsigned char neigh_priv_len
    Used in the neigh_alloc() method; neigh_priv_len is initialized only in the ATM code
  • struct netdev_hw_addr_list uc
    Unicast MAC addresses list, initialized by the dev_uc_init() method. There are three types of packets in Ethernet: unicast, multicast, and broadcast. Unicast is destined for one machine, multicast is destined for a group of machines, and broadcast is destined for all the machines in the LAN.
  • struct netdev_hw_addr_list mc
    Multicast MAC addresses list, initialized by the dev_mc_init() method.
  • unsigned int promiscuity
    A counter of the times a network interface card is told to work in promiscuous mode.
    With promiscuous mode, packets with MAC destination address which is different than the interface MAC address are not rejected. The promiscuity counter is used, for example, to enable more than one sniffing client; so when opening some sniffing clients (like wireshark), this counter is incremented by 1 for each client you open, and closing that client will decrement the promiscuity counter. When the last instance of the sniffing client is closed, promiscuity will be set to 0, and the device will exit from working in promiscuous mode. It is used also in the bridging subsystem, as the bridge interface needs to work in promiscuous mode. So when adding a bridge interface, the network interface card is set to work in promiscuous mode.
    dev_set_promiscuity(struct net_device *dev, int inc): Increments/decrements the promiscuity counter of the specified network device according to the specified increment. The dev_set_promiscuity() method can get a positive increment or a negative increment parameter. As long as the promiscuity counter remains above zero, the interface remains in promiscuous mode. Once it reaches zero, the device reverts back to normal filtering operation. Because promiscuity is an integer, the dev_set_promiscuity() method takes into account cyclic overflow of integer, which means it handles the case when the promiscuity counter is incremented when it reaches the maximum positive value an unsigned integer can reach.
  • unsigned int allmulti
    The allmulti counter of the network device enables or disables the allmulticast mode. When selected, all multicast packets on the network will be received by the interface.
    You can set a network device to work in allmulticast mode by ifconfig eth0 allmulti. You disable the allmulti flag by ifconfig eth0 –allmulti.
    Enabling/disabling the allmulticast mode can also be performed with the ip command:
    ip link set p2p1 allmulticast on
    ip link set p2p1 allmulticast off
    You can also see the allmulticast state by inspecting the flags that are shown by the ip command:
    ip addr show
    flags=4610<BROADCAST,ALLMULTI,MULTICAST> mtu 1500
    dev_set_allmulti(struct net_device *dev, int inc): Increments/decrements the allmulti counter of the specified network device according to the specified increment (which can be a positive or a negative integer). The dev_set_allmulti() method also sets the IFF_ALLMULTI flag of the network device when setting the allmulticast mode and removes this flag when disabling the allmulticast mode.
  • struct in_device __rcu *ip_ptr
    This pointer is assigned to a pointer to struct in_device, which represents IPv4 specific data, in inetdev_init(),
  • struct inet6_dev __rcu *ip6_ptr
    This pointer is assigned to a pointer to struct inet6_dev, which represents IPv6 specific data, in ipv6_add_dev()
  • struct wireless_dev *ieee80211_ptr
    This is a pointer for the wireless device, assigned in the ieee80211_if_add() method
  • unsigned long last_rx
    Time of last Rx. It should not be set by network device drivers, unless really needed. Used, for example, in the bonding driver code
  • struct list_head dev_list
    The global list of network devices. Insertion to the list is done with the list_netdevice() method, when the network device is registered. Removal from the list is done with the unlist_netdevice() method, when the network device is unregistered.
  • struct list_head napi_list
    NAPI stands for New API, a technique by which the network driver works in polling mode, and not in interrupt-driven mode, when it is under high traffic. Using NAPI under high traffic has been proven to improve performance. When working with NAPI, instead of getting an interrupt for each received packet, the network stack buffers the packets and from time to time triggers the poll method the driver registered with the netif_napi_add() method. When working with polling mode, the driver starts to work in interrupt-driven mode. When there is an interrupt for the first received packet, you reach the interrupt service routine (ISR), which is the method that was registered with request_irq(). Then the driver disables interrupts and notifies NAPI to take control,  usually by calling the __napi_schedule() method from the ISR. When the traffic is low, the network driver switches to work in interrupt-driven mode.
    Nowadays, most network drivers work with NAPI. The napi_list object is the list of napi_struct objects; The netif_napi_add() method adds napi_struct objects to this list, and the netif_napi_del() method deletes napi_struct objects from this list. When calling the netif_napi_add() method, the driver should specify its polling method and a weight parameter. The weight is a limit on the number of packets the driver will pass to the stack in each polling cycle. It is recommended to use a weight of 64. If a driver attempts to call netif_napi_add() with weight higher than 64 (NAPI_POLL_WEIGHT), there is a kernel error message.
    The network driver should call napi_enable() to enable NAPI scheduling. Usually this is done in the ndo_open() callback of the net_device_ops object. The network driver should call napi_disable() to disable NAPI scheduling. Usually this is done in the ndo_stop() callback of net_device_ops. NAPI is implemented using softirqs. This softirq handler is the net_rx_action() method and is registered by calling open_softirq(NET_RX_SOFTIRQ, net_rx_action) by the net_dev_init() method.
    The net_rx_action() method invokes the poll method of the network driver which was registered with NAPI. The maximum number of packets (taken from all interfaces which are registered to polling) in one polling cycle (NAPI poll) is by default 300. It is the netdev_budget variable, and can be modified via a procfs entry, /proc/sys/net/core/netdev_budget. The napi_complete() method removes a device from the polling list. When a network driver wants to return to work in interrupt-driven mode, it should call the napi_complete() method to remove itself from the polling list.
  • struct list_head unreg_list
    The list of unregistered network devices. Devices are added to this list when they are unregistered.
  • unsigned char *dev_addr
    The MAC address of the network interface. Sometimes you want to assign a random MAC address. You do that by calling the eth_hw_addr_random() method, which also sets the addr_assign_type to be NET_ADDR_RANDOM.
    The dev_addr field is exported by sysfs via /sys/class/net/<devName>/address. You can change dev_addr with userspace tools like ifconfig or ip of iproute2.
    is_zero_ether_addr(const u8 *addr): Returns true if the address is all zeroes.
    is_multicast_ether_addr(const u8 *addr): Returns true if the address is a multicast
    address. By definition the broadcast address is also a multicast address.
    is_valid_ether_addr (const u8 *addr): Returns true if the specified MAC address is not 00:00:00:00:00:00, is not a multicast address, and is not a broadcast address (FF:FF:FF:FF:FF:FF).
  • struct netdev_hw_addr_list dev_addrs
    The list of device hardware addresses.
  • unsigned char broadcast[MAX_ADDR_LEN]
    The hardware broadcast address. For Ethernet devices, the broadcast address is initialized to 0XFFFFFF in the ether_setup() method. The broadcast address is exported by sysfs via /sys/class/net/<devName>/broadcast.
  • struct kset *queues_kset
    A kset is a group of kobjects of a specific type, belonging to a specific subsystem. The kobject structure is the basic type of the device model. A Tx queue is represented by struct netdev_queue, and the Rx queue is represented by struct netdev_rx_queue. Each of them holds a kobject pointer. The queues_kset object is a group of all kobjects of the Tx queues and Rx queues.
    Each Rx queue has the sysfs entry /sys/class/net/<deviceName>/queues/<rx-queueNumber>, and each Tx queue has the sysfs entry /sys/class/net/<deviceName>/queues/<tx-queueNumber>.
    These entries are added with the rx_queue_add_kobject() method and the netdev_queue_add_kobject() method respectively.
  • struct netdev_rx_queue *_rx
    An array of Rx queues (netdev_rx_queue objects), initialized by the netif_alloc_rx_queues() method. The Rx queue to be used is determined in the get_rps_cpu() method.
  • unsigned int num_rx_queues
    The number of Rx queues allocated in the register_netdev() method
  • unsigned int real_num_rx_queues
    Number of Rx queues currently active in the device.
    netif_set_real_num_rx_queues (struct net_device *dev, unsigned int rxq): Sets the actual number of Rx queues used for the specified device according to the specified number of Rx queues. The relevant sysfs entries (/sys/class/net/<devName>/queues/*) are updated (only in the case that the state of the device is NETREG_REGISTERED or NETREG_UNREGISTERING). Note that alloc_netdev_mq() initializes num_rx_queues, real_num_rx_queues, num_tx_queues and real_num_tx_queues to the same value. One can set the number of Tx queues and Rx queues by using ip link when adding a device.
    For example, if you want to create a VLAN device with 6 Tx queues and 7 Rx queues, you can run this command:
    ip link add link p2p1 name p2p1.1 numtxqueues 6 numrxqueues 7 type vlan id 8
  • rx_handler_func_t __rcu *rx_handler
    netdev_rx_handler_register(struct net_device *dev, rx_handler_func_t *rx_handler void *rx_handler_data) The rx_handler callback is set by calling the netdev_rx_handler_register() method. It is used, for example, in bonding, team, openvswitch, macvlan, and bridge devices.
    netdev_rx_handler_unregister(struct net_device *dev): Unregisters a receive handler for the specified network device.
  • void __rcu *rx_handler_data
    The rx_handler_data field is also set by the netdev_rx_handler_register() method when a non-NULL value is passed to the netdev_rx_handler_register() method.
  • struct netdev_queue __rcu *ingress_queue
    struct netdev_queue *dev_ingress_queue(struct net_device *dev): Returns the ingress_queue of the specified net_device
  • struct netdev_queue *_tx
    An array of Tx queues (netdev_queue objects), initialized by the netif_alloc_netdev_queues() method.
    netdev_get_tx_queue(const struct net_device *dev,unsigned int index): Returns the Tx queue (netdev_queue object), an element of the _tx array of the specified network
    device at the specified index.
  • unsigned int num_tx_queues
    Number of Tx queues, allocated by the alloc_netdev_mq() method.
  • unsigned int real_num_tx_queues
    Number of Tx queues currently active in the device.
  • struct Qdisc *qdisc
    Each device maintains a queue of packets to be transmitted named qdisc. The Qdisc (Queuing Disciplines) layer implements the Linux kernel traffic management. The default qdisc is pfifo_fast. You can set a different qdisc using tc, the traffic control tool of the iproute2 package. You can view the qdisc of your network device by the using the ip command:
    ip addr show <deviceName>
    For example, running
    ip addr show eth1
    can give:
    2: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:e0:4c:53:44:58 brd ff:ff:ff:ff:ff:ff
    inet 192.168.2.200/24 brd 192.168.2.255 scope global eth1
    inet6 fe80::2e0:4cff:fe53:4458/64 scope link
    valid_lft forever preferred_lft forever
    In this example, you can see that a qdisc of pfifo_fast is used, which is the default.
  • unsigned long tx_queue_len
    The maximum number of allowed packets per queue. Each hardware layer has its own tx_queue_len default. For Ethernet devices, tx_queue_len is set to 1,000 by default (see the ether_setup() method). For FDDI, tx_queue_len is set to 100 by default.
    The tx_queue_len field is set to 0 for virtual devices, such as the VLAN device, because the actual transmission of packets is done by the real device on which these virtual devices are based. You can set the Tx queue length of a device by using the command ifconfig (this option is called txqueuelen) or by using the command ip link show (it is called qlen), in this way, for example:
    ifconfig p2p1 txqueuelen 900
    ip link set txqueuelen 950 dev p2p1
    The Tx queue length is exported via the following sysfs entry: /sys/class/net/<deviceName>/tx_queue_len.
  • unsigned long trans_start
    The time (in jiffies) of the last transmission
  • int watchdog_timeo
    The watchdog is a timer that will invoke a callback when the network interface was idle and did not perform transmission in some specified timeout interval. Usually the driver defines a watchdog callback which will reset the network interface in such a case. The ndo_tx_timeout() callback of net_device_ops serves as the watchdog callback. The watchdog_timeo field represents the timeout that is used by the watchdog.
  • int __percpu *pcpu_refcnt
    Per CPU network device reference counter
  • struct hlist_node index_hlist
    This is a hash table of network devices, indexed by the network device index (the ifindex field). A lookup in this table is performed by the dev_get_by_index() method. Insertion into this table is performed by the list_netdevice() method, and removal from this list is done with the unlist_netdevice() method.
  • enum {...} reg_state
    An enum that represents the various registration states of the network device.
    1. NETREG_UNINITIALIZED: When the device memory is allocated, in the alloc_netdev_mqs() method.
    2. NETREG_REGISTERED: When the net_device is registered, in the register_netdevice() method.
    3. NETREG_UNREGISTERING: When unregistering a device, in the rollback_registered_many() method.
    4. NETREG_UNREGISTERED: The network device is unregistered but it is not freed yet.
    5. NETREG_RELEASED: The network device is in the last stage of freeing the allocated memory of the network device, in the free_netdev() method.
  • NETREG_DUMMY: Used in the dummy device, in the init_dummy_netdev() method
  • bool dismantle
    A Boolean flag that shows that the device is in dismantle phase, which means that it is going to be freed.
  • enum {...} rtnl_link_state
    This is an enum that can have two values that represent the two phases of creating a new link:
    1. RTNL_LINK_INITIALIZE: The ongoing state, when creating the link is still not finished.
    2. RTNL_LINK_INITIALIZING: The final state, when work is finished.
  • void (*destructor)(struct net_device *dev)
    This destructor callback is called when unregistering a network device, in the netdev_run_todo() method. It enables network devices to perform additional tasks that need to be done for unregistering. For example, the loopback device destructor callback, loopback_dev_free(), calls free_percpu() for freeing its statistics object and free_netdev(). Likewise the team device destructor callback, team_destructor(), also calls free_percpu() for freeing its statistics object and free_netdev(). And there are many other network device drivers that define a destructor callback.
  • struct net *nd_net
    The network namespace this network device is inside.
    These features provide process virtualization, which is considered lightweight in comparison to other virtualization solutions like KVM and Xen. There is currently support for six namespaces in the Linux kernel. In order to support network namespaces, a structure called net was added. This structure represents a network namespace. The process descriptor (task_struct) handles the network namespace and other namespaces via a new member which was added for namespaces support, named nsproxy. This nsproxy includes a network namespace object called net_ns, and also four other namespace objects of the following namespaces: pid namespace, mount namespace, uts namespace, and ipc namespace; the sixth namespace, the user namespace, is kept in struct cred (the credentials object) which is a member of the process descriptor, task_struct).
    Network namespaces provide a partitioning and isolation mechanism which enables one process or a group of processes to have a private view of a full network stack of their own.
    By default, after boot all network interfaces belong to the default network namespace, init_net. You can create a network namespace with userspace tools using the ip command from iproute2 package or with the unshare command of util-linux—or by writing your own userspace application and invoking the unshare() or the clone() system calls with the CLONE_NEWNET flag. Moreover, you can also change the network namespace of a process by invoking the setns() system call. This setns() system call and the unshare() system call were added specially to support namespaces. The setns() system call can attach to the calling process an existing namespace of any type (network namespace, pid namespace, mount namespace, and so on). You need CAP_SYS_ADMIN privilege to call set_ns() for all namespaces, except the user namespace.
    A network device belongs to exactly one network namespace at a given moment. And a network socket belongs to exactly one network namespace at a given moment. Namespaces do not have names, but they do have a unique inode which identifies them. This unique inode is generated when the namespace is created and can be read by reading a procfs entry (the command ls –al /proc/<pid>/ns/ shows all the unique inode numbers symbolic links of a process—you can also read these symbolic links with the readlink command).
    For example, using the ip command, creating a new namespace called ns1 is done thus:
    ip netns add myns1
    Each newly created network namespace includes only the loopback device and includes no sockets. Each device (like a bridge device or a VLAN device) that is created from a process that runs in that namespace (like a shell) belongs to that namespace.
    Removing a namespace is done using the following command:
    ip netns del myns1
    After deleting a namespace, all its physical network devices are moved to the default network namespace. Local devices (namespace local devices that have the NETIF_F_NETNS_LOCAL flag set, like PPP device or VXLAN device) are not moved to the default network namespace but are deleted.
    Showing the list of all network namespaces on the system is done with this command:
    ip netns list
    Assigning the p2p1 interface to the myns1 network namespace is done by the command:
    ip link set p2p1 netns myns1
    Opening a shell in myns1 is done thus:
    ip netns exec myns1 bash
    With the unshare utility, creating a new namespace and starting a bash shell inside is done thus:
    unshare --net bash
    Two network namespaces can communicate by using a special virtual Ethernet driver, veth.
    dev_change_net_namespace(struct net_device *dev, struct net *net, const char *pat): Moves the network device to a different network namespace, specified by the net parameter. Local devices (devices in which the NETIF_F_NETNS_LOCAL feature is set) are not allowed to change their namespace. This method returns -EINVAL for this type of device. The pat parameter, when it is not NULL, is the name pattern to try if the current device name is already taken in the destination network namespace. The method also sends a KOBJ_REMOVE uevent for removing the old namespace entries from sysfs, and a KOBJ_ADD uevent to add the sysfs entries to the new namespace. This is done by invoking the kobject_uevent() method specifying the corresponding uevent.
    dev_net_set(struct net_device *dev, struct net *net): Decrements the reference count of the nd_net (namespace object) of the specified device and assigns the specified network namespace to it.
  • struct pcpu_lstats __percpu *lstats
    The loopback network device statistics.
  • struct pcpu_tstats __percpu *tstats
    The tunnel statistics.
  • struct pcpu_dstats __percpu *dstats
    The dummy network device statistics.
  • struct pcpu_vstats __percpu *vstats
    The VETH (Virtual Ethernet) statistics.
  • struct device dev
    The device object associated with the network device. Every device in the Linux kernel is associated with a device object, which is an instance of the device structure.
    SET_NETDEV_DEV (net, pdev): Sets the parent of the dev member of the specified network device to be that specified device (the second argument, pdev).
    With virtual devices, you do not call the SET_NETDEV_DEV() macro. As a result, entries for these virtual devices are created under /sys/devices/virtual/net.
    The SET_NETDEV_DEV() macro should be called before calling the register_
    With the udevadm tool (udev management tool), you can find the device type, for example, for a bridge device named mybr:
    udevadm info -q all -p /sys/devices/virtual/net/mybr
    P: /devices/virtual/net/mybr
    E: DEVPATH=/devices/virtual/net/mybr
    E: DEVTYPE=bridge
    E: ID_MM_CANDIDATE=1
    E: IFINDEX=7
    E: INTERFACE=mybr
    E: SUBSYSTEM=net
  • const struct attribute_group *sysfs_groups[4]
    Used by networking sysfs.
  • struct rtnl_link_ops *rtnl_link_ops
    The rtnetlink link operations object. It consists of various callbacks for handling network devices, for example:
    1. newlink() for configuring and registering a new device.
    2. changelink() for changing parameters of an existing device.
    3. dellink() for removing a device.
    4. get_num_tx_queues() for getting the number of Tx queues.
    5. get_num_rx_queues() for getting the number of Rx queues.
    Registration and unregistration of rtnl_link_ops object is done with the rtnl_link_register() method and the rtnl_link_unregister() method, respectively.
  • unsigned int gso_max_size
    netif_set_gso_max_size(struct net_device *dev, unsigned int size): Sets the specified gso_max_size for the specified network device.
  • u8 num_tc
    The number of traffic classes in the net device, the maximum value of num_tc can be TC_MAX_QUEUE, which is 16
  • struct netdev_tc_txq tc_to_txq[TC_MAX_QUEUE]
    u8 prio_tc_map[TC_BITMASK + 1];
    struct netprio_map __rcu *priomap
    The network priority cgroup module provides an interface to set the priority of network traffic. The cgroups layer is a Linux kernel layer that enables process resource management and process isolation. It enables assigning one task or several tasks to a system resource, like a networking resource, memory resource, CPU resource, and so on. The cgroups layer implements a Virtual File System (VFS) and is managed by filesystem operations like mounting/unmounting, creating files and directories, writing to cgroup VFS control files, and so forth.There is no relation between the cgroup implementation and the namespaces implementation.
    There are two networking cgroups modules: net_prio and net_cls. These two cgroup modules are relatively short and simple.
    Setting the priority of network traffic with the netprio cgroup module is done by writing an entry to a cgroup control file, /sys/fs/cgroup/net_prio/<group>/net_prio. ifpriomap. The entry is in the form “deviceName priority.” It is true that an application can set the priority of its traffic via the setsockopt() system call with SO_PRIORITY, but this is not always possible. Sometimes you cannot change the code of certain applications.
    Moreover, you want to let the system administrator decide on priority according to site-specific setup. The netprio kernel module is a solution when using the setsockopt() system call with SO_PRIORITY is not feasible. The netprio module also exports another /sys/fs/cgroup/netprio entry, net_prio.prioidx. The net_prio.prioidx entry is a read-only file and contains a unique integer value that the kernel uses as an internal representation of this cgroup.
    The network classifier cgroup provides an interface to tag network packets with a class identifier (classid). Creating a net_cls cgroups instance creates a net_cls.classid control file. This net_cls.classid value is initialized to 0. You can set up rules for this classid with tc, the traffic control command of iproute2.
  • struct phy_device *phydev
    The associated PHY device. The phy_device is the Layer 1 (the physical layer) device. For many devices, PHY flow control parameters like autonegotiation, speed, or duplex can be configured via the PHY device with ethtool commands.
  • int group
    The group that the network device belongs to. It is initialized with INIT_NETDEV_GROUP (0) by default. The group is exported by sysfs via /sys/class/net/<devName>/netdev_group.
  • struct pm_qos_request pm_qos_req
    Power Management Quality Of Service request object
  • The netdev_priv(struct net_device *netdev) method returns a pointer to the end of the net_device. This area is used by drivers, which define a private network interface structure in order to store private data. For example:
    static int e1000_open(struct net_device *netdev)
    {
    struct e1000_adapter *adapter = netdev_priv(netdev);
    . . .
    }
    The netdev_priv() method is used also for software devices, like the VLAN device.
    So you have:
    static inline struct vlan_dev_priv *vlan_dev_priv(const struct net_device *dev)
    {
    return netdev_priv(dev);
    }
  • The alloc_netdev(sizeof_priv, name, setup) macro is for allocation and initialization of a network device. It is in fact a wrapper around alloc_netdev_mqs(), with one Tx queue and one Rx queue. sizeof_priv is the size of private data to allocate space for. The setup method is a callback to initialize the network device. For Ethernet devices, it is usually ether_setup().
    For Ethernet devices, you can use the alloc_etherdev() or alloc_etherdev_mq() macros, which eventually invoke alloc_etherdev_mqs(); alloc_etherdev_mqs() is also a wrapper around alloc_netdev_mqs(), with the ether_setup() as the setup callback method.
  • Software devices usually define a setup method of their own. So, in PPP you have the ppp_setup() method and for VLAN you have vlan_setup(struct net_device *dev)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值