Device Registration and Initialization

When a Device Is Registered

The registration of a network device takes place in the following situations:
Loading an NIC’s device driver
An NIC’s device driver is initialized at boot time if it is built into the kernel, and at runtime if it is loaded as a module. Whenever initialization occurs, all the NICs controlled by that driver are registered.
Inserting a hot-pluggable network device
When a user inserts a hot-pluggable NIC, the kernel notifies its driver, which then registers the device. (For the sake of simplicity, we’ll assume the device driver is already loaded.)

When a Device Is Unregistered
Two main conditions trigger the unregistration of a device:
Unloading an NIC device driver
This can be done only for drivers loaded as modules, of course, not for those built into the kernel. When the administrator unloads an NIC’s device driver, all the associated NICs must be unregistered.
Removing a hot-pluggable network device
When a user removes a hot-pluggable NIC from a system whose running kernel has support for hot-pluggable devices, the network device is unregistered.

Allocating net_device Structures

These data structures are allocated with alloc_netdev, defined in net/core/dev.c, which requires three input parameters:
Size of private data structure
This parameter specifies the size of the block.
Device name
This may be a partial name that the kernel will complete through some scheme that ensures unique device names.
Setup routine
This routine is used to initialize a portion of the net_device’s fields.
The return value is a pointer to the net_device structure allocated, or NULL in case of errors.

Every device is assigned a name that depends on the device type and that, to be unique, contains a number that is assigned sequentially as devices of the same type are registered. Ethernet devices, for instance, are called eth0, eth1, and so on. A single device may be called with different names depending on the order with which the devices are registered. For instance, if you had two cards handled by two different modules, the names of the devices would depend on the order in which the two modules were loaded. Hot-pluggable devices lend themselves particularly to unanticipated
name changes.

(a) Device registration model; (b) device unregistration model

Device Initialization

The net_device structure is pretty big. Its fields are initialized in chunks by different:
Device drivers
Parameters such as IRQ, I/O memory, and I/O port, whose values depend on the hardware configuration, are taken care of by the device driver.
Device type
The initialization of fields common to all the devices of a device type family is taken care by the xxx_setup routines. For example, Ethernet devices use ether_setup.
Features
Mandatory and optional features also need to be initialized. For example, the queuing discipline (i.e., QoS) is initialized in register_netdevice.

Device Driver Initializations

The net_device fields initialized by the device driver are usually taken care of by the xxx_probe function

Virtual devices usually inherit configuration parameters from the real devices they are associated with, and then adjust them if needed.

Organization of net_device Structures
Normally, the second part is allocated together with the first one so that a single kmalloc is sufficient, but there are also cases where the driver prefers to allocate its private block by itself
the size of the driver’s private block and its content change not only from one device type to another (e.g., Token Ring versus Ethernet) but also among devices of the same type (e.g., two different Ethernet cards).

dev_base and the next pointer in net_device point to the beginning of the net_device structure, not to the beginning of the allocated block. However, the size of the initial padding is saved in dev->padded, which allows the kernel to release the whole memory block when it is time to do so.

net_device data structures are inserted both in a global list and in two hash tables. These different structures allow the kernel to easily browse or look up the net_device database as required. Here are the details:

dev_base
This global list of all net_device instances allows the kernel to easily browse devices in case, for instance, it has to get some statistics, change a configuration across all devices as a consequence of a user command, or find devices matching given search criteria.
Because each driver has its own definition for the private data structure, the global list of net_device structures may link together elements of different sizes

Global list of registered devices

dev_name_head
This is a hash table indexed on the device name. It is useful, for instance, when applying a configuration change via the ioctl interface. The old-generation configuration tools that talk to the kernel via the ioctl interface usually refer to devices by their names.
dev_index_head
This is a hash table indexed on the device ID dev->ifindex. Cross-references to net_device structures usually store either device IDs or pointers to net_device structures; dev_index_head is useful for the former. Also, the new-generation configuration tool ip (from the IPROUTE2 package), which talks to the kernel via the Netlink socket, usually refers to devices by their ID.

Hash tables used to search net_device instances based on device name and device index

Device State
The net_device structure includes different fields that define the current state of the device. These include:
flags
Bitmap used to store different flags. Most of them represent a device’s capabilities. However, one of them, IFF_UP, is used to say whether the device is enabled (up) or disabled (down). You can find the list of IFF_XXX flags in include/linux/if.h
reg_state
Device registration state.

state
Device state with regard to its queuing discipline.

You may find a little bit of overlap sometimes between these variables. For example, every time IFF_UP is set in flags, _ _LINK_STATE_START is set in state, and vice versa. Both of them are set and cleared, respectively, by dev_open and dev_close. However, their domains are different, and a little bit of overlap may sometimes be introduced when writing modular code

Queuing Discipline State
Each network device is assigned a queuing discipline, which is used by Traffic Control to implement its QoS mechanisms. The state field of net_device is one of the structure’s fields used by Traffic Control. state is a bitmap, and the following list shows the flags that can be set. They are defined in include/linux/netdevice.h.
_ _LINK_STATE_START
The device is up. This flag can be checked with netif_running.
_ _LINK_STATE_PRESENT
The device is present. This flag may look superfluous; but take into account that hot-pluggable devices can be temporally removed. The flag is also cleared and restored, respectively, when the system goes into suspend mode and then resumes. The flag can be checked with netif_device_present.
_ _LINK_STATE_NOCARRIER
There is no carrier. The flag can be checked with netif_carrier_ok.
_ _LINK_STATE_LINKWATCH_EVENT
The device’s link state has changed.
_ _LINK_STATE_XOFF
_ _LINK_STATE_SHED
_ _LINK_STATE_RX_SCHED
These three flags are used by the code that manages ingress and egress traffic on the device.

Registration State
The state of a device with regard to its registration with the network stack is saved in the reg_state field of the net_device structure. The NETREG_XXX values it can take are defined in include/linux/netdevice.h, within the net_device structure definition. Here is a brief description:
NETREG_UNINITIALIZED
Defined as 0. When the net_device data structure is allocated and its contents zeroed, this value represents the 0 in dev->reg_state.
NETREG_REGISTERING
The net_device structure has been added but the kernel still needs to add an entry to the /sys filesystem.
NETREG_REGISTERED
The device has been fully registered.
NETREG_UNREGISTERING
The net_device structure has been removed
NETREG_UNREGISTERED
The device has been fully unregistered (which includes removing the entry from /sys), but the net_device structure has not been freed yet.
NETREG_RELEASED
All the references to the net_device structure have been released. The data structure can be freed, from the networking code’s perspective.

Registering and Unregistering Devices

Network devices are registered and unregistered with the kernel with register_netdev and unregister_netdev, respectively. These are simple wrappers that take care of locking and then invoke the routines register_netdevice and unregister_netdevice, respectively.
All of them are defined in net/core/dev.c.
Changes of state may use intermediate states between NETREG_UNINITIALIZED and NETREG_REGISTERED. These progressions are handled by netdev_run_todo
The two net_device virtual functions init and uninit can be used by device drivers to initialize and clean up private data, respectively, when registering and unregistering a device. They are mainly used by virtual devices.

The unregistration of a device cannot be completed until all references to the associated net_device data structure have been released: netdev_wait_allrefs does not return until that condition is met.
Both the registration and unregistration of a device are completed by netdev_run_todo.

net_device’s registration state machine

Split Operations: netdev_run_todo

register_netdevice takes care of a portion of the registration, and then lets netdev_run_todo complete it. At first, it may not be clear how this happens by looking at the code.
Changes to net_device structures are protected with the Routing Netlink semaphore via rtnl_lock and rtnl_unlock, which is why register_netdev acquires the lock (semaphore) at the beginning and releases it before returning . Once register_netdevice is done with its job, it adds the new net_device structure to net_todo_list with net_set_todo. That list contains the devices whose registration (or unregistration, as we will see in a moment) has to be completed. The list is not processed by a separate kernel thread or by means of a periodic timer; it will be up to register_netdev to indirectly process it when releasing the lock.

Thus, rtnl_unlock not only releases the lock, but also calls netdev_run_todo.* The latter function browses the net_todo_list array and completes the registration of all its net_device instances.
Only one CPU can be running net_run_todo at any one time. Serialization is enforced with the net_todo_run_mutex mutex.
The unregistration of a device is handled exactly the same way

Structure of register_netdev and unregister_netdev

Note that since the registration and unregistration tasks handled by netdev_run_todo do not hold the lock, this function can safely sleep and leave the semaphore available.
the kernel cannot have more than
one net_device instance in net_todo_list by the time netdev_run_todo is called. How can there be more than one element if register_netdev and unregister_netdev add only one net_device instance to the list and then process the latter right away when releasing the lock? Well, for example, it is possible for a device driver to use a loop like the following to unregister all of its devices in one shot (see, for instance, tun_cleanup in drivers/net/tun.c):
rtnl_lock( );
loop for each device driven by this driver {
... ... ...
unregister_netdevice(dev);
... ... ...

}
rtnl_unlock( );
This is better than the following approach, which gets and releases the lock and processes net_todo_list at each iteration of the loop:
loop for each device driven by this driver {
... ... ...
unregister_netdev(dev);
... ... ...
}

Device Registration Status Notification
Both kernel components and user-space applications may be interested in knowing when a network device is registered, unregistered, goes down, or comes up. Notifications about these events are sent via two channels:
netdev_chain
Kernel components can register with this notification chain.
Netlink’s RTMGRP_LINK multicast group
User-space applications, such as monitoring tools or routing protocols, can register with RTnetlink’s RTMGRP_LINK multicast group.

netdev_chain notification chain
The
progress through the various stages of registering and unregistering a device is reported with the netdev_chain notification chain. This chain is defined in net/core/dev.c, and kernel components interested in these kinds of events register and unregister with the chain with register_netdevice_notifier and unregister_netdevice_notifier, respectively.
All the NETDEV_XXX events that are reported via netdev_chain are listed in include/linux/notifier.h. Here are the ones we have seen in this chapter, together with the
conditions that trigger them:
NETDEV_UP
NETDEV_GOING_DOWN
NETDEV_DOWN
NETDEV_UP is sent to report about a device that has been enabled, and is generated by dev_open.
NETDEV_GOING_DOWN is sent when the device is about to be disabled. NETDEV_DOWN is sent when the device has been disabled. They are both generated by  dev_close.

NETDEV_REGISTER
The device has been registered. This event is generated by register_netdevice.
NETDEV_UNREGISTER
The device has been unregistered. This event is generated by unregister_netdevice.
NETDEV_REBOOT
The device has restarted due to a hardware failure. Currently not used.
NETDEV_CHANGEADDR
The hardware address (or the associated broadcast address) of the device has changed.
NETDEV_CHANGENAME
The device has changed its name.

NETDEV_CHANGE
The device status or configuration of the device has changed. This is used in all the cases not covered by NETDEV_CHANGEADDR and NETDEV_CHANGENAME. It is currently used when something changes in dev->flags.
The NETDEV_CHANGEXXX notifications are usually generated in response to a user configuration change.

Note that register_netdevice_notifier, when registering with the chain, also replays (to the new registrant only) all the past NETDEV_REGISTER and NETDEV_UP notifications for the devices currently registered in the system. This gives the new registrant a clear picture of the current status of the registered devices.
Quite a few kernel components register to netdev_chain. Among them are:
Routing
For instance, the routing subsystem uses this notification to add or remove all the routing entries associated with the device.
Firewall
For example, if the firewall had buffered any packet from a device that now is down, it has to either drop the packet or take another action according to its policies.
Protocol code (i.e., ARP, IP, etc.)
For example, when you change the MAC address of a local device, the ARPtable must be updated accordingly.

RTnetlink link notifications

Notifications are sent to the Link multicast group RTMGRP_LINK with rtmsg_ifinfo when something changed in the device’s state or configuration. Among these notifications
are:
When a notification is received on the netdev_chain notification chain. RTnetlink registers to the netdev_chain chain and replays the notifications it receives.
When a disabled device is enabled or vice versa (see netdev_state_change).
When a flag in net_device->flags is changed, for example, via a user configuration command (see dev_change_flags).
netplugd is a daemon, part of the net-utils package, that listens to these notifications and reacts according to a user configuration file.

Device Registration
Device registration does not consist
simply of inserting the net_device structure into the global list and hash tables.  It also involves the initialization of some parameters in the net_device structure, the generation of a broadcast notification that will inform other kernel components about the registration, and other tasks. Devices are registered with register_netdev, which is a simple wrapper around register_netdevice. The wrapper mainly takes care of locking and name completion. The lock protects the dev_base list of registered devices.

register_netdevice Function
register_netdevice starts device registration and calls net_set_todo, which ultimately asks netdev_run_todo to complete the registration. Here are the main tasks carried out by register_netdevice:
Initialize some of the net_device’s fields, including the ones used for locking

When the kernel has support for the Divert feature, allocate a configuration block needed by the feature and link it to dev->divert. This is taken care of by alloc_divert_blk.
If the device driver had initialized dev->init, execute that function.
Assign the device a unique identifier with dev_new_index. The identifier is generated using a counter that is incremented every time a new device is added to the system. This counter is a 32-bit variable, so dev_new_index includes an if clause to handle wraparound as well as another if clause to handle the possibility that the variable hits a value that was already assigned.
Append net_device to the global list dev_base and insert it into the two hash tables. Even though adding the structure at the head of dev_base would be faster, the kernel has a chance to check for duplicate device names by browsing the entire list. The device name is checked against invalid names with dev_valid_name.

Check the feature flags for invalid combinations. For example:
Scather/Gather-DMA is useless without L4 hardware checksumming support and is therefore disabled in that situation.
TCPSegmentation Offload (TSO) requires Scather/Gather-DMA, and is therefore disabled when the latter is not supported.
Set the _ _LINK_STATE_PRESENT flag in dev->state to make the device available (visible and usable) to the system. The flag is cleared, for example, when a hotpluggable device is unplugged, or when a system with support for power management goes into suspend mode. The initialization of this flag does not trigger any action; instead, its value is checked in well-defined cases to filter out illegal requests or to get the device state.

Initialize the device’s queuing discipline, used by Traffic Control to implement
QoS, with dev_init_scheduler. The queuing discipline defines how egress packets
are queued to and dequeued from the egress queue, defines how many packets
can be queued before starting to drop them, etc.
Notify all the subsystems interested in device registration via the netdev_chain notification chain.

When netdev_run_todo is called to complete the registration, it just updates dev->reg_state and registers the device in the sysfs filesystem.
Aside from memory allocation problems, device registration can fail only if the device name is invalid or is a duplicate, or when dev->init fails for some reason.

Device Unregistration
To unregister a device, the kernel and the associated device driver need to undo all the operations that were executed during its registration, and more:
Disable the device with dev_close
Release all the allocated resources (IRQ, I/O memory, I/O port, etc.)
Remove the net_device structure from the global list dev_base and the two hash tables Once all the references to the structure have been released, free the net_device data structure, the driver’s private data structure, and any other memory block linked to it. The net_device structure is freed with free_netdev. When the kernel is compiled with support for sysfs, free_netdev lets it take care of freeing the structure.
Remove any file that may have been added to the /proc and /sys filesystems.
Note that whenever there is a dependency between devices, unregistering one of them may force the unregistration of all (or part) of the others.

Three function pointers in net_device (represented by a variable named dev) come into the picture when unregistering a device:
dev->stop
This function pointer is initialized by the device driver to one of its local routines. It is invoked by dev_stop when disabling a device . Common tasks handled here include stopping the egress queue with netif_stop_queue,* releasing hardware resources, stopping any timers used by the device driver, etc. Virtual devices do not need to release any hardware resources, but they may need to take care of other, high-level issues.
dev->uninit
This function pointer is also initialized by the device driver to one of its local routines. Only a few, tunneling virtual devices currently initialize it; they point it to a routine that mainly takes care of reference counts

dev->destructor
When used, this is normally initialized to free_netdev or to a wrapper around it. However, destructor is not commonly initialized; only a few virtual devices use it. Most device drivers call free_netdev directly after unregister_netdevice

unregister_netdevice Function
unregister_netdevice accepts one parameter, the pointer to the net_device structure. it is to remove:
int unregister_netdevice(struct net_device *dev) Two calls to synchronize_net are used to synchronize unregister_netdevice with the receive engine (net_rx_action) so that it will not access old data after it has been updated by unregister_netdevice.
Other tasks taken care of by unregister_netdevice include:

If the device was not disabled, it has to be disabled first with dev_close. The net_device instance is then removed from the global list dev_base and the two hash tables. Note that this is not sufficient to forbid kernel subsystems from using the device: they may still hold a pointer to the net_device data structure. This is why net_device uses a reference count to keep track of how many references are left to the structure
All the instances of queuing discipline associated with the device are destroyed with dev_shutdown.
A NETDEV_UNREGISTER notification is sent on the netdev_chain notification chain to let other kernel components know about it. User space has to be notified about the unregistration. For instance, in a system with two NICs that could be used to access the Internet, this notification could be used to start the secondary device.
Any data block linked to the net_device structure is freed. For example, the multicast data dev->mc_list is removed with dev_mc_discard, the Divert block is removed with free_divert_blk, etc. The ones that are not explicitly removed in unregister_netdevice are supposed to be removed by the function handlers that process the notifications mentioned in the previous bullet.
Whatever was done by dev->init in register_netdevice is undone here with dev->uninit.
Features such as bonding allow you to group a set of devices together and treat them as a single virtual device with special characteristics. Among those devices, one is often elected master because it plays a special role within the group. For obvious reasons, the device being removed should release any reference to the master device: having dev->master non-NULL at this point would be a bug. If we stick to the bonding example, the dev->master reference is cleared thanks to the NETDEV_UNREGISTER notifications sent just a few lines of code earlier.
Finally, net_set_todo is called to let net_run_todo complete the unregistration,  and the reference count is decreased with dev_put. net_run_todo unregisters the device from sysfs, changes dev->reg_state to NETREG_UNREGISTERED, waits until all the references are gone, and completes the unregistration with a call to dev->destructor.

Reference Counts
net_device structures cannot be freed until all the references to it are released. The reference count for the structure is kept in dev->refcnt, which is updated every time a reference is added or removed, respectively, with dev_hold and dev_put. When a device is registered with register_netdevice, dev->refcnt is initialized to 1. This first reference is therefore kept by the kernel code that is responsible for the network devices database. This reference will be released only with a call to unregister_netdevice. This means that dev->refcnt will never drop to zero until the device is to be unregistered. Therefore, unlike other kernel objects that are freed by the xxx_put routine when the reference count drops to zero, net_device data structures are not freed until you unregister the device from the kernel.

In summary, the call to dev_put at the end of unregister_netdevice is not sufficient to make a net_device instance eligible for deletion: the kernel still needs to wait until all the references are released. But because the device is no longer usable after it is unregistered, the kernel needs to notify all the reference holders so that they can release their references. This is done by sending a NETDEV_UNREGISTER notification to the netdev_chain notification chain. This also means that reference holders should register to the notification chain; otherwise, they will not be able to receive such notifications and take action accordingly.
unregister_netdevice starts the unregistration process and lets netdev_run_todo complete it. netdev_run_todo calls netdev_wait_allrefs to indefinitely wait until all references to the net_device structure have been released.

Function netdev_wait_allrefs
netdev_wait_allrefs consists of a loop that ends only when the value of dev->refcnt drops to zero. Every second it sends out a NETDEV_UNREGISTER notification, and every 10 seconds it prints a warning on the console. The rest of the time it sleeps. The function does not give up until all the references to the input net_device structure have been released.
Two common cases that would require more than one notification to be sent are:
A bug
For example, a piece of code could hold references to net_device structures, but it may not release them because it has not registered to the netdev_chain notification chain, or because it does not process notifications correctly.
A pending timer
For example, suppose the routine that is executed when some timer expires needs to access data that includes references to net_device structures. In this case, you would need to wait until the timer expires and its handler hopefully releases its references.

Note that since netdev_run_todo is started by unregister_netdevice when it releases the lock it means that whoever started the unregistration, most probably the driver, is going to sleep waiting for netdev_run_todo to complete its job.
When the function sends the notification, it also processes the pending link state change events.  when a device is being unregistered, the kernel does not need to do anything when informed about a link state change event on the device. When the current device state is that the device is about to be removed, events associated with devices being removed are associated with no-ops when the link state change event list is processed, so the result is that the event list is cleared and only events for other devices are actually processed. This is just an easy way to clean up the link state change queue from events associated with a device about to disappear.

Enabling and Disabling a Network Device
Once a device has been registered it is available for use, but it will not transmit and receive traffic until it is explicitly enabled by the user (or a user-space application).
Requests to enable a device are taken care of by dev_open, defined in net/core/dev.c.
Enabling a device consists of the following tasks:
Call dev->open if it is defined. Not all device drivers initialize this function.
Set the _ _LINK_STATE_START flag in dev->state to mark the device as up and running.

Function netdev_wait_allrefs

Set the IFF_UP flag in dev->flags to mark the device as up.
Call dev_activate to initialize the egress queuing discipline used by Traffic Control, and start the watchdog timer.* If there is no user configuration for Traffic Control, assign a default First In, First Out (FIFO) queue.
Send a NETDEV_UP notification to the netdev_chain notification chain to notify interested kernel components that the device is now enabled

While a device needs to be explicitly enabled, it can be disabled either explicitly by a user command or implicitly by other events. For example, before a device is unregistered, it is first disabled. Network devices are disabled with dev_close. Disabling a device consists of the following tasks:
Send a NETDEV_GOING_DOWN notification to the netdev_chain notification chain to notify interested kernel components that the device is about to be disabled

Call dev_deactivate to disable the egress queuing discipline, thus making sure the device cannot be used for transmission anymore, and stop the watchdog timer because it is not needed anymore.
Clear the _ _LINK_STATE_START flag in dev->state to mark the device as down.
If a polling action was scheduled to read ingress packets on the device, wait for that action to complete. Because the _ _LINK_STATE_START flag has been cleared, no more receive polling will be scheduled on the device, but one could have been pending before the flag was cleared.
Call dev->stop if it is defined. Not all device drivers initialize this function.
Clear the IFF_UP flag in dev->flags to mark the device as down.
Send a NETDEV_DOWN notification to the netdev_chain notification chain to notify interested kernel components that the device is now disabled.

Interactions with Power Management

When the kernel has support for power management, NIC device drivers can be notified when the system goes into suspend mode, when it is resumed, etc. This is, for example, how the drivers/net/3c59x.c device driver initializes its pci_driver instance:
static struct pci_driver vortex_driver = {
.name "3c59x",
.probe vortex_init_one,
.remove _ _devexit_p(vortex_remove_one),
.id_table vortex_pci_tbl,
#ifdef CONFIG_PM
.suspend vortex_suspend,
.resume vortex_resume,
#endif
};

When the system goes into suspend mode, the suspend routines provided by device drivers are executed to let drivers take action accordingly. Power management state changes do not affect the registration status dev->reg_state, but the device state dev->state needs to be changed

Suspending a device
When a device is suspended, its device driver handles the event, by calling, for example, the pci_driver’s suspend routine for PCI devices. Besides the driver-specific actions, a few additional actions must be performed by every device driver:
Clear the _ _LINK_STATE_PRESENT flag from dev->state because the device is temporarily not going to be operational.
If the device was enabled, disable its egress queue with netif_stop_queue* to prevent the device from being used to transmit any other packet. Note that a device that is registered is not necessarily enabled: when a device is recognized, it gets assigned to its device driver by the kernel and is registered; however, the device will not be enabled (and therefore usable) until an explicit user configuration requests it.

These tasks are succinctly implemented by netif_device_detach:
static inline void netif_device_detach(struct net_device *dev)
{
if (test_and_clear_bit(_ _LINK_STATE_PRESENT, &dev->state) &&
netif_running(dev)) {
netif_stop_queue(dev);
}
}

Resuming a device
When a device is resumed, its device driver handles the event, by calling, for example, the pci_driver’s resume routine for PCI devices. Again, a few tasks are shared by all device drivers:
Set the _ _LINK_STATE_PRESENT flag in dev->state because the device is now available again.
If the device was enabled before being suspended, re-enable its egress queue with netif_wake_queue, and restart a watchdog timer used by Traffic Control
These tasks are implemented by netif_device_attach:
static inline void netif_device_attach(struct net_device *dev)
{
if (!test_and_set_bit(_ _LINK_STATE_PRESENT, &dev->state) &&
netif_running(dev)) {
netif_wake_queue(dev);
_ _netdev_watchdog_up(dev);
}
}

Link State Change Detection
When an NIC device driver detects the presence or absence of a carrier or signal, either because it was notified by the NIC or via an explicit check by reading a configuration register on the NIC, it can notify the kernel with netif_carrier_on and netif_carrier_off, respectively. These routines are to be called when there is a change in the carrier status; therefore, they do nothing when they are invoked inappropriately.
Here are a few common cases that may lead to a link state change:
A cable is plugged into or unplugged from an NIC.
The device at the other end of the cable is powered down or disabled. Examples of devices include hubs, bridges, routers, and PC NICs.
When netif_carrier_on is called by a device driver that has detected the carrier on one of its devices, the function:

Clears the _ _LINK_STATE_NOCARRIER flag from dev->state.
Generates a link state change event and submits it for processing with linkwatch_fire_event.
If the device was enabled, starts a watchdog timer. The timer is used by Traffic Control to detect whether a transmission fails and gets stuck (in which case the timer times out).
static inline netif_carrier_on(struct net_device *dev)
{
if (test_and_clear_bit(_ _LINK_STATE_NOCARRIER, &dev->state))
linkwatch_fire_event(dev);
if (netif_running(dev)
_ _netdev_watchdog_up(dev);
}
When netif_carrier_off is called by a device driver that has detected the loss of a carrier from one of its devices, the function

Sets the _ _LINK_STATE_NOCARRIER flag in dev->state.
Generates a link state change event and submits it for processing with linkwatch_fire_event.
Note that both routines generate a link state change event and submit it for processing with linkwatch_fire_event, described in the next section.
static inline netif_carrier_off(struct net_device *dev)
{
if (!test_and_set_bit(_ _LINK_STATE_NOCARRIER, &dev->state))
linkwatch_fire_event(dev);
}

Scheduling and processing link state change events
Link state change events are defined with lw_event structures. It’s a pretty simple structure: it includes just a pointer to the associated net_device structure and another field used to link the structure to the global list of pending link state change events, lweventlist. The list is protected by the lweventlist_lock lock.
Note that the lw_event structure does not include any parameter to distinguish between detection and loss of carrier. This is because no differentiation is needed. All the kernel needs to know is that there was a change in the link status, so a reference to the device is sufficient. There will never be more than one lw_event instance in lweventlist for any device, because there’s no reason to record a history or track changes: either the link is operational or it isn’t, so the link state is either on or off.
Two state changes equal no change, three changes equal one, etc., so new events are not queued when the device already has a pending link state change event. The condition can be detected by checking the _ _LINK_STATE_LINKWATCH_PENDING flag in dev->state

linkwatch_fire_event function

Once the lw_event data structure has been initialized with a reference to the right net_device instance and it has been added to the lweventlist list, and the _ _LINK_STATE_LINKWATCH_PENDING flag has been set in dev->state, linkwatch_fire_event needs to launch the routine that will actually process the elements on the lweventlist list.
This routine, linkwatch_event, is not called directly. It is scheduled for execution by submitting a request to the keventd_wq kernel thread: a work_struct data structure is initialized with a reference to the linkwatch_event routine and is submitted to keventd_wq.
To avoid having the processing routine linkwatch_event run too often, its execution is rate limited to once per second.
linkwatch_event processes the elements of the lweventlist list with linkwatch_run_queue. Processing lw_event instances consists simply of:

Clearing the _ _LINK_STATE_LINKWATCH_PENDING flag on dev->state.
Sending a NETDEV_CHANGE notification on the netdev_chain notification chain
Sending an RTM_NEWLINK notification to the RTMGRP_LINK RTnetlink group.
The two notifications are sent with netdev_state_change, but only when the device is enabled (dev->flags & IFF_UP): no one cares about link state changes on disabled  devices.

Linkwatch flags
The code in net/core/linkwatch.c defines two flags that can be set in the global variable linkwatch_flags:
LW_RUNNING
When this flag is set, linkwatch_event has been scheduled for execution. The flag is cleared by linkwatch_event itself.
LW_SE_USED
Because lweventlist usually has at most one element, the code optimizes lw_event data structure allocations by statically allocating one and always using it as the first element of the list. Only when the kernel needs to keep track of more than one pending event (events on more than one device) does it allocate additional lw_event structures; otherwise, it simply recycles the same one.

Virtual Devices

virtual devices need to be registered and enabled just like real ones, to be used. However, there are differences:
Virtual devices sometimes call register_netdevice and unregister_netdevice rather than their wrappers, and take care of locking by themselves. They may need to handle locking to keep the lock for a little longer than a real device does. With this approach, the lock could also be misused and hold longer than needed, by making it protect additional pieces of code (besides register_netdev) that could be protected in other ways.

Real devices cannot be unregistered (i.e., destroyed) with user commands; they can only be disabled. Real devices are unregistered at the time their drivers are unloaded (when loaded as modules, of course). Virtual devices, in contrast, may be created and unregistered with user commands, too. Whether this is possible depends on the virtual device driver’s design.  virtual devices, unlike most real ones, use dev->init, dev->uninit, and dev->destructor. Because most virtual devices implement some kind of more or less complex logic on top of real devices, they use dev->init and dev->uninit to take care of extra initialization and cleanup. dev->destructor is often initialized to free_netdev  so that the driver does not need to explicitly call the latter function after unregistration.
virtual devices do not have a probe routine
Virtual device drivers register to the netdev_chain notification chain because most virtual devices are defined on top of real devices, so changes to real devices affect virtual ones, too. Let’s see two examples:

Bonding
Bonding is a virtual device that allows you to bundle a set of interfaces and make them look like a single one. Traffic can be distributed between the set of interfaces using different algorithms, one of which is a simple round robin. Let’s take the example in. When eth0 goes down, the bonding interface bond0 needs to know about it to take it into account when distributing traffic between the real devices. In case eth1 went down too, bond0 would have to be disabled because there would not be any working real device left.
VLAN interfaces
Linux supports the 802.1Q protocol and allows you to define Virtual LAN (VLAN) interfaces. where the user has defined two VLAN interfaces on eth0. When eth0 goes down, all virtual (VLAN) interfaces must go down, too.

Locking
the dev_base list
and the two hash tables dev_name_head and dev_name_index are protected by the dev_base_list lock. That lock, however, is used only to serialize accesses to the list and tables, not to serialize changes to the contents of net_device data structures. net_device content changes are taken care of by the Routing Netlink semaphore (rtnl_sem), which is acquired and released with rtnl_lock and rtnl_unlock, respectively.* This semaphore is used to serialize changes to net_device instances from:
Runtime events
For example, when the link state changes (e.g., a network cable is plugged or unplugged), the kernel needs to change the device state by modifying dev->flags.
Configuration changes
When the user applies a configuration change with commands such as ifconfig and route from the net-tools package, or ip from the IPROUTE2 package, the kernel is notified via ioctl commands and the Netlink socket, respectively. The routines invoked via these interfaces must use locks.

The net_device data structure includes a few fields used for locking, among them:
ingress_lock
queue_lock
Used by Traffic Control when dealing with ingress and egress traffic scheduling, respectively.
xmit_lock
xmit_lock_owner
Used to synchronize accesses to the device driver hard_start_xmit function.


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值