Chapter 6. The PCI Layer and Network Interface Cards

 

Chapter 6. The PCI Layer and Network Interface Cards

Given the popularity of the PCI bus, on the x86 as well as other architectures, we will spend a few pages on it so that you can understand how PCI devices are managed by the kernel, with special emphasis on network devices. This chapter will help you find a context for the code about device registration we will see in Chapter 8. You will also learn a bit about how PCI handles some nifty kernel features such as probing and power management. For an in-depth discussion of PCI, such as device driver design, PCI bus features, and implementation details, refer to Linux Device Drivers and Understanding the Linux Kernel, as well as PCI specifications.

The PCI subsystem (also known as the PCI layer ) in the kernel provides all the generic functions that are used in common by various PCI device drivers. This subsystem takes a lot of work off the shoulders of the programmer for each individual device, lets drivers be written in a clean manner, and makes it easier for the kernel to collect and maintain information about the devices, such as accounting information and statistics.

In this chapter, we will see the meaning of a few key data structures used by the PCI layer and how these structures are initialized by one common NIC device driver. I'll conclude with a few words on the PCI power management and Wake-on-LAN features.

 

6.1. Data Structures Featured in This Chapter

Here are a few key data structure types used by the PCI layer. There are many others, but the following ones are all we need to know for our overview in this book. The first one is defined in include/linux/mod_devicetable.h, and the other two are defined in include/linux/pci.h.


pci_device_id

Device identifier. This is not a local ID used by Linux, but an ID defined accordingly to the PCI standard. The later section "Registering a PCI NIC Device Driver" shows the ID's definition, and the later section "Example of PCI NIC Driver Registration" presents an example.


pci_dev

Each PCI device is assigned a pci_dev instance, just as network devices are assigned net_device instances. This is the structure used by the kernel to refer to a PCI device.


pci_driver

Defines the interface between the PCI layer and the device drivers. This structure consists mostly of function pointers. All PCI devices use it. See the later section "Example of PCI NIC Driver Registration" for its definition and an example of its initialization.

PCI device drivers are defined by an instance of a pci_driver structure. Here is a description of its main fields, with special attention paid to the case of NIC devices. The function pointers are initialized by the device driver to point to appropriate functions within that driver.


char *name

Name of the driver.


const struct pci_device_id *id_table

Vector of IDs the kernel will use to associate devices to this driver. The section "Example of PCI NIC Driver Registration" shows an example.


int (*probe)(struct pci_dev *dev, const struct pci_device_id *id)

Function invoked by the PCI layer when it finds a match between a device ID for which it is seeking a driver and the id_table mentioned previously. This function should enable the hardware, allocate the net_device structure, and initialize and register the new device.[*] In this function, the driver also allocates any additional data structures (e.g., buffer rings used during transmission or reception) that it may need to work properly.

[*] NIC registration is covered in Chapter 8.


void (*remove)(struct pci_dev *dev)

Function invoked by the PCI layer when the driver is unregistered from the kernel or when a hot-pluggable device is removed. It is the counterpart of probe and is used to clean up any data structure and state.

Network devices use this function to release the allocated I/O ports and I/O memory, to unregister the device, and to free the net_device data structure and any other auxiliary data structure that could have been allocated by the device driver, usually in its probe function.


int (*suspend)(struct pci_dev *dev, pm_message_t state)


int (*resume)(struct pci_dev *dev)

Functions invoked by the PCI layer when the system goes into suspend mode and when it is resumed, respectively. See the later section "Power Management and Wake-on-LAN."


int (*enable_wake)(struct pci_dev *dev, u32 state, int enable)

With this function, a driver can enable or disable the capability of the device to wake the system up by generating specific Power Management Event signals. See the later section "Power Management and Wake-on-LAN."


struct pci_dynids dynids

Dynamic IDs. See the following section.

See the later section "Example of PCI NIC Driver Registration" for an example of initialization of a pci_driver instance.

 

6.2. Registering a PCI NIC Device Driver

PCI devices are uniquely identified by a combination of parameters, including vendor, model, etc. These parameters are stored by the kernel in a data structure of type pci_device_id, defined as follows:

struct pci_device_id {
     unsigned int vendor, device;
     unsigned int subvendor, subdevice;
     unsigned int class, class_mask;
     unsigned long driver_data;
};


 

Most of the fields are self-explanatory. vendor and device are usually sufficient to identify the device. subvendor and subdevice are rarely needed and are usually set to a wildcard value (PCI_ANY_ID). class and class_mask represent the class the device belongs to; NETWORK is the class that covers the devices we discuss in this chapter. driver_data is not part of the PCI ID; it is a private parameter used by the driver.

Each device driver registers with the kernel a vector of pci_device_id instances that lists the IDs of the devices it can handle.

PCI device drivers register and unregister with the kernel with pci_register_driver and pci_unregister_driver, respectively. These functions are defined in drivers/pci/pci.c. There is also pci_module_init, an alias for pci_register_driver. A few drivers still use pci_module_init, which is the name of the routine the kernel provided in older kernel versions before the introduction of pci_register_driver.

pci_register_driver requires a pci_driver data structure as an argument. Thanks to the pci_driver's id_table vector, the kernel knows what devices the driver can handle, and thanks to all the virtual functions that are part of pci_driver, the kernel has a mechanism to interact with any device that will be associated with the driver.

One of the great advantages of PCI is its elegant support for probing to find the IRQ and other resources each device needs. A module can be passed input parameters at load time to tell it how to configure all the devices for which it is responsible, but sometimes (especially with buses such as PCI) it is easier to let the driver itself check the devices on the system and configure the ones for which it is responsible. The user can still fall back on manual configuration if necessary.

The /sys filesystem exports information about system buses (PCI, USB, etc.), including the various devices and relationships between them. /sys also allows an administrator to define new IDs for a given device driver so that besides the static IDs registered by the drivers with their pci_driver structures' id_table vector, the kernel can use the user-configured parameters.

We will not cover the probing mechanism used by the kernel to look up a driver based on the device IDs. However, it is worth mentioning that there are two types of probing:


Static

Given a device PCI ID, the kernel can look up the right PCI driver (i.e., the pci_driver instance) based on the id_table vectors. This is called static probing.


Dynamic

This is a lookup based on IDs the user configures manually, a rare practice but one that is occasionally useful, as for debugging. Dynamic refers to the system administrator's ability to add an ID; it does not mean the ID can change on its own.

Since dynamic IDs are configured on a running system, they are useful only when the kernel is compiled with support for Hotplug.

 

6.3. Power Management and Wake-on-LAN

PCI power management events are processed by the suspend and resume functions of the pci_driver data structure. Besides taking care of the PCI state, by saving and restoring it, respectively, these functions need to take special steps in the case of NICs:

  • suspend mainly stops the device egress queue so that no transmission will be allowed on the device.

  • resume re-enables the egress queue so that the device is available again for transmissions.

Wake-on-LAN (WOL) is a feature that allows an NIC to wake up a system that's in standby mode when it receives a specific type of frame. WOL is normally disabled by default. The feature can be turned on and off with pci_enable_wake.

When the WOL feature was first introduced, only one kind of frame could wake up a system: "Magic Packets."[*] These special frames have two main characteristics:

[*] WOL was introduced by AMD with the name "Magic Packet Technology."

  • The destination MAC address belongs to the receiving NIC (whether the address is unicast, multicast, or broadcast).

  • Somewhere (anywhere) in the frame a sequence of 48 bits is set (i.e., FF:FF:FF:FF:FF:FF) followed by the NIC MAC address repeated at least 16 times in a row.

Now it is possible to allow other frame types to wake up the system, too. A handful of devices can enable or disable the WOL feature based on a parameter that can be set at module load time (see drivers/net/3c59x.c for an example).The ethtool tool allows an administrator to configure what kind of frames can wake up the system. One choice is ARP packets, as described in the section "Wake-on-LAN Events" in Chapter 28. The net-utils package includes a command, ether-wake, that can be used to generate WOL Ethernet frames.

Whenever a WOL-enabled device recognizes a frame whose type is allowed to wake up the system, it generates a power management notification that does the job.

For more details on power management, refer to the later section "Interactions with Power Management" in Chapter 8.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值