2.1 数据结构
前面第一章讲了总线、设备以及驱动方面的关系,也讲到了大多数网卡设备实际上是一个PCI设备。因此,本章就讲解网卡设备在注册时是如何注册到PCI总线上去的。在这里,以Intel的E100网卡驱动进行讲解。
前面讲到每个PCI设备都由一组参数唯一地标识,这些参数保存在结构体pci_device_id中,如下所示:
- struct pci_device_id {
- __u32 vendor, device; /* Vendor and device ID or PCI_ANY_ID*/
- __u32 subvendor, subdevice; /* Subsystem ID's or PCI_ANY_ID */
- __u32 class, class_mask; /* (class,subclass,prog-if) triplet */
- kernel_ulong_t driver_data; /* Data private to the driver */
- };
每个PCI设备驱动都有一个pci_driver变量,它描述了一个PCI驱动的信息,如下所示:
- struct pci_driver {
- struct list_head node;
- char *name;
- const struct pci_device_id *id_table; /* must be non-NULL for probe to be called */
- int (*probe) (struct pci_dev *dev, const struct pci_device_id *id); /* New device inserted */
- void (*remove) (struct pci_dev *dev); /* Device removed (NULL if not a hot-plug capable driver) */
- int (*suspend) (struct pci_dev *dev, pm_message_t state); /* Device suspended */
- int (*suspend_late) (struct pci_dev *dev, pm_message_t state);
- int (*resume_early) (struct pci_dev *dev);
- int (*resume) (struct pci_dev *dev); /* Device woken up */
- int (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable); /* Enable wake event */
- void (*shutdown) (struct pci_dev *dev);
- struct pci_error_handlers *err_handler;
- struct device_driver driver;
- struct pci_dynids dynids;
- int multithread_probe;
- };
每个PCI驱动中都有一个id_table成员变量,记录了当前这个驱动所能够进行驱动的那些设备的ID值。
对于E100网卡驱动来说,它的pci_driver变量定义为:
- static struct pci_driver e100_driver = {
- .name = DRV_NAME,
- .id_table = e100_id_table,
- .probe = e100_probe,
- .remove = __devexit_p(e100_remove),
- #ifdef CONFIG_PM
- /* Power Management hooks */
- .suspend = e100_suspend,
- .resume = e100_resume,
- #endif
- .shutdown = e100_shutdown,
- .err_handler = &e100_err_handler,
- };
里面e100_id_table就表示该E100驱动所能够支持的PCI设备的ID号,其定义为:
- #define INTEL_8255X_ETHERNET_DEVICE(device_id, ich) {\
- PCI_VENDOR_ID_INTEL, device_id, PCI_ANY_ID, PCI_ANY_ID, \
- PCI_CLASS_NETWORK_ETHERNET << 8, 0xFFFF00, ich }
- static struct pci_device_id e100_id_table[] = {
- INTEL_8255X_ETHERNET_DEVICE(0x1029, 0),
- INTEL_8255X_ETHERNET_DEVICE(0x1030, 0),
- …
- { 0, }
- };
当PCI层检测到一个PCI设备能够被某PCI驱动所支持时(这是通过函数pci_match_one_device来进行检测的),就会调用这个PCI驱动上的probe函数,在该函数中会对该特定的PCI设备进行一些具体的初始化等操作。比如对于E100设备驱动来说,其probe函数为e100_probe。在这个函数中,会对网卡设备进行初始化。
e100_probe主要就涉及到网卡设备net_device的初始化,我们现在先来关注一下从网卡注册一直到调用e100_probe这一个过程的整个流程。
2.2 E100初始化
E100驱动程序的初始化是在函数e100_init_module()中的,如下:
- static int __init e100_init_module(void)
- {
- if(((1 << debug) - 1) & NETIF_MSG_DRV) {
- printk(KERN_INFO PFX "%s, %s\n", DRV_DESCRIPTION, DRV_VERSION);
- printk(KERN_INFO PFX "%s\n", DRV_COPYRIGHT);
- }
- return pci_register_driver(&e100_driver);
- }
在这个函数中,调用了pci_register_driver()函数,对e100_driver这个驱动进行注册。
2.3 PCI注册
在前面我们已经看到,PCI的注册就是将PCI驱动程序挂载到其所在的总线的drivers链,同时扫描PCI设备,将它能够进行驱动的设备挂载到driver上的devices链表上来,这里,我们将详细地查看这整个流程的函数调用关系。
pci_register_driver()->__pci_register_driver()
- /**
- * __pci_register_driver - register a new pci driver
- * @drv: the driver structure to register
- * @owner: owner module of drv
- * @mod_name: module name string
- *
- * Adds the driver structure to the list of registered drivers.
- * Returns a negative value on error, otherwise 0.
- * If no error occurred, the driver remains registered even if
- * no device was claimed during registration.
- */
- int __pci_register_driver(struct pci_driver *drv, struct module *owner, const char *mod_name);
- 在函数中有几个初始化语句:
- drv->driver.name = drv->name;
- drv->driver.bus = &pci_bus_type;
- drv->driver.owner = owner;
- drv->driver.mod_name = mod_name;
即是将PCI设备中的driver变量的总线指向pci_bus_type这个总线描述符,同时设置驱动的名字等。
pci_bus_type定义如下:
- struct bus_type pci_bus_type = {
- .name = "pci",
- .match = pci_bus_match,
- .uevent = pci_uevent,
- .probe = pci_device_probe,
- .remove = pci_device_remove,
- .suspend = pci_device_suspend,
- .suspend_late = pci_device_suspend_late,
- .resume_early = pci_device_resume_early,
- .resume = pci_device_resume,
- .shutdown = pci_device_shutdown,
- .dev_attrs = pci_dev_attrs,
- };
然后再调用函数driver_register(&drv->driver);通过这个函数将这个PCI驱动中的struct device_driver driver成员变量注册到系统中去。
pci_register_driver()->__pci_register_driver()->driver_register()
driver_register()代码如下:
- /**
- * driver_register - register driver with bus
- * @drv: driver to register
- *
- * We pass off most of the work to the bus_add_driver() call,
- * since most of the things we have to do deal with the bus
- * structures.
- *
- * The one interesting aspect is that we setup @drv->unloaded
- * as a completion that gets complete when the driver reference
- * count reaches 0.
- */
- int driver_register(struct device_driver * drv)
- {
- if ((drv->bus->probe && drv->probe) ||
- (drv->bus->remove && drv->remove) ||
- (drv->bus->shutdown && drv->shutdown)) {
- printk(KERN_WARNING "Driver '%s' needs updating - please use bus_type methods\n", drv->name);
- }
- klist_init(&drv->klist_devices, NULL, NULL);
- init_completion(&drv->unloaded);
- return bus_add_driver(drv);
- }
klist_init()是为设备驱动的klist_devices成员进行初始化,这个klist_devices是一个对链表进行操作的包裹结构,它会链接这个驱动能够支持的那些设备。
最后就调用bus_add_driver()函数。这个函数的功能就是将这个驱动加到其所在的总线的驱动链上。
pci_register_driver()->__pci_register_driver()->driver_register()->bus_add_driver()->driver_attach()
在bus_add_driver()函数中,最重要的是调用driver_attach()函数,其定义如下:
- /**
- * driver_attach - try to bind driver to devices.
- * @drv: driver.
- *
- * Walk the list of devices that the bus has on it and try to
- * match the driver with each one. If driver_probe_device()
- * returns 0 and the @dev->driver is set, we've found a
- * compatible pair.
- */
- int driver_attach(struct device_driver * drv)
- {
- return bus_for_each_dev(drv->bus, NULL, drv, __driver_attach);
- }
该函数遍历这个驱动所在的总线上的所有设备,然后将这些设备与当前驱动进行匹配,以检测这个驱动是否能够支持某个设备,也即是将设备与驱动联系起来。
bus_for_each_dev函数是扫描在drv->bus这个总线上的所有设备,然后将每个设备以及当前驱动这两个指针传递给__driver_attach函数。
pci_register_driver()->__pci_register_driver()->driver_register()->bus_add_driver()->driver_attach()->__driver_attach()
__driver_attach()函数是将驱动与设备联系起来的函数。
- static int __driver_attach(struct device * dev, void * data)
- {
- struct device_driver * drv = data;
- /*
- * Lock device and try to bind to it. We drop the error
- * here and always return 0, because we need to keep trying
- * to bind to devices and some drivers will return an error
- * simply if it didn't support the device.
- *
- * driver_probe_device() will spit a warning if there
- * is an error.
- */
- if (dev->parent) /* Needed for USB */
- down(&dev->parent->sem);
- down(&dev->sem);
- if (!dev->driver)
- driver_probe_device(drv, dev);
- up(&dev->sem);
- if (dev->parent)
- up(&dev->parent->sem);
- return 0;
- }
在函数中有两条语句:
- if (!dev->driver)
- driver_probe_device(drv, dev);
也即是判断当前设备是否已经注册了一个驱动,如果没有注册驱动,则调用driver_probe_device()函数。
pci_register_driver()->__pci_register_driver()->driver_register()->bus_add_driver()->driver_attach()->__driver_attach()->driver_probe_device()
如下:
- /**
- * driver_probe_device - attempt to bind device & driver together
- * @drv: driver to bind a device to
- * @dev: device to try to bind to the driver
- *
- * First, we call the bus's match function, if one present, which should
- * compare the device IDs the driver supports with the device IDs of the
- * device. Note we don't do this ourselves because we don't know the
- * format of the ID structures, nor what is to be considered a match and
- * what is not.
- *
- * This function returns 1 if a match is found, an error if one occurs
- * (that is not -ENODEV or -ENXIO), and 0 otherwise.
- *
- * This function must be called with @dev->sem held. When called for a
- * USB interface, @dev->parent->sem must be held as well.
- */
- int driver_probe_device(struct device_driver * drv, struct device * dev)
- {
- struct stupid_thread_structure *data;
- struct task_struct *probe_task;
- int ret = 0;
- if (!device_is_registered(dev))
- return -ENODEV;
- if (drv->bus->match && !drv->bus->match(dev, drv))
- goto done;
- pr_debug("%s: Matched Device %s with Driver %s\n",
- drv->bus->name, dev->bus_id, drv->name);
- data = kmalloc(sizeof(*data), GFP_KERNEL);
- if (!data)
- return -ENOMEM;
- data->drv = drv;
- data->dev = dev;
- if (drv->multithread_probe) {
- probe_task = kthread_run(really_probe, data,
- "probe-%s", dev->bus_id);
- if (IS_ERR(probe_task))
- ret = really_probe(data);
- } else
- ret = really_probe(data);
- done:
- return ret;
- }
该函数首先会调用总线上的match函数,以判断当前的PCI驱动能否支持该PCI设备,如果可以,则继续往后面执行。
drv->bus->match函数也即是pci_bus_type中的match成员变量,它为pci_bus_match函数。
pci_register_driver()->__pci_register_driver()->driver_register()->bus_add_driver()->driver_attach()->__driver_attach()->driver_probe_device()->pci_bus_match()
- /**
- * pci_bus_match - Tell if a PCI device structure has a matching PCI device id structure
- * @dev: the PCI device structure to match against
- * @drv: the device driver to search for matching PCI device id structures
- *
- * Used by a driver to check whether a PCI device present in the
- * system is in its list of supported devices. Returns the matching
- * pci_device_id structure or %NULL if there is no match.
- */
- static int pci_bus_match(struct device *dev, struct device_driver *drv)
- {
- struct pci_dev *pci_dev = to_pci_dev(dev);
- struct pci_driver *pci_drv = to_pci_driver(drv);
- const struct pci_device_id *found_id;
- found_id = pci_match_device(pci_drv, pci_dev);
- if (found_id)
- return 1;
- return 0;
- }
pci_bus_match函数的作用就是将PCI设备与PCI驱动进行比较以检查该驱动是否能够支持这个设备。在函数的最前面是两个宏to_pci_dev和to_pci_driver。因为在函数执行的过程中,虽然最开始传进来的是pci_driver结构与pci_dev结构,但是在执行的时候却取了这两个结构体中的device_driver和device成员变量,所以现在就要通过这两个成员变量找到之前对应的pci_driver和pci_dev结构的地址。
#define to_pci_dev(n) container_of(n, struct pci_dev, dev)
#define to_pci_driver(drv) container_of(drv,struct pci_driver, driver)
这两个宏在<Linux Device Driver> 3rd书上有相应的讲解,这里也就是找到E100的pci_driver:e100_driver以及该网卡设备的pci_dev结构。现在就要对它们进行比较以看它们之间是否能够联系起来。这是通过函数pci_match_device实现的。
pci_register_driver()->__pci_register_driver()->driver_register()->bus_add_driver()->driver_attach()->__driver_attach()->driver_probe_device()->pci_bus_match()->pci_match_device()
- /**
- * pci_match_device - Tell if a PCI device structure has a matching PCI device id structure
- * @drv: the PCI driver to match against
- * @dev: the PCI device structure to match against
- *
- * Used by a driver to check whether a PCI device present in the
- * system is in its list of supported devices. Returns the matching
- * pci_device_id structure or %NULL if there is no match.
- */
- const struct pci_device_id *pci_match_device(struct pci_driver *drv,
- struct pci_dev *dev)
- {
- struct pci_dynid *dynid;
- /* Look at the dynamic ids first, before the static ones */
- spin_lock(&drv->dynids.lock);
- list_for_each_entry(dynid, &drv->dynids.list, node) {
- if (pci_match_one_device(&dynid->id, dev)) {
- spin_unlock(&drv->dynids.lock);
- return &dynid->id;
- }
- }
- spin_unlock(&drv->dynids.lock);
- return pci_match_id(drv->id_table, dev);
- }
pci_match_one_driver函数的作用是将一个PCI设备与PCI驱动进行比较,以查看它们是否相匹配。如果相匹配,则返回匹配的pci_device_id结构体指针。
此时,如果该PCI驱动已经找到了一个可以想符的PCI设备,则返回,然后再退回到之前的driver_probe_device函数中。在该函数最后将调用really_probe函数。将device_driver与device结构体指针作为参数传递到这个函数中。下面几行是调用驱动或者总线的probe函数来扫描设备。
pci_register_driver()->__pci_register_driver()->driver_register()->bus_add_driver()->driver_attach()->__driver_attach()->driver_probe_device()->really_probe()
在函数really_probe()中:
- if (dev->bus->probe) {
- ret = dev->bus->probe(dev);
- if (ret)
- goto probe_failed;
- } else if (drv->probe) {
- ret = drv->probe(dev);
- if (ret)
- goto probe_failed;
- }
此时的dev->bus为pci_bus_type,其probe函数则对应为:pci_device_probe。
pci_register_driver()->__pci_register_driver()->driver_register()->bus_add_driver()->driver_attach()->__driver_attach()->driver_probe_device()->really_probe()->pci_device_probe()
同样,在该函数中会获得当前的PCI设备的pci_dev结构体指针以及PCI驱动程序的pci_driver结构体指针。分别使用宏to_pci_dev和to_pci_driver。最后则调用函数__pci_device_probe。在该函数中还会调用函数pci_call_probe,这是最后的函数
pci_register_driver()->__pci_register_driver()->driver_register()->bus_add_driver()->driver_attach()->__driver_attach()->driver_probe_device()->really_probe()->pci_device_probe()->__pci_device_probe()->pci_call_probe()
在函数pci_call_probe里有一条语句:
- static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
- const struct pci_device_id *id)
- {
- int error;
- /* 省略 */
- error = drv->probe(dev, id);
在此处就调用了pci_driver的probe函数,对于这里的E100驱动来说,它的probe函数是最开始注册的e100_probe函数,在该函数中会完成对网卡设备net_device的初始化等操作。
- pci_register_driver()->__pci_register_driver()->driver_register()->bus_add_driver()->driver_attach()->__driver_attach()->driver_probe_device()->really_probe()->pci_device_probe()->__pci_device_probe()->pci_call_probe()->e100_probe()
到这里,我们对网卡驱动的PCI层的初始化分析就告一个段落了,剩下的部分就是网卡驱动对网卡设备本身的初始化等操作。
2.4 函数调用流程图
在这里,为网卡在PCI层的注册画了一个函数调用的流程图,能够更直观地展现网卡从注册到调用其自身的网卡初始化的这一个函数调用过程。
补充一下,贴一下open函数的调用的笔记:
86, ifconfig eth0 up 会导致 net_device->open被调用,内幕!
# strace ifconfig eth0 up 2>&1 |less -N
可以看到,它是先用sockfd = socket(AF_INET, SOCK_DGRAM, 0)生成一个sockfd文件描述符,
再ioctl(sockfd, SIOCSIFFLAGS, 加上IFF_UP标志)。 这样就导致了open方法的调用。
socket文件描述符都是用socket(2)系统调用生成的:
sys_socket() > sock_map_fd() > sock_attach_fd() :
dentry->d_op = &sockfs_dentry_operations;
...
init_file(file, sock_mnt, dentry, FMODE_READ|FMODE_WRITE, &socket_file_ops);
SOCK_INODE(sock)->i_fop = &socket_file_ops;
(回忆一下,这个是不是就类似于ext3_iget()里头对inode->i_fop的赋值?)
static const struct file_operations socket_file_ops = {
.owner = THIS_MODULE,
.llseek = no_llseek,
.aio_read = sock_aio_read,
.aio_write = sock_aio_write,
.poll = sock_poll,
.unlocked_ioctl = sock_ioctl,
#ifdef CONFIG_COMPAT
.compat_ioctl = compat_sock_ioctl,
#endif
.mmap = sock_mmap,
.open = sock_no_open, /* special open code to disallow open via /proc */
.release = sock_close,
.fasync = sock_fasync,
.sendpage = sock_sendpage,
.splice_write = generic_splice_sendpage,
.splice_read = sock_splice_read,
}
其unlocked_ioctl = sock_ioctl,那么我们沿着sock_ioctl走下去:
sock_ioctl() --switch到了default--> dev_ioctl() > --SIOCSIFFLAGS--> dev_ifsioc()
> dev_change_flags() :
if ((old_flags ^ flags) & IFF_UP) {IFF_UP/* Bit is different ? */
ret = ((old_flags & IFF_UP) ? dev_close : dev_open)(dev);
如果是设置了IFF_UP,就调用dev_open;如果是清除了IFF_UP,就调用dev_close。 看dev_open里的:
ret = dev->open(dev);
就在此时,struct net_device的open方法被调用。
2). remove函数何时被调用?
当pci_dev消失时(设备被拔出),或者module被rmmod时。
pci_unregister_driver() > driver_unregister() > driver_detach() > __device_release_driver():
if (dev->bus && dev->bus->remove)
dev->bus->remove(dev);
else if (drv->remove)
drv->remove(dev);
对pci设备来说,这里的dev->bus就是&pci_bus_type,参考1)中的定义,我们知道其remove函数是
pci_device_remove():
struct pci_dev * pci_dev = to_pci_dev(dev);
struct pci_driver * drv = pci_dev->driver;
if (drv) {
if (drv->remove)
drv->remove(pci_dev);
pci_dev->driver = NULL;
}
增加/删除一个PCI device时的情景。
(只有boot时的enumeration和hotplug两种情况可能导致设备出现与消失)
pci device的发现:
[ pci_scan_slot() > pci_scan_single_device() > pci_scan_device()
> pci_device_add() ]
pci_bus_add_devices() > pci_bus_add_device() > device_add() > bus_attach_device() :
int device_attach(struct device *dev)
{
int ret = 0;
down(&dev->sem);
if (dev->driver) {
ret = device_bind_driver(dev);
if (ret == 0)
ret = 1;
else {
dev->driver = NULL;
ret = 0;
}
} else {
ret = bus_for_each_drv(dev->bus, NULL, dev, __device_attach);
}
up(&dev->sem);
return ret;
}
也就是说,如果已经有了dev->driver这个值,那么就直接bind上去;如果没有,那么:
bus_for_each_drv() > __device_attach() > driver_probe_device() > really_probe()
此后发生的情形就和从pci_register_driver()一直调用到really_probe()的一样了。
remove:
=======
pci_remove_bus_device() > pci_destroy_dev() > pci_stop_dev() > device_unregister() >
device_del() > bus_remove_device() > device_release_driver() > __device_release_driver() :
static void __device_release_driver(struct device *dev)
{
struct device_driver *drv;
drv = dev->driver;
if (drv) {
driver_sysfs_remove(dev);
sysfs_remove_link(&dev->kobj, "driver");
if (dev->bus)
blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
BUS_NOTIFY_UNBIND_DRIVER,
dev);
if (dev->bus && dev->bus->remove)
dev->bus->remove(dev);
else if (drv->remove)
drv->remove(dev);
devres_release_all(dev);
dev->driver = NULL;
klist_remove(&dev->knode_driver);
}
}
注意,如果可以,首先尝试调用pci_bus_type的remove方法(a.k.a pci_device_remove),否则调用
device_driver的remove方法。