设备注册和初始化
设备注册
网络设备注册发生在下列情况:
- 加载NIC设备驱动程序
NIC设备驱动初始化时,该驱动程序控制的所有的NIC都会被注册。 - 插入可热插拔网络设备
前边章节知道加载PCI设备驱动程序导致pci_driver->probe函数执行,probe函数由驱动程序提供,并由该函数负责设备的注册。
设备注销
以下情况触发设备的注销:
- 卸载NIC设备驱动程序
仅仅针对那些以模块加载的驱动程序。不适用内建到内核的驱动程序。 - 删除可热插拔设备
分配net_device结构
内核使用alloc_etherdev_mqs函数分配struct net_device结构,该函数会调用alloc_netdev_mqs函数进行实际的分配。
传入的第一个参数是驱动程序扩充私有数据块区域大小,驱动程序可以用此区域存储驱动程序参数信息。
第二个时设备名称,在alloc_etherdev_mqs函数中生成网卡命名规则,为eth%d,。
setup函数参数用于初始化net_device的部分字段。
/**
* alloc_netdev_mqs - allocate network device
* @sizeof_priv: size of private data to allocate space for
* @name: device name format string
* @name_assign_type: origin of device name
* @setup: callback to initialize device
* @txqs: the number of TX subqueues to allocate
* @rxqs: the number of RX subqueues to allocate
*
* Allocates a struct net_device with private data area for driver use
* and performs basic initialization. Also allocates subqueue structs
* for each queue on the device.
*/
struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
unsigned char name_assign_type,
void (*setup)(struct net_device *),
unsigned int txqs, unsigned int rxqs)
一般会使用包裹函数对alloc_netdev_mqs进行包裹,比如Ethernet设备使用alloc_etherdev_mqs函数申请net_device。
struct net_device *alloc_etherdev_mqs(int sizeof_priv, unsigned int txqs,
unsigned int rxqs)
{
return alloc_netdev_mqs(sizeof_priv, "eth%d", NET_NAME_UNKNOWN,
ether_setup, txqs, rxqs);
}
NIC注册和注销架构
设备注册两个关键步骤
- 使用alloc_etherdev分配net_device结构,alloc_etherdev会为Ethernet设备通用参数做初始化。
- 调用register_netdev为函数注册。
设备注销两个关键步骤
- unregister_netdev函数将设备注销掉
- free_netdev将申请的netdev释放掉。
设备初始化
Ethernet设备在申请netdev时使用ether_setup函数初始化netdev中的某些字段。
header_ops包含操作L2链路层报文的函数。
/**
* ether_setup - setup Ethernet network device
* @dev: network device
*
* Fill in the fields of the device structure with Ethernet-generic values.
*/
void ether_setup(struct net_device *dev)
{
dev->header_ops = ð_header_ops;
dev->type = ARPHRD_ETHER;
dev->hard_header_len = ETH_HLEN;
dev->min_header_len = ETH_HLEN;
dev->mtu = ETH_DATA_LEN;
dev->min_mtu = ETH_MIN_MTU;
dev->max_mtu = ETH_DATA_LEN;
dev->addr_len = ETH_ALEN;
dev->tx_queue_len = DEFAULT_TX_QUEUE_LEN;
dev->flags = IFF_BROADCAST|IFF_MULTICAST;
dev->priv_flags |= IFF_TX_SKB_SHARING;
eth_broadcast_addr(dev->broadcast);
}
驱动程序初始化netdev_ops和ethtool_ops 两个字段。
netdev_ops包括管理网卡的可能的函数。
ethtool_ops 包括可选的网卡设备操作。
netdev->netdev_ops = &e100_netdev_ops;
netdev->ethtool_ops = &e100_ethtool_ops;
上面提到的函数很多不需要初始化,相关的函数指针时NULL,使用前需要判断。
net_device组织
net_device数据结构插入全局链表和两个hash表中。
dev_list将内核中的net_device通过链表的形式组织起来。
name_hlist将内核中的net_device通过以网卡Name为key的HASH表组织起来。
index_hlist将内核中的net_device通过以网卡ifindex为key的HASH表组织起来。
这些不同的结构让内核按需求查找net_device结构。
struct hlist_node name_hlist;
struct hlist_node index_hlist;
struct list_head dev_list;
设备状态
net_device结构中和设备状态有关的字段:
unsigned long state;//@state:Generic network queuing layer state, see netdev_state_t
unsigned int flags;//@flags:Interface flags (a la BSD)
enum { NETREG_UNINITIALIZED=0,
NETREG_REGISTERED, /* completed register_netdevice */
NETREG_UNREGISTERING, /* called unregister_netdevice */
NETREG_UNREGISTERED, /* completed unregister todo */
NETREG_RELEASED, /* called free_netdev */
NETREG_DUMMY, /* dummy device for NAPI poll */
} reg_state:8;//Register/unregister state machine
队列规则状态
每个网络设备都会被分配一种队列规则,流量控制使用这种队列规则实现QoS机制。net_device结构的state字段是流量控制使用的字段之一。
state可以设置以下标识:
-
__LINK_STATE_START
设备开启,可以由函数netif_running检测。 -
__LINK_STATE_PRESENT
设备存在,可热插拔设备可以暂时删除。当系统进入挂起模式然后重新继续运行时,此标志也会被清除然后再取回值。 -
__LINK_STATE_NOCARRIER
NIC接口没有载波,网口处于down的状态。 -
__LINK_STATE_LINKWATCH_PENDING
-
__LINK_STATE_DORMANT
/* These flag bits are private to the generic network queueing
* layer; they may not be explicitly referenced by any other
* code.
*/
enum netdev_state_t {
__LINK_STATE_START,
__LINK_STATE_PRESENT,
__LINK_STATE_NOCARRIER,
__LINK_STATE_LINKWATCH_PENDING,
__LINK_STATE_DORMANT,
};
注册状态
网络设备的注册状态存储在reg_state字段中。
enum { NETREG_UNINITIALIZED=0,
NETREG_REGISTERED, /* completed register_netdevice */
NETREG_UNREGISTERING, /* called unregister_netdevice */
NETREG_UNREGISTERED, /* completed unregister todo */
NETREG_RELEASED, /* called free_netdev */
NETREG_DUMMY, /* dummy device for NAPI poll */
} reg_state:8;
设备的注册和注销
网络设备的驱动程序通过register_netdev和unregister_netdev函数向内核注册和注销设备。
设备注册
register_netdev会调用register_netdevice进一步的处理。
register_netdevice会使用dev_get_valid_name为网卡完成命名。alloc_etherdev_mqs在申请net_device时,网卡的名字初始化为"eth%d",在dev_get_valid_name将%d修改为网口编号。
如果dev->netdev_ops->ndo_init设置了回调函数则需要调用该函数。
向通知链发送网卡注册消息。
向sysfs注册网卡信息。
标记这个net_device的注册状态为NETREG_REGISTERED。
ret = dev_get_valid_name(net, dev, dev->name);
if (ret < 0)
goto out;
/* Init, if this function is available */
if (dev->netdev_ops->ndo_init) {
ret = dev->netdev_ops->ndo_init(dev);
if (ret) {
if (ret > 0)
ret = -EIO;
goto out;
}
}
...
ret = call_netdevice_notifiers(NETDEV_POST_INIT, dev);
ret = notifier_to_errno(ret);
if (ret)
goto err_uninit;
ret = netdev_register_kobject(dev);
if (ret) {
dev->reg_state = NETREG_UNREGISTERED;
goto err_uninit;
}
dev->reg_state = NETREG_REGISTERED;
函数list_netdevice负责将该net_device放入全局链表和两个hash表中。
list_netdevice(dev);
static void list_netdevice(struct net_device *dev)
{
struct net *net = dev_net(dev);
ASSERT_RTNL();
write_lock_bh(&dev_base_lock);
list_add_tail_rcu(&dev->dev_list, &net->dev_base_head);
hlist_add_head_rcu(&dev->name_hlist, dev_name_hash(net, dev->name));
hlist_add_head_rcu(&dev->index_hlist,
dev_index_hash(net, dev->ifindex));
write_unlock_bh(&dev_base_lock);
dev_base_seq_inc(net);
}
通过函数dev_init_scheduler初始化设备的队列规则,实现Qos功能。队列规则定义出口报文如何进入、退出出口队列的规则。定义开始丢掉报文前有多少报文可以在队列中等。
netdev_run_todo
register_netdevice函数负责一部分注册工作,然后在让netdev_run_todo完成其余的工作。
对net_device结构的修改需要rtnl_mutex(Rounting Netlink)信号量的保护。所以在调用register_netdevice函数之前需要先调用rtnl_lock_killable锁定该信号量,并在完成后释放该信号量。
int register_netdev(struct net_device *dev)
{
int err;
if (rtnl_lock_killable())
return -EINTR;
err = register_netdevice(dev);
rtnl_unlock();
return err;
}
rtnl_unlock函数中调用netdev_run_todo函数。为什么需要在释放锁的时候调用这个netdev_run_todo函数呢?
void rtnl_unlock(void)
{
/* This fellow will unlock it for us. */
netdev_run_todo();
}
查看netdev_run_todo函数代码和注释可知,这样设计原因可以解决以下问题:
- 这样避免因删除sysfs objects时引起的热插拔事件通过keventd导致的和linkwatch的死锁。
- 因为我们运行时没有获得RTNL信号量,我们可以为了等待netdev的refcnt到0而安全的进入睡眠。我们必须在所有的注销事件完成后才能返回。
/* The sequence is:
*
* rtnl_lock();
* ...
* register_netdevice(x1);
* register_netdevice(x2);
* ...
* unregister_netdevice(y1);
* unregister_netdevice(y2);
* ...
* rtnl_unlock();
* free_netdev(y1);
* free_netdev(y2);
* * We are invoked by rtnl_unlock().
* This allows us to deal with problems:
* 1) We can delete sysfs objects which invoke hotplug
* without deadlocking with linkwatch via keventd.
* 2) Since we run with the RTNL semaphore not held, we can sleep
* safely in order to wait for the netdev refcnt to drop to zero.
* * We must not return until all unregister events added during
* the interval the lock was held have been completed.
*/
void netdev_run_todo(void)
{
struct list_head list;
/* Snapshot list, allow later requests */
list_replace_init(&net_todo_list, &list);
__rtnl_unlock();
/* Wait for rcu callbacks to finish before next phase */
if (!list_empty(&list))
rcu_barrier();
while (!list_empty(&list)) {
struct net_device *dev
= list_first_entry(&list, struct net_device, todo_list);
list_del(&dev->todo_list);
if (unlikely(dev->reg_state != NETREG_UNREGISTERING)) {
pr_err("network todo '%s' but state %d\n",
dev->name, dev->reg_state);
dump_stack();
continue;
}
dev->reg_state = NETREG_UNREGISTERED;
netdev_wait_allrefs(dev);
/* paranoia */
BUG_ON(netdev_refcnt_read(dev));
BUG_ON(!list_empty(&dev->ptype_all));
BUG_ON(!list_empty(&dev->ptype_specific));
WARN_ON(rcu_access_pointer(dev->ip_ptr));
WARN_ON(rcu_access_pointer(dev->ip6_ptr));
#if IS_ENABLED(CONFIG_DECNET)
WARN_ON(dev->dn_ptr);
#endif
if (dev->priv_destructor)
dev->priv_destructor(dev);
if (dev->needs_free_netdev)
free_netdev(dev);
/* Report a network device has been unregistered */
rtnl_lock();
dev_net(dev)->dev_unreg_count--;
__rtnl_unlock();
wake_up(&netdev_unregistering_wq);
/* Free network device */
kobject_put(&dev->dev.kobj);
}
}
设备注册状态通知
网络设备注册、注销、关闭、开启事件通过两个通知链传递
- netdev_chain
- Netlink的REMGRP_LINK多播群组
netdev_chain
设备注册和注销各个阶段都是通过这个通知链报告的。
内核通过register_netdevice_notifier和unregister_netdevice_notifier两个函数处理通知链。
通过call_netdevice_notifiers函数发送通知链信息,支持的信息如下:
/* netdevice notifier chain. Please remember to update netdev_cmd_to_name()
* and the rtnetlink notification exclusion list in rtnetlink_event() when
* adding new types.
*/
enum netdev_cmd {
NETDEV_UP = 1, /* For now you can't veto a device up/down */
NETDEV_DOWN,
NETDEV_REBOOT, /* Tell a protocol stack a network interface
detected a hardware crash and restarted
- we can use this eg to kick tcp sessions
once done */
NETDEV_CHANGE, /* Notify device state change */
NETDEV_REGISTER,
NETDEV_UNREGISTER,
NETDEV_CHANGEMTU, /* notify after mtu change happened */
NETDEV_CHANGEADDR,
NETDEV_GOING_DOWN,
NETDEV_CHANGENAME,
NETDEV_FEAT_CHANGE,
NETDEV_BONDING_FAILOVER,
NETDEV_PRE_UP,
NETDEV_PRE_TYPE_CHANGE,
NETDEV_POST_TYPE_CHANGE,
NETDEV_POST_INIT,
NETDEV_RELEASE,
NETDEV_NOTIFY_PEERS,
NETDEV_JOIN,
NETDEV_CHANGEUPPER,
NETDEV_RESEND_IGMP,
NETDEV_PRECHANGEMTU, /* notify before mtu change happened */
NETDEV_CHANGEINFODATA,
NETDEV_BONDING_INFO,
NETDEV_PRECHANGEUPPER,
NETDEV_CHANGELOWERSTATE,
NETDEV_UDP_TUNNEL_PUSH_INFO,
NETDEV_UDP_TUNNEL_DROP_INFO,
NETDEV_CHANGE_TX_QUEUE_LEN,
NETDEV_CVLAN_FILTER_PUSH_INFO,
NETDEV_CVLAN_FILTER_DROP_INFO,
NETDEV_SVLAN_FILTER_PUSH_INFO,
NETDEV_SVLAN_FILTER_DROP_INFO,
};
当其他子系统通过register_netdevice_notifier注册通知链时,该函数会将内核中已经存在的网卡信息重新回放给注册者。
这样新注册的系统也可以得知系统网卡的状态。
注册netdev_chain的内核组件有:
- 路由
- 防火墙
- 协议代码
- 虚拟设备
- RTnetlink
设备注销
要把设备注销,内核需要操作如下:
- 以dev_close关闭设备
- 释放所有的资源(IO IRQ 端口)
- 将全局链表和两个hash表中的netdevice指针删除。
- 一旦结构中的所有引用计数都释放后,将释放netdevice结构。
- 删除/proc/和sysfs下添加的文件。
unregister_netdev函数
unregister_netdev函数和register_netdev函数类似先调用rtnl_lock加锁。
void unregister_netdev(struct net_device *dev)
{
rtnl_lock();
unregister_netdevice(dev);
rtnl_unlock();
}
EXPORT_SYMBOL(unregister_netdev);
unregister_netdev调用unregister_netdevice_queue函数在内核中将设备移除。之后将剩余工作交给通过调用net_set_todo在rtnl_unlock调用时完成。
/**
* unregister_netdevice_queue - remove device from the kernel
* @dev: device
* @head: list
*
* This function shuts down a device interface and removes it
* from the kernel tables.
* If head not NULL, device is queued to be unregistered later.
*
* Callers must hold the rtnl semaphore. You may want
* unregister_netdev() instead of this.
*/
void unregister_netdevice_queue(struct net_device *dev, struct list_head *head)
{
ASSERT_RTNL();
if (head) {
list_move_tail(&dev->unreg_list, head);
} else {
rollback_registered(dev);
/* Finish processing unregister after unlock */
net_set_todo(dev);
}
}
rollback_registered函数负责实际的注销工作。
static void rollback_registered_many(struct list_head *head)
{
struct net_device *dev, *tmp;
LIST_HEAD(close_head);
BUG_ON(dev_boot_phase);
ASSERT_RTNL();
list_for_each_entry_safe(dev, tmp, head, unreg_list) {
/* Some devices call without registering
* for initialization unwind. Remove those
* devices and proceed with the remaining.
*/
if (dev->reg_state == NETREG_UNINITIALIZED) {
pr_debug("unregister_netdevice: device %s/%p never was registered\n",
dev->name, dev);
WARN_ON(1);
list_del(&dev->unreg_list);
continue;
}
dev->dismantle = true;
BUG_ON(dev->reg_state != NETREG_REGISTERED);
}
/* If device is running, close it first. */
list_for_each_entry(dev, head, unreg_list)
list_add_tail(&dev->close_list, &close_head);
dev_close_many(&close_head, true);
list_for_each_entry(dev, head, unreg_list) {
/* And unlink it from device chain. */
unlist_netdevice(dev);
dev->reg_state = NETREG_UNREGISTERING;
}
flush_all_backlogs();
synchronize_net();
list_for_each_entry(dev, head, unreg_list) {
struct sk_buff *skb = NULL;
/* Shutdown queueing discipline. */
dev_shutdown(dev);
dev_xdp_uninstall(dev);
/* Notify protocols, that we are about to destroy
* this device. They should clean all the things.
*/
call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
if (!dev->rtnl_link_ops ||
dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
skb = rtmsg_ifinfo_build_skb(RTM_DELLINK, dev, ~0U, 0,
GFP_KERNEL, NULL, 0);
/*
* Flush the unicast and multicast chains
*/
dev_uc_flush(dev);
dev_mc_flush(dev);
if (dev->netdev_ops->ndo_uninit)
dev->netdev_ops->ndo_uninit(dev);
if (skb)
rtmsg_ifinfo_send(skb, dev, GFP_KERNEL);
/* Notifier chain MUST detach us all upper devices. */
WARN_ON(netdev_has_any_upper_dev(dev));
WARN_ON(netdev_has_any_lower_dev(dev));
/* Remove entries from kobject tree */
netdev_unregister_kobject(dev);
#ifdef CONFIG_XPS
/* Remove XPS queueing entries */
netif_reset_xps_queues_gt(dev, 0);
#endif
}
synchronize_net();
list_for_each_entry(dev, head, unreg_list)
dev_put(dev);
}
引用计数
net_device只有在所有的引用计数都释放时才会被释放。
所以unregister_netdev调用后,引用计数不为0,不能讲net_device结构删除,内核必须等待内核其他部分将引用都释放为止。但是该设备注销后就不能再使用了,内核必须通知所有的引用持有者使其释放引用,通知过程也是通过向netdev_chain发送注销通知信息实现的。
上一小节说到rtnl_unlock函数调用netdev_run_todo,而netdev_run_todo会调用netdev_wait_allrefs。一直等待下去,知道net_device的引用计数为0。
netdev_wait_allrefs
netdev_wait_allrefs由一个循环组成,netdev_refcnt降为0时结束。
循环中没一秒发送一次NETDEV_UNREGISTER到netdev_chain通知链。
每隔10秒钟打印一次警告信息。
/**
* netdev_wait_allrefs - wait until all references are gone.
* @dev: target net_device
*
* This is called when unregistering network devices.
*
* Any protocol or device that holds a reference should register
* for netdevice notification, and cleanup and put back the
* reference if they receive an UNREGISTER event.
* We can get stuck here if buggy protocols don't correctly
* call dev_put.
*/
static void netdev_wait_allrefs(struct net_device *dev)
{
unsigned long rebroadcast_time, warning_time;
int refcnt;
linkwatch_forget_dev(dev);
rebroadcast_time = warning_time = jiffies;
refcnt = netdev_refcnt_read(dev);
while (refcnt != 0) {
if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
rtnl_lock();
/* Rebroadcast unregister notification */
call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
__rtnl_unlock();
rcu_barrier();
rtnl_lock();
if (test_bit(__LINK_STATE_LINKWATCH_PENDING,
&dev->state)) {
/* We must not have linkwatch events
* pending on unregister. If this
* happens, we simply run the queue
* unscheduled, resulting in a noop
* for this device.
*/
linkwatch_run_queue();
}
__rtnl_unlock();
rebroadcast_time = jiffies;
}
msleep(250);
refcnt = netdev_refcnt_read(dev);
if (refcnt && time_after(jiffies, warning_time + 10 * HZ)) {
pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d\n",
dev->name, refcnt);
warning_time = jiffies;
}
}
}
开启设备
设备一旦注册就可用了,大师除非由应用程序明确开启,否则还是无法传输和接收报文。开启设备由dev_open函数负责。
开启设备由下列人物要做:
- 调用驱动程序注册的dev->netdev_ops中的相关回调函数。
- 设置dev->state的__LINK_STATE_START标记。
- 设置dev->flags的IFF_UP标记。
- 调用dev_activate函数初始化流量控制使用的出口队列规则,然后启动看门狗定时器。如果没有配置流量控制,就指定默认的FIFO队列。
- 传送NETDEV_UP到Netdev_chain通知链
关闭设备
网络设备由dev_close负责,大概有以下任务要做
- 传送NETDEV_DOWN到netdev_chain通知链。
- 调用dev_deactivate_many函数关闭出口队列规则。设备将无法再用于传输数据,停止看门狗定时器。
- 清除dev->state的__LINK_STATE_START标记
- 清除dev->flags 的~IFF_UP标记
- 如果dev->netdev_ops->ndo_stop有定义,就调用该函数。
更新设备队列规则状态
和电源管理之间的交互
pci_driver结构的suspend和resume函数根据内核是否支持电源管理进行初始化。系统进入挂起状态时,执行设备驱动程序提供的suspend函数,让驱动程序采取动作,电源管理不影响netdevice->reg_state但是要更新netdevice->state结构。
挂起设备
挂起设备时调用suspend函数处理此事件,动作包括:
- 清除dev_state的__LINK_STATE_PRESENT标记。
- 如果设备已开启就调用netif_stop_queue关闭出口队列。防止再次传递数据包。
netif_device_detach函数负责处理
/**
* netif_device_detach - mark device as removed
* @dev: network device
*
* Mark device as removed from system and therefore no longer available.
*/
void netif_device_detach(struct net_device *dev)
{
if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) &&
netif_running(dev)) {
netif_tx_stop_all_queues(dev);
}
}
设备继续运行
resume函数负责设备继续运行,由netif_device_attach负责处理:
void netif_device_attach(struct net_device *dev)
{
if (!test_and_set_bit(__LINK_STATE_PRESENT, &dev->state) &&
netif_running(dev)) {
netif_tx_wake_all_queues(dev);
__netdev_watchdog_up(dev);
}
}
链接状态变更检测
当NIC设备驱动程序侦测到载波信号是否存在时,由NIC通知或者读取NIC寄存器得出。可以利用netif_carrier_on和netif_carrier_off通知内核。
链接状态变化情况:
- 网线插入或者拔出NIC
- 网线另一侧设备状态发生变化
设备驱动程序发现载波消失调用netif_carrier_off函数。函数会设置__LINK_STATE_NOCARRIER标记,并调用linkwatch_fire_event处理。
void netif_carrier_off(struct net_device *dev)
{
if (!test_and_set_bit(__LINK_STATE_NOCARRIER, &dev->state)) {
if (dev->reg_state == NETREG_UNINITIALIZED)
return;
atomic_inc(&dev->carrier_down_count);
linkwatch_fire_event(dev);
}
}
驱动程序检测到链接由载波时调用netif_carrier_on函数。清除__LINK_STATE_NOCARRIER标记,并调用linkwatch_fire_event函数。
void netif_carrier_on(struct net_device *dev)
{
if (test_and_clear_bit(__LINK_STATE_NOCARRIER, &dev->state)) {
if (dev->reg_state == NETREG_UNINITIALIZED)
return;
atomic_inc(&dev->carrier_up_count);
linkwatch_fire_event(dev);
if (netif_running(dev))
__netdev_watchdog_up(dev);
}
}
linkwatch_fire_event函数检查net_device->state字段是否有设置__LINK_STATE_LINKWATCH_PENDING标记,如果没有设置的话就调用linkwatch_add_event函数,该函数只是将dev->link_watch_list放到lweventlist链表结尾。
lweventlist链表中的设备载波发生了变化,即使发生了多次链表中也只有一个元素,因为持有的时net_device结构指针,
一旦net_device加入到了lweventlist链表或者linkwatch_urgent_event函数返回true,就需要把这个事件交给keventd_wq内核线程调度执行。
为了防止linkwatch_event执行过于频繁,其执行频率限制为每秒1次。
static void linkwatch_add_event(struct net_device *dev)
{
unsigned long flags;
spin_lock_irqsave(&lweventlist_lock, flags);
if (list_empty(&dev->link_watch_list)) {
list_add_tail(&dev->link_watch_list, &lweventlist);
dev_hold(dev);
}
spin_unlock_irqrestore(&lweventlist_lock, flags);
}
void linkwatch_fire_event(struct net_device *dev)
{
bool urgent = linkwatch_urgent_event(dev);
if (!test_and_set_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state)) {
linkwatch_add_event(dev);
} else if (!urgent)
return;
linkwatch_schedule_work(urgent);
}
linkwatch_schedule_work函数会调用工作队列workqueue_struct的调度函数,使workqueue注册时的回调函数linkwatch_event被调用。
static DECLARE_DELAYED_WORK(linkwatch_work, linkwatch_event);
static void linkwatch_schedule_work(int urgent)
{
unsigned long delay = linkwatch_nextevent - jiffies;
if (test_bit(LW_URGENT, &linkwatch_flags))
return;
/* Minimise down-time: drop delay for up event. */
if (urgent) {
if (test_and_set_bit(LW_URGENT, &linkwatch_flags))
return;
delay = 0;
}
/* If we wrap around we'll delay it by at most HZ. */
if (delay > HZ)
delay = 0;
/*
* If urgent, schedule immediate execution; otherwise, don't
* override the existing timer.
*/
if (test_bit(LW_URGENT, &linkwatch_flags))
mod_delayed_work(system_wq, &linkwatch_work, 0);
else
schedule_delayed_work(&linkwatch_work, delay);
}
linkwatch_event函数调用__linkwatch_run_queue函数。
在__linkwatch_run_queue函数会为link_watch_list上的每个设备调用linkwatch_do_dev函数。
linkwatch_do_dev函数中清除__LINK_STATE_LINKWATCH_PENDING标记,并向netdev_chain发送通知信息。
static void linkwatch_do_dev(struct net_device *dev)
{
/*
* Make sure the above read is complete since it can be
* rewritten as soon as we clear the bit below.
*/
smp_mb__before_atomic();
/* We are about to handle this device,
* so new events can be accepted
*/
clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state);
rfc2863_policy(dev);
if (dev->flags & IFF_UP) {
if (netif_carrier_ok(dev))
dev_activate(dev);
else
dev_deactivate(dev);
netdev_state_change(dev);
}
dev_put(dev);
}
static void __linkwatch_run_queue(int urgent_only)
{
struct net_device *dev;
LIST_HEAD(wrk);
/*
* Limit the number of linkwatch events to one
* per second so that a runaway driver does not
* cause a storm of messages on the netlink
* socket. This limit does not apply to up events
* while the device qdisc is down.
*/
if (!urgent_only)
linkwatch_nextevent = jiffies + HZ;
/* Limit wrap-around effect on delay. */
else if (time_after(linkwatch_nextevent, jiffies + HZ))
linkwatch_nextevent = jiffies;
clear_bit(LW_URGENT, &linkwatch_flags);
spin_lock_irq(&lweventlist_lock);
list_splice_init(&lweventlist, &wrk);
while (!list_empty(&wrk)) {
dev = list_first_entry(&wrk, struct net_device, link_watch_list);
list_del_init(&dev->link_watch_list);
if (urgent_only && !linkwatch_urgent_event(dev)) {
list_add_tail(&dev->link_watch_list, &lweventlist);
continue;
}
spin_unlock_irq(&lweventlist_lock);
linkwatch_do_dev(dev);
spin_lock_irq(&lweventlist_lock);
}
if (!list_empty(&lweventlist))
linkwatch_schedule_work(0);
spin_unlock_irq(&lweventlist_lock);
}
static void linkwatch_event(struct work_struct *dummy)
{
rtnl_lock();
__linkwatch_run_queue(time_after(linkwatch_nextevent, jiffies));
rtnl_unlock();
}
虚拟设备
虚拟设备使用场景
- Bonding接口
- VLAN接口