/** * dev_queue_xmit - transmit a buffer * @skb: buffer to transmit * 把一个要传送的网络缓存队列传送到一个网络接口。该函数的调用者必须 设置网络设备 * 及优先级,自己创建缓存。该函书可以被中断程序调用。 * Queue a buffer for transmission to a network device. The caller must * have set the device and priority and built the buffer before calling * this function. The function can be called from an interrupt. * 成功返回并不能保证包能正确的传输,因为它也许会因为网络问题而丢弃。 * A negative errno code is returned on a failure. A success does not * guarantee the frame will be transmitted as it may be dropped due * to congestion or traffic shaping. * * --------------------------------------------------------------------- -------------- * I notice this method can also return errors from the queue disciplines, * including NET_XMIT_DROP, which is a positive value. So, errors can also * be positive. * * Regardless of the return value, the skb is consumed, so it is currently * difficult to retry a send to this method. (You can bump the ref count * before sending to hold a reference for retry if you are careful.) * * When calling this method, interrupts MUST be enabled. This is because * the BH enable code must have IRQs enabled so that it will not deadlock. * --BLG */ int dev_queue_xmit(struct sk_buff *skb) { struct net_device *dev = skb->dev; struct Qdisc *q; int rc = -ENOMEM; /* GSO will handle the following emulations directly. */ //??????? if (netif_needs_gso(dev, skb)) goto gso; //检查skb_shinfo(skb)->frag_list 是否有值,如果有,但是网络设备接口不 支持skb 的碎片 //列表(NETIF_F_FRAGLIST),则需要把这些碎片重组到一个完整的skb 中 //(通过函数__skb_linearize)。 if (skb_shinfo(skb)->frag_list && !(dev->features & NETIF_F_FRAGLIST) && __skb_linearize(skb)) goto out_kfree_skb; /* Fragmented skb is linearized if device does not support SG, * or if at least one of fragments is in highmem and device * does not support DMA from it. */ //检查skb_shinfo(skb)->nr_frags,如果不为0,表示这个skb 使用了分散/聚 焦IO, //如果网络设备接口不支持(NETIF_F_SG),同样需要重新线性化(通过函数 __skb_linearize)。 if (skb_shinfo(skb)->nr_frags && (!(dev->features & NETIF_F_SG) || illegal_highdma(dev, skb)) && __skb_linearize(skb)) goto out_kfree_skb; /* If packet is not checksummed and device does not support * checksumming for this protocol, complete checksumming here. */ //检查是关于校验和的,需要注意的是这个校验和不是IP 首部的首部校验和, //IP 首部校验和在每个IP 数据报中是必需的,由软件来完成,对IP 首部以16bit 为 //段进行反码求和得到,只覆盖到IP 首部,而未覆盖到IP 数据。 //而这里的校验和是其上层协议(比如UDP)的校验和,它覆盖到上层协议的首部 和数据。 //struc sk_buff 有一个成员ip_summed,表示校验和的执行策略,其可能的取 值有三种, 􀁺 CHECKSUM_HW 表示由硬件来执行校验和, 􀁺 CHECKSUM_NONE 表示完全由软件来执行校验和, 􀁺 CHECKSUM_UNNECESSARY 表示没有必要执行校验和。 //对于新分配的一个skb,总是默认由软件来执行校验和,如果网络设备接口拥 有以下三个标志之一,并满足其它一些相关条件,就由硬件执行校验和: NETIF_F_IP_CSUM(硬件只能执行IPv4 上的TCP/UDP 协议的校验和), NETIF_F_NO_CSUM(硬件不需要执行校验和,比如环回设备), NETIF_F_HW_CSUM(硬件能执行所有数据报的校验和)。如果校验和由软件执行, 则在ip_generic_getfrag 拷贝应用数据的时候执行,计算得到的校验和存放在 skb->csum,由上层协议填写自己的协议首部时填入。否则,如果校验和由硬件 执行,则上层协议在填写自己的协议首部时,为skb->csum 填上自己首部中校 验和所处的位置,以备硬件生成校验和时可以找到这个位置填入。 //dev_queue_xmit 检查校验和,只是为了作一个补救措施,即:如果 skb->ip_summed==CHECKSUM_HW(由硬件执行校验和,即当前还未生成校验和), 但是网络设备接口的成员features 上没有标志NETIF_F_HW_CSUM, NETIF_F_NO_CSUM 或NETIF_F_IP_CSUM,即网络设备接口既没有表示不需要执行 校验和,也说明自己没有执行校验和的能力,或者,如果features 上有 NETIF_F_IP_CSUM,但是数据报又不是IP 协议的。这时候,还需要执行软件校 验和,dev_queue_xmit 就调用skb_checksum_help 补上这个校验和,并把 skb->ip_summed 设为CHECKSUM_NONE。 if (skb->ip_summed == CHECKSUM_PARTIAL) { skb_set_transport_header(skb, skb->csum_start - skb_headroom(skb)); if (!(dev->features & NETIF_F_GEN_CSUM) && (!(dev->features & NETIF_F_IP_CSUM) || skb->protocol != htons(ETH_P_IP))) if (skb_checksum_help(skb)) goto out_kfree_skb; } gso: spin_lock_prefetch(&dev->queue_lock); /* Disable soft irqs for various locks below. Also * stops preemption for RCU. */ rcu_read_lock_bh(); /* Updates of qdisc are serialized by queue_lock. * The struct Qdisc which is pointed to by qdisc is now a * rcu structure - it may be accessed without acquiring * a lock (but the structure may be stale.) The freeing of the * qdisc will be deferred until it's known that there are no * more references to it. * * If the qdisc has an enqueue function, we still need to * hold the queue_lock before calling it, since queue_lock * also serializes access to the device queue. */ //struct net_device 的成员qdisc 是一个发送队列, 缓冲等待网络设备进 行发送的skb //如果网络设备设置了这个队列,则把skb 加到这个队列中,并启动队列的 发送。 q = rcu_dereference(dev->qdisc); #ifdef CONFIG_NET_CLS_ACT skb->tc_verd = SET_TC_AT(skb->tc_verd,AT_EGRESS); #endif if (q->enqueue) { /* Grab device queue */ spin_lock(&dev->queue_lock); q = dev->qdisc; if (q->enqueue) { rc = q->enqueue(skb, q); qdisc_run(dev); spin_unlock(&dev->queue_lock); rc = rc == NET_XMIT_BYPASS ? NET_XMIT_SUCCESS : rc; goto out; } spin_unlock(&dev->queue_lock); } //否则如果网络设备没有设置了这个队列,则 /* The device has no queue. Common case for software devices: loopback, all the sorts of tunnels... Really, it is unlikely that netif_tx_lock protection is necessary here. (f.e. loopback and IP tunnels are clean ignoring statistics counters.) However, it is possible, that they rely on protection made by us here. Check this and shot the lock. It is not prone from deadlocks. Either shot noqueue qdisc, it is even simpler 8) */ //如果网络设备处于启用状态,则直接调用网络设备的输出函数进行发送, 但在发送前,还需要做一件事情,就是,如果有ETH_P_ALL 数据报类型被添 加 //到ptype_all 中来,则需要把数据报复制一份给这个数据报类型的接收函数, //因为该类型需要接收到所有的数据报,包括输出的数据报。 if (dev->flags & IFF_UP) { int cpu = smp_processor_id(); /* ok because BHs are off */ if (dev->xmit_lock_owner != cpu) { HARD_TX_LOCK(dev, cpu); if (!netif_queue_stopped(dev)) { rc = 0; if (!dev_hard_start_xmit(skb, dev)) { HARD_TX_UNLOCK(dev); goto out; } } HARD_TX_UNLOCK(dev); if (net_ratelimit()) printk(KERN_CRIT "Virtual device %s asks to " "queue packet!\n", dev->name); } else { /* Recursion is detected! It is possible, * unfortunately */ if (net_ratelimit()) printk(KERN_CRIT "Dead loop on virtual device " "%s, fix it urgently!\n", dev->name); } } rc = -ENETDOWN; rcu_read_unlock_bh(); out_kfree_skb: kfree_skb(skb); return rc; out: rcu_read_unlock_bh(); return rc; } |