在ip层进行初始化时,对每种协议都会定义一个结构,并将其注册到全局列表。在IP层收到数据,要将其提交到上一层,这时,根据上层协议的类型从全局列表中找到对应的结构,然后调用其中的函数对数据进行处理:
[ net/ipv4/af_inet.c ]
static const struct net_protocol tcp_protocol = {
.early_demux = tcp_v4_early_demux,
.handler = tcp_v4_rcv,
.err_handler = tcp_v4_err,
.no_policy = 1,
.netns_ok = 1,
.icmp_strict_tag_validation = 1,
};
其中tcp_v4_rcv就是在TCP层用来接收数据的函数。
当收到一个包后,先要对这个包的头部进行一些检测,首先要检查头部的长度是否正确,这时会用到下面的函数:
[ include/linux/skbuff.h ]
static inline int pskb_may_pull(struct sk_buff *skb, unsigned int len)
{
/* len <= skb主buffer的长度,大多数情况满足这一条件
*/
if (likely(len <= skb_headlen(skb)))
return 1;
/* len > skb的长度,这种情况不太容易出现
*/
if (unlikely(len > skb->len))
return 0;
/* 将分片中的数据拷贝到skb主buffer的尾部
* 如果主buffer中空间不足,会重新分配内存
* 所以此函数调用后所有指向skb内部数据的指针都会失效
*/
return __pskb_pull_tail(skb, len - skb_headlen(skb)) != NULL;
}
然后是检测校验和,有几个函数要用到:
[ include/linux/skbuff.h ]
static inline int skb_csum_unnecessary(const struct sk_buff *skb)
{
return skb->ip_summed & CHECKSUM_UNNECESSARY;
}
CHECKSUM_UNNECESSARY:
The hardware you're dealing with doesn't calculate the full checksum (as in CHECKSUM_COMPLETE), but it does parse headers and verify checksums for specific protocols e.g. TCP/UDP/SCTP, then, for such packets it will set CHECKSUM_UNNECESSARY if their checksums are okay. skb->csum is still undefined in this case though. It is a bad option, but, unfortunately, nowadays most vendors do this.
CHECKSUM_COMPLETE:
This is the most generic way. The device supplied checksum of the _whole_packet as seen by netif_rx() and fills out in skb->csum. Meaning, the hardware doesn't need to parse L3/L4 headers to implement this.
Note: Even if device supports only some protocols, but is able to produce skb->csum, it MUST use CHECKSUM_COMPLETE, not CHECKSUM_UNNECESSARY.
在TCP头部前面有一个12字节的pseudo header,分别为32位源地址,32位目的地址,8位保留位,8位协议类型,16位长度(头部和数据,不包含pseudo header)。计算校验和时要考虑pseudo header。有关联的函数为:
[ arch/x86/include/asm/checksum_32.h ]
/**
* csum_tcpup_nofold - Compute an IPv4 pseudo header checksum.
* @saddr: source address
* @daddr: destination address
* @len: length of packet
* @proto: ip protocol of packet
* @sum: initial sum to be added in (32bit unfolded)
*
* Returns the pseudo header checksum the input data. Result is
* 32bit unfolded.
*/
static inline __wsum
csum_tcpudp_nofold(__be32 saddr, __be32 daddr, unsigned short len,
unsigned short proto, __wsum sum)
{
/* 初始校验和 + 目的地址 + 源地址 + ((长度 + 协议)<< 8)
* 最后的<<8是左移8位,成为16位。因为计算校验和是以16位为单位的
*/
asm(" addl %1, %0\n"
" adcl %2, %0\n"
" adcl %3, %0\n"
" adcl $0, %0\n"
: "=r" (sum)
: "g" (daddr), "g" (saddr), "g" ((len + proto) << 8), "0" (sum));
return sum;
}
/**
* csum_fold - Fold and invert a 32bit checksum.
* sum: 32bit unfolded sum
*
* Fold a 32bit running checksum to 16bit and invert it. This is usually
* the last step before putting a checksum into a packet.
* Make sure not to mix with 64bit checksums.
*/
static inline __sum16 csum_fold(__wsum sum)
{
/* 32位的sum的高16位和低16位相加,对和取反
*/
__asm__(
" addl %1,%0\n"
" adcl $0xffff,%0"
: "=r" (sum)
: "r" ((__force u32)sum << 16),
"0" ((__force u32)sum & 0xffff0000)
);
return (__force __sum16)(~(__force u32)sum >> 16);
}
/*
* computes the checksum of the TCP/UDP pseudo-header
* returns a 16-bit checksum, already complemented
*/
static inline __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr,
unsigned short len,
unsigned short proto,
__wsum sum)
{
return csum_fold(csum_tcpudp_nofold(saddr,daddr,len,proto,sum));
}
/*
* Calculate(/check) TCP checksum
*/
static inline __sum16 tcp_v4_check(int len, __be32 saddr,
__be32 daddr, __wsum base)
{
/* IPPROTO_TCP = 6
*/
return csum_tcpudp_magic(saddr,daddr,len,IPPROTO_TCP,base);
}
以上函数会在接收下面函数中调用:
[ net/ipv4/tcp_ipv4.c ]
static __sum16 tcp_v4_checksum_init(struct sk_buff *skb)
{
const struct iphdr *iph = ip_hdr(skb);
if (skb->ip_summed == CHECKSUM_OMPLETE) {
/* 此时校验和己经计算过并保存在skb->csum
* 对TCP头部前面的12字节的pseudo header计算校验和
* 结果0说明校验和正确,设置校验和状态为CHECKSUM_UNNECESSARY
*/
if (!tcp_v4_check(skb->len, iph->saddr,
iph->daddr, skb->csum)) {
skb->ip_summed = CHECKSUM_UNNECESSARY;
return 0;
}
}
/* 初始化校验和
* 只是将pseudo header各位相加
*/
skb->csum = csum_tcpudp_nofold(iph->saddr, iph->daddr,
skb->len, IPPROTO_TCP, 0);
if (skb->len <= 76) {
/* 包的长度不超过76字节,对整个包计算校验和
*/
return __skb_checksum_complete(skb);
}
return 0;
}