网络编程之Sockets Introduction(一)

前言

一直在想该怎么记录UNP篇,最后我想我要锻炼自己归纳总结的能力,抓重点,抓应用。

1. introdution

本章主要分析了地址结构体的相关API函数,及2进制和字符串形式的地址之间的转化函数,最后这些地址转换函数依赖于具体的IP协议,UNP中提出了自己的以sock_开头的地址转换函数,其不依赖于具体的协议。

  • These structures can be passed in two directions: from the process to the kernel, and from the kernel to the process.
  • The address conversion functions convert between a text representation of an address and the binary value that goes into a socket address structure.
  • these address conversion functions is that they are dependent on
    the type of address being converted: IPv4 or IPv6.

2. Socket Address Structures

2.1 IPv4 Socket Address Structure
struct in_addr {
    in_addr_t s_addr; /* 32-bit IPv4 address */
    /* network byte ordered */
};
struct sockaddr_in {
    uint8_t sin_len; /* length of structure (16) */
    sa_family_t sin_family; /* AF_INET */
    in_port_t sin_port; /* 16-bit TCP or UDP port number */
    /* network byte ordered */
    struct in_addr sin_addr; /* 32-bit IPv4 address */
    /* network byte ordered */
    char sin_zero[8]; /* unused */
};

UNP强调了以下几点:

  • Having a length field simplifies the handling of variable-length socket address structures.
  • Even if the length field is present, we need never set it and need never examine it, unless we are dealing with routing sockets (Chapter 18). It is used within the kernel by the routines that deal with socket address structures from various protocol families (e.g., the routing table code).
  • The POSIX specification requires only three members in the structure: sin_family, sin_addr, and sin_port.
  • Both the IPv4 address and the TCP or UDP port number are always stored in the structure in network byte order.
  • if serv is defined as an Internet socket address structure, then serv.sin_addr references the 32-bit IPv4 address as an in_addr structure, while serv.sin_addr.s_addr references the same 32-bit IPv4 address as an in_addr_t.
  • The sin_zero member is unused, but we always set it to 0 when filling in one of these structures.

显然第一点第二段的内容,不是很容易理解,要到以后的章节结合分析,但是目前只需要掌握其sin_len成员用于处理变长socket地址结构体。还有第4点,可以看到in_addr结构体只包含一个成员in_addr_t,这样引用其32位的地址就有两种方式。

2.2 Generic Socket Address Structure

struct sockaddr {
    uint8_t sa_len;
    sa_family_t sa_family; /* address family: AF_xxx value */
    char sa_data[14]; /* protocol-specific address */
};

之前在APUE也纳闷为什么在给socket函数传递地址结构体指针时,都要强制转换成sockaddr *类型,现在UNP给出了答案:

But any socket function that takes one of these pointers as an argument must deal with socket address structures from any of the supported protocol families. A problem arises in how to declare the type of pointer that is passed. With ANSI C, the solution is simple: void * is the generic pointer type. But, the socket functions predate ANSI C and the solution chosen in 1982 was to define a generic socket address structure in the sys/socket.h header.

为了更深刻的理解,UNP分别从编程人员和内核的角度思考generic socket address structures的意义,特别应注意内核的角度:

  • From an application programmer ’s point of view, the only use of these generic socket address structures is to cast pointers to protocol-specific structures.
  • From the kernel’s perspective, another reason for using pointers to generic socket address structures as arguments is that the kernel must take the caller’s pointer, cast it to a struct sockaddr *, and then look at the value of sa_family to determine the type of the structure.

2.3 IPv6 Socket Address Structure

struct in6_addr {
    uint8_t s6_addr[16]; /* 128-bit IPv6 address */
    /* network byte ordered */
};
#define SIN6_LEN /* required for compile-time tests */
struct sockaddr_in6 {
    uint8_t sin6_len; /* length of this struct (28) */
    sa_family_t sin6_family; /* AF_INET6 */
    in_port_t sin6_port; /* transport layer port# */
    /* network byte ordered */
    uint32_t sin6_flowinfo; /* flow information, undefined */
    struct in6_addr sin6_addr; /* IPv6 address */
    /* network byte ordered */
    uint32_t sin6_scope_id; /* set of interfaces for a scope */
};
  • The IPv6 family is AF_INET6, whereas the IPv4 family is AF_INET.
  • The sin6_flowinfo member is divided into two fields:
  • The low-order 20 bits are the flow label
  • The high-order 12 bits are reserved
  • The sin6_scope_id identifies the scope zone in which a scoped address is
    meaningful, most commonly an interface index for a link-local address.

第二条flow label的作用目前尚且不太清楚,第三条都更加难以理解,可以暂时放置下,等到具体的章节再次分析。

2.4 New Generic Socket Address Structure

struct sockaddr_storage {
    uint8_t ss_len; /* length of this struct (implementation dependent) */
    sa_family_t ss_family; /* address family: AF_xxx value */
    /* implementation-dependent elements to provide:
    * a) alignment sufficient to fulfill the alignment requirements of
    * all socket address types that the system supports.
    * b) enough storage to hold any type of socket address that the
    * system supports.
    */
};

UNP以Generic Socket Address StructureNew Generic Socket Address Structure的区别来介绍:

  • If any socket address structures that the system supports have alignment requirements, the sockaddr_storage provides the strictest alignment requirement.
  • The sockaddr_storage is large enough to contain any socket address structure that the system supports.

非常严格的地址对齐及足够大的空间容纳各种socket address structure,但是笔者觉得现阶段只要记住如何应用就行:

Note that the fields of the sockaddr_storage structure are opaque to the user, except for ss_family and ss_len (if present). The sockaddr_storage must be cast or copied to the appropriate socket address structure for the address given in ss_family to access any other fields.

2.5 Comparison of Socket Address Structures

下图的给出假设了两个前提,第一都包含length,第二其他成员都是按照POSIX要求的最低位数假设。

加个图

UNP中给出以下两点注意的地方:

  • To handle variable-length structures, whenever we pass a pointer to a socket address structure as an argument to one of the socket functions, we pass its length as another argument.
  • The sockaddr_un structure itself is not variable-length (Figure 15.1), but the amount of information — the pathname within the structure—is variable-length. When passing pointers to these structures, we must be careful how we handle the length field, both the length field in the socket address structure itself and the length to and from the kernel.

但是以下的强调的话,却让我不甚理解,感觉和上面矛盾。但是从下文的内容来看,应该以上面的话理解为主。

Had the length field been present with the original release of sockets, there would be no need for the length argument to all the socket functions: the third argument to bind and connect, for example. Instead, the size of the structure could be contained in the length field of the structure.

2.6 Value-Result Arguments

图一:

struct sockaddr_in serv;
/* fill in serv{} */
connect(sockfd, (SA *) &serv, sizeof(serv));

Three functions, bind, connect, and sendto, pass a socket address structure from the process to the kernel. One argument to these three functions is the pointer to the socket address structure and another argument is the integer size of the structure, Since the kernel is passed both the pointer and the size of what the pointer points to, it knows exactly how much data to copy from the process into the kernel.

这里和2.5部分对应,类似bind,connect,sento等函数,都是将地址结构体从进程空间传到内核空间,此时需要两个参数,一个是结构体本身,一个是结构体的大小。

图二:

struct sockaddr_un cli; /* Unix domain */
socklen_t len;
len = sizeof(cli); /* len is a value */
getpeername(unixfd, (SA *) &cli, &len);
/* len may have changed */

Four functions, accept, recvfrom, getsockname, and getpeername, pass a socket address structure from the kernel to the process. Two of the arguments to these four functions are the pointer to the socket address structure along with a pointer to an integer containing the size of the structure.

getpeername函数的第三个参数&len是指针,这样做的原因如下,其实很简单,但是UNP描述的比较复杂:

The reason that the size changes from an integer to be a pointer to an integer is because the size is both a value when the function is called (it tells the kernel the size of the structure so that the kernel does not write past the end of the structure when filling it in) and a result when the function returns (it tells the process how much information the kernel actually stored in the structure). This type of argument is called a value-result argument.

2.7 Byte Ordering Functions

加个图

The terms little-endian and big-endian indicate which end of the multibyte value, the little end or the big end, is stored at the starting address of the value.

需要注意MSB和LSB的定义:分别是最高或者最低比特位。

#include <netinet/in.h>
uint16_t htons(uint16_t host16bitvalue);
uint32_t htonl(uint32_t host32bitvalue);
Both return: value in network byte order
uint16_t ntohs(uint16_t net16bitvalue);
uint32_t ntohl(uint32_t net32bitvalue);
Both return: value in host byte order

比较方便的记住这些函数的方式:

In the names of these functions, h stands for host, n stands for network, s stands for short, and l stands for long.

使用这些函数并不需要关心实际数据存储的方式是大端还是小端,只要保证数据到本地时,调用ntohs,发送数据时调用htons

When using these functions, we do not care about the actual values (big-endian or little-endian) for the host byte order and the network byte order. What we must do is call the appropriate function to convert a given value between the host and network byte order. On those systems that have the same byte ordering as the Internet protocols (big-endian), these four functions are usually defined as null macros.

另外需要注意以下内容,现在可以先不深入但是提前了解:

Another important convention in Internet standards is bit ordering. In many Internet standards, you will see pictures of packets that look similar to the following (this is the first 32 bits of the IPv4 header from RFC 791):

这里写图片描述

This represents four bytes in the order in which they appear on the wire; the leftmost bit is the most significant. However, the numbering starts with zero assigned to the most significant bit.

2.8 Byte Manipulation Functions

#include <strings.h>
void bzero(void *dest, size_t nbytes);
void bcopy(const void *src, void *dest, size_t nbytes);
int bcmp(const void *ptr1, const void *ptr2, size_t nbytes);
Returns: 0 if equal, nonzero if unequal

以上函数用法如下:

bzero sets the specified number of bytes to 0 in the destination. We often use this function to initialize a socket address structure to 0. bcopy moves the specified number of bytes from the source to the destination. bcmp compares two arbitrary byte strings. The return value is zero if the two byte strings are identical; otherwise, it is nonzero.

#include <string.h>
void *memset(void *dest, int c, size_t len);
void *memcpy(void *dest, const void *src, size_t nbytes);
int memcmp(const void *ptr1, const void *ptr2, size_t nbytes);
Returns: 0 if equal, <0 or >0 if unequal (see text)

同样的上述API的用法如下:

memset sets the specified number of bytes to the value c in the destination. memcpy is similar to bcopy, but the order of the two pointer arguments is swapped. bcopy correctly handles overlapping fields, while the behavior of memcpy is undefined if the source and destination overlap. The ANSI C memmove function must be used when the fields overlap. memcmp compares two arbitrary byte strings and returns 0 if they are identical. If not identical, the return value is either greater than 0 or less than 0, depending on whether the first unequal byte pointed to by ptr1 is greater than or less than the corresponding byte pointed to by ptr2. The comparison is done assuming the two unequal bytes are unsigned chars.

3. 待深入问题

  • 第一个问题就是routing sockets的相关概念?
  • IPv6中flow label的作用和意义?
  • 2.5小节前后矛盾的描述 ?
  • 最后难以理解的Internet standards is bit ordering?
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值