TCP IP Socket In C, 2e - chapter 5 Sending and Reciving Data-CSDN博客

本文链接：https://blog.csdn.net/basil1728/article/details/105204819

Sending and Reciving Data

0. 介绍
1. 整数编码
2. 创建，分段以及解析信息（Constructing, Framing, and Parsing Messages）
- 2.1 分段（Framing）

0. 介绍

网络socket编程的出发点，无非是有和远端主机程序通信的需求。但是正所谓“无规矩，不成方圆”，通信也需要遵循一定的编码规则：需要知道是哪一个（which）程序发送了什么（what）信息，何时（when）发的，以及收到信息后程序的行为是怎样的（how）。这个规矩在网络的世界里称之为“协议（protocol）”，由应用实现的称之为“应用协议（application protocol）”。

TCP/IP协议族传送用户数据时是不检查、不修改数据的【即网络本身不会检查和修改，攻击者除外】。这就使得应用对信息有很大的编码灵活性。大多数应用层协议是由域的序列组成的离散的信息（discrete message made up of sequences of fields，即报文格式），但是其本身只是一串连续的字节序列。

当你构建一个通过Socket通信的程序的时候，你有两种选择：

自定义应用层协议，这具有很大的灵活性；
实现已有的应用层协议。

1. 整数编码

1.1 整数大小

C语言规范只是对于整数类型大小给了建议，而非强制实现：char永远是一个字节，short不小于char，int不小于short，long不小于int，long long不小于long。而鉴于**char永远是一个字节**，所以sizeof(char)是实际上的sizeof()的单位。那么C语言中一个字节是多少位呢？这个在标准库<limits.h>早已有了定义：CHAR_BIT=8。但是对于其他的整数类型则依实现而变。

C99语言标准规范给了一个解决方案：设定了以下类型：int8_t，int16_t，int32_t，int64_t，uint8_t，uint16_t，uint32_t，uint64_t，C99还定义了新的类型：long long。这对于在实现了C99标准的两台不同制式的主机上是有很大好处的。

这里可以使用TestSizes.c来测试一下你的机器上各个整型的位数以及C99规范实现情况。

1.2 字节序

大端序（big-endian）和小端序（little-endian）叫法起源于《格列佛游记》中吃鸡蛋先打破哪一端。大端序是符合人类得阅读习惯的，而小端序是符合计算机得处理习惯的。比方说0x1234，小端序是0x3412，而大端序是0x1234。为了在不同机器上处理收发数据，sender和receiver必须以同样的字节序处理。

大端序又称为网络字节序（network byte order），小端序又称为本地字节序或主机字节序（native byte order/host byte order）。有四个处理整数的字节序的方法：

方法名	解释
`htons()`	host to network short(16 bits)
`htonl()`	host to network long(32 bits)
`ntohs()`	network to host short(16 bits)
`ntohl()`	network to host long(32 bits)

这里请注意：Socket API 中使用的地址和端口都是网络字节序！！！

1.3 流包装TCP Sockets

编码多字节整型以用于在TCP Socket传输的一种方式是使用内置FILE-stream–类似sdtin，stdout等。

/*
 * fdopen(): 将流和一个已存在的文件描述符关联起来
 * 	这就像在网络上调用fopen()方法
 * @params:
 * 	socketdes: 	
 * 	mode:		和file descriptor兼容的下面的一种
 *		┌─────────────┬───────────────────────────────┐
 *      │     r       │ O_RDONLY                      │
 *      ├─────────────┼───────────────────────────────┤
 *      │     w       │ O_WRONLY | O_CREAT | O_TRUNC  │
 *      ├─────────────┼───────────────────────────────┤
 *      │     a       │ O_WRONLY | O_CREAT | O_APPEND │
 *      ├─────────────┼───────────────────────────────┤
 *      │     r+      │ O_RDWR                        │
 *      ├─────────────┼───────────────────────────────┤
 *      │     w+      │ O_RDWR | O_CREAT | O_TRUNC    │
 *      ├─────────────┼───────────────────────────────┤
 *      │     a+      │ O_RDWR | O_CREAT | O_APPEND   │
 *      └─────────────┴───────────────────────────────┘
 * @return:
 * 		Upon successful completion fopen(), fdopen() and freopen() return a FILE pointer. 
 * 		Otherwise, NULL is returned and errno is set to indicate the error.
 * 		【成功返回FILE指针，失败返回NULL并设置errno】
 */
FILE *fdopen(int socketdes, const char *mode);
int fclose(FILE* stream);
int fflush(FILE* stream);

/*
 * fwrite()：向流中写入指定数量的objects
 * fread()：从流中读取指定数量的objects
 */
size_t fwrite(const void* ptr, size_t size, size_t nmemb, FILE* stream);
size_t fread(void *ptr, size_t size, size_t nmemb, FILE* stream);

大致的程序结构:

发端

sock = socket(/* ... */);
/* ... connect socket ... */
// wrap the socket in an output stream
FILE* outstream = fdopen(sock, "w");
// send message, converting each object to network byte order before sending
if (fwrite(&val8, sizeof(val8), 1, outstream) != 1) ...
val16 = htons(val16);
if (fwrite(&val16, sizeof(val16), 1, outstream) != 1) ...
val32 = htonl(val32);
if (fwrite(&val32, sizeof(val32), 1, outstream) != 1) ...
val64 = htonll(val64);
if (fwrite(&val64, sizeof(val64), 1, outstream) != 1) ...
fflust(outstream);  // immediately flush stream buffer to socket
...					// do other work...
fclose(outstream);	// flushes stream and closes socket

收端

/* ... csock is connected ...*/
// wrap the socket in an input stream
FILE *instream = fdopen(csock, "r");
// receive message, converting each received object to host byte order
if (fread(&rcv8, sizeof(rcv8), 1, instream) != 1) ...
if (fread(&rcv16, sizeof(rcv16), 1, instream) != 1) ...
rcv16 = ntohs(rcv16); // convert to host order
if (fread(&rcv32, sizeof(rcv32), 1, instream) != 1) ...
rcv32 = ntohl(rcv32);
if (fread(&rcv64, sizeof(rcv64), 1, instream) != 1) ...
rcv64 = ntohll(rcv64);
...
fclose(instream); // closes the socket connection!

1.4 结构覆盖：对齐和填充

对齐规则：

Data structures are maximally aligned. That is, the address of any instance of a structure (including one in an array) will be divisible by the size of its largest native integer field.【数据结构最大对齐。即结构体的地址需能够被结构体中字节数最大的整型数整除。】
Fields whose type is a multibyte integer type are aligned to their size (in bytes). Thus, an int32_t integer field’s beginning address is always divisible by four, and a uint16_t integer field’s address is guaranteed to be divisible by two.【多字节整型类型的地址起始位是能够被它整除的】

为了满足上述对齐规则，C编译器需要能够在不同field之间填充（padding）。
填充例子

1.5 宽字符类型`wchar_t`

C标准没有字符串这一类型，所谓的字符串是一个字符序列，C语言中char类型的编码是ASCII（American Standard Code Information Interchange，美国信息交换标准代码），但是这个最多只能表示256种字符/符号，对于英文是足够的，但是对于例如汉字这种则是不够的。

C99标准定义了一种新的类型wchar_t，宽字符类型（wide character），用以保存超过一个字节的字符集合。

#include <stdlib.h>

/*
 * wcstombs(): converts the wide-character string pwcs to a multibyte string starting at str. At most n bytes are written to str.
 * @params:
 * 	str		This is the pointer to an array of char elements at least n bytes long.
 * 	pwcs	This is wide-character string to be converted.
 * 	n		This is the maximum number of bytes to be written to str.
 * @return:
 * 	the number of bytes (not characters) converted and written to str, excluding the ending null-character. 
 * 	If an invalid multibyte character is encountered, -1 value is returned.
 */
size_t wcstombs(char *restrict s, const wchar_t *restrict pwcs, size_t n);

/*
 * mbstowcs(): converts the string of multi-byte characters pointed to, by the argument str to the array pointed to by pwcs.
 * @params:
 * 	pwcs	This is the pointer to an array of wchar_t elements that is long enough to store a wide string max characters long.
 * 	str		This is the C multi-byte character string to be interpreted.
 * 	n		This is the maximum number of wchar_t characters to be interpreted.
 * @return:
 * 	the number of characters translated, excluding the ending null-character. 
 * 	If an invalid multi-byte character is encountered, a -1 value is returned.
 */
size_t mbstowcs(wchar_t *restrict pwcs, const char *restrict s, size_t n);

在使用宽字符类型时，需要sender和receiver协商一致整型如何编码为字节序列。

1.6 位操作

const int BIT5 = 1 << 5;
const int BIT7 = 0x80;
const int BITS2AND3 = 12; // 8 + 4
int bitmap = 128;

设置特定位（赋值1）

// bit 5 is now one
bitmap |= BIT5;

清除特定位（赋值0）

// bit 7 is now zero
bitmap &= ~BIT7;

同时设置和清除多个位

// clear bits 2, 3 and 5
bitmap &= ~(BITS2AND3 | BIT5);

检测特定位是否已经设置

/* 书中如此，但是C没有bool类型额 */
bool bit6Set = (bitmap & (1<<6)) != 0;

2. 创建，分段以及解析信息（Constructing, Framing, and Parsing Messages）

在这里介绍一个程序：投票程序。它有两个功能：

根据候选ID查询投票总数；
根据候选ID进行投票并返回投票总数。

投票示意图

2.1 分段（Framing）

Framing 是为了使得接收方可以确定一条信息的边界，即接收方可以知道是否已经收到了一个完整的信息。

在UDP中是不存在这个问题的，因为UDP要求一次性发送/接收一条信息，消息过大时它会在IP层分包。而使用TCP则不会造成IP分片，所以需要传输层分段，因为TCP没有标明信息边界的信息【这就是所谓的TCP粘包与分包问题】。

如果接收方尝试接收比一条消息更多的数据时，可能会有两种后果：

Socket中没有更多信息时，接收方会阻塞且不会处理已接收信息，如果此时发送方也阻塞等待确认，会造成死锁；
如果Socket信道中已有其它信息的数据，接收方会部分或全部读取该信息，接续到之前的信息，造成错误。

有两种方式可以使得接收方能够确认一条信息的边界：

基于定界符：消息结束使用特殊的标记。常用于文本信息。缺点是：特殊标记字节序列不能出现在消息内容中。
指定长度：使用length字段指定消息包含的字节数。简单，但是需要知道消息大小的上界。

投票协议包装也有两种格式：

文本，使用一种称之为魔力字符串（Magic string，“Voting”）标记这是Vote 协议所用的；
二进制位，各个位都有其特殊含义，更像是常见的协议了。

投票程序使用到的代码：

数据结构定义在VoteProtocol.h；
成帧的函数的定义在Framer.h，实现在LengthFramer.c，DelimFramer.c；
编解码的函数的定义在VoteEncoding.h，实现在VoteEncodingBin.c，VoteEncodingText.c；
Socket程序VoteClientTCP.c、VoteServerTCP.c。
其他程序：AddressUtility，DieWithMessage.c，TCPClientUtility.c，TCPServerUtility.c。