《UNIX网络编程——Socket Networking API》（3rd，Vol1）读书笔记(2)【第二章】

最新推荐文章于 2022-04-29 09:10:42 发布

vingstar

最新推荐文章于 2022-04-29 09:10:42 发布

阅读量2.1k

点赞数

分类专栏： Unix 文章标签： Linux Unix 网络编程 API socket

本文链接：https://blog.csdn.net/vingstar/article/details/10827957

版权

Unix 同时被 2 个专栏收录

19 篇文章 0 订阅

订阅专栏

网络编程

8 篇文章 0 订阅

订阅专栏

chapter 2 The Transport Layer: TCP, UDP,and SCTP

(这一章从基础原理的角度进一步回顾了（如果你学过计算机网络的话）的基本内容，为接下里在后续部分介绍一些道理确定基本的术语、定义等）

2.1 Intro

Our goal is to provide enough detail from a network programming perspective to understand how to use the protocols and provide references to more detailed descriptions of their actual design, implementation, and history.

This chapter focuses on the transport layer: TCP, UDP, and Stream Control Transmission Protocol (SCTP).Most client/server applications use either TCP or UDP. SCTP is a newer protocol, originally designed for transport of telephony signaling across the Internet.

2.2 The big picture 总体结构

这里较为全面的介绍了各种网络协议。

IPv4	Internet Protocol version 4. IPv4, which we often denote as just IP, has been the workhorse protocol of the IP suite since the early 1980s. It uses 32-bit addresses (Section A.4). IPv4 provides packet delivery service for TCP, UDP, SCTP, ICMP, and IGMP.
IPv6	Internet Protocol version 6. IPv6 was designed in the mid-1990s as a replacement for IPv4. The major change is a larger address comprising 128 bits (Section A.5), to deal with the explosive growth of the Internet in the 1990s. IPv6 provides packet delivery service for TCP, UDP, SCTP, and ICMPv6. We often use the word "IP" as an adjective, as inIP layer andIP address, when the distinction between IPv4 and IPv6 is not needed.
TCP	Transmission Control Protocol. TCP is a connection-oriented protocol that provides a reliable, full-duplex byte stream to its users. TCP sockets are an example of stream sockets. TCP takes care of details such as acknowledgments, timeouts, retransmissions, and the like. Most Internet application programs use TCP. Notice that TCP can use either IPv4 or IPv6.
UDP	User Datagram Protocol. UDP is a connectionless protocol, and UDP sockets are an example ofdatagram sockets. There is no guarantee that UDP datagrams ever reach their intended destination. As with TCP, UDP can use either IPv4 or IPv6.
SCTP	Stream Control Transmission Protocol. SCTP is a connection-oriented protocol that provides a reliable full-duplex association. The word "association" is used when referring to a connection in SCTP because SCTP is multihomed, involving a set of IP addresses and a single port for each side of an association. SCTP provides a message service, which maintains record boundaries. As with TCP and UDP, SCTP can use either IPv4 or IPv6, but it can also use both IPv4 and IPv6 simultaneously on the same association.
ICMP	Internet Control Message Protocol. ICMP handles error and control information between routers and hosts. These messages are normally generated by and processed by the TCP/IP networking software itself, not user processes, although we show the`ping` and`traceroute` programs, which use ICMP. We sometimes refer to this protocol as ICMPv4 to distinguish it from ICMPv6.
IGMP	Internet Group Management Protocol. IGMP is used with multicasting (Chapter 21), which is optional with IPv4.
ARP	Address Resolution Protocol. ARP maps an IPv4 address into a hardware address (such as an Ethernet address). ARP is normally used on broadcast networks such as Ethernet, token ring, and FDDI, and is not needed on point-to-point networks.
RARP	Reverse Address Resolution Protocol. RARP maps a hardware address into an IPv4 address. It is sometimes used when a diskless node is booting.
ICMPv6	Internet Control Message Protocol version 6. ICMPv6 combines the functionality of ICMPv4, IGMP, and ARP.
BPF	BSD packet filter. This interface provides access to the datalink layer. It is normally found on Berkeley-derived kernels.
DLPI	Datalink provider interface. This interface also provides access to the datalink layer. It is normally provided with SVR4.

2.3 User Datagram Protocol （UDP）

UDP is a simple transport-layer protocol. It is described in RFC 768 [Postel 1980].The application writes a message to a UDP socket, which is thenencapsulated in a UDP datagram, which is then further encapsulated as an IP datagram, which is then sent to its destination. There is no guarantee that a UDP datagram will ever reach its final destination, that order will be preserved across the network, or that datagrams arrive only once.

The problem that we encounter with network programming using UDP is its lack of reliability. If a datagram reaches its final destination but the checksum detects an error, or if the datagram is dropped in the network, it is not delivered to the UDP socket and is not automatically retransmitted.

Each UDP datagram has a length. The length of a datagram is passed to the receiving application along with the data.

由于UDP是无连接的，所以其socket本身也有一定的特点，不是连接到某一个固定的节点后就不能在更改了，而是可以用同一个socket向不同的节点发送UDP报文。 We also say that UDP provides aconnectionless service, as there need not be any long-term relationship between a UDP client and server.For example, a UDP client can create a socket and send a datagram to a given server and then immediately send another datagram on the same socket to a different server. Similarly, a UDP server can receive several datagrams on a single UDP socket, each from a different client.

(Nevertheless, it is important to understand that many applications are built using UDP because the application exchanges small amounts of data and UDP avoids the overhead of TCP connection establishment and connection termination.)

2.4 Transmission Control Protocol (TCP)

First, TCP provides connections between clients and servers. A TCP client establishes a connection with a given server,exchanges data with that server across the connection, and then terminates the connection.

TCP also providesreliability. When TCP sends data to the other end, it requires an acknowledgment in return. If an acknowledgment is not received, TCP automatically retransmits the data and waits a longer amount of time. After some number of retransmissions, TCP will give up, with the total amount of time spent trying to send data typically between 4 and 10 minutes (depending on the implementation). it provides reliable delivery of dataor reliable notification of failure.

TCP contains algorithms to estimate theround-trip time (RTT) between a client and server dynamically so that it knows how long to wait for an acknowledgment.

TCP alsosequences the data by associating a sequence numberwith every byte that it sends. For example, assume an application writes 2,048 bytes to a TCP socket, causing TCP to send two segments, the first containing the data with sequence numbers 1–1,024 and the second containing the data with sequence numbers 1,025–2,048. (Asegment is the unit of data that TCP passes to IP.) If the segments arrive out of order, the receiving TCP will reorder the two segments based on their sequence numbers before passing the data to the receiving application. If TCP receives duplicate data from its peer (say the peer thought a segment was lost and retransmitted it, when it wasn't really lost, the network was just overloaded), it can detect that the data has been duplicated (from the sequence numbers), and discard the duplicate data.

TCP providesflow control. TCP always tells its peer exactly how many bytes of data it is willing to accept from the peer at any one time.This is called the advertisedwindow. At any time, the window is the amount of room currently available in the receive buffer, guaranteeing that the sender cannot overflow the receive buffer. The window changes dynamically over time: As data is received from the sender, the window size decreases, but as the receiving application reads data from the buffer, the window size increases. It is possible for the window to reach 0: when TCP's receive buffer for a socket is full and it must wait for the application to read data from the buffer before it can take any more data from the peer.

Finally, a TCP connection isfull-duplex. This means that an application can send and receive data in both directions on a given connection at any time. This means that TCP must keep track of state information such as sequence numbers and window sizes for each direction of data flow: sending and receiving. After a full-duplex connection is established, it can be turned into a simplex connection if desired

*2.5 SCTP

SCTP provides associations between clients and servers. SCTP also provides applications with reliability, sequencing, flow control, and full-duplex data transfer, like TCP. The word"association" is used in SCTP instead of "connection" to avoid the connotation that a connection involves communication between only two IP addresses. An association refers to a communication between two systems, which may involve more than two addresses due to multihoming.

SCTP is message-oriented. It provides sequenced delivery of individual records. Like UDP, the length of a record written by the sender is passed to the receiving application.

SCTP can provide multiple streams between connection endpoints, each with its own reliable sequenced delivery of messages. A lost message in one of these streams does not block delivery of messages in any of the other streams. This approach is in contrast to TCP, where a loss at any point in the single stream of bytes blocks delivery of all future data on the connection until the loss is repaired.

SCTP also provides a multihoming feature, which allows a single SCTP endpoint to support multiple IP addresses. This feature can provide increased robustness against network failure. An endpoint can have multiple redundant network connections, where each of these networks has a different connection to the Internet infrastructure. SCTP can work around a failure of one network or path across the Internet by switching to another address already associated with the SCTP association.

2.6 TCP connection establishment and termination

Three-Way Handshake

The following scenario occurs when a TCP connection is established:

The server must be prepared to accept an incoming connection. This is normally done by callingsocket,bind, andlisten and is called apassive open.
The client issues anactive open by callingconnect. This causes the client TCP to send a "synchronize" (SYN) segment, which tells the server the client's initial sequence number for the data that the client will send on the connection. Normally, there is no data sent with the SYN; it just contains an IP header, a TCP header, and possible TCP options (which we will talk about shortly).
The server must acknowledge (ACK) the client's SYN and the server must also send its own SYN containing the initial sequence number for the data that the server will send on the connection. The server sends its SYN and the ACK of the client's SYN in a single segment.
The client must acknowledge the server's SYN.

作者引用了一个非常巧妙的打电话过程的例子：The socket function is the equivalent of having a telephone to use. bind is telling other people your telephone number so that they can call you.listen is turning on the ringer so that you will hear when an incoming call arrives.connect requires that we know the other person's phone number and dial it.accept is when the person being called answers the phone. Having the client's identity returned byaccept (where the identify is the client's IP address and port number) is similar to having the caller ID feature show the caller's phone number. One difference, however, is thataccept returns the client's identity only after the connection has been established, whereas the caller ID feature shows the caller's phone number before we choose whether to answer the phone or not.

*TCP Options

在建立阶段双方还可以协商一些option，比如MSS option，确定自己可以接受的maximum segment size，或者window scale option或者timestamp option.

TCP connection termination

While it takes three segments to establish a connection,it takes four to terminate a connection

One application calls close first, and we say that this end performs theactive close. This end's TCP sends a FIN segment, which means it is finished sending data.
The other end that receives the FIN performs the passive close. The received FIN is acknowledged by TCP. The receipt of the FIN is also passed to the application as an end-of-file (after any data that may have already been queued for the application to receive), since the receipt of the FIN means the application will not receive any additional data on the connection.
Sometime later, the application that received the end-of-file willclose its socket. This causes its TCP to send a FIN.
The TCP on the system that receives this final FIN (the end that did the active close) acknowledges the FIN.

Since a FIN and an ACK are required in each direction, four segments are normally required. We use the qualifier "normally" because in some scenarios, the FIN in Step 1 is sent with data. Also, the segments in Steps 2 and 3 are both from the end performing the passive close and could be combined into one segment.

TCP State Transition Diagram

基本的共有11个不同的状态：

Watching the package

该图展示了完整的一个C/S模式TCP数据传输过程：

2.7 TIME_WAIT state

The duration that this endpoint remains in this state is twice themaximum segment lifetime (MSL), sometimes called 2MSL.

recommended value in RFC 1122 [Braden 1989] is 2 minutes, although Berkeley-derived implementations have traditionally used a value of 30 seconds instead

The MSL is the maximum amount of time that any given IP datagram can live in a network. We know this time is bounded because every datagram contains an 8-bit hop limit with a maximum value of 255.Although this is a hop limit and not a true time limit, the assumption is made that a packet with the maximum hop limit of 255 cannot exist in a network for more than MSL seconds.

可以解决在网络中路由等情况发生变化产生wandering duplicate的数据包时，TCP需要将其消灭。

There are two reasons for the TIME_WAIT state:

To implement TCP's full-duplex connection termination reliably
To allow old duplicate segments to expire in the network（通俗的说就是使得上一次已经结束的数据传输在网络中可能残留的数据也被flush干净，不会对下一次使用同样端口、地址的传输造成任何的干扰）

*2.8 介绍基本的SCTP Association Establishment and Termination过程

需要four-way handshake建立connection.

Unlike TCP, SCTP does not permit a "half-closed" association. When one end shuts down an association, the other end must stop sending new data. The receiver of the shutdown request sends the data that was queued, if any, and then completes the shutdown.

2.9 Port Number

自然，端口号也是一个重要的抽象概念，相当于定义了不同的数据运输“码头”。At any given time, multiple processes can be using any given transport: UDP, SCTP, or TCP. All three transport layers use 16-bit integerport numbers to differentiate between these processes.

When a client wants to contact a server, the client must identify the server with which it wants to communicate. TCP, UDP, and SCTP define a group ofwell-known ports to identify well-known services.

Clients, on the other hand, normally useephemeral ports, that is, short-lived ports. These port numbers are normally assigned automatically by the transport protocol to the client.Clients normally do not care about the value of the ephemeral port; the client just needs to be certain that the ephemeral port is unique on the client host. The transport protocol code guarantees this uniqueness.

The port numbers are divided into three ranges:

（1）已分配，公认端口：Thewell-known ports:0 through 1023. These port numbers are controlled and assigned by the IANA. When possible, the same port is assigned to a given service for TCP, UDP, and SCTP.

（2）已注册，待用端口： The registered ports: 1024 through 49151. These are not controlled by the IANA, but the IANA registers and lists the uses of these ports as a convenience to the community. When possible, the same port is assigned to a given service for both TCP and UDP

（3）动态私人可用端口：Thedynamicorprivateports, 49152 through 65535. The IANA says nothing about these ports.These are what we callephemeral ports. (The magic number 49152 is three-fourths of 65536.)

Socket pair

The socket pair for a TCP connection is the four-tuple that defines the two endpoints of the connection: the local IP address, local port, foreign IP address, and foreign port.A socket pair uniquely identifies every TCP connection on a network. For SCTP, an association is identified by a set of local IP addresses, a local port, a set of foreign IP addresses, and a foreign port.

The two values that identify each endpoint, an IP address and a port number, are often called asocket.

bind lets the application specify the local IP address and local port for TCP, UDP, and SCTP sockets.

2.10 TCP port numbers and Concurrent servers

When we specify the local IP address as an asterisk, it is called thewildcard character. If the host on which the server is running is multihomed (as in this example), the server can specify that it wants only to accept incoming connections that arrive destined to one specific local interface. This is a one-or-any choice for the server. The server cannot specify a list of multiple addresses. The wildcard local address is the "any" choice.the wildcard address was specified by setting the IP address in the socket address structure toINADDR_ANY before calling bind.

Notice that the connected socket uses the same local port (21) as the listening socket. Also notice that on the multihomed server, the local address is filled in for the connected socket (12.106.32.254) once the connection is established.

需要注意的是，一个具体的socket连接由socket pair一一确定，并不能由于目标地址和端口一致就实现所谓的多路复用。因为试想一下，我们打开一个浏览器，一台机器上可能会同时有多个进程去向同一台服务器请求同一类服务（端口相同），而具体的请求内容是不一样的，这时显然需要区分两者，服务器端需要将这两个请求当做两个不同的请求分别加以处理：

Notice from this example that TCP cannot demultiplex incoming segments by looking at just the destination port number. TCP must look at all four elements in the socket pair to determine which endpoint receives an arriving segment.

In Figure 2.14, we have three sockets with the same local port (21). If a segment arrives from 206.168.112.219 port 1500 destined for 12.106.32.254 port 21, it is delivered to the first child. If a segment arrives from 206.168.112.219 port 1501 destined for 12.106.32.254 port 21, it is delivered to the second child. All other TCP segments destined for port 21 are delivered to the original server with the listening socket.:

2.11 Buffer sizes and limitations

Many networks have an MTU which can be dictated by the hardware. For example, the Ethernet MTU is 1,500 bytes. Other datalinks, such as point-to-point links using the Point-to-Point Protocol (PPP), have a configurable MTU.

The smallest MTU in the path between two hosts is called thepath MTU. Today, the Ethernet MTU of 1,500 bytes is often the path MTU.

When an IP datagram is to be sent out an interface, if the size of the datagram exceeds the link MTU,fragmentation is performed by both IPv4 and IPv6. The fragments are not normallyreassembled until they reach the final destination. IPv4 hosts perform fragmentation on datagrams that they generate and IPv4 routers perform fragmentation on datagrams that they forward. But with IPv6, only hosts perform fragmentation on datagrams that they generate; IPv6 routers do not fragment datagrams that they are forwarding.

TCP output

Every TCP socket has a send buffer and we can change the size of this buffer with theSO_SNDBUF socket option (Section 7.5). When an application callswrite, the kernel copies all the data from the application buffer into the socket send buffer. If there is insufficient room in the socket buffer for all the application's data (either the application buffer is larger than the socket send buffer, or there is already data in the socket send buffer), the process is put to sleep. This assumes the normal default of a blocking socket. (We will talk about nonblocking sockets inChapter 16.) The kernel will not return from thewrite until the final byte in the application buffer has been copied into the socket send buffer. Therefore, the successful return from awrite to a TCP socket only tells us that we can reuse our application buffer. It doesnot tell us that either the peer TCP has received the data or that the peer application has received the data.

TCP takes the data in the socket send buffer and sends it to the peer TCP based on all the rules of TCP data transmission (Chapter 19 and 20 of TCPv1). The peer TCP must acknowledge the data, and as the ACKs arrive from the peer, only then can our TCP discard the acknowledged data from the socket send buffer. TCP must keep a copy of our data until it is acknowledged by the peer.

TCP sends the data to IP in MSS-sized or smaller chunks, prepending its TCP header to each segment, where the MSS is the value announced by the peer, or 536 if the peer did not send an MSS option. IP prepends its header, searches the routing table for the destination IP address (the matching routing table entry specifies the outgoing interface), and passes the datagram to the appropriate datalink. IP might perform fragmentation before passing the datagram to the datalink, but as we said earlier, one goal of the MSS option is to try to avoid fragmentation and newer implementations also use path MTU discovery. Each datalink has an output queue, and if this queue is full, the packet is discarded and an error is returned up the protocol stack: from the datalink to IP and then from IP to TCP. TCP will note this error and try sending the segment later. The application is not told of this transient condition.

UDP Output

Since UDP is unreliable, it does not need to keep a copy of the application's data and does not need an actual send buffer. (The application data is normally copied into a kernel buffer of some form as it passes down the protocol stack, but this copy is discarded by the datalink layer after the data is transmitted.)

The successful return from a write to a UDP socket tells us that either the datagram or all fragments of the datagram have been added to the datalink output queue. If there is no room on the queue for the datagram or one of its fragments,ENOBUFS is often returned to the application.

*2.12 Standard Internet Services

echo/telnet等

2.13 Protocol Usage by common Internet applications

The first two applications,ping and traceroute, are diagnostic applications that use ICMP.traceroute builds its own UDP packets to send and reads ICMP replies.

The three popular routing protocols demonstrate the variety of transport protocols used by routing protocols. OSPF uses IP directly, employing a raw socket, while RIP uses UDP and BGP uses TCP.