tcp套接字和udp套接字_通过TCP套接字在引擎盖下看http

tcp套接字和udp套接字

In software engineering we love abstractions. They take care of the tedious details and allow us to put our attention where it belongs. However, there is value in understanding how they do what they do (take this advice from Joel). Following this guidance, I decided to tackle an embarrassingly long-standing gap in my knowledge; translating tcp packets into HTTP responses.

在软件工程中,我们喜欢抽象。 他们会处理乏味的细节,并让我们将注意力放在它所属于的地方。 但是,了解他们如何做自己的工作是有价值的( 请采纳Joel的建议 )。 遵循这一指导,我决定解决我的知识方面一个令人尴尬的长期鸿沟; 将tcp数据包转换为HTTP响应。

This post goes through the steps of writing an HTTP client similar to curl. The steps are;

这篇文章经历了编写类似于curl的HTTP客户端的步骤。 步骤是;

  • Creating a socket

    创建一个套接字
  • Establishing a tcp connection

    建立TCP连接
  • Sending an http request

    发送http请求
  • Reading the http response

    阅读http响应

The code examples in this post are in C. The purpose of this choice is to be as close as possible to the system calls the kernel provides.

这篇文章中的代码示例位于C中。此选择的目的是尽可能接近内核提供的系统调用。

样板 (The Boilerplate)

Now that we are under the hood, we are exposed to the overhead of establishing a tcp connection. First we need to create a socket. Then use that to initiate a tcp handshake.

现在我们处于幕后,我们将面临建立tcp连接的开销。 首先,我们需要创建一个套接字。 然后使用它启动tcp握手

int sockfd = socket(AF_INET, SOCK_STREAM, 0);

The example above uses the socket constructor from the C standard library. The type `SOCK_STREAM` reflects the stream oriented nature of the tcp protocol. This will become relevant later.

上面的示例使用了C标准库中的套接字构造函数。 类型“ SOCK_STREAM”反映了tcp协议的面向流性质。 稍后将变得相关。

A web server is a machine that waits for clients to connect to it. We do that with the connect command. This command tells the kernel to initiate the handshake mentioned above. Handshake is a costly process. Hence http clients usually provide ways to optimize for it. Our naive implementation will not do that.

Web服务器是一台等待客户端连接到它的机器。 我们使用connect命令执行此操作。 该命令告诉内核启动上述握手。 握手是一个昂贵的过程。 因此,http客户端通常提供对其进行优化的方法 。 我们幼稚的实现将无法做到这一点。

int sockfd, portno; // port is 80
struct sockaddr_in serv_addr;
struct hostent *server;server = gethostbyname(“www.wikipedia.com");
if (server == NULL) {
fprintf(stderr,”ERROR, no such host\n”);
exit(0);
}bcopy((char *)server->h_addr,
(char *)&serv_addr.sin_addr.s_addr,
server->h_length);
serv_addr.sin_port = htons(portno);if (connect(sockfd, (struct sockaddr *) &serv_addr, sizeof(serv_addr)) < 0)
error(“ERROR connecting”);

What happens above is that; we resolve the host name to an address, we use that and the port number to create a socket address, we use the socket address to connect to the host. If the C language constructs are not familiar to you, do not worry. It is unlikely you’ll ever need to learn them. If you are a helpless curious, here are all the details you wish you didn’t ask for.

上面发生的是那件事; 我们将主机名解析为一个地址 ,我们使用该主机名和端口号创建一个套接字地址 ,然后使用套接字地址连接到主机。 如果您不熟悉C语言的构造,请不要担心。 您不太可能需要学习它们。 如果您对无奈感到好奇,那么这里是您不希望得到的所有细节

询问 (The Ask)

We just overcame the tcp entry barrier. We are ready for the first request. We will initiate the conversation with the most well known pick up line in the http playbook. The `GET /` request. Every server falls for that.

我们刚刚克服了TCP进入壁垒。 我们已经准备好第一个请求。 我们将使用http playbook中最知名的接听电话启动对话。 `GET /`请求。 每个服务器都为此而牺牲。

char get_req[] = “GET / HTTP/1.1\r\n\r\n”;
int byte_count = write(sockfd, get_req, strlen(get_req));
if (byte_count < 0) error(“ERROR writing to socket”);

Onto the fun part! Let’s start reading RFC 2616 to understand what is going on. The request string follows the structure identified in section 5. It doesn’t have any headers, which also indicates no message body. Hence the double `\r\n` (CRLF) marks the end of the request.

进入有趣的部分! 让我们开始阅读RFC 2616以了解发生了什么。 请求字符串遵循第5节中确定的结构。 它没有任何标题,也表示没有消息正文。 因此,双`\ r \ n`( CRLF )标记了请求的结束。

Next up is the write command. As well as the socket and the request string, this command expects the length of the input. In high level languages this is done for you (see python’s send method). In C, the command signature reflects the system call. This is why we are writing in C.

接下来是命令。 以及套接字和请求字符串,此命令还要求输入的长度。 在高级语言中,这已为您完成( 请参见python的send方法 )。 在C语言中,命令签名反映了系统调用。 这就是为什么我们用C语言编写。

回报 (The Return)

It is time to receive our very first http response. This is where the stream oriented nature of tcp makes things interesting.

现在是时候收到我们的第一个http响应了。 这就是tcp面向流的性质使事情变得有趣的地方。

In a data stream there is no notion of an individual message. Tcp does not provide boundaries between one chunk of data and another. Hence much like the write command, the read command as well expects a length (count) argument.

在数据流中,没有单独消息的概念。 Tcp不在一个数据块和另一个数据块之间提供边界。 因此,就像写命令一样, 命令也需要一个长度(计数)参数。

The count argument tells the kernel how much of the incoming byte stream the user wants to read. The kernel reads this much data from the beginning of the stream. It removes the data from the stream and returns it to the caller. The next read call repeats this operation, starting from the new beginning.

count参数告诉内核用户要读取多少传入的字节流。 内核从流的开头读取大量数据。 它从流中删除数据并将其返回给调用方。 下一个读取调用从新的起点开始重复此操作。

One common scenario is that the amount of data we are asking for has not arrived to the socket yet. In that case the kernel returns as much data as the socket has. It also returns the length of the data it read from the stream. It is up to the caller to handle the partial responses. One approach is to poll the socket until the data is available (or a timeout has been reached).

一种常见的情况是,我们要求的数据量尚未到达套接字。 在那种情况下,内核返回的数据与套接字的数据一样多。 它还返回从流中读取的数据的长度。 由调用方处理部分响应。 一种方法是轮询套接字,直到数据可用(或已达到超时)。

All right, let’s get back to our http response.

好吧,让我们回到我们的http响应。

欢迎HTTP (Welcome HTTP)

The challenge is that http messages have varying length. In order to understand how much we should read from the stream, we should inspect the http specification. Section 6 defines the http response structure as;

挑战在于http消息的长度各不相同。 为了了解应该从流中读取多少内容,我们应该检查http规范。 第6节将http响应结构定义为:

https://tools.ietf.org/html/rfc2616#section-6
https://tools.ietf.org/html/rfc2616#section-6 https://tools.ietf.org/html/rfc2616#section-6

In plain speak, http response has a status line, a headers section (might be empty) and a response body (optional). The headers section is separated from the message body by the same CRLF delimiter we used in the request. Also the status line and each header end with the same delimiter.

简而言之,http响应具有状态行,标头部分(可能为空)和响应主体(可选)。 标头部分通过我们在请求中使用的相同CRLF分隔符与消息主体分隔开。 状态行和每个标头都以相同的定界符结尾。

Message body does not end with a delimiter. Instead http protocol provides alternative ways to inform the client. One of them is the content-length header. Section 4–4 explains other methods.

邮件正文不以定界符结尾。 相反,http协议提供了通知客户端的替代方法。 其中之一是content-length标头第4–4节介绍了其他方法。

Ok, armed with this knowledge we can come up with a game plan for the http responses that include a content length header;

好的,有了这些知识,我们可以为http响应提供一个包含内容长度标头的游戏计划;

  • Read from the socket until the first delimiter, this is the status line

    从套接字读取直到第一个定界符,这是状态行
  • Read from the socket until the next delimiter, this is the first header

    从套接字读取直到下一个定界符,这是第一个标头
  • Repeat the last step until we come across two delimiters back to back, this is the end of the headers section

    重复最后一步,直到我们背对背遇到两个定界符,这是标题部分的结尾
  • Read the content length from the headers, read as much data as length suggests from the socket, this is the message body

    从标题读取内容长度,从套接字读取尽可能多的数据,这是消息正文
  • Mission accomplished, we read the exact whole message, not a byte less or more

    任务完成了,我们阅读了完整的消息,而不是少一个或多个字节

One last hurdle to tackle is reading the socket until a certain delimiter. Read command doesn’t provide such a functionality. Hence we have to read a fixed sized chunk. Search the data for the delimiter. If not found, load another chunk. At some point we will receive the delimiter.

要解决的最后一个障碍是读取套接字,直到确定定界符为止。 读取命令不提供此类功能。 因此,我们必须读取固定大小的块。 在数据中搜索定界符。 如果找不到,则加载另一个块。 在某些时候,我们将收到定界符。

Once the delimiter is found, we can process all the data up to that point. The implementation should keep a reference of the data that came after the delimiter. This data constitutes the beginning of the next structure in the response.

找到定界符后,我们就可以处理到该点为止的所有数据。 实现应保留定界符之后的数据引用。 此数据构成响应中下一个结构的开始。

Below is a simplified implementation;

下面是一个简化的实现;

void parse_headers(int sockfd, char *buffer, char *content_length) {
bool reached_message_body = false; while(!reached_message_body){
// append CHUNK_SIZE data to the buffer
load_buffer(sockfd, buffer, CHUNK_SIZE); // look for the delimiter index in the buffer,
// if not found return -1
int next_line_start = find_next_line(buffer); // loop while there is a delimiter and
// we have not reached to the message body yet
while(next_line_start != -1 && !reached_message_body) { log_header(buffer, next_line_start); // keep a reference of the content length header
// for reading the message body
copy_if_content_length(buffer, next_line_start,
content_length); // remove the parsed headers from the buffer
release_line(buffer, next_line_start); // check if we reached the start of the message body
// this looks for two delimiters back to back
reached_message_body = is_message_body_start(buffer); // we might have read multiple headers in one read.
// before we read more data check if
// there are any other headers we can consume
if(!reached_message_body) {
next_line_start = find_next_line(buffer);
}
}
}
}

Rest of the implementation is available in github at grandbora/knowledge_gap . It handles only the http responses that have content length header. Extending the functionality to cover chunked http responses is left as an exercise to the user. You are welcome.

其余实现可在github的grandbora / knowledge_gap上找到 。 它仅处理具有内容长度标头的http响应。 扩展功能以覆盖分块的http响应留给用户作为练习。 别客气。

Bye.

再见

翻译自: https://medium.com/@grandbora/looking-under-the-hood-http-over-tcp-sockets-952a944c99da

tcp套接字和udp套接字

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值