http 协议中分块传输编码(Chunked transfer encoding)介绍

最新推荐文章于 2024-03-29 10:10:46 发布

iteye_15309

最新推荐文章于 2024-03-29 10:10:46 发布

阅读量1.1k

点赞数

分类专栏：好工具推荐文章标签： netty 后端 epoll

本文链接：https://blog.csdn.net/iteye_15309/article/details/82542830

版权

好工具推荐专栏收录该内容

2 篇文章 0 订阅

订阅专栏

最近需要编写一个服务，其中调用到同事编写的服务；由于产品的用户数量比较多，所以需要同时发起好多个请求交给后端去处理。整个服务大概是这样一个流程：

建立连接，写指令，读取数据，结束操作。

后端需要操作缓存、DB，所以处理时间可能比较长。这种处理方式天生适合使用epoll来处理（这里还有另外一个原因，就是「epoll」总能戳中某些同事的G点，所以我要试试这东西到底是啥玩意），所以我使用python写了一个客户端，处理下来发现性能好得令人发指。

受到它的启发，我打算写一个使用epoll向后端服务发送接口的helper类，这样我就可以每次只启动一个（最多两个）进程来处理队列，向其他服务发送请求了。不过这时候就没有办法方便地使用urllib2了，所以需要自己写http的头。一开始进展颇顺利，不过后面遇到了一个问题：我不知道socket read到什么时候结束。

不知道socket read什么时候结束的原因是，我在后端服务，遇到了两种http返回。一种返回的头是这样的：

写道

< HTTP/1.1 200 OK
< Date: Mon, 07 Oct 2013 12:00:52 GMT
< Server: Apache/2.2.16 (Debian)
< X-Powered-By: PHP/5.3.3-7+squeeze17
< Vary: Accept-Encoding
< Content-Length: 51
< Content-Type: text/html

里面的 Content-Length 标明了返回的body的体积是多少，所以只要你记住这个值，读到对应长度的内容后即可关闭socket。这种处理方式比较简单。

不过我还遇到了另外一种头：

写道

< HTTP/1.1 200 OK
< Server: nginx/1.0.5
< Date: Mon, 07 Oct 2013 12:03:12 GMT
< Content-Type: text/html
< Transfer-Encoding: chunked
< Connection: keep-alive
< X-Powered-By: PHP/5.3.6-13ubuntu3.10

里面没有了「Content-Length 」，取而代之的是「Transfer-Encoding: chunked」，你可以去翻一下RFC（http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html） 3.6.1小节，里面详细定义了「Transfer-Encoding: chunked」的情况下，body是怎样构成的。我借用一下wiki(http://zh.wikipedia.org/wiki/%E5%88%86%E5%9D%97%E4%BC%A0%E8%BE%93%E7%BC%96%E7%A0%81)里的说法，是这样的：

写道

通常，HTTP应答消息中发送的数据是整个发送的，Content-Length消息头字段表示数据的长度。数据的长度很重要，因为客户端需要知道哪里是应答消息的结束，以及后续应答消息的开始。然而，使用分块传输编码，数据分解成一系列数据块，并以一个或多个块发送，这样服务器可以发送数据而不需要预先知道发送内容的总大小。通常数据块的大小是一致的，但也不总是这种情况。

简单的说，body中的数据是一块一块的，每一块的开头回独立标示出当前块的大小。当你遇到一个长度为0的「last-chunk」之后，说明数据传输已经结束了(原文是： The chunked encoding is ended by any chunk whose size is zero)。

本来我需要自己写代码来完成这部分解析的，后来想到urllib2的实现里面，应该又处理相关返回的代码，于是我在python的源码中，Lib/httplib.py下找到了对应的实现， HTTPResponse中又一个方法叫「_read_chunked」，它用50行代码完成了对应的工作。大概是这个样子的：

    def _read_chunked(self, amt):
        assert self.chunked != _UNKNOWN
        chunk_left = self.chunk_left
        value = []
        while True:
            if chunk_left is None:
                line = self.fp.readline(_MAXLINE + 1)
                if len(line) > _MAXLINE:
                    raise LineTooLong("chunk size")
                i = line.find(';')
                if i >= 0:
                    line = line[:i] # strip chunk-extensions
                try:
                    chunk_left = int(line, 16)
                except ValueError:
                    # close the connection as protocol synchronisation is
                    # probably lost
                    self.close()
                    raise IncompleteRead(''.join(value))
                if chunk_left == 0:
                    break
            if amt is None:
                value.append(self._safe_read(chunk_left))
            elif amt < chunk_left:
                value.append(self._safe_read(amt))
                self.chunk_left = chunk_left - amt
                return ''.join(value)
            elif amt == chunk_left:
                value.append(self._safe_read(amt))
                self._safe_read(2)  # toss the CRLF at the end of the chunk
                self.chunk_left = None
                return ''.join(value)
            else:
                value.append(self._safe_read(chunk_left))
                amt -= chunk_left

            # we read the whole chunk, get another
            self._safe_read(2)      # toss the CRLF at the end of the chunk
            chunk_left = None

        # read and discard trailer up to the CRLF terminator
        ### note: we shouldn't have any trailers!
        while True:
            line = self.fp.readline(_MAXLINE + 1)
            if len(line) > _MAXLINE:
                raise LineTooLong("trailer line")
            if not line:
                # a vanishingly small number of sites EOF without
                # sending the trailer
                break
            if line == '\r\n':
                break

        # we read everything; close the "file"
        self.close()

        return ''.join(value)

简单的说它的实现，就是使用 readline按行从socket读入body，