HTTP1.1认识chunked编码以及使用socket对chunked解码（Java）

最新推荐文章于 2022-10-26 18:31:50 发布

zuguorui

最新推荐文章于 2022-10-26 18:31:50 发布

阅读量2.1k

点赞数

分类专栏： Java socket chunk编解码文章标签： socket java chunk 网络

本文链接：https://blog.csdn.net/zuguorui/article/details/60145796

版权

Java 同时被 3 个专栏收录

1 篇文章 0 订阅

订阅专栏

socket

1 篇文章 0 订阅

订阅专栏

chunk编解码

1 篇文章 0 订阅

订阅专栏

HTTP1.1认识chunked编码以及使用socket对chunked解码（Java）

最近在恶补android的网络方面，练习分别使用socket和HttpURLConnection下载、上传文件。在HttpURLConnection上基本都没什么问题，然而HttpURLConnection封装得太好了，只是学会了使用这个还算不上学会网络。要想更深入地学习网络，就不可避免地要接触到socket了。然而在用socket时，第一次遇到了chunked编码，让我非常头疼。

在HTTP1.1的头部信息中有Content-Length这一项，它表明了即将传输的数据正文的大小，以字节为单位。这一般没什么问题，但是很多时候其实服务器无法预先知道你的数据的大小，这个时候就要遇到chunked编码。

chunked意味分块，表示服务器将会把数据分块传输，一般有chunked的信息的消息头部是这样的：

HTTP/1.1 200 OK
Date: Wed, 01 Mar 2017 02:31:20 GMT
Server: Apache/2.4.23 (Win64) PHP/5.6.25
X-Powered-By: PHP/5.6.25
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

可以看到，原本应该有的Content-Length不见了，变为了Transfer-Encoding。
我的服务器脚本是这样写的

<?php
/**
 * Created by PhpStorm.
 * User: zu
 * Date: 2017/2/22
 * Time: 14:50
 */

/**
下载文件的服务器脚本。 
 */


//判断post请求中是否有file_name这个变量，如果有就下载该文件
if(empty($_POST["file_name"]))
{
    echo "NO_FILE_NAME\n";
    print_r($_POST);
    exit();
}

/*由于文件是存储在windows系统上，文件名都是GB2312编码，所以还是要转换一下文件名，看存放资源的父文件夹在不在*/
$path = iconv("utf-8", "GB2312", "F:\\NetEaseMusic\\download\\");
if(!file_exists($path))
{
    echo "文件夹不存在\n";
    print_r($path);
    exit();
}

$file_map = array();
$files = scandir($path);
for($i = 0; $i < count($files); $i++ )
{
    $file_map[iconv("GB2312", "utf-8", $files[$i])] = $files[$i];
}

if(!array_key_exists($_POST["file_name"], $file_map))
{
    echo "FILE_KEY_NOT_FOUND\n";
    print_r($file_map);
    exit();
}

/*拼接出文件路径然后转码，用来寻找文件。*/
$path = iconv("utf-8", "GB2312", "F:\\NetEaseMusic\\download\\".$_POST["file_name"]);
//$path = "F:\\NetEaseMusic\\download\\".$file_map[$_POST["file_name"]];
if (!file_exists ( $path )) {
    echo "FILE_NOT_FOUND\n";
    echo "F:\\NetEaseMusic\\download\\".$_POST["file_name"]."\n";
    print($path);
    exit ();
}

/*下面的代码用于断点续传，如果客户端发送了有效的range信息，就从range开始发送文件，而不是从头开始*/
$file_size = filesize($path);

$begin = 0;
$end = 0;

if(isset($_SERVER["HTTP_RANGE"]))
{
    $temp = $_SERVER["HTTP_RANGE"];
    if(preg_match('/bytes=\h*(\d+)-(\d*)[\D.*]?/i',$temp, $matcher))
    {
        $begin = intval($matcher[0]);
        $end = intval($matcher[1]);
    }
}
if($begin == 0)
{
    header("HTTP/1.1 200 OK");
}else{
    header("HTTP/1.1 206 Partial Content");
}


//header("Content-type: application/octet-stream");
//header("content-length: ".$file_size);

//header("Accept-Ranges: bytes");
//header("Accept-Length:".$file_size);
//header("Content-Disposition: attachment; filename=".$path);

/*不停读取文件并将数据流发送出去*/
$file = fopen($path, "r");
fseek($file, $begin, 0);
while(!feof($file))
{
    echo fread($file, 1024);
}
exit();
?>

可以看出服务器在向客户端返回文件的时候，其实只是脚本一直在读文件并且把数据流传送给服务器。而服务器必然会有个缓存区域，用来存储脚本输出的内容。如果发送的数据小于缓存区域，那么服务器会计算数据大小并且自动在响应头中添加Content-Length；如果缓存区满而脚本仍然不断输出，并且此时脚本也没有通知服务器大小是多少，那么服务器就会使用chunked编码方式。也就是说，如果发送一个几kb的文本，那么服务器就会自动得出Content-Length并添加到响应头中。但是如果是几个G的大文件，在没有手动通知服务器Content-Length的情况下，就会使用chunked编码。
注意上面的脚本中，如果将header("content-length: ".$file_size);这一行取消注释，那么服务器就不会使用chunked编码，而且响应头中会包含刚才设置的content-length。

而在使用socket的情况下对chunked进行解码，原理说起来其实很简单，只是实际操作时需要考虑的情况比较多。下面先看chunked编码规则：

HTTP/1.1 200 OK\r\n
Date: Wed, 01 Mar 2017 02:31:20 GMT\r\n
Server: Apache/2.4.23 (Win64) PHP/5.6.25\r\n
X-Powered-By: PHP/5.6.25\r\n
Transfer-Encoding: chunked\r\n
Content-Type: text/html; charset=UTF-8\r\n
\r\n
chunked-length\r\n
chunked-body\r\n
...
chunked-length\r\n
chunked-body\r\n
0\r\n
\r\n

这是使用socket获得的完整的使用chunked编码的数据，保留了所有信息。可以看出来，报头仍然是以\r\n结尾的，接下来是数据。一个chunk块的结构则是

chunked-length\r\n
chunked-body\r\n

chunked-length是表示chunked-body的字节数量的十六进制数字，千万注意是十六进制。这个长度只包含chunked-body的长度，不包含前后跟的\r\n。最后在所有块都传输完毕后，会再传输一个空的块来通知客户端数据发送完毕。

要注意解码的时候不能以\r\n为判断依据，只能以chunked-length。因为也许正文中也含有\r\n。
我的目标是写一个能够在边下载边解码的程序。因此重点在于读取数据时发生的各种情况，比如可能会把chunked-length给截断、或者把\r\n给截断。这都会导致无法读取到正确的数据，而且是一步错步步错。下面上代码。

这是通过socket下载文件的方法，也可以解码chunked编码。

    private void downloadFileBySocketWithChunke(String urlString, String fileName)
    {

        try{
            /**拼接post请求，注意发送的post数据要进行编码，否则服务器无法识别到。而头部则可不编码*/
            StringBuilder sb = new StringBuilder();
            String data = URLEncoder.encode("file_name", "utf-8") + "=" +  URLEncoder.encode(fileName, "utf-8");
//            String data = URLEncoder.encode("file_name="+fileName, "utf-8");
            sb.append("POST " + urlString + " HTTP/1.1\r\n");
            sb.append("Host: 10.206.68.242\r\n");
            sb.append("Content-Type: application/x-www-form-urlencoded\r\n");
            sb.append("Content-Length: " + data.length() + "\r\n");
            sb.append("\r\n");
            sb.append(data + "\r\n");

            String temp = sb.toString();

//            sb.append( URLEncoder.encode("file_name", "utf-8") + "=" +  URLEncoder.encode(fileName, "utf-8") + "\r\n");

            System.out.println(temp);

            URL url = new URL(urlString);
            Socket socket = new Socket(url.getHost(), url.getPort());
            BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), "utf-8"));
            /**将post请求通过socket发送到服务器*/
            writer.write(sb.toString());
            writer.flush();


            File file = new File("./" + fileName);
            DataOutputStream out = null;
            DataInputStream in = null;
            try{
                out = new DataOutputStream(new FileOutputStream(file));
                in = new DataInputStream(socket.getInputStream());
                /**缓存从socket读取的数据的buffer*/
                byte[] buffer = new byte[1024];
                /**本次从socket读取了多少字节*/
                int readBytes = 0;
                /**从buffer中向外取数据时当前读取的位置*/
                int readPosition = 0;

                /**是否已经获得了头部信息*/
                boolean getHead = false;
                /**当前是否为chunk编码*/
                boolean chunked = false;
                /**缓存头部信息的*/
                StringBuilder headTemp = new StringBuilder();
                /**完整的头部信息*/
                String head = null;
                /**该chunk块的大小*/
                int chunkedSize = 0;
                /**该chunk块已经读取到的大小*/
                int readSize = 0;

                /**
                 * 用于存储上一轮读取中遗留的信息。假如在上一轮读取完了一个chunk块，要提取下一个chunk块的大小信息时，发现
                 * 这些信息时被截断了，那就要把这些信息存在这里，待下一轮从socket读取后，将这些信息拼接上去再分析。
                 * */
                byte[] chunkedSizeBuffer = new byte[32];
                /**存储在chunkedSizeBuffer里的有效信息的长度*/
                int chunkedSizeBit = 0;

                /**不断从socket中读取*/
                while((readBytes = in.read(buffer)) != -1)
                {
                    readPosition = 0;
                    /**没有获得头部就先获得头部，头部与正文以\r\n\r\n区分，\r的十六进制数字是0x0d，而\n是0x0a*/
                    if(!getHead)
                    {
                        /**找出一个byte[]在另一个byte[]中的序号，返回-1是没找到*/
                        int position = findByte(buffer,0, readBytes, new byte[]{0x0d,0x0a,0x0d,0x0a});
                        if(position == -1)
                        {
                            /**没找到证明这次读取的全都是头部信息，放入头部缓存*/
                            headTemp.append(new String(buffer));
                            continue;
                        }else
                        {
                            /**找到则分析头部信息*/
                            byte[] headBytes = new byte[position];
                            System.arraycopy(buffer, 0, headBytes, 0, position);
                            headTemp.append(new String(headBytes));
                            head = headTemp.toString();
                            getHead = true;
                            String[] infoList = head.split("\r\n");

                            if(!infoList[0].split(" ")[1].equals("200"))
                            {
                                throw new RuntimeException("连接失败，状态码为" + infoList[0].split(" ")[1]);
                            }

                            /**查询是否为chunked编码*/
                            for(String s : infoList)
                            {
                                if (s.toLowerCase().contains("chunked"))
                                {
                                    chunked = true;
                                    break;
                                }
                            }
                            /**将读取位置向后移4个字节，到达正文。*/
                            readPosition = position + 4;

                        }
                    }

                    if(!chunked)
                    {
                        /**如果不是chunk，直接写入*/
                        out.write(buffer, readPosition, readBytes - readPosition);
                    }else
                    {
                        while(readPosition < readBytes)
                        {
                            /**判断该chunk块是否已读取完毕，如果是，就要获取下一个chunk块的长度信息*/
                            if(chunkedSize == readSize)
                            {
//                                System.out.println("chunkedSize == readSize");
                                /**如果buffer中未读的字节数小于10，就判断这次是把chunk长度信息给截断了，就放在
                                 * chunkedSizeBuffer里等待下一轮读取后拼接再分析
                                 * */
                                if(readBytes - readPosition < 10)
                                {
//                                    System.out.println("readBytes - readPosition < 10");
                                    for(chunkedSizeBit = 0; chunkedSizeBit < readBytes - readPosition; chunkedSizeBit++)
                                    {
                                        chunkedSizeBuffer[chunkedSizeBit] = buffer[readPosition + chunkedSizeBit];
                                    }
                                    readPosition = readBytes;
//                                    System.out.println("readPosition = " + readPosition + ",readBytes = " + readBytes);
                                    continue;
                                }

                                /**如果chunkedSizeBit不为0，说明在上一轮分析中有被截断在遗留信息在chunkedSizeBuffer里，
                                 * 需要和这次buffer里的数据先拼接*/
                                if(chunkedSizeBit != 0)
                                {
//                                    System.out.println("chunkedSizeBit != 0, chunkedSizeBit = " + chunkedSizeBit);
                                    byte[] a = new byte[chunkedSizeBit + readBytes];
                                    System.arraycopy(chunkedSizeBuffer, 0, a, 0, chunkedSizeBit);
                                    System.arraycopy(buffer, 0, a, chunkedSizeBit, readBytes);
                                    buffer = a;
                                    readBytes = chunkedSizeBit + readBytes;
                                    chunkedSizeBit = 0;
                                }
                                /**判断下buffer里的下一个读取数据是否是长度前面的\r\n，如果是要剔除掉*/
                                if(buffer[readPosition] == 0x0d && buffer[readPosition + 1] == 0x0a)
                                {
//                                    System.out.println("buffer[readPosition] == 0x0d && buffer[readPosition + 1] == 0x0a");
                                    readPosition += 2;
                                }
                                /**获取长度，如果超出32个还未读取到长度后面的\r\n，就说明读取出错。*/
                                int count = 0;
                                byte r1 = 0;
                                byte n1 = 0;
                                byte[] size = new byte[32];
                                while((r1 = buffer[readPosition++]) != 0x0d)
                                {
                                    size[count] = r1;
                                    count++;
                                    if(count >= 32)
                                    {
                                        System.out.println("read /r error");
                                        System.out.println(new String(buffer, readPosition - count, readBytes - (readPosition - count)));
                                        return;
                                    }

                                }
                                if((n1 = buffer[readPosition++]) != 0x0a)
                                {
                                    System.out.println("read /n error");
                                    System.out.println(new String(buffer, readPosition - count, readBytes - (readPosition - count)));
                                    return;
                                }
//                                System.out.println("chunked size:" + new String(size, 0, count));
                                /**千万注意是十六进制*/
                                chunkedSize = Integer.parseInt(new String(size, 0, count), 16);
                                readSize = 0;

                            }
                            /**以下是将buffer里的内容按照长度写到文件里，分两种情况。需要注意的是readPosition和readSize要
                             * 及时变化。*/
                            if(readBytes - readPosition >= chunkedSize - readSize)
                            {
//                                System.out.println("readBytes - readPosition >= chunkedSize - readSize");
                                out.write(buffer, readPosition, chunkedSize - readSize);
                                readPosition += chunkedSize - readSize;
                                readSize = chunkedSize;
                            }else
                            {
//                                System.out.println("readBytes - readPosition < chunkedSize - readSize");
                                out.write(buffer, readPosition, readBytes - readPosition);
                                readSize += readBytes - readPosition;
                                readPosition = readBytes;
                            }
                        }

                    }


                }
                out.flush();
            }catch (Exception e1)
            {
                e1.printStackTrace();
            }finally {
                try{
                    if(in != null)
                    {
                        in.close();
                    }
                    if(out != null)
                    {
                        out.flush();
                        out.close();
                    }
                }catch (Exception e2)
                {
                    e2.printStackTrace();
                }
            }
            socket.close();

        }catch (Exception e)
        {
            e.printStackTrace();
        }

    }

这是获取一个byte[]在另一个byte[]中的索引的方法，很简单就不写注释了。

    private int findByte(byte[] src, byte[] mark)
    {
        return findByte(src, 0, src.length, mark);
    }

    private int findByte(byte[] src, int start, int length, byte[] mark)
    {
        if(length < mark.length || length > src.length)
        {
            return -1;
        }
        for(int i = start; i < length - mark.length; i++)
        {
            if(src[i] == mark[0])
            {
                for(int j = 0; j < src.length; j++)
                {
                    if(src[j + i] == mark[j])
                    {
                        if(j == mark.length - 1)
                        {
                            return i;
                        }
                    }else
                    {
                        break;
                    }
                }
            }
        }
        return -1;
    }

以上就是一个使用socket进行下载并支持chunked解码的例子。当然这个还不是最好的，实际使用过程中下载一个十几MB的文件时和直接使用socket不进行解码的速度差不多，不过下载一个1.74G的电影时会多几秒，当然这是我在本机实验下载的，传输速度很快的。如果是实用的话，网络环境应该会成为瓶颈而不是解码耗时。然而最快的还是HttpURLConnection，快非常多。当然这可能是socket的固有缺陷，毕竟用socket不解码时下载也很慢。
还有另一种思路，就是将buffer的大小设置为本个chunk块的大小，然后只要从socket中读数据并向里填即可，只要buffer满了就说明该块读取完毕。不会出现将chunk块截断的问题。如果时间充裕我会在下一篇博客中实现这种方法。