httpclient4 download file 下载文件 CRLF expected at end of chunk

最新推荐文章于 2023-05-29 11:30:00 发布

myvictoryhhb

最新推荐文章于 2023-05-29 11:30:00 发布

阅读量913

点赞数

分类专栏： java技术文章标签： java apache 大数据

本文链接：https://blog.csdn.net/myvictoryhhb/article/details/126921551

版权

java技术专栏收录该内容

3 篇文章 0 订阅

订阅专栏

原文发表于：https://www.52tect.com/java/2022/09/18/947.html

异常信息

前段时间，某业务场景需要调用第三方系统进行文件下载，结果发现文件下载的时候，CRLF expected at end of chunk。

org.apache.http.MalformedChunkCodingException: CRLF expected at end of chunk
	at org.apache.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:250) ~[httpcore-4.4.11.jar!/:4.4.11]
	at org.apache.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:222) ~[httpcore-4.4.11.jar!/:4.4.11]
	at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:183) ~[httpcore-4.4.11.jar!/:4.4.11]
	at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:210) ~[httpcore-4.4.11.jar!/:4.4.11]
	at org.apache.http.impl.io.ChunkedInputStream.close(ChunkedInputStream.java:312) ~[httpcore-4.4.11.jar!/:4.4.11]
	at org.apache.http.impl.execchain.ResponseEntityProxy.streamClosed(ResponseEntityProxy.java:142) ~[httpclient-4.5.8.jar!/:4.5.8]
	at org.apache.http.conn.EofSensorInputStream.checkClose(EofSensorInputStream.java:228) ~[httpclient-4.5.8.jar!/:4.5.8]
	at org.apache.http.conn.EofSensorInputStream.close(EofSensorInputStream.java:172) ~[httpclient-4.5.8.jar!/:4.5.8]
	at java.util.zip.InflaterInputStream.close(InflaterInputStream.java:227) ~[na:1.8.0_322]
	at java.util.zip.GZIPInputStream.close(GZIPInputStream.java:136) ~[na:1.8.0_322]
	at org.apache.http.client.entity.LazyDecompressingInputStream.close(LazyDecompressingInputStream.java:94) ~[httpclient-4.5.8.jar!/:4.5.8]

全选代码

复制

下载代码

HttpEntity httpEntity = null;
        InputStream fileInputStream =null;
    	ByteArrayOutputStream fileByteArrayOutputStream=null;
        try (
                CloseableHttpClient closeableHttpClient = httpClientBuilder.build();
                CloseableHttpResponse closeableHttpResponse = closeableHttpClient.execute(httpUriRequest)
        ) {
        	HttpClientFileDownloadResult download=new HttpClientFileDownloadResult();
        	
        	String contentType=null ;
        	if(closeableHttpResponse.containsHeader("content-type")) {
        		contentType=closeableHttpResponse.getHeaders("content-type")[0].getValue();
        	}
        	String result= null;
        	String contentDisposition = null;
        	if(closeableHttpResponse.containsHeader("content-disposition")) {
        		 contentDisposition = closeableHttpResponse.getHeaders("content-disposition")[0].getValue();
                 LOGGER.info("content-type:{},contentDisposition:{}",contentType,contentDisposition);
        	}
			if (ContentType.DEFAULT_BINARY.getMimeType().equals(contentType)) {
				String[] contentDispositionArray = contentDisposition != null ? contentDisposition.split(";")
						: new String[] {};
                for (String value : contentDispositionArray) {
                    if (value.trim().startsWith("filename")) {
                         String filename = value.substring(value.indexOf('=') + 1).trim().replace("\"", "");
                		 download.setFileName(new String(filename.getBytes(),"utf-8"));
                		 break;
                    }
                }
                
        		result=download.getFileName();
        		
                httpEntity = closeableHttpResponse.getEntity();
                ByteArrayOutputStream bos=new ByteArrayOutputStream();
                fileInputStream = httpEntity.getContent();
                IOUtils.copy(fileInputStream, bos);
                download.setFileContent(bos.toByteArray());
        	}else {
        		httpEntity = closeableHttpResponse.getEntity();
                result=EntityUtils.toString(httpEntity, charset);
        		download.setResponseBody(result);
        	}
        	download.setHeaderContentType(contentType);
            return download;
        } finally {
            //如果没有
            try {
        		if(fileInputStream!=null) {
            		fileInputStream.close();
            	}
		} catch (Exception e) {
                //结果发现这个报错
		 LOGGER.error("fileInputStream close error:"+e.getMessage());
		}
           //如果 没有fileInputStream.close();这个报错，finally try-catch只是把错误提前了，做了个代码兼容
          EntityUtils.consume(httpEntity);
          
        }

//EntityUtils.consume(httpEntity); 这个报错，所有finally加上了这个fileInputStream.close();
  /**
     * Ensures that the entity content is fully consumed and the content stream, if exists,
     * is closed.
     *
     * @param entity the entity to consume.
     * @throws IOException if an error occurs reading the input stream
     *
     * @since 4.1
     */
    public static void consume(final HttpEntity entity) throws IOException {
        if (entity == null) {
            return;
        }
        if (entity.isStreaming()) {
            final InputStream inStream = entity.getContent();
            if (inStream != null) {
                inStream.close();
            }
        }
    }

全选代码

复制

错误原因分析

抓取了response的headed信息

HTTP/1.1 200
Date:Wed, 27 Sep 2022 11:11:45 GMT
Content-Type:application/octet-stream
Transfer-Encoding:chunked
Connection:keep-alive
Vary:Accept-Encoding
Content-Disposition:attachment; filename="xxx.pdf";filename*=utf-8''xxx.pdf
Strict-Transport-Security:max-age=15724800; includeSubDomains
2022-09-17 11:11:45.890  util: content-type:Content-Type: application/octet-stream,contentDisposition:attachment; filename="xxx.pdf";filename*=utf-8''xxx.pdf

全选代码

复制

在头部加入 Transfer-Encoding: chunked 之后，就代表这个报文采用了分块编码。

此应用场景外部文件系统是怕文件太大，进行了分块传输

HTTP 1.1时，Response要嘛通过Content-Length来指定要传输的内容大小，要嘛通过Transfer-Encoding: chunked来传输动态大小的内容，此时要求Response传输的内容要符合chunk encoding的规定。

从抓包的角度来说，两个请求如果HTTP 参数（Head和Body）都相同的话，是等价的，不管请求是从浏览器还是Java代码发出来的。
CRLF — Carriage-Return Line-Feed回车换行回车(CR, ASCII 13, \r) 换行(LF, ASCII 10, \n)。
【carriage [‘kæridʒ] n. 运输；运费；四轮马车；举止；客车厢】

HTTP1.1默认支持长连接（持久连接），长连接避免为每个请求都创建各自的连接，而是多个请求使用一个已经建立好的连接。如果每个HTTP请求都进行set up和tear down，这对于性能损失是非常严重的。

在长连接中，每次传输的长度必须都被计算的非常精确。如果在http头中以Content-Length来标记请求request或者响应response的大小，客户端或者服务器就只会从流中读取 Content-Length指定大小的字节，然后表明本次传输结束。使得下一次客户端请求以及服务器端响应继续使用这个相同的socket连接成为可能。
在交互式应用程序中有1个问题，就是它并不知道将要传输的数据有多大。
在HTTP1.0中，服务器就会省略response头中的Content-Length而持续写数据出去，当服务器挂了的话，它简单地断开连接。而经典的HTTP客户端会一直读数据直到碰到-1（传输结束的标识符）。
为了处理这个问题，HTTP1.1中增加了一个特殊的header： Transfer-Encoding：chunked，允许响应response被分块chunked。每次向连接写数据的时候会先计算大小，最后在response的尾部以一个0长度的chunk块标志着此次传输的结束。即HTTP1.1支持chunked编码，它允许HTTP消息被分成多块后再进行传输。 Chunking一般用在服务器响应response的时候，但是客户端也可以chunk大的请求request。即Chunk编码允许服务器在发送完Header后，发送更多的Body内容。
Chunked编码使用若干个Chunk块串连而成，每个Chunk块都以一个表明chunk快大小的16进制数字和一个额外的CRLF（回车换行）开始，后面跟着指定大小的内容。即每个Chunk块分为头部和正文两部分，头部内容指定下一段正文的字符总数（十六进制的数字）和数量单位（一般不写），正文部分就是指定长度的实际内容，两部分之间用回车换行(CRLF)隔开。最后以一个长度为0的Chunk表明本次传输结束。

1个典型的Chunk传输例子：

C\r\n   
Some data...   
11\r\n   
Some more data...   
0\r\n  
上述消息内容包含2个chunk块，第一个块是12字节长度（hex C），第二个块是17字节长度（hex 11）。

全选代码

复制

Java中减少Chunk数量的一种方法是不要适应flush（），只使用1个write（）方法来输出所有内容。如果被输出的内容大小大于output的buffer大小，那么输出还是会被chunk，但是不使用flush（）方法还是可以有效地减少不必要的chunking。
在有些情况下，有些客户端或者服务器只能处理老的HTTP1.0的行为。此时，应该使用 Connection: close的header来通知接收部分不要使用长连接。

Connection: keep-alive表明可以使用长连接。

解决办法方案

把CloseableHttpResponse closeableHttpResponse 抽取到外面，在finally 进行关闭CloseableHttpResponse ；

if(isOctetStream) {
        		try {
            		if(closeableHttpResponse!=null) {
            			closeableHttpResponse.close();
                	}
    			} catch (Exception e) {
    				LOGGER.error("response close error:",e);
    			}
        	}
        	else {
    			EntityUtils.consume(httpEntity);
        	}

全选代码

复制

感谢参考文章：https://www.daimajiaoliu.com/daima/4ed40a71d900410can

http://events.jianshu.io/p/acd6abec1cc3

原文发表于：https://www.52tect.com/java/2022/09/18/947.html

遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。：JAVA技术专家栈 » httpclient4 download file 下载文件 CRLF expected at end of chunk