java http gzip 解压缩,在Java中解压缩GZIPed HTTP响应

I'm trying to uncompress a GZIPed HTTP Response by using GZIPInputStream. However I always have the same exception when I try to read the stream : java.util.zip.ZipException: invalid bit length repeat

My HTTP Request Header:

GET www.myurl.com HTTP/1.0\r\n

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6\r\n

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n

Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3\r\n

Accept-Encoding: gzip,deflate\r\n

Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7\r\n

Keep-Alive: 115\r\n

Connection: keep-alive\r\n

X-Requested-With: XMLHttpRequest\r\n

Cookie: Some Cookies\r\n\r\n

At the end of the HTTP Response header, I get path=/Content-Encoding: gzip, followed by the gziped response.

I tried 2 similars codes to uncompress :

UPDATE : In the following codes, tBytes = (the string after 'path=/Content-Encoding: gzip').getBytes ();

GZIPInputStream gzip = new GZIPInputStream (new ByteArrayInputStream (tBytes));

StringBuffer szBuffer = new StringBuffer ();

byte tByte [] = new byte [1024];

while (true)

{

int iLength = gzip.read (tByte, 0, 1024); //

if (iLength < 0)

break;

szBuffer.append (new String (tByte, 0, iLength));

}

And this one that I get on this forum :

InputStream gzipStream = new GZIPInputStream (new ByteArrayInputStream (tBytes));

Reader decoder = new InputStreamReader (gzipStream, "UTF-8");//

BufferedReader buffered = new BufferedReader (decoder);

I guess this is an encoding error.

Best regards,

bill0ute

解决方案

You don't show how you get the tBytes that you use to set up the gzip stream here:

GZIPInputStream gzip = new GZIPInputStream (new ByteArrayInputStream (tBytes));

One explanation is that you are including the entire HTTP response in tBytes. Instead, it should be only the content after the HTTP headers.

Another explanation is that the response is chunked.

edit: You are taking the data after the content-encoding line as the message body. However, according to the HTTP 1.1 specification the header fields do not come in any particular order, so this is very dangerous.

As explained in this part of the HTTP specification, the message body of a request or response doesn't come after a particular header field but after the first empty line:

Request (section 5) and Response

(section 6) messages use the generic

message format of RFC 822 [9] for

transferring entities (the payload of

the message). Both types of message

consist of a start-line, zero or more

header fields (also known as

"headers"), an empty line (i.e., a

line with nothing preceding the CRLF)

indicating the end of the header

fields, and possibly a message-body.

You still haven't show how exactly you compose tBytes, but at this point I think you're erroneously including the empty line in the data that you try to decompress. The message body starts after the CRLF characters of the empty line.

May I suggest that you use the httpclient library instead to extract the message body?

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值