python压缩数据长度,使用Python解压缩数据包的压缩后的有效载荷

博客讨论了如何从.pcap文件中解析出含有Content-Encoding:gzip的HTTP响应,并进行解压缩。通过使用Python的httplib模块的HTTPMessage类来分离HTTP消息的头部和正文,然后对gzip压缩的正文部分进行解压缩。提供的代码示例展示了如何实现这一过程。
摘要由CSDN通过智能技术生成

I am currently working on a program that takes a .pcap file and separates all of the packets out by ip using the scapy package. I want to decompress the payloads that are compressed using the gzip package. I can tell if the payload is gzipped because it contains

Content-Encoding: gzip

I am trying to use

fileStream = StringIO.StringIO(payload)

gzipper = gzip.GzipFile(fileobj=fileStream)

data = gzipper.read()

to decompress the payload, where

payload = str(pkt[TCP].payload)

When I try to do this I get the error

IOError: Not a gzipped file

When I print the first payload I get

HTTP/1.1 200 OK

Cache-Control: private, max-age=0

Content-Type: text/html; charset=utf-8

P3P: CP="NON UNI COM NAV STA LOC CURa DEVa PSAa PSDa OUR IND"

Vary: Accept-Encoding

Content-Encoding: gzip

Date: Sat, 30 Mar 2013 19:23:33 GMT

Content-Length: 15534

Connection: keep-alive

Set-Cookie: _FS=NU=1; domain=.bing.com; path=/

Set-Cookie: _SS=SID=F2652FD33DC443498CE043186458C3FC&C=20.0; domain=.bing.com; path=/

Set-Cookie: MUID=2961778241736E4F314E732240626EBE; expires=Mon, 30-Mar-2015 19:23:33 GMT; domain=.bing.com; path=/

Set-Cookie: MUIDB=2961778241736E4F314E732240626EBE; expires=Mon, 30-Mar-2015 19:23:33 GMT; path=/

Set-Cookie: OrigMUID=2961778241736E4F314E732240626EBE%2c532012b954b64747ae9b83e7ede66522; expires=Mon, 30-Mar-2015 19:23:33 GMT; domain=.bing.com; path=/

Set-Cookie: SRCHD=D=2758763&MS=2758763&AF=NOFORM; expires=Mon, 30-Mar-2015 19:23:33 GMT; domain=.bing.com; path=/

Set-Cookie: SRCHUID=V=2&GUID=02F43275DC7F435BB3DF3FD32E181F4D; expires=Mon, 30-Mar-2015 19:23:33 GMT; path=/

Set-Cookie: SRCHUSR=AUTOREDIR=0&GEOVAR=&DOB=20130330; expires=Mon, 30-Mar-2015 19:23:33 GMT; domain=.bing.com; path=/

?}k{?H????+0?#!?,_???$?:?7vf?w?Hb???ƊG???9???/9U?\$;3{9g?ycAӗ???????W{?o?~?FZ?e ]>???n????׻?????????d?t??a?3?

?2?p??eBI?e??????ܒ?P??-?Q?-L?????ǼR?³?ׯ??%'

?2Kf?7???c?Y?I?1+c??,ae]?????

For additional information, this is a packet that was isolated because it contained Content-Encoding: gzip from a sample .pcap file provided by a project.

解决方案

In order to decode a gzipped HTTP response, you only need to decode the response body, not the headers.

The payload in your case is the entire TCP payload, i.e. the entire HTTP message including headers and body.

HTTP messages (requests and responses) are RFC 822 messages (which is the same generic message format that E-Mail messages (RFC 2822) are based upon).

The structure of an 822 message is very simple:

Zero or more header lines (key/ value pairs separated by :), terminated by CRLF

An empty line (CRLF (carriage return, line feed, so '\r\n')

The message body

You now could parse this message yourself in order to isolate the body. But I would rather recommend you use the tools Python already provides for you. The httplib module (Python 2.x) includes the HTTPMessage class which is used by httplib internally to parse HTTP responses. It's not meant to be used directly, but in this case I would probably still use it - it will handle some HTTP specific details for you.

Here's how you can use it to separate the body from the headers:

>>> from httplib import HTTPMessage

>>>

>>> f = open('gzipped_response.payload')

>>>

>>> # Or, if you already have the payload in memory as a string:

... # f = StringIO.StringIO(payload)

...

>>> status_line = f.readline()

>>> msg = HTTPMessage(f, 0)

>>> body = msg.fp.read()

The HTTPMessage class works in a similar way the rfc822.Message does:

First, you need to read (or discard) the status line (HTTP/1.1 200 OK), because that's not part of the RFC822 message, and is not a header.

Then you instantiate HTTPMessage with a handle to an open file and the seekable argument set to 0. The file pointer is stored as msg.fp

Upon instantiation it calls msg.readheaders(), which reads all header lines until it encounters an empty line (CRLF).

At that point, msg.fp has been advanced to the point where the headers end and the body starts. You can therefore call msg.fp.read() to read the rest of the message - the body.

After that, your code for decompressing the gzipped body just works:

>>> body_stream = StringIO.StringIO(body)

>>> gzipper = gzip.GzipFile(fileobj=body_stream)

>>> data = gzipper.read()

>>>

>>> print data[:25]

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值