python字节流处理_Python解压缩字节流？

最新推荐文章于 2024-05-23 09:52:57 发布

weixin_39671964

最新推荐文章于 2024-05-23 09:52:57 发布

阅读量651

点赞数

文章标签： python字节流处理

Here is the situation:

I get gzipped xml documents from Amazon S3

import boto

from boto.s3.connection import S3Connection

from boto.s3.key import Key

conn = S3Connection('access Id', 'secret access key')

b = conn.get_bucket('mydev.myorg')

k = Key(b)

k.key('documents/document.xml.gz')

I read them in file as

import gzip

f = open('/tmp/p', 'w')

k.get_file(f)

f.close()

r = gzip.open('/tmp/p', 'rb')

file_content = r.read()

r.close()

Question

How can I unzip the streams directly and read the contents?

I do not want to create temp files, they don't look good.

解决方案

Yes, you can use the zlib module to decompress byte streams:

import zlib

def stream_gzip_decompress(stream):

dec = zlib.decompressobj(32 + zlib.MAX_WBITS) # offset 32 to skip the header

for chunk in stream:

rv = dec.decompress(chunk)

if rv:

yield rv

The offset of 32 signals to the zlib header that the gzip header is expected but skipped.

The S3 key object is an iterator, so you can do:

for data in stream_gzip_decompress(k):

# do something with the decompressed data

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

关注关注