python压缩 gz文件,使用python解压缩.gz文件的一部分

So here's the problem. I have sample.gz file which is roughly 60KB in size. I want to decompress the first 2000 bytes of this file. I am running into CRC check failed error, I guess because the gzip CRC field appears at the end of file, and it requires the entire gzipped file to decompress. Is there a way to get around this? I don't care about the CRC check. Even if I fail to decompress because of bad CRC, that is OK. Is there a way to get around this and unzip partial .gz files?

The code I have so far is

import gzip

import time

import StringIO

file = open('sample.gz', 'rb')

mybuf = MyBuffer(file)

mybuf = StringIO.StringIO(file.read(2000))

f = gzip.GzipFile(fileobj=mybuf)

data = f.read()

print data

The error encountered is

File "gunzip.py", line 27, in ?

data = f.read()

File "/usr/local/lib/python2.4/gzip.py", line 218, in read

self._read(readsize)

File "/usr/local/lib/python2.4/gzip.py", line 273, in _read

self._read_eof()

File "/usr/local/lib/python2.4/gzip.py", line 309, in _read_eof

raise IOError, "CRC check failed"

IOError: CRC check failed

Also is there any way to use zlib module to do this and ignore the gzip headers?

解决方案

I seems that you need to look into Python zlib library instead

The GZIP format relies on zlib, but introduces a file-level compression concept along with CRC checking, and this appears to be what you do not want/need at the moment.

Edit: the code on Doubh Hellman's site only show how to compress or decompress with zlib. As indicated above, GZIP is "zlib with an envelope", and you'll need to decode the envellope before getting to the zlib-compressed data per se. Here's more info to go about it, it's really not that complicated:

see RFC 1952 for details about the GZIP format

This format starts with a 10 bytes header, followed by optional, non compressed elements such as the file name or a comment, followed by the zlib-compressed data, itself followed by a CRC-32 (precisely an "Adler32" CRC).

By using Python's struct module, parsing the header should be relatively simple

The zlib sequence (or its first few thousand bytes, since that is what you want to do) can then be decompressed with python's zlib module, as shown in the examples above

Possible problems to handle: if there are more than one file in the GZip archive, and if the second file starts within the block of a few thousand bytes we wish to decompress.

Sorry to provide neither an simple procedure nor a ready-to-go snippet, however decoding the file with the indication above should be relatively quick and simple.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值