GZIP_MAGIC字段

最新推荐文章于 2024-03-16 08:58:42 发布

海若

最新推荐文章于 2024-03-16 08:58:42 发布

阅读量3.7k

点赞数

分类专栏：解压Gzip GZIPInputStream 文章标签： buffer byte parameters null 浏览器 python

解压Gzip 同时被 2 个专栏收录

2 篇文章 0 订阅

订阅专栏

GZIPInputStream

1 篇文章 0 订阅

订阅专栏

曾遇到过一个特殊的文件，格式未知，就特地搜索了一下文件前面的两个字节，是0x8B1F（就是通常所说的magic number了），猜测极有可能是GZip了，不过解压的时候CRC校验失败了。

GZip常用于服务器像浏览器传送数据时进行数据压缩处理，类似的技术还有Defalte，小站就采用了这样的技术，加快页面的响应请求。在发送GET请求时，如果指明支持GZip或Deflate，如“Accept-Encoding:gzip, deflate”，那么实际传送过来的数据就是经过了压缩处理的，如果是浏览器，一般会自己解压缩；如果是写程序的话就需要自己解压缩了，否则就是一堆乱码了。

简单python脚本解压缩GZip数据：

#!/usr/bin/python# -*- coding:utf-8 -*-

import sys

import gzipfrom StringIO

import StringIO

if __name__ == "__main__":

if len(sys.argv) != 3:

print "Command line parameters error."

print "Usage: UnGzip.py GzipFilePath ResFilePath"

sys.exit(1)

try:

srcfile = open(sys.argv[1], "rb")

srcdata = srcfile.read()

buf = StringIO(srcdata)

f = gzip.GzipFile(fileobj = buf)

resdata = f.read()

resfile = open(sys.argv[2], "wb")

resfile.write(resdata)

srcfile.close()

resfile.close()

except:

import traceback

import sys

import StringIO

f = StringIO.StringIO()

traceback.print_exc(file=f)

print f.getvalue()

sys.exit(1)

关于GZip格式的RFC文档：http://www.ietf.org/rfc/rfc1952.txt

转自:程序人生 >> GZip magic标志0x8B1F

小结:java解压Gzip数据方法:

protected static byte[] unPack(byte[] b) {
if (b == null || b.length == 0) {
return null;
}
ByteArrayOutputStream out = new ByteArrayOutputStream();
ByteArrayInputStream in = new ByteArrayInputStream(b);
try {
GZIPInputStream gunzip = new GZIPInputStream(in);
byte[] buffer = new byte[256];
int n;
while ((n = gunzip.read(buffer)) >= 0) {
out.write(buffer, 0, n);
}
return out.toByteArray();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}