python二进制截取_从python中的二进制文件中提取zlib压缩数据

最新推荐文章于 2022-12-14 20:57:40 发布

处黑

最新推荐文章于 2022-12-14 20:57:40 发布

阅读量567

点赞数

文章标签： python二进制截取

本文链接：https://blog.csdn.net/weixin_36087357/article/details/111929638

版权

My company uses a legacy file format for Electromiography data, which is no longer in production. However, there is some interest in maintaining retro-compatibility, so I am studying the possibility to write a reader for that file format.

By analyzing a very convoluted former source code written in Delphi, the file reader/writer uses ZLIB, and inside a HexEditor it looks like there is a file header in binary ASCII (with fields like "Player", "Analyzer" readily readable), followed by a compressed string containing raw data.

My doubt is: how should I proceed in order to identify:

If it is a compressed stream;

Where does the compressed stream start and where does it end;

From Wikipedia:

zlib compressed data is typically written with a gzip or a zlib

wrapper. The wrapper encapsulates the raw DEFLATE data by adding a

header and trailer. This provides stream identification and error

detection

Is this relevant?

I'll be glad to post more information, but I don't know what would be most relevant.

Thanks for any hint.

EDIT: I have the working application, and can use it to record actual data of any time length, getting files even smaller than 1kB if necessary.

Some sample files:

The same as above after a very short (1 second?) datastream has been saved: https://dl.dropbox.com/u/4849855/Mio_File/HeltonFilled.mio

A different one, from a patient named "manco" instead of "Helton", with an even shorter stream (ideal for Hex viewing): https://dl.dropbox.com/u/4849855/Mio_File/manco_short.mio

Instructions: each file should be the file of a patient (a person). Inside these files, one or more exams are saved, each exam consisting of one or more time series. The provided files contain only one exam, with one data series.

解决方案

To start, why not scan the files for all valid zip streams (it's good enough for small files and to figure out the format):

import zlib

from glob import glob

def zipstreams(filename):

"""Return all zip streams and their positions in file."""

with open(filename, 'rb') as fh:

data = fh.read()

i = 0

while i < len(data):

try:

zo = zlib.decompressobj()