python二进制截取_从python中的二进制文件中提取zlib压缩数据

My company uses a legacy file format for Electromiography data, which is no longer in production. However, there is some interest in maintaining retro-compatibility, so I am studying the possibility to write a reader for that file format.

By analyzing a very convoluted former source code written in Delphi, the file reader/writer uses ZLIB, and inside a HexEditor it looks like there is a file header in binary ASCII (with fields like "Player", "Analyzer" readily readable), followed by a compressed string containing raw data.

My doubt is: how should I proceed in order to identify:

If it is a compressed stream;

Where does the compressed stream start and where does it end;

From Wikipedia:

zlib compressed data is typically written with a gzip or a zlib

wrapper. The wrapper encapsulates the raw DEFLATE data by adding a

header and trailer. This provides stream identification and error

detection

Is this relevant?

I'll be glad to post more information, but I don't know what would be most relevant.

Thanks for any hint.

EDIT: I have the working application, and can use it to record actual data of any time length, getting files even smaller than 1kB if necessary.

Some sample files:

The same as above after a very short (1 second?) datastream has been saved: https://dl.dropbox.com/u/4849855/Mio_File/HeltonFilled.mio

A different one, from a patient named "manco" instead of "Helton", with an even shorter stream (ideal for Hex viewing): https://dl.dropbox.com/u/4849855/Mio_File/manco_short.mio

Instructions: each file should be the file of a patient (a person). Inside these files, one or more exams are saved, each exam consisting of one or more time series. The provided files contain only one exam, with one data series.

解决方案

To start, why not scan the files for all valid zip streams (it's good enough for small files and to figure out the format):

import zlib

from glob import glob

def zipstreams(filename):

"""Return all zip streams and their positions in file."""

with open(filename, 'rb') as fh:

data = fh.read()

i = 0

while i < len(data):

try:

zo = zlib.decompressobj()

yield i, zo.decompress(data[i:])

i += len(data[i:]) - len(zo.unused_data)

except zlib.error:

i += 1

for filename in glob('*.mio'):

print(filename)

for i, data in zipstreams(filename):

print (i, len(data))

Looks like the data streams contain little-endian double precision floating point data:

import numpy

from matplotlib import pyplot

for filename in glob('*.mio'):

for i, data in zipstreams(filename):

if data:

a = numpy.fromstring(data, '

pyplot.plot(a[1:])

pyplot.title(filename + ' - %i' % i)

pyplot.show()

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值