python 二进制文件合并,将整个二进制文件读入Python

I need to import a binary file from Python -- the contents are signed 16-bit integers, big endian.

The following Stack Overflow questions suggest how to pull in several bytes at a time, but is this the way to scale up to read in a whole file?

I thought to create a function like:

from numpy import *

import os

def readmyfile(filename, bytes=2, endian='>h'):

totalBytes = os.path.getsize(filename)

values = empty(totalBytes/bytes)

with open(filename, 'rb') as f:

for i in range(len(values)):

values[i] = struct.unpack(endian, f.read(bytes))[0]

return values

filecontents = readmyfile('filename')

But this is quite slow (the file is 165924350 bytes). Is there a better way?

解决方案

I would directly read until EOF (it means checking for receiving an empty string), removing then the need to use range() and getsize.

Alternatively, using xrange (instead of range) should improve things, especially for memory usage.

Moreover, as Falmarri suggested, reading more data at the same time would improve performance quite a lot.

That said, I would not expect miracles, also because I am not sure a list is the most efficient way to store all that amount of data.

What about using NumPy's Array, and its facilities to read/write binary files? In this link there is a section about reading raw binary files, using numpyio.fread. I believe this should be exactly what you need.

Note: personally, I have never used NumPy; however, its main raison d'etre is exactly handling of big sets of data - and this is what you are doing in your question.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值