python 数据处理二进制_使用python处理和使用二进制数据HEX

本文档探讨了在Python中处理二进制文件并进行HEX模式匹配的问题。作者遇到的主要挑战是ASCII编码错误,尝试将字节与预定义的模式字典进行比较。提供的解决方案涉及对二进制数据片段进行迭代,并检查是否存在匹配的EOF标记。
摘要由CSDN通过智能技术生成

I'm trying to do a comparison of some byte values - source A comes from a file that is being 'read':

f = open(fname, "rb")

f_data = f.read()

f.close()

These files can be anything from a few Kb to a few Mb large

Source B is a dictionary of known patterns:

eof_markers = {

'jpg':b'\xff\xd9',

'pdf':b'\x25\x25\x45\x4f\x46',

}

(This list will be extended once the basic process works)

Essentially I'm trying to 'read' the file (source A) and then incrementally inspect the last byte for matches to the pattern list testString = f_data[-counter:] If no match is found, it should increase counter by 1, and try to pattern match against the list again.

I've tried a number of different ways to get this working, I can get the testString to increment correctly, but I keep running into encode issue where various approaches are want to ASCIIify the byte to undertake the comparison.

I'm a bit lost, and not for the first time wandering around the code changing int to u to b and not getting past issues like d9 being a reserved value, and therefore not being able to use the ASCII type comparison tools e.g. if format_type in testString: (results in a UnicodeDecodeError: 'ascii' codec can't decode byte a9

I tried to convert everything to an integer, but that was throwing this error: ValueError: invalid literal for int() with base 2: '.' or ValueError: invalid literal for int() with base 10: '.' I tried to convert the testString to hex bytes, but kept getting TypeError: hex() argument can't be converted to hex (this is more my lack of understanding than anything else I'm sure!....)

There are a number of resources I've found that talk about encoding / hex comparisons e.g. stackoverflow.com/questions/10561923/unicodedecodeerror-ascii-codec-cant-decode-byte-0xef-in-position-1), I've just not found something that I can either fully understand, or that points me down the right path.

Its been a while I've been stuck on this, so any pointers are gratefully received.

解决方案

I'm not sure exactly what you're trying to do, but I ran this code in Python 3.2.3.

#f = open(fname, "rb")

#f_data = f.read()

#f.close()

f_data = b'\x12\x43\xff\xd9\x00\x23'

eof_markers = {

'jpg':b'\xff\xd9',

'pdf':b'\x25\x25\x45\x4f\x46',

}

for counter in range(-4, 0):

for name, marker in eof_markers.items():

print(counter, ('' if marker in f_data[counter:] else '!') + name)

I'm using a hardcoded f_data, but you can undo that by just uncommenting lines 1-3 and comment line 4.

Here's the output:

-4 !pdf

-4 jpg

-3 !pdf

-3 !jpg

-2 !pdf

-2 !jpg

-1 !pdf

-1 !jpg

Is there something this isn't doing that you need to do?

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值