python 数据处理二进制_使用python处理和使用二进制数据HEX

最新推荐文章于 2023-01-02 09:37:30 发布

weixin_39794130

最新推荐文章于 2023-01-02 09:37:30 发布

阅读量238

点赞数

文章标签： python 数据处理二进制

本文链接：https://blog.csdn.net/weixin_39794130/article/details/111417325

版权

本文档探讨了在Python中处理二进制文件并进行HEX模式匹配的问题。作者遇到的主要挑战是ASCII编码错误，尝试将字节与预定义的模式字典进行比较。提供的解决方案涉及对二进制数据片段进行迭代，并检查是否存在匹配的EOF标记。

摘要由CSDN通过智能技术生成

I'm trying to do a comparison of some byte values - source A comes from a file that is being 'read':

f = open(fname, "rb")

f_data = f.read()

f.close()

These files can be anything from a few Kb to a few Mb large

Source B is a dictionary of known patterns:

eof_markers = {

'jpg':b'\xff\xd9',

'pdf':b'\x25\x25\x45\x4f\x46',

}

(This list will be extended once the basic process works)

Essentially I'm trying to 'read' the file (source A) and then incrementally inspect the last byte for matches to the pattern list testString = f_data[-counter:] If no match is found, it should increase counter by 1, and try to pattern match against the list again.

I've tried a number of different ways to get this working, I can get the testString to increment correctly, but I keep running into encode issue where various approaches are want to ASCIIify the byte to undertake the comparison.

I'm a bit lost, and not for the first time wandering around the code changing int to u to b and not getting past issues like d9 being a reserved value, and therefore not being able to use the ASCII type comparison tools e.g. if format_type in testString: (results in a UnicodeDecodeError: 'ascii' codec can't decode byte a9

I tried to convert everything to an integer, but that was throwing this error: ValueError: invalid literal for int() with base 2: '.' or ValueError: invalid literal for int() with base 10: '.' I tried to convert the testString to hex bytes, but kept getting TypeError: hex() argument can't be converted to hex (this is more my lack of understanding than anything else I'm sure!....)

There are a number of resources I've found that talk about encoding / hex comparisons e.g. stackoverflow.com/questions/10561923/unicodedecodeerror-ascii-codec-cant-decode-byte-0xef-in-position-1), I've just not found something that I can either fully understand, or that points me down the right path.

Its been a while I've been stuck on this, so any pointers are gratefully received.

解决方案

I'm not sure exactly what you're trying to do, but I ran this code in Python 3.2.3.

#f = open(fname, "rb")

#f_data = f.read()

#f.close()

f_data = b'\x12\x43\xff\xd9\x00\x23'

eof_markers = {

'jpg':b'\xff\xd9',

'pdf':b'\x25\x25\x45\x4f\x46',

}

for counter in range(-4, 0):

for name, marker in eof_markers.items():

print(counter, ('' if marker in f_data[counter:] else '!') + name)

I'm using a hardcoded f_data, but you can undo that by just uncommenting lines 1-3 and comment line 4.

Here's the output:

-4 !pdf

-4 jpg

-3 !pdf

-3 !jpg

-2 !pdf

-2 !jpg

-1 !pdf

-1 !jpg

Is there something this isn't doing that you need to do?

weixin_39794130

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python 数据处理二进制_使用python处理和使用二进制数据HEX

I'm trying to do a comparison of some byte values - source A comes from a file that is being 'read':f = open(fname, "rb")f_data = f.read()f.close()These files can be anything from a few Kb to a few Mb...
复制链接

扫一扫