两种方法判断文件是否为二进制文件,最准确的就是把这两种方法结合起来。
(1)方法一:
它首先检查文件是否以BOM开始,如果不在初始8192字节内查找零字节:
import codecs
file_path = "/home/ubuntu/zgd/ztest/_gs418_510txp_v6.6.2.7.stk.extracted/test"
#: BOMs to indicate that a file is a text file even if it contains zero bytes.
_TEXT_BOMS = (
codecs.BOM_UTF16_BE,
codecs.BOM_UTF16_LE,
codecs.BOM_UTF32_BE,
codecs.BOM_UTF32_LE,
codecs.BOM_UTF8,
)
def is_binary_file(file_path):
with open(file_path, 'rb') as file:
initial_bytes = file.read(8192)
file.close()
return not any(initial_bytes.startswith(bom) for bom in _TEXT_BOMS) and b'\0' in initial_bytes
if __name__ == "__main__":
print is_binary_file(file_path)
上面is_binary_file()函数也可以改成下面的方式:
def is_binary_file(file_path):