python读取文件判断文件编码内容

最新推荐文章于 2024-04-07 11:26:25 发布

xiaofeihfh

最新推荐文章于 2024-04-07 11:26:25 发布

阅读量1.1k

点赞数

分类专栏： python 文章标签： python 开发语言后端

本文链接：https://blog.csdn.net/xiaofeihfh/article/details/121284131

版权

文件编码 chardet 读取策略中文文件名解码

关键词由CSDN通过智能技术生成

python 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

每次读取文件出现不同编码内容，头大。

对应每次文本编码格式不同的读取

bytes = min(32, os.path.getsize(file_path))
raw = open(file_path, 'rb').read(bytes)
result = chardet.detect(raw)
encoding = result['encoding']
f = open(file_path, "r", encoding=encoding)
f_content = f.readlines()

部分文件还是无法正确识别编码

更新ing

# 首先二进制方式打开文件
with open(absPath, 'rb') as frb:
    # 检测编码方式
    cur_encoding = chardet.detect(frb.read())['encoding']
# 指定文件编码方式
with open(absPath, 'r', encoding=cur_encoding) as fr:
    Content = fr.read()

用这个包可以解决编码读取问题

linux中文文件读取，打开时候指定编码