UnicodeDecodeError:‘utf-8‘codec can‘t decode byte 0xc4 in position 0: invalid continuation byte

最新推荐文章于 2025-03-17 09:49:48 发布

小龙狗

最新推荐文章于 2025-03-17 09:49:48 发布

阅读量1.5w

点赞数 11

分类专栏： Memos Python编程文章标签： Unicode DecodeError utf-8 gbk

本文链接：https://blog.csdn.net/ShyLoneGirl/article/details/116207734

版权

Python编程同时被 2 个专栏收录

42 篇文章

订阅专栏

Memos

27 篇文章

订阅专栏

在Python编程中，遇到读取.txt或.csv文件时的UnicodeDecodeError，通常是由于文件编码与程序设定不符。解决方法包括：检查并修改文件编码，使用`chardet`库检测文件原始编码，或编写函数批量转换文件编码。确保在open()函数中正确指定文件的编码方式。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

问题描述

Python 编程读取 .txt ，.csv 等文本文件时信息，遇到错误如下

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 0: invalid continuation byte

执行的代码是

filename = 'cdtest.txt'
    with open(filename, 'r', encoding='UTF-8') as fr:
        tempContent = fr.read()

原因分析

文本中出现了程序不能识别的字符，大概率是由于文本编码方式与程序设定的编码不一致。例如文件的编码方式是 GBK ，而程序中 open() 函数的 encoding 参数设置成了其他的。

解决办法

直接修改文本编码方式

最直接的方式，可以直接用记事本打开文档，保存(或另存为)新文本，在保存选项中设置编码格式为 UTF-8 ，这是较常用的编码方式。

以文本编码格式读入

修改代码如下。如改为 GBK ，但前提是要知道本文是哪种编码。

filename = 'cdtest.txt'
with open(filename, 'r', encoding='GBK') as fr:
    tempContent = fr.read()

如何得知文件编码方式？

import chardet
def  GetEncodingSheme(_filename):
    with open(_filename, 'rb') as file:
        buf = file.read()
    result = chardet.detect(buf)
    return result['encoding']
    
if __name__ == '__main__':
	filename = 'cdtestu.txt'
	print(GetEncodingSheme(filename))

程序修改文本编码方式

首先要知道文本的编码方式 (上2)，然后修改它，存成新文本。

def ChangeEncoding(_infilename, _outfilname, _encodingsheme='UTF-8'):
	ifEncodeSheme = GetEncodingSheme(_infilename)
	with open(_infilename, 'r', encoding=ifEncodeSheme) as fr:
    	tempContent = fr.read()
	with open(_outfilname, 'w', encoding=_encodingsheme) as fw:
    	fw.write(tempContent)
if __name__ == '__main__':
	ChangeEncoding('ascii.txt', 'ascii2.txt', 'GB2312')