python error解决记录

最新推荐文章于 2024-04-12 02:34:18 发布

ronvicki

最新推荐文章于 2024-04-12 02:34:18 发布

阅读量859

点赞数

分类专栏： steps 文章标签： python

本文链接：https://blog.csdn.net/ronvicki/article/details/80991217

版权

steps 专栏收录该内容

3 篇文章 1 订阅

订阅专栏

【2018-7-10】

UnicodeDecodeError

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 0: invalid continuation byte

三国演义人物出场统计（引自嵩天《Python语言程序设计基础》），其中threekindoms.txt中是《三国演义》全文。

#CalThreeKingdomsV1.py
import jieba
txt = open("threekingdoms.txt", "r", encoding='utf-8').read()
words  = jieba.lcut(txt)
counts = {}
for word in words:
    if len(word) == 1:
        continue
    else:
        counts[word] = counts.get(word,0) + 1
items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=True) 
for i in range(15):
    word, count = items[i]
    print ("{0:<10}{1:>5}".format(word, count))

F5执行程序时，提示错误：

Traceback (most recent call last):
  File "C:\Users\Vicki\Desktop\MOOC\python\WEEK6\CalThreeKingdomsV1.py", line 4, in <module>
    txt = open("threekingdoms.txt", "r", encoding='utf-8').read()
  File "E:\Python36\lib\codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 0: invalid continuation byte

【错误原因】：编码问题， threekindoms.txt中粘贴《三国演义》全文时，默认保存的是编码是"ANSI"，而这里应该用utf-编码。

【解决方法】：将threekindoms.txt另存为编码格式为“UTF-8”的txt文件。点击F5运行。

【输出结果】：

== RESTART: C:\Users\Vicki\Desktop\MOOC\python\WEEK6\CalThreeKingdomsV1.py ==
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\Vicki\AppData\Local\Temp\jieba.cache
Loading model cost 1.498 seconds.
Prefix dict has been built succesfully.
曹操          953
孔明          836
将军          772
却说          656
玄德          585
关公          510
丞相          491
二人          469
不可          440
荆州          425
玄德曰         390
孔明曰         390
不能          384
如此          378
张飞          358
>>>

SyntaxError: Non-UTF-8 code starting with...

SyntaxError: Non-UTF-8 code starting with '\xbd' in file

三国演义人物出场统计优化版（引自嵩天《Python语言程序设计基础》），其中threekindoms.txt中是《三国演义》全文。

#CalThreeKingdomsV2.py
import jieba
excludes = {"将军","却说","荆州","二人","不可","不能","如此"}
txt = open("threekingdoms.txt", "r", encoding='utf-8').read()
words  = jieba.lcut(txt)
counts = {}
for word in words:
    if len(word) == 1:
        continue
    elif word == "诸葛亮" or word == "孔明曰":
        rword = "孔明"
    elif word == "关公" or word == "云长":
        rword = "关羽"
    elif word == "玄德" or word == "玄德曰":
        rword = "刘备"
    elif word == "孟德" or word == "丞相":
        rword = "曹操"
    else:
        rword = word
    counts[rword] = counts.get(rword,0) + 1
for word in excludes:
    del counts[word]
items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=True) 
for i in range(10):
    word, count = items[i]
    print ("{0:<10}{1:>5}".format(word, count))

F5执行程序时，提示错误：

 File "CalThreeKingdomsV2.py", line 3

SyntaxError: Non-UTF-8 code starting with '\xbd' in file CalThreeKingdomsV2.py on line 3, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

【错误原因】：在.py文件第3行有中文字符，运行时出现错误。

【解决办法】：在.py文件开头，输入：# coding=gbk

【输出结果】：

== RESTART: C:\Users\Vicki\Desktop\MOOC\python\WEEK6\CalThreeKingdomsV2.py ==
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\Vicki\AppData\Local\Temp\jieba.cache
Loading model cost 1.011 seconds.
Prefix dict has been built succesfully.
曹操         1451
孔明         1383
刘备         1252
关羽          784
张飞          358
商议          344
如何          338
主公          331
军士          317
吕布          300
>>>