Python学习-内置函数open

最新推荐文章于 2023-02-22 20:32:18 发布

jadechenyt

最新推荐文章于 2023-02-22 20:32:18 发布

阅读量214

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/jadechenyt/article/details/103122080

版权

python 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

题目：读取文本文档中每个单词出现的次数

第一次尝试编写代码：

import re

with open("test.txt",'r') as text:
words = text.read().split()
for word in words:
if words.count(word)>1:
print('{},{}times'.format(word,words.count(word)))

执行结果，出现编码问题：

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-19-d8c8e28c5803> in <module>
      2 
      3 with open("test.txt",'r') as text:
----> 4     words = text.read().split()
      5     for word in words:
      6         if words.count(word)>1:

UnicodeDecodeError: 'gbk' codec can't decode byte 0x9d in position 220: illegal multibyte sequence

第二次尝试以‘rb’模式读取文本内容，‘rb'是以二进制格式打开文件。

import re

with open("test.txt",'rb') as text:
words = text.read().split()
for word in words:
if words.count(word)>1:
print('{},{}times'.format(word,words.count(word)))

执行结果如下，可以执行成功，但是输出结果中单词前面都带有一个b。b代表二进制模式，并不是单词输出错误。要使单词输出正确，需转换编码方式。

b'a',4times
b'virtual',6times
b'in',6times
b'the',6times

第三次，添加输出时转换编码的语句。

import re

with open("test.txt",'rb') as text:
words = text.read().split()
for word in words:
if words.count(word)>1:
print('{}-{}times'.format(str(word,"cp936"),words.count(word)))

执行结果：

a-4times
virtual-6times
in-6times
the-6times

jadechenyt

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python学习-内置函数open

题目：读取文本文档中每个单词出现的次数第一次尝试编写代码：import rewith open("test.txt",'r') as text: words = text.read().split() for word in words: if words.count(word)>1: print('{},{}tim...
复制链接

扫一扫