**
使用python分析英语四级试题中各个词汇出现的次数
**
这是今天上课时出现的想法,觉着使用hadoop大数据集群实现太麻烦,于是乎自己写了一个程序实现它
1.打开pycharm,鼠标单击左侧选项栏,新建一个python文件,命名为wordcount
如图
2.将下方代码复制进wordcount文件中,如下图
import io
import sys
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8')
fi=open("a.txt","r",encoding='utf-8')
#with open('b.txt',errors='ignore')
fo= open("b.txt","w",encoding='utf-8')
txt=fi.read().split(" ")
d={}
#exclude=["the","as","to","and","is","are","in","with","our","by","this","a","some","or","you","my","of","one","C)","D)","A)","B)","-"," "]
for word in txt:
# if word in exclude:
#continue
# else:
d[word]=d.get(word,0)+1
ls=list(d.items())
ls.sort(key=lambda x:x[1],reverse=True)
for i in range(len(ls)):
fo.write("{}:{}".format(ls[i][0],ls[i][1])+"\n")
fi.close
fo.close
3.在wordcount文件同一目录下分别创建a.txt和b.txt文件,如图所示
4.打开a.txt,将四级试题内容粘贴进去,这里我放了十套真题
5.打开wordcount文件,按f5运行程序
ps:编写代码时遇到了一个字符编码的问题,上网查了查资料,原来是utf-8没有弄好
6.各个词汇出现的次数保存在b.txt文件中,打开它,我们便能查看结果了
哈哈,大功告成