python 小说词频统计_python编程：英文小说词频统计

最新推荐文章于 2023-05-16 23:54:22 发布

weixin_39862669

最新推荐文章于 2023-05-16 23:54:22 发布

阅读量548

点赞数 1

文章标签： python 小说词频统计

#!/usr/bin/env python

# -*- coding: utf-8 -*-

#1导入模块

import string #导入字符串模块

import matplotlib.pyplot as plt #导入matplotlib模块，取别名plt

#from matplotlib import pyplot as plt #导入matplotlib模块，取别名plt

#2读取文件，并分词

hist = {} #创建一个空字典，放词频与单词，无序排列

data = [] #创建一个空列表，放词频与单词，有序：从多到少

f = open('The Myths(神话).txt','r') #打开文件

content = f.read() #读取文件

f.close() #关闭文件

content = content.replace('-',' ') #连字符—用空格代替

words = content.split() #字符串按空格分割--分词

#迭代处理：将字典变列表，存入数据

for i in range(len(words)):

words[i] = words[i].strip(string.punctuation) #去掉标点符号，去掉首尾

words[i] = words[i].lower() #统一大小写

if words[i] in hist: #统计词频与单词

hist[words[i]] = hist[words[i]] + 1 #不是第一次

else:

hist[words[i]] = 1 #第一次

#print(hist) #打印字典（词频与单词，无序）

#遍历字典

for key, value in hist.items(): #遍历字典

temp = [value,key] #变量，变量值

data.append(temp) #添加数据

data.sort(reverse=True) #排序

#print(data) #打印列表（词频与单词，有序，从多到少）

#绘制直方图（词频TOP1-10）

plt.rcParams['font.sans-serif']=['SimHei'] #直方图正常显示中文字体

for i in range(0,10):

plt.bar((data[i][1],),(data[i][0],))

plt.title('小说"The Myths(神话)"词频(TOP1-10)') #显示标题

plt.xlabel('单词') # 显示x轴名称

plt.ylabel('词频') # 显示y轴名称

plt.legend('词频直方图') #显示图例

plt.show() #显示作图结果

#绘制直方图（词频TOP11-20）

for i in range(10,20):

plt.bar((data[i][1],),(data[i][0],))

plt.legend('直方图')

plt.xlabel('单词')

plt.ylabel('词频')

plt.title('小说"The Myths(神话)"词频(Top11-20)')

plt.show()

weixin_39862669

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python 小说词频统计_python编程：英文小说词频统计

#!/usr/bin/env python# -*- coding: utf-8 -*-#1导入模块import string #导入字符串模块import matplotlib.pyplot as plt #导入matplotlib模块，取别名plt#from matplotlib import pyplot as plt ...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。