词频统计

最新推荐文章于 2021-11-25 15:32:07 发布

Y_Jireh

最新推荐文章于 2021-11-25 15:32:07 发布

阅读量217

点赞数

分类专栏： Python学习文章标签： Python 词汇统计

本文链接：https://blog.csdn.net/Y_Jireh/article/details/101173891

版权

博客介绍了如何使用Python进行单词频次统计，注意到标点符号、大小写对统计的影响，并提出了解决方案。通过预处理，如移除标点、统一单词大小写，创建单词频率字典，实现有效统计。

摘要由CSDN通过智能技术生成

统计重复出现的单词

文本来源：https://pan.baidu.com/s/1o75GKZ4

path = r'C:\Users\Administrator\PycharmProjects\untitled1\Walden.txt'
with open(path,'r',encoding='utf8 ') as text:
    words = text.read().split()
    print(words)
    for word in words:
        print('{}-{} times'.format(word,words.count(word)))

结论：

有一些带标点符号的单词被单独统计了次数；
有些单词不止一次地展示了出现的次数；
由于Python对大小写敏感，开头大写的单词被单独统计了。

现在我们根据这些点调整一下我们的统计方法。对单词做一些预处理：

mport string

path = r'C:\Users\Administrator\PycharmProjects\untitled1\Walden.txt'
with open(path,'r',encoding='utf8 ') as text:
    words = [raw_word.strip(string.punctuation).lower() for raw_word in text.read().split()]
    words_index = set(words)
    counts_dict = {index:

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

Y_Jireh

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
词频统计

统计重复出现的单词文本来源：https://pan.baidu.com/s/1o75GKZ4path = r'C:\Users\Administrator\PycharmProjects\untitled1\Walden.txt'with open(path,'r',encoding='utf8 ') as text: words = text.read().split() ...
复制链接

扫一扫