Python词频统计

最新推荐文章于 2024-09-06 10:00:00 发布

RNGXiaohua

最新推荐文章于 2024-09-06 10:00:00 发布

阅读量318

点赞数 1

分类专栏： Python小程序

本文链接：https://blog.csdn.net/qq_38768811/article/details/99975511

版权

Python小程序专栏收录该内容

2 篇文章 0 订阅

订阅专栏

# 词频统计：将每个单词都转换为小写，去掉有些单词后面的标点符号

import string

with open("D:/test.txt", 'r', encoding='utf-8') as text:
    # 用一个列表存储所有的单词
    words = [word.strip(string.punctuation).lower() for word in text.read().split()]
    # 使用set()函数将列表转换为集合,相同的单词只出现一次
    words_index = set(words)
    # 用词典存储每个单词和单词出现的次数
    count_dict = {index:words.count(index) for index in words_index}
# 写入文件
out_file = open("D:/result.txt","a", encoding='utf-8')
for word in sorted(count_dict, key=lambda x: count_dict[x], reverse=True):
    print("%-20s"% word, count_dict[word], file=out_file)