python统计每个单词出现的次数

最新推荐文章于 2024-09-17 16:45:05 发布

今天吃了饺子吗

最新推荐文章于 2024-09-17 16:45:05 发布

阅读量1.2w

点赞数 4

分类专栏： python 文章标签： python

本文链接：https://blog.csdn.net/ethereal_tl/article/details/127943817

版权

python 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

编程要求

请按照函数的注释，补充程序中缺失部分语句，按要求实现如下程序功能：‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬

word_frequency()函数统计并以字典类型返回每个单词出现的次数。‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
top_ten_words()函数分行依次输出出现次数最多的n个单词及其出现次数。‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬

根据提示，输入一个非负整数n，分行依次输出出现次数最多的n个单词及其出现次数，单词和次数之间以空格间隔。

import string

def word_frequency(txt):
    """接收去除标点、符号的字符串，统计并返回每个单词出现的次数
    返回值为字典类型，单词为键，对应出现的次数为值"""
    word_list = txt.split()
    d = {}  # 定义一个空字典
    for word in word_list:  
        if word in d:
            d[word] += 1
        else:
            d[word] = 1 

    return d

def top_ten_words(frequency, cnt):
    """接收词频字典，输出出现次数最多的cnt个单词及其出现次数"""
    dic=sorted(frequency.items(),key=lambda x:x[1],reverse=True)
    
    for i in range(cnt):
        print(dic[i][0],dic[i][1])

def read_file(file):
    """接收文件名为参数，将文件中的内容读为字符串，
    只保留文件中的英文字母和西文符号，过滤掉中文
    所有字符转为小写，
    将其中所有标点、符号替换为空格，返回字符串"""
    with open(file, 'r', encoding='utf-8') as novel:
        txt = novel.read()
    english_only_txt = ''.join(x for x in txt if ord(x) < 256)
    english_only_txt = english_only_txt.lower()
    for character in string.punctuation:
        english_only_txt = english_only_txt.replace(character, ' ')
    return english_only_txt

if __name__ == '__main__':
    filename = 'Who Moved My Cheese.txt'  # 文件名
    content = read_file(filename)  # 调用函数返回字典类型的数据
    frequency_result = word_frequency(content)  # 统计词频
    n = int(input())
    top_ten_words(frequency_result, n)

知识点：

1.字典items()方法和iteritems()方法，是python字典的内建函数，分别会返回Python列表和迭代器

2.string.punctuation -- 无参，返回所有标点符号

3.ord(c) -- 参数c为字符，返回值是对应的十进制整数（对应的 ASCII 数值）