【python】统计英文小说中各单词出现的次数

最新推荐文章于 2024-03-14 11:39:58 发布

测试@小成同学

最新推荐文章于 2024-03-14 11:39:58 发布

阅读量4.7k

点赞数 6

分类专栏：编程 # python基础文章标签： python

本文链接：https://blog.csdn.net/weixin_45589713/article/details/105346016

版权

编程同时被 2 个专栏收录

24 篇文章 1 订阅

订阅专栏

python基础

22 篇文章 1 订阅

订阅专栏

1、问题描述

以英文小说THE TRAGEDY OF ROMEO AND JULIET （罗密欧与朱丽叶）为例，统计该小说中各个单词出现的频次，按出现次数由高到低排序。部分内容如下：

Serv. Up.

Rom. Whither?

Serv. To supper, to our house.

Rom. Whose house?

Serv. My master's.

Rom. Indeed I should have ask'd you that before.

Signior Valentio and His cousin Tybalt;

[Gives back the paper.] A fair assembly. Whither should they come?

2、解题思路

第一步：读取txt文本文件内容

第二步：文本预处理，去掉英文符号

第三步：分割获得单词列表

第四步：获得单词频次字典

第五步：对字典进行排序

第六步：显示字典前10项

3、代码实现

import re
with open('romeo.txt') as file:
    #1.1.读取txt文本文件内容
    file_txt=file.read()
    #2.文本预处理，去掉英文符号
    word_text=re.sub(r'[?.!,;""/\[\]]',' ',file_txt) #特殊字符替换成空格
    word_texts=re.sub(r"-",' ',word_text) #替换单独的-，不是同一单词里的连字符
    #3.分割获得单词列表
    wordlist=word_texts.split()
    #4.获得单词频次字典
    word_dict={}
    for word in wordlist:
        if word not in word_dict:
            word_dict[word]=1
        else:
            word_dict[word]=word_dict.get(word)+1
    #5.对字典进行排序
    dict_order=dict(sorted(word_dict.items(),key=lambda x:x[1],reverse=True))#reverse为True，降序
    #6.显示字典前10项
    print(list(dict_order.items())[:10])

4、运行结果

D:\Anaconda3\python.exe G:/pythonworkspaces/djangoproject1/student_manager/test1.py
[('the', 604), ('I', 574), ('and', 490), ('to', 488), ('a', 399), ('of', 371), ('my', 311), ('is', 307), ('in', 286), ('that', 270)]

Process finished with exit code 0