Python简单方法实现英文文本词频统计

被南宫问雅摸过虾头

已于 2022-03-14 15:53:49 修改

阅读量4.1k

点赞数 6

文章标签： python

于 2022-03-12 23:29:53 首次发布

这里是被南宫问雅摸了虾头！

本文链接：https://blog.csdn.net/weixin_59249304/article/details/123443285

版权

一问题描述:

给定一段英文字符串,要求统计其中所有单词出现的频率,将结果封装为字典

二解题思路:

使用到的方法:

replace("a","b") 将字符串中的a字符替换成b

split() 将字符串以空格符,制表符,回车符为标志分割成单独元素并封装为列表

步骤:

步骤一 . 因为给出的文本为英文,则可以使用空格和标点符号来划分各个单词.首先处理标点符号,可以使用replace()方法先将其中出现的标点符号替换为空格(替换为空格是为了方便后续操作),然后在使用split()方法将单词分割并封装进列表.

步骤二 . 创建一个空字典,遍历列表中的元素。判断该元素是否在字典中存在：若不存在，则将该元素作为键，添加进字典；若存在，则将该键的值加1。如此该字典的键值对就是单词及出现的频率。

三实现代码及结果

该实例使用《小王子》片段作为测试文本。

#data的值为测试文本
data = '''The shrub soon stopped growing, and began to get ready to produce a flower. The little prince, who was present at the first appearance of a huge bud, felt at once that some sort of miraculous apparition must emerge from it. But the flower was not satisfied to complete the preparations for her beauty in the shelter of her green chamber. She chose her colours with the greatest care. She adjusted her petals one by one. She did not wish to go out into the world all rumpled, like the field poppies. It was only in the full radiance of her beauty that she wished to appear. Oh, yes! She was a coquettish creature! And her mysterious adornment lasted for days and days.'''

#替换掉文本中出现的标点符号
str_data = data.replace("!"," ").replace(","," ").replace("."," ")
#将字符串中的单词封装成列表
list_data = str_data.split()

将单词拆分后得到的列表:

dic_data = {}
#遍历列表，将单词与其出现频率封装成字典
for i in list_data:
    if(i in dic_data):
        dic_data[i] += 1
    else:
        dic_data[i] = 1

查看封装在字典中的数据,该字典的键为被统计的单词,值为该次出现的频率,即{"被统计的单词":出现次数}

被南宫问雅摸过虾头

关注

6
点赞
踩
27

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

Python简单方法实现英文文本词频统计

一 问题描述:

二 解题思路:

使用到的方法:

步骤:

三 实现代码及结果

一问题描述:

二解题思路:

三实现代码及结果