[Paddle领航团python基础课程大作业一]英文词频的统计任务

最新推荐文章于 2023-05-09 16:17:19 发布

三岁学编程

最新推荐文章于 2023-05-09 16:17:19 发布

阅读量1.2k

点赞数

分类专栏： # 领航团图像分类课程文章标签： python

本文链接：https://blog.csdn.net/weixin_45623093/article/details/113746031

版权

英文词频的统计任务

- 作业内容

作业内容

统计英语6级试题中所有单词的词频，并返回一个如下样式的字典

{‘and’:100,‘abandon’:5}

英语6级试题的文件路径./artical.txt

文件下载传送门

Tip: 读取文件的方法

def get_artical(artical_path):
    with open(artical_path) as fr:
        data = fr.read()
    return data

get_artical('./artical.txt')

处理要求

(a) '\n’是换行符需要删除
(b) 标点符号需要处理

['.', ',', '!', '?', ';', '\'', '\"', '/', '-', '(', ')']

['1','2','3','4','5','6','7','8','9','0']

(d) 注意大小写
一些单词由于在句首，首字母大写了。需要把所有的单词转成小写

'String'.lower()

(e) 高分项

通过自己查找资料学习正则表达式，并在代码中使用(re模块)

可参考资料：https://docs.python.org/3.7/library/re.html

方法一：

1.读取文档

2.处理数据

3.获得词汇

4.获得词频

5.拼接

# 伪代码
# 得到文档内容
def get_artical(artical_path):
    with open(artical_path) as fr:
        data = fr.read()
    return data

get_artical('./artical.txt')

a = ['.', ',', '!', '?', ';', '\'', '\"', '/', '-', '(', ')']
b = ['1','2','3','4','5','6','7','8','9','0'] 

# 处理数据
新数据 = 数据.lower()  # 小写化
自定义变量

# 脏数据的处理
循环 新数据：
    如果 循环的内容 不等于 a, b, '\n':
        得到新的数据

# 获得我们要进行统计词频的数据（方法一）
循环 处理以后的数据：
    查看是否在我们数据的列表
        没有就添加
    否则
        就pass
# 获得的新数据转元组（方法二）
newword = tuple(newdata)


# 统计词频
循环 我们要进行统计词频的数据：
    循环我们处理后的newdata：
        判断单词的次数然后计数
    计数后放进列表

循环 单词的长度（len（newword））：
    把单词和词频匹配并放进字典

方法一代码展示

def get_artical(artical_path):
    # 去读文档
    with open(artical_path) as fr:
        data = fr.read()
    return data

# 需要处理的脏数据
not_1 = ['.', ',', '!', '?', ';', '\'', '\"', '/', '-', '(', ')']
not_2 = ['1','2','3','4','5','6','7','8','9','0'] 

# 读取数据
data = get_artical('./artical.txt')


# 数据小写化，清理脏数据
data = data.lower()
new_data = ''
for i in data:
    if i not in not_1 and i not in not_2 and i

最低0.47元/天解锁文章

三岁学编程

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
打赏
9
评论
[Paddle领航团python基础课程大作业一]英文词频的统计任务

英文词频的统计任务作业内容方法一：方法二作业内容统计英语6级试题中所有单词的词频，并返回一个如下样式的字典{‘and’:100,‘abandon’:5}英语6级试题的文件路径./artical.txt文件下载传送门Tip: 读取文件的方法def get_artical(artical_path): with open(artical_path) as fr: data = fr.read() return dataget_artical('./artical
复制链接

扫一扫