用Python读取全英文txt文档，并统计其中单词个数和单词使用频率_python给定一个英文文档(文档名为 “实验1-数据1.txt”,判断该文档中是否包含单词-CSDN博客

本文链接：https://blog.csdn.net/qq_45969502/article/details/130103698

过程分为三步：1、读取文档；2、统计单词个数；3、统计单词使用频率

1、读取txt文档

# 对英文text文档进行统计，得到英文文档使用了多少个单词，每个单词的使用频率，
path = 'English1.txt'
with open(path, 'r', encoding='UTF-8') as f:
    content = f.read()      # 为字符串内容

encoding = 'UTF-8'防止乱码，f.read()返回字符串类型

2、统计单词个数

 # 统计单词个数
    content_words = content.split()
    words = [word for word in content_words if word not in ['.', ',']]
    print('%s中共用了%d个单词' % (path, len(words)))

使用split()函数对字符串进行单词分离（分离后会包含标点符号），以列表形式返回

利用列表推导式去除标点符号

3、统计单词使用频率

# 统计单词使用频率
    dict1 = {}
    for word in words:
        if word not in dict1.keys():
            dict1[word] = 1
        else:
            dict1[word] += 1
    print('%s中的单词频率为：' % path, dict1)
    print('%s中使用单词%d种' % (path, len(dict1)))

利用字典键值对关系分别记录单词和单词使用频率

部分结果如下：