python-英文文章词频统计

最新推荐文章于 2024-08-10 23:14:03 发布

言安(celien)

最新推荐文章于 2024-08-10 23:14:03 发布

阅读量128

点赞数 2

分类专栏： python学习文章标签： python

本文链接：https://blog.csdn.net/qq_69814495/article/details/137088578

版权

python学习专栏收录该内容

7 篇文章 0 订阅

订阅专栏

# 打开文件
file = open("C:/python数据/article.txt",'r',encoding = "utf-8")
contend = file.read()
file.close()

# 转化成小写
contend = contend.lower()

# 去掉特殊符号和标点符号
for s in ".,;:?\"!#@$%&^*()\/~{}[]^|":
    contend = contend.replace(s, '')

# 分割文本单词为列表
contend = contend.split()

# 创建空字典，记录单词次数
wordDict = {}

for s in contend:
    wordDict[s] = wordDict.get(s,0)+1  # 构建单词字典

# 返回一个包含字典 wordDict 中所有键值对的可迭代对象。每个键值对都表示一个单词和它在文章中出现的次数。
lst = list(wordDict.items())

# 值从大到小排序
lst.sort(key = lambda x:-x[1])

# 遍历
for s in lst:
    print(s[0],s[1])

#   如果将次数少于两次的单词除去
print("除去单词少于两次之后的结果：")
for s in lst:
    if s[1] >2:
        print(s[0],s[1])