python如何打开txt文件、并算词频_利用Python 统计txt 文档词频次数

最新推荐文章于 2023-11-28 00:00:24 发布

weixin_39731271

最新推荐文章于 2023-11-28 00:00:24 发布

阅读量209

点赞数

文章标签： python如何打开txt文件、并算词频

-- coding：utf-8 --

import jieba

读取文件

f=open(r'E:\Chrome_download\tieba.txt',encoding='utf-8')

txt =f.read()

print(txt)

分词

words = jieba.lcut(txt)

string = ' '.join(words)

print(words)

print(f"输出词数量：{len(words)}") # 词数量

print(f"不重复词数量{len(set(words))}") # 不重复词数量

构造词频字典

counts ={}

for word in words:

if len(word)==1:

continue

else:

counts[word]=counts.get(word,0)+1 # 这个语法需要理解下

# dict.get(key,default=None)

# key -- 字典中要查找的键

# default 指定key不存在时，返回值。

print(counts) # 输出构造好的字典

转列表

items = list(counts.items()) #返回可遍历的(键, 值) 元组数组。

print(items)

排序

items.sort(key=lambda x:x[1],reverse=True)

print(items)

输出前15个

for i in range(15):

word ,count = items[i]

print(f"{word}--出现了--{count}-次")

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

关注关注