基于python的词频统计

最新推荐文章于 2023-05-23 20:42:11 发布

Artra_Soong

最新推荐文章于 2023-05-23 20:42:11 发布

阅读量201

点赞数

分类专栏： python 词频统计

本文链接：https://blog.csdn.net/qq_30230591/article/details/118889566

版权

python 同时被 2 个专栏收录

9 篇文章 1 订阅

订阅专栏

词频统计

1 篇文章 0 订阅

订阅专栏

# 代码中的‘标题.txt’是自己的文本数据
import jieba

counts = {}     # 通过键值对的形式存储词语及其出现的次数
with open('./标题.txt','r',encoding='gbk') as f:
    con = f.readlines()
    for i in con:
        words = jieba.lcut(i)
        for word in words:
            if  len(word) == 1:    # 单个词语不计算在内
                continue
            else:
                counts[word] = counts.get(word, 0) + 1    # 遍历所有词语，每出现一次其对应的值加 1
        
items = list(counts.items())#将键值对转换成列表
items.sort(key=lambda x: x[1], reverse=True)    # 根据词语出现的次数进行从大到小排序

#range中的30指的是输出词频前30的词
for i in range(30): # 
    word, count = items[i]
    print("{0:<5}{1:>5}".format(word, count))

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

Artra_Soong

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
基于python的词频统计

# 代码中的‘标题.txt’是自己的文本数据import jiebacounts = {} # 通过键值对的形式存储词语及其出现的次数with open('./标题.txt','r',encoding='gbk') as f: con = f.readlines() for i in con: words = jieba.lcut(i) for word in words: if len(word) == 1:
复制链接

扫一扫