提取一篇文章中出现最多的词

最新推荐文章于 2022-04-04 00:26:03 发布

灵均兰草

最新推荐文章于 2022-04-04 00:26:03 发布

阅读量2.9k

点赞数 2

分类专栏：有关词的细说文本分析文章标签：提取一篇文章中出现最多的词高频词提取词云统计文本分析

本文链接：https://blog.csdn.net/ljlchrr/article/details/84001508

版权

文本分析同时被 2 个专栏收录

2 篇文章 0 订阅

订阅专栏

有关词的细说

1 篇文章 0 订阅

订阅专栏

高频词提取 -------提取一篇文章中出现最多的词

1.需要cmd库 pip install jieba （分词）

jieba.lcut() 分词函数

hist.sort(key=lambda x: x[1], reverse=True) # 排序还是降序

# -*- coding:utf-8 -*-
import jieba
content = open('meizu.txt', 'r', encoding='utf-8').read()
words = jieba.lcut(content)

counts = {}

for word in words:
    if len(word) == 1:  # 排除单个字的分词结果
        continue
    else:
        counts[word] = counts.get(word, 0) + 1  # dict用法

hist = list(counts.items())  # 形成列表
hist.sort(key=lambda x: x[1], reverse=True)  

for i in range(20):        #输出高频前20个词
    word, count = hist[i]
    print("{:<10}{:>5}".format(word, count))