词频统计

最新推荐文章于 2022-07-18 23:59:02 发布

Claroja

最新推荐文章于 2022-07-18 23:59:02 发布

阅读量828

点赞数

分类专栏：文本分析文章标签：词频统计

本文链接：https://blog.csdn.net/claroja/article/details/79044578

版权

文本分析专栏收录该内容

4 篇文章 0 订阅

订阅专栏

import jieba
stopwords = [line.strip() for line in open("./stopwords.txt", 'r', encoding='utf-8').readlines()] # 获得停词表，停词表可以在网上搜集
def word_counts(text):
    seg_list = jieba.cut(text)  # 使用结巴对文本分词
    words_list=[]
    for word in seg_list:
        if word not in stopwords: # 去除停用词
            if not word.isspace() and len(word)>1: # 去除空白以及单个的词
                words_list.append(word)
    counts=pd.Series(words_list).value_counts() # 统计词频
    return counts # 返回的是Series所以可以直接用to_csv来保存