停用词表的使用

最新推荐文章于 2024-06-27 17:42:25 发布

星夜猫

最新推荐文章于 2024-06-27 17:42:25 发布

阅读量3.4k

点赞数 3

文章标签： python

本文链接：https://blog.csdn.net/qq_44418077/article/details/109556480

版权

停用词表的使用：

问题描述：

停用词表的使用，不能死板的从网上查找模板，一定要根据自己的需要去修改，不然会影响使用效果

停用词典的使用

需要读取进入文件，使其从文件变成一个个的词，比如下面：

s=open("data/Chinese_stop.txt",encoding='utf-8',errors="ignore")
chinese_stop={}
for word in s:
    word =word.strip()
    chinese_stop[word]=1
    
s.close()

调用的方法之一：

    for i in ci:
        if i not in chinese_stop.keys():

在处理英文文本时，如果仅仅是简单的处理停用词，可以使用NLTK库中的停用词。调用如下：

#导入停用词
from nltk.corpus import stopwords
#读入 stopwords
stopwords_en=stopwords.words(fileids='english')+['.', ',', '``', "''", '?', '!', '--', ';', ':', '(', ')', "'"]