英文文本去停用词

最新推荐文章于 2022-12-21 13:46:22 发布

平平无奇代码小孩

最新推荐文章于 2022-12-21 13:46:22 发布

阅读量876

点赞数 1

文章标签： nlp

本文链接：https://blog.csdn.net/XXACY123321/article/details/117029866

版权

需要安装nltk，安装完之后还有stopwords，装在copora文件夹下边
!
[文件夹一定要放对，不然特别麻烦，会一直报错说找不到stopwords](https://img-blog.csdnimg.cn/20210519145514680.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L1hYQUNZMTIzMzIx,size_16,color_FFFFFF,t_

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
set(stopwords.words('english'))
text="""Removal of amoxicillin from aqueous solution using sludge-based activated carbon modified ."""#插入需要停用词处理的txt
stop_words=set(stopwords.words('english'))
word_tokens=word_tokenize(text)

filtered_sentence = []

for w in word_tokens:
    if w not in stop_words:
        filtered_sentence.append(w)



print("\n\nFiltered Sentence \n\n")
print(" ".join(filtered_sentence))

输出的结果是：

Filtered Sentence 


Removal amoxicillin aqueous solution using sludge-based activated carbon modified walnut shell nano-titanium dioxide . Dewatered municipal sludge used raw material prepare activated carbon ( SAC ) , SAC modified walnut shell nano-titanium dioxide ( MSAC ) . The results showed MSAC higher specific surface area ( S-BET ) ( 279.147 ( 2 ) /g ) total pore volume ( V-T ) ( 0.324 cm ( 3 ) /g ) SAC . 

Process finished with exit code 0

我也是个小白菜鸡文科硕士生……
正在记录自己的处理过程

平平无奇代码小孩

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
1
评论
英文文本去停用词

需要安装nltk，安装完之后还有stopwords，装在copora文件夹下边!import nltkfrom nltk.corpus import stopwordsfrom nltk.tokenize import word_tokenizeset(stopwords.words('english'))text="""Removal of amoxicillin from aqueous solution using sludge-based activated carbon modifi
复制链接

扫一扫