英文文本去停用词

需要安装nltk,安装完之后还有stopwords,装在copora文件夹下边
!
[文件夹一定要放对,不然特别麻烦,会一直报错说找不到stopwords](https://img-blog.csdnimg.cn/20210519145514680.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L1hYQUNZMTIzMzIx,size_16,color_FFFFFF,t_

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
set(stopwords.words('english'))
text="""Removal of amoxicillin from aqueous solution using sludge-based activated carbon modified ."""#插入需要停用词处理的txt
stop_words=set(stopwords.words('english'))
word_tokens=word_tokenize(text)

filtered_sentence = []

for w in word_tokens:
    if w not in stop_words:
        filtered_sentence.append(w)



print("\n\nFiltered Sentence \n\n")
print(" ".join(filtered_sentence))

输出的结果是:

Filtered Sentence 


Removal amoxicillin aqueous solution using sludge-based activated carbon modified walnut shell nano-titanium dioxide . Dewatered municipal sludge used raw material prepare activated carbon ( SAC ) , SAC modified walnut shell nano-titanium dioxide ( MSAC ) . The results showed MSAC higher specific surface area ( S-BET ) ( 279.147 ( 2 ) /g ) total pore volume ( V-T ) ( 0.324 cm ( 3 ) /g ) SAC . 

Process finished with exit code 0

我也是个小白菜鸡文科硕士生……
正在记录自己的处理过程

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值