自然语言处理 NLTK

最新推荐文章于 2024-10-14 18:12:55 发布

dff505045

最新推荐文章于 2024-10-14 18:12:55 发布

阅读量126

点赞数

文章标签：人工智能

原文链接：http://www.cnblogs.com/lovely7/p/6144936.html

版权

from nltk.tokenize import MWETokenizer

tokenizer = MWETokenizer([('molecular','pathogenesis'), ('molecular','basis'), ('cognitive','assessment'),('clinical','intervention'),('clinical','interventions')
,('risk','factor'),('risk','factors'),('assisted','care')])

all_the_text = titleandabstractList[i].lower()
all_the_text = re.sub("\"|,|\.", "", all_the_text)
for word in tokenizer.tokenize(all_the_text.split()):

转载于:https://www.cnblogs.com/lovely7/p/6144936.html