NLTK 词干化

最新推荐文章于 2022-04-09 10:47:10 发布

无知书童

最新推荐文章于 2022-04-09 10:47:10 发布

阅读量1.3k

点赞数

分类专栏： # NLTK 文章标签： NLTK

本文链接：https://blog.csdn.net/qq_28404829/article/details/100181667

版权

NLTK 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

NLTK 词干化

在NLP中，我们对一句话或一个文档分词之后，一般要进行词干化处理。词干化处理就是把一些名词的复数去掉，动词的不同时态去掉等等类似的处理。

对于切词得到的英文单词要进行词干化处理，主要包括将名词的复数变为单数和将动词的其他形态变为基本形态

在nltk当中有两种方法做词干化处理：“porter” “snowball”

import nltk


word_data = "It originated from the idea that there are readers who prefer learning new skills from the comforts of their drawing rooms"
words = word_data.split(" ")

porterStemmer = nltk.stem.PorterStemmer()
snowballStemmer = nltk.stem.SnowballStemmer('english')

def stem_tokens(tokens, stemmer):
    stemmed = []
    for token in tokens:
        stemmed.append(stemmer.stem(token))
    return stemmed

print(word_data)
print(stem_tokens(words,porterStemmer))
print(stem_tokens(words,snowballStemmer))

It originated from the idea that there are readers who prefer learning new skills from the comforts of their drawing rooms
['It', 'origin', 'from', 'the', 'idea', 'that', 'there', 'are', 'reader', 'who', 'prefer', 'learn', 'new', 'skill', 'from', 'the', 'comfort', 'of', 'their', 'draw', 'room']
['it', 'origin', 'from', 'the', 'idea', 'that', 'there', 'are', 'reader', 'who', 'prefer', 'learn', 'new', 'skill', 'from', 'the', 'comfort', 'of', 'their', 'draw', 'room']

无知书童

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
NLTK 词干化

NLTK 词干化在NLP中，我们对一句话或一个文档分词之后，一般要进行词干化处理。词干化处理就是把一些名词的复数去掉，动词的不同时态去掉等等类似的处理。对于切词得到的英文单词要进行词干化处理，主要包括将名词的复数变为单数和将动词的其他形态变为基本形态在nltk当中有两种方法做词干化处理：“porter” “snowball”import nltkword_data = "It ori...
复制链接

扫一扫

专栏目录