python常用英语单词-python – 获取英语单词的基本形式

我试图获得一个英语单词的基本英语单词,该单词是从其基本形式修改的.这个问题已在这里提出,但我没有看到正确的答案,所以我试图这样说.我尝试了两个来自NLTK包的词干器和一个词形变换器,它们是搬运器,干扰器,雪球器和wordnet lemmatiser.

我试过这段代码:

from nltk.stem.porter import PorterStemmer

from nltk.stem.snowball import SnowballStemmer

from nltk.stem.wordnet import WordNetLemmatizer

words = ['arrival','conclusion','ate']

for word in words:

print " Original Word =>", word

print "porter stemmer=>", PorterStemmer().stem(word)

snowball_stemmer = SnowballStemmer("english")

print "snowball stemmer=>", snowball_stemmer.stem(word)

print "WordNet Lemmatizer=>", WordNetLemmatizer().lemmatize(word)

这是我得到的输出:

Original Word => arrival

porter stemmer=> arriv

snowball stemmer=> arriv

WordNet Lemmatizer=> arrival

Original Word => conclusion

porter stemmer=> conclus

snowball stemmer=> conclus

WordNet Lemmatizer=> conclusion

Original Word => ate

porter stemmer=> ate

snowball stemmer=> ate

WordNet Lemmatizer=> ate

但我想要这个输出

Input : arrival

Output: arrive

Input : conclusion

Output: conclude

Input : ate

Output: eat

我怎样才能做到这一点?有没有可用的工具?这称为形态分析.我知道这一点,但必须有一些工具已经实现了这一点.感谢帮助:)

首先编辑

我试过这段代码

import nltk

from nltk.stem.wordnet import WordNetLemmatizer

from nltk.tokenize import word_tokenize

from nltk.corpus import wordnet as wn

query = "The Indian economy is the worlds tenth largest by nominal GDP and third largest by purchasing power parity"

def is_noun(tag):

return tag in ['NN', 'NNS', 'NNP', 'NNPS']

def is_verb(tag):

return tag in ['VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ']

def is_adverb(tag):

return tag in ['RB', 'RBR', 'RBS']

def is_adjective(tag):

return tag in ['JJ', 'JJR', 'JJS']

def penn_to_wn(tag):

if is_adjective(tag):

return wn.ADJ

elif is_noun(tag):

return wn.NOUN

elif is_adverb(tag):

return wn.ADV

elif is_verb(tag):

return wn.VERB

return wn.NOUN

tags = nltk.pos_tag(word_tokenize(query))

for tag in tags:

wn_tag = penn_to_wn(tag[1])

print tag[0]+"---> "+WordNetLemmatizer().lemmatize(tag[0],wn_tag)

在这里,我试图通过提供适当的标签来使用wordnet lemmatizer.这是输出:

The---> The

Indian---> Indian

economy---> economy

is---> be

the---> the

worlds---> world

tenth---> tenth

largest---> large

by---> by

nominal---> nominal

GDP---> GDP

and---> and

third---> third

largest---> large

by---> by

purchasing---> purchase

power---> power

parity---> parity

仍然,像“到达”和“结论”这样的词语不会被这种方法处理.这有什么解决方案吗?

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值