python中处理WordNet

本文介绍了如何使用Python的nltk库处理WordNet,WordNet是一个英语词汇的语义导向词典,类似传统词典但结构更丰富。通过WordNet可以访问同义词,消除歧义,并了解词汇的层级结构。例如,'motorcar'只有一个意义,对应于'car.n.01',它包含了如'car'、'auto'等同义词。WordNet还提供了定义、例子和层级结构,帮助理解词汇之间的关系。
摘要由CSDN通过智能技术生成

python中处理WordNet

>>>from nltk.corpus import wordnet as wn

>>> wn.synsets('motorcar')

>>> wn.synset('car.n.01').lemma_names

2.5 WordNet

WordNet is a semantically oriented dictionary of English, similar to a traditional thesaurus(辞典)but with a richer structure. NLTK includes the English WordNet, with 155,287 words and 117,659 synonym(同义词)sets. We’ll begin by looking at synonyms and how they are accessed in WordNet.

Senses and Synonyms 意义和同义词

Consider the sentence in (1a). If we replace the word motorcar in (1a) with automobile, to get (1b), the meaning of the sentence stays pretty much the same:

(1) a. Benz is credited with the invention of the motorcar.

b. Benz is credited with the invention of the automobile.

Since everything else in the sentence has remained unchanged, we can conclude that the words motorcar and automobile have the same meaning, i.e., they are synonyms.

We can explore these words with the help of WordNet:

>>> from nltk.corpus import wordnet as wn

>>> wn.synsets('motorcar')

[Synset('car.n.01')]

Thus, motorcar has just one possible meaning and it is identified as car.n.01, the first noun sense of car. The entity car.n.01 is called a synset, or “synonym set,”(同义词集)a collection of synonymous words (or “lemmas”):

>>> wn.synset('car.n.01').lemma_names

['car', 'auto', 'automobile', 'machine', 'motorcar']

Each word of a synset can have several meanings, e.g., car can also signify a train carriage, a gondola(货车), or an elevator car. However, we are only interested in the single meaning that is common to all words of this synset. Synsets also come with a prose(平凡的) definition and some example sentences:

>>> wn.synset('car.n.01').definition

'a motor vehicle with four wheels; usually propelled by an internal combustion engine(内燃机)'

>>> wn.synset('car.n.01').examples

['he needs a car to get to work']

Although definitions help humans to understand the intended meaning of a synset, the words of the synset are often more useful for our programs. To eliminate ambiguity, we will identify these words as car.n.01.automobile, car.n.01.motorcar, and so on. This pairing of a synset with a word is called a lemma(一个同义词集的单词配对称为词条). We can get all the lemmas for a given synset①, look up a particular lemma②, get the synset corresponding to a lemma③, and get the “name” of a lemma④:

>>> wn.synset('car.n.01').lemmas ①

[Lemma('car.n.01.car'), Lemma('car.n.01.auto'), Lemma('car.n.01.automobile'),

Lemma('car.n.01.machine'), Lemma('car.n.01.motorcar')]

>>> wn.lemma('car.n.01.automobile') ②

Lemma('car.n.01.automobile')

>>> wn.lemma('car.n.01.automobile').synset ③

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值