python中处理WordNet
>>>from nltk.corpus import wordnet as wn
>>> wn.synsets('motorcar')
>>> wn.synset('car.n.01').lemma_names
2.5 WordNet
WordNet is a semantically oriented dictionary of English, similar to a traditional thesaurus(辞典)but with a richer structure. NLTK includes the English WordNet, with 155,287 words and 117,659 synonym(同义词)sets. We’ll begin by looking at synonyms and how they are accessed in WordNet.
Senses and Synonyms 意义和同义词
Consider the sentence in (1a). If we replace the word motorcar in (1a) with automobile, to get (1b), the meaning of the sentence stays pretty much the same:
(1) a. Benz is credited with the invention of the motorcar.
b. Benz is credited with the invention of the automobile.
Since everything else in the sentence has remained unchanged, we can conclude that the words motorcar and automobile have the same meaning, i.e., they are synonyms.
We can explore these words with the help of WordNet:
>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('motorcar')
[Synset('car.n.01')]
Thus, motorcar has just one possible meaning and it is identified as car.n.01, the first noun sense of car. The entity car.n.01 is called a synset, or “synonym set,”(同义词集)a collection of synonymous words (or “lemmas”):
>>> wn.synset('car.n.01').lemma_names
['car', 'auto', 'automobile', 'machine', 'motorcar']
Each word of a synset can have several meanings, e.g., car can also signify a train carriage, a gondola(货车), or an elevator car. However, we are only interested in the single meaning that is common to all words of this synset. Synsets also come with a prose(平凡的) definition and some example sentences:
>>> wn.synset('car.n.01').definition
'a motor vehicle with four wheels; usually propelled by an internal combustion engine(内燃机)'
>>> wn.synset('car.n.01').examples
['he needs a car to get to work']
Although definitions help humans to understand the intended meaning of a synset, the words of the synset are often more useful for our programs. To eliminate ambiguity, we will identify these words as car.n.01.automobile, car.n.01.motorcar, and so on. This pairing of a synset with a word is called a lemma(一个同义词集的单词配对称为词条). We can get all the lemmas for a given synset①, look up a particular lemma②, get the synset corresponding to a lemma③, and get the “name” of a lemma④:
>>> wn.synset('car.n.01').lemmas ①
[Lemma('car.n.01.car'), Lemma('car.n.01.auto'), Lemma('car.n.01.automobile'),
Lemma('car.n.01.machine'), Lemma('car.n.01.motorcar')]
>>> wn.lemma('car.n.01.automobile') ②
Lemma('car.n.01.automobile')
>>> wn.lemma('car.n.01.automobile').synset ③