WordNet介绍和使用

最新推荐文章于 2023-06-10 19:20:59 发布

aodeng9367

最新推荐文章于 2023-06-10 19:20:59 发布

阅读量265

点赞数

文章标签： python 人工智能

原文链接：http://www.cnblogs.com/wycg1984/archive/2009/09/15/1567247.html

版权

Wordnet是一个词典。每个词语(word)可能有多个不同的语义，对应不同的sense。而每个不同的语义（sense）又可能对应多个词，如topic和subject在某些情况下是同义的，一个sense中的多个消除了多义性的词语叫做lemma。例如，“publish”是一个word，它可能有多个sense：

1. (39) print, publish -- (put into print; "The newspaper published the news of the royal couple's divorce"; "These news should not be printed")

2. (14) publish, bring out, put out, issue, release -- (prepare and issue for public distribution or sale; "publish a magazine or newspaper")

3. (4) publish, write -- (have (one's written work) issued for publication; "How many books did Georges Simenon write?"; "She published 25 books during her long career")

在第一个sense中，print和publish都是lemma。Sense 1括号内的数字39表示publish以sense 1在某外部语料中出现的次数。显然，publish大多数时候以sense 1出现，很少以sense 3出现。

WordNet的具体用法

NLTK是python的一个自然语言处理工具，其中提供了访问wordnet各种功能的函数。下面简单列举一些常用功能：

得到wordnet本身：

from nltk.corpus import wordnet

获得一个词的所有sense，包括词语的各种变形的sense：

wordnet.synsets('published')

[Synset('print.v.01'),

Synset('publish.v.02'),

Synset('publish.v.03'),

Synset('published.a.01'),

Synset('promulgated.s.01')]