python检索word文档,python – 如何在NLTK的Wordnet中检索目标synse...

出于某种原因,WordNet在引理级别而不是Synset(参见http://wordnetweb.princeton.edu/perl/webwn?o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&s=good&i=8&h=00001000000000000000000000000000#c)索引反义关系,因此问题是Synsets和Lemmas是否具有多对多或一对一的关系.

在含糊不清的单词,一个含义很多的单词的情况下,我们在String-to-Synset之间有一对多的关系,例如

>>> wn.synsets('dog')

[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]

在一个含义/概念,多个表示的情况下,我们在Synset-to-String(其中String指引理名称)之间具有一对多关系:

>>> dog = wn.synset('dog.n.1')

>>> dog.definition()

u'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'

>>> dog.lemma_names()

[u'dog', u'domestic_dog', u'Canis_familiaris']

注意:到目前为止,我们正在比较String和Synsets之间的关系,而不是Lemmas和Synsets.

“可爱”的东西是Lemma和String有一对一的关系:

>>> wn.synsets('dog')

[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]

>>> wn.synsets('dog')[0]

Synset('dog.n.01')

>>> wn.synsets('dog')[0].definition()

u'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'

>>> wn.synsets('dog')[0].lemmas()

[Lemma('dog.n.01.dog'), Lemma('dog.n.01.domestic_dog'), Lemma('dog.n.01.Canis_familiaris')]

>>> wn.synsets('dog')[0].lemmas()[0]

Lemma('dog.n.01.dog')

>>> wn.synsets('dog')[0].lemmas()[0].name()

u'dog'

Lemma attributes, accessible via methods with the same name::

name: The canonical name of this lemma.

synset: The synset that this lemma belongs to.

syntactic_marker: For adjectives, the WordNet string identifying the

syntactic position relative modified noun. See:

07004

For all other parts of speech, this attribute is None.

count: The frequency of this lemma in wordnet.

所以我们可以这样做,并以某种方式知道每个Lemma对象只会返回1个synset:

>>> wn.synsets('dog')[0].lemmas()[0]

Lemma('dog.n.01.dog')

>>> wn.synsets('dog')[0].lemmas()[0].synset()

Synset('dog.n.01')

假设您正在尝试进行一些情绪分析,并且您需要WordNet中每个形容词的反义词,您可以轻松地接受反义词的Synsets:

>>> from nltk.corpus import wordnet as wn

>>> all_adj_in_wn = wn.all_synsets(pos='a')

>>> def get_antonyms(ss):

... return set(chain(*[[a.synset() for a in l.antonyms()] for l in ss.lemmas()]))

...

>>> for ss in all_adj_in_wn:

... print ss, ':', get_antonyms(ss)

...

Synset('unable.a.01') : set([Synset('unable.a.01')])

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值