nltk和python的关系,斯坦福大学对Python NLTK的通用依赖关系

Is there any way I can get the Universal dependencies using python, or nltk?I can only produce the parse tree.

Example:

Input sentence:

My dog also likes eating sausage.

Output:

Universal dependencies

nmod:poss(dog-2, My-1)

nsubj(likes-4, dog-2)

advmod(likes-4, also-3)

root(ROOT-0, likes-4)

xcomp(likes-4, eating-5)

dobj(eating-5, sausage-6)

解决方案

Wordseer's stanford-corenlp-python fork is a good start as it works with the recent CoreNLP release (3.5.2). However it will give you raw output, which you need manually transform. For example, given you have the wrapper running:

>>> import json, jsonrpclib

>>> from pprint import pprint

>>>

>>> server = jsonrpclib.Server("http://localhost:8080")

>>>

>>> pprint(json.loads(server.parse('John loves Mary.'))) # doctest: +SKIP

{u'sentences': [{u'dependencies': [[u'root', u'ROOT', u'0', u'loves', u'2'],

[u'nsubj',

u'loves',

u'2',

u'John',

u'1'],

[u'dobj', u'loves', u'2', u'Mary', u'3'],

[u'punct', u'loves', u'2', u'.', u'4']],

u'parsetree': [],

u'text': u'John loves Mary.',

u'words': [[u'John',

{u'CharacterOffsetBegin': u'0',

u'CharacterOffsetEnd': u'4',

u'Lemma': u'John',

u'PartOfSpeech': u'NNP'}],

[u'loves',

{u'CharacterOffsetBegin': u'5',

u'CharacterOffsetEnd': u'10',

u'Lemma': u'love',

u'PartOfSpeech': u'VBZ'}],

[u'Mary',

{u'CharacterOffsetBegin': u'11',

u'CharacterOffsetEnd': u'15',

u'Lemma': u'Mary',

u'PartOfSpeech': u'NNP'}],

[u'.',

{u'CharacterOffsetBegin': u'15',

u'CharacterOffsetEnd': u'16',

u'Lemma': u'.',

u'PartOfSpeech': u'.'}]]}]}

In case you want to use dependency parser, you can reuse NLTK's DependencyGraph with a bit of effort

>>> import jsonrpclib, json

>>> from nltk.parse import DependencyGraph

>>>

>>> server = jsonrpclib.Server("http://localhost:8080")

>>> parses = json.loads(

... server.parse(

... 'John loves Mary. '

... 'I saw a man with a telescope. '

... 'Ballmer has been vocal in the past warning that Linux is a threat to Microsoft.'

... )

... )['sentences']

>>>

>>> def transform(sentence):

... for rel, _, head, word, n in sentence['dependencies']:

... n = int(n)

...

... word_info = sentence['words'][n - 1][1]

... tag = word_info['PartOfSpeech']

... lemma = word_info['Lemma']

... if rel == 'root':

... # NLTK expects that the root relation is labelled as ROOT!

... rel = 'ROOT'

...

... # Hack: Return values we don't know as '_'.

... # Also, consider tag and ctag to be equal.

... # n is used to sort words as they appear in the sentence.

... yield n, '_', word, lemma, tag, tag, '_', head, rel, '_', '_'

...

>>> dgs = [

... DependencyGraph(

... ' '.join(items) # NLTK expects an iterable of strings...

... for n, *items in sorted(transform(parse))

... )

... for parse in parses

... ]

>>>

>>> # Play around with the information we've got.

>>>

>>> pprint(list(dgs[0].triples()))

[(('loves', 'VBZ'), 'nsubj', ('John', 'NNP')),

(('loves', 'VBZ'), 'dobj', ('Mary', 'NNP')),

(('loves', 'VBZ'), 'punct', ('.', '.'))]

>>>

>>> print(dgs[1].tree())

(saw I (man a (with (telescope a))) .)

>>>

>>> print(dgs[2].to_conll(4)) # doctest: +NORMALIZE_WHITESPACE

Ballmer NNP 4 nsubj

has VBZ 4 aux

been VBN 4 cop

vocal JJ 0 ROOT

in IN 4 prep

the DT 8 det

past JJ 8 amod

warning NN 5 pobj

that WDT 13 dobj

Linux NNP 13 nsubj

is VBZ 13 cop

a DT 13 det

threat NN 8 rcmod

to TO 13 prep

Microsoft NNP 14 pobj

. . 4 punct

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值