nltk和python的关系,斯坦福大学对Python NLTK的通用依赖关系

最新推荐文章于 2023-08-07 15:20:40 发布

啦力的拉力

最新推荐文章于 2023-08-07 15:20:40 发布

阅读量137

点赞数

文章标签： nltk和python的关系

Is there any way I can get the Universal dependencies using python, or nltk?I can only produce the parse tree.

Example:

Input sentence:

My dog also likes eating sausage.

Output:

Universal dependencies

nmod:poss(dog-2, My-1)

nsubj(likes-4, dog-2)

advmod(likes-4, also-3)

root(ROOT-0, likes-4)

xcomp(likes-4, eating-5)

dobj(eating-5, sausage-6)

解决方案

Wordseer's stanford-corenlp-python fork is a good start as it works with the recent CoreNLP release (3.5.2). However it will give you raw output, which you need manually transform. For example, given you have the wrapper running:

>>> import json, jsonrpclib

>>> from pprint import pprint

>>>

>>> server = jsonrpclib.Server("http://localhost:8080")

>>>

>>> pprint(json.loads(server.parse('John loves Mary.'))) # doctest: +SKIP

{u'sentences': [{u'dependencies': [[u'root', u'ROOT', u'0', u'loves', u'2'],

[u'nsubj',

u'loves',

u'2',

u'John',

u'1'],

[u'dobj', u'loves', u'2', u'Mary', u'3'],

[u'punct', u'loves', u'2', u'.', u'4']],

u'parsetree': [],

u'text': u'John loves Mary.',

u'words': [[u'John',

{u'CharacterOffsetBegin': u'0',

u'CharacterOffsetEnd': u'4',

u'Lemma': u'John',

u'PartOfSpeech': u'NNP'}],

[u'loves',

{u'CharacterOffsetBegin': u'5',

u'CharacterOffsetEnd': u'10',

u'Lemma': u'love',

u'PartOfSpeech': u'VBZ'}],

[u'Mary',

{u'CharacterOffsetBegin': u'11',

u'CharacterOffsetEnd': u'15',

u'Lemma': u'Mary',

u'PartOfSpeech': u'NNP'}],

[u'.',

{u'CharacterOffsetBegin': u'15',

u'CharacterOffsetEnd': u'16',

u'Lemma': u'.',

u'PartOfSpeech': u'.'}]]}]}

In case you want to use dependency parser, you can reuse NLTK's DependencyGraph with a bit of effort

>>> import jsonrpclib, json

>>> from nltk.parse import DependencyGraph

>>>

>>> server = jsonrpclib.Server("http://localhost:8080")

>>> parses = json.loads(

... server.parse(

... 'John loves Mary. '

... 'I saw a man with a telescope. '

... 'Ballmer has been vocal in the past warning that Linux is a threat to Microsoft.'

... )

... )['sentences']

>>>

>>> def transform(sentence):

... for rel, _, head, word, n in sentence['dependencies']:

... n = int(n)

...

... word_info = sentence['words'][n - 1][1]

... tag = word_info['PartOfSpeech']

... lemma = word_info['Lemma']

... if rel == 'root':

... # NLTK expects that the root relation is labelled as ROOT!

... rel = 'ROOT'

...

... # Hack: Return values we don't know as '_'.

... # Also, consider tag and ctag to be equal.

... # n is used to sort words as they appear in the sentence.

... yield n, '_', word, lemma, tag, tag, '_', head, rel, '_', '_'

...

>>> dgs = [

... DependencyGraph(

... ' '.join(items) # NLTK expects an iterable of strings...

... for n, *items in sorted(transform(parse))

... )

... for parse in parses

... ]

>>>

>>> # Play around with the information we've got.

>>>

>>> pprint(list(dgs[0].triples()))

[(('loves', 'VBZ'), 'nsubj', ('John', 'NNP')),

(('loves', 'VBZ'), 'dobj', ('Mary', 'NNP')),

(('loves', 'VBZ'), 'punct', ('.', '.'))]

>>>

>>> print(dgs[1].tree())

(saw I (man a (with (telescope a))) .)

>>>

>>> print(dgs[2].to_conll(4)) # doctest: +NORMALIZE_WHITESPACE

Ballmer NNP 4 nsubj

has VBZ 4 aux

been VBN 4 cop

vocal JJ 0 ROOT

in IN 4 prep

the DT 8 det

past JJ 8 amod

warning NN 5 pobj

that WDT 13 dobj

Linux NNP 13 nsubj

is VBZ 13 cop

a DT 13 det

threat NN 8 rcmod

to TO 13 prep

Microsoft NNP 14 pobj

. . 4 punct

啦力的拉力

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
nltk和python的关系,斯坦福大学对Python NLTK的通用依赖关系

Is there any way I can get the Universal dependencies using python, or nltk?I can only produce the parse tree.Example:Input sentence:My dog also likes eating sausage.Output:Universal dependenciesnmod:...
复制链接

扫一扫