python nlp_使用python在stanford-nlp中的回指解析

这是使用CoreNLP输出的数据结构的一种可能的解决方案.提供所有信息.这并不是完整的解决方案,可能需要扩展才能处理所有情况,但这是一个很好的起点.

from pycorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP('http://localhost:9000')

def resolve(corenlp_output):

""" Transfer the word form of the antecedent to its associated pronominal anaphor(s) """

for coref in corenlp_output['corefs']:

mentions = corenlp_output['corefs'][coref]

antecedent = mentions[0] # the antecedent is the first mention in the coreference chain

for j in range(1, len(mentions)):

mention = mentions[j]

if mention['type'] == 'PRONOMINAL':

# get the attributes of the target mention in the corresponding sentence

target_sentence = mention['sentNum']

target_token = mention['startIndex'] - 1

# transfer the antecedent's word form to the appropriate token in the sentence

corenlp_output['sentences'][target_sentence - 1]['tokens'][target_token]['word'] = antecedent['text']

def print_resolved(corenlp_output):

""" Print the "resolved" output """

possessives = ['hers', 'his', 'their', 'theirs']

for sentence in corenlp_output['sentences']:

for token in sentence['tokens']:

output_word = token['word']

# check lemmas as well as tags for possessive pronouns in case of tagging errors

if token['lemma'] in possessives or token['pos'] == 'PRP$':

output_word += "'s" # add the possessive morpheme

output_word += token['after']

print(output_word, end='')

text = "Tom and Jane are good friends. They are cool. He knows a lot of things and so does she. His car is red, but " \n "hers is blue. It is older than hers. The big cat ate its dinner."

output = nlp.annotate(text, properties= {'annotators':'dcoref','outputFormat':'json','ner.useSUTime':'false'})

resolve(output)

print('Original:', text)

print('Resolved: ', end='')

print_resolved(output)

这给出以下输出:

Original: Tom and Jane are good friends. They are cool. He knows a lot of things and so does she. His car is red, but hers is blue. It is older than hers. The big cat ate his dinner.

Resolved: Tom and Jane are good friends. Tom and Jane are cool. Tom knows a lot of things and so does Jane. Tom's car is red, but Jane's is blue. His car is older than Jane's. The big cat ate The big cat's dinner.

如您所见,当代词具有句子首字母(标题大小写)的先行词(最后一个句子中的“大猫”而不是“大猫”)时,该解决方案不涉及更正情况.这取决于先行词的类别-普通名词先词需要小写,而专有名词先词则不需要.

其他一些临时处理可能是必要的(关于我测试语句中的所有格).它还假定您不希望重复使用原始输出令牌,因为它们已被此代码修改.解决该问题的方法是复制原始数据结构或创建新属性,并相应地更改print_resolved函数.

纠正任何分辨率错误也是另一个挑战!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值