java命名实体抽取,如何从文本中提取命名实体+动词

Well, my aim is to extract NE (Person) and a verb connected to it from a text. For example, I have this text:

Dumbledore turned and walked back down the street. Harry Potter rolled over inside his blankets without waking up.

As an ideal result i should get

Dumbledore turned walked; Harry Potter rolled

I use Stanford NER to find and mark persons, then I delete all sentences that don't contain NE. So, in the end I have a 'pure' text that consists only of sentences with names of characters.

After that I use Stanford Dependencies. As the result I get smth like this (CONLLU output-format):

1 Dumbledore _ _ NN _ 2 nsubj _ _

2 turned _ _ VBD _ 0 root _ _

3 and _ _ CC _ 2 cc _ _

4 walked _ _ VBD _ 2 conj _ _

5 back _ _ RB _ 4 advmod _ _

6 down _ _ IN _ 8 case _ _

7 the _ _ DT _ 8 det _ _

8 street _ _ NN _ 4 nmod _ _

9 . _ _ . _ 2 punct _ _

1 Harry _ _ NNP _ 2 compound _ _

2 Potter _ _ NNP _ 3 nsubj _ _

3 rolled _ _ VBD _ 0 root _ _

4 over _ _ IN _ 3 compound:prt _ _

5 inside _ _ IN _ 7 case _ _

6 his _ _ PRP$ _ 7 nmod:poss _ _

7 blankets _ _ NNS _ 3 nmod _ _

8 without _ _ IN _ 9 mark _ _

9 waking _ _ VBG _ 3 advcl _ _

10 up _ _ RP _ 9 compound:prt _ _

11 . _ _ . _ 3 punct _ _

And that's where all my problems start. I know the person and the verb, but how to extract it from this format I have no idea.

I guess, i can do it this way: find NN/NNP in the table, find its 'parent' and then extract all its 'child'-words. Theoretically it should work. Theoretically.

The question is if anyone can come up with any other idea how to get a person and its action from the text? Or if there any more rational way to do it?

I'll be very grateful for any help!

解决方案

Here is some sample code to help with your problem:

import java.io.*;

import java.util.*;

import edu.stanford.nlp.ling.*;

import edu.stanford.nlp.pipeline.*;

import edu.stanford.nlp.semgraph.*;

import edu.stanford.nlp.util.*;

public class NERAndVerbExample {

public static void main(String[] args) throws IOException {

Properties props = new Properties();

props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,depparse,entitymentions");

StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

String text = "John Smith went to the store.";

Annotation annotation = new Annotation(text);

pipeline.annotate(annotation);

System.out.println("---");

System.out.println("text: " + text);

System.out.println("");

System.out.println("dependency edges:");

for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {

SemanticGraph sg = sentence.get(SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation.class);

for (SemanticGraphEdge sge : sg.edgeListSorted()) {

System.out.println(

sge.getGovernor().word() + "," + sge.getGovernor().index() + "," + sge.getGovernor().tag() + "," +

sge.getGovernor().ner()

+ " - " + sge.getRelation().getLongName()

+ " -> "

+ sge.getDependent().word() + "," +

+sge.getDependent().index() + "," + sge.getDependent().tag() + "," + sge.getDependent().ner());

}

System.out.println();

System.out.println("entity mentions:");

for (CoreMap entityMention : sentence.get(CoreAnnotations.MentionsAnnotation.class)) {

int lastTokenIndex = entityMention.get(CoreAnnotations.TokensAnnotation.class).size()-1;

System.out.println(entityMention.get(CoreAnnotations.TextAnnotation.class) +

"\t" +

entityMention.get(CoreAnnotations.TokensAnnotation.class)

.get(lastTokenIndex).get(CoreAnnotations.IndexAnnotation.class) + "\t" +

entityMention.get(CoreAnnotations.NamedEntityTagAnnotation.class));

}

}

}

}

I'm hoping to add some syntactic sugar to Stanford CoreNLP 3.8.0 to assist with working with the entity mentions.

To explain this code a bit, basically the entitymentions annotator goes through and groups tokens with the same NER tag together. So "John Smith" gets marked as an entity mention.

If you go through the dependency graph, you can get the index of each word.

Likewise if you access the list of tokens for an entity mention, you can also find the index of each word for the entity mention.

With a little more code you can link those together and form entity mention verb pairs as you were requesting.

As you can see in the current code it is quite cumbersome to access info for an entity mention, so I am going to try to improve that in 3.8.0.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值