1.getting started Stanford CoreNLP

CoreNLP的核心包包括两个类:Annotation 和 Annotator。

Annotations 是用来保存 annotators的结果的数据结构,Annotations 一般是map,Annotators 更像函数,不过他们对Annotations进行操作,而不是针对Objects。

Annotators 可以进行 tokenize,parse,NER,POS。Annotators 和Annotations 整合在 AnnotationPipelines 中,Stanford CoreNLP 继承了AnnotationPipeline 类,并且自定义了NLPAnnotators。Annotators 的输出需要使用 CoreMap 和 CoreLabel来获取。

1. 通过StanfordCoreNLP(Properties props)来创建StanfordCoreNLP对象

2. 通过annotate(Annotation document) 来解析任意的文本。

public class WordSeg {
	public static void main(String[] args) {
		// 创建一个StanfordCoreNLP对象,
		// 包括POS tagging, lemmatization, NER, parsing, and coreference
		// resolution
		Properties props = new Properties();
		props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,dcoref");
		// 创建一个Stanford coreNLP对象
		StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

		String text = "Until the 19th century and the first opium war, Shanghai was considered to be essentially a fishing village. However, in 1914, Shanghai had 200 banks dealing with 80% of its foreign investments in China.";
		// 用上述文本创建一个空的Annotation
		Annotation document = new Annotation(text);
		System.out.println("空的Annotation:"+document);
		// 对文本进行所有上述定义的操作
		pipeline.annotate(document);
		
		// 这是text中所有的sentences
		// CoreMap<class object,custom types>
		List<CoreMap> sentences = document.get(SentencesAnnotation.class);
		
		for (CoreMap sentence : sentences) {
			System.out.println("sentence:"+sentence);
			// CoreLabel是具有特殊token处理方法的CoreMap
			for (CoreLabel token : sentence.get(TokensAnnotation.class)) {
				System.out.println("token:"+token);
				// 这是token的文本内容(word)
				String word = token.get(TextAnnotation.class);
				System.out.println("word:"+word);
				// 这是token的词性标注标签
				String pos = token.get(PartOfSpeechAnnotation.class);
				System.out.println("pos:"+pos);
				// 这是token的NER标签
				String ne = token.get(NamedEntityTagAnnotation.class);
				System.out.println("ne:"+ne);
			}
			// 这是sentence的句法分析树
			Tree tree = sentence.get(TreeAnnotation.class);
			System.out.println(tree);
			// 这是sentence的依赖图
			SemanticGraph dependencies = sentence.get(CollapsedDependenciesAnnotation.class);
			System.out.println(dependencies);
		}
		// 这是指代链的图
		Map<Integer, CorefChain> graph = document.get(CorefChainAnnotation.class);
		System.out.println(graph);
	}
}
一部分输出如下图所示:
sentence:Until the 19th century and the first opium war, Shanghai was considered to be essentially a fishing village.
token:Until-1
word:Until
pos:IN
ne:O
token:the-2
word:the
pos:DT
ne:DATE
token:19th-3
word:19th
pos:JJ
ne:DATE
token:century-4
word:century
pos:NN
ne:DATE
token:and-5
word:and
pos:CC
ne:O
// 句法分析树
(ROOT (S (PP (IN Until) (NP (NP (DT the) (JJ 19th) (NN century)) (CC and) (NP (DT the) (JJ first) (NN opium) (NN war)))) (, ,) (NP (NNP Shanghai)) (VP (VBD was) (VP (VBN considered) (S (VP (TO to) (VP (VB be) (NP (RB essentially) (DT a) (NN fishing) (NN village))))))) (. .)))
// 依赖图
-> considered/VBN (root)
  -> century/NN (nmod:until)
    -> Until/IN (case)
    -> the/DT (det)
    -> 19th/JJ (amod)
    -> and/CC (cc)
    -> war/NN (conj:and)
      -> the/DT (det)
      -> first/JJ (amod)
      -> opium/NN (compound)
  -> ,/, (punct)
  -> Shanghai/NNP (nsubjpass)
  -> was/VBD (auxpass)
  -> village/NN (xcomp)
    -> to/TO (mark)
    -> be/VB (cop)
    -> essentially/RB (advmod)
    -> a/DT (det)
    -> fishing/NN (compound)
  -> ./. (punct)
// 指代链
{1=CHAIN1-["first" in sentence 1], 2=CHAIN2-["Shanghai" in sentence 1, "Shanghai" in sentence 2], 3=CHAIN3-["the 19th century and the first opium war" in sentence 1], 4=CHAIN4-["the 19th century" in sentence 1], 5=CHAIN5-["the first opium war" in sentence 1], 6=CHAIN6-["essentially a fishing village" in sentence 1], 8=CHAIN8-["200" in sentence 2], 9=CHAIN9-["China" in sentence 2], 10=CHAIN10-["1914" in sentence 2, "its" in sentence 2], 11=CHAIN11-["200 banks dealing with 80 % of its foreign investments in China" in sentence 2], 12=CHAIN12-["its foreign investments" in sentence 2]}


运行后若出现错误

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

解决办法是在pom.xml中加入依赖关系

<dependency>
    <groupId>org.slf4j</groupId>
    <artifactId>slf4j-simple</artifactId>
    <version>1.7.12</version>
</dependency>


评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值