1.getting started Stanford CoreNLP

最新推荐文章于 2022-10-10 15:27:38 发布

yessica_Chen

最新推荐文章于 2022-10-10 15:27:38 发布

阅读量563

点赞数

分类专栏： NLP 文章标签： Stanford CoreNLP nlp

本文链接：https://blog.csdn.net/u012666689/article/details/70016906

版权

NLP 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

CoreNLP的核心包包括两个类：Annotation 和 Annotator。

Annotations 是用来保存 annotators的结果的数据结构，Annotations 一般是map，Annotators 更像函数，不过他们对Annotations进行操作，而不是针对Objects。

Annotators 可以进行 tokenize，parse，NER，POS。Annotators 和Annotations 整合在 AnnotationPipelines 中，Stanford CoreNLP 继承了AnnotationPipeline 类，并且自定义了NLPAnnotators。Annotators 的输出需要使用 CoreMap 和 CoreLabel来获取。

1. 通过StanfordCoreNLP(Properties props)来创建StanfordCoreNLP对象

2. 通过annotate(Annotation document) 来解析任意的文本。

public class WordSeg {
	public static void main(String[] args) {
		// 创建一个StanfordCoreNLP对象，
		// 包括POS tagging, lemmatization, NER, parsing, and coreference
		// resolution
		Properties props = new Properties();
		props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,dcoref");
		// 创建一个Stanford coreNLP对象
		StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

		String text = "Until the 19th century and the first opium war, Shanghai was considered to be essentially a fishing village. However, in 1914, Shanghai had 200 banks dealing with 80% of its foreign investments in China.";
		// 用上述文本创建一个空的Annotation
		Annotation document = new Annotation(text);
		System.out.println("空的Annotation:"+document);
		// 对文本进行所有上述定义的操作
		pipeline.annotate(document);
		
		// 这是text中所有的sentences
		// CoreMap<class object,custom types>
		List<CoreMap> sentences = document.get(SentencesAnnotation.class);
		
		for (CoreMap sentence : sentences) {
			System.out.println("sentence:"+sentence);
			// CoreLabel是具有特殊token处理方法的CoreMap
			for (CoreLabel token : sentence.get(TokensAnnotation.class)) {
				System.out.println("token:"+token);
				// 这是token的文本内容（word）
				String word = token.get(TextAnnotation.class);
				System.out.println("word:"+word);
				// 这是token的词性标注标签
				String pos = token.get(PartOfSpeechAnnotation.class);
				System.out.println("pos:"+pos);
				// 这是token的NER标签
				String ne = token.get(NamedEntityTagAnnotation.class);
				System.out.println("ne:"+ne);
			}
			// 这是sentence的句法分析树
			Tree tree = sentence.get(TreeAnnotation.class);
			System.out.println(tree);
			// 这是sentence的依赖图
			SemanticGraph dependencies = sentence.get(CollapsedDependenciesAnnotation.class);
			System.out.println(dependencies);
		}
		// 这是指代链的图
		Map<Integer, CorefChain> graph = document.get(CorefChainAnnotation.class);
		System.out.println(graph);
	}
}

一部分输出如下图所示：

sentence:Until the 19th century and the first opium war, Shanghai was considered to be essentially a fishing village.
token:Until-1
word:Until
pos:IN
ne:O
token:the-2
word:the
pos:DT
ne:DATE
token:19th-3
word:19th
pos:JJ
ne:DATE
token:century-4
word:century
pos:NN
ne:DATE
token:and-5
word:and
pos:CC
ne:O

// 句法分析树

(ROOT (S (PP (IN Until) (NP (NP (DT the) (JJ 19th) (NN century)) (CC and) (NP (DT the) (JJ first) (NN opium) (NN war)))) (, ,) (NP (NNP Shanghai)) (VP (VBD was) (VP (VBN considered) (S (VP (TO to) (VP (VB be) (NP (RB essentially) (DT a) (NN fishing) (NN village))))))) (. .)))

// 依赖图

-> considered/VBN (root)
  -> century/NN (nmod:until)
    -> Until/IN (case)
    -> the/DT (det)
    -> 19th/JJ (amod)
    -> and/CC (cc)
    -> war/NN (conj:and)
      -> the/DT (det)
      -> first/JJ (amod)
      -> opium/NN (compound)
  -> ,/, (punct)
  -> Shanghai/NNP (nsubjpass)
  -> was/VBD (auxpass)
  -> village/NN (xcomp)
    -> to/TO (mark)
    -> be/VB (cop)
    -> essentially/RB (advmod)
    -> a/DT (det)
    -> fishing/NN (compound)
  -> ./. (punct)

// 指代链

{1=CHAIN1-["first" in sentence 1], 2=CHAIN2-["Shanghai" in sentence 1, "Shanghai" in sentence 2], 3=CHAIN3-["the 19th century and the first opium war" in sentence 1], 4=CHAIN4-["the 19th century" in sentence 1], 5=CHAIN5-["the first opium war" in sentence 1], 6=CHAIN6-["essentially a fishing village" in sentence 1], 8=CHAIN8-["200" in sentence 2], 9=CHAIN9-["China" in sentence 2], 10=CHAIN10-["1914" in sentence 2, "its" in sentence 2], 11=CHAIN11-["200 banks dealing with 80 % of its foreign investments in China" in sentence 2], 12=CHAIN12-["its foreign investments" in sentence 2]}

运行后若出现错误

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

解决办法是在pom.xml中加入依赖关系

<dependency>
    <groupId>org.slf4j</groupId>
    <artifactId>slf4j-simple</artifactId>
    <version>1.7.12</version>
</dependency>

yessica_Chen

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
2
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录