代码
//openie is dependent on tokenize,ssplit,pos,depparse
public class Try1 {
public static void main(String[] args){
Properties props = new Properties(); //props是一个类似map的结构
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, depparse, natlog, openie");
/* tokenize Tokenizesthetextintoasequenceoftokens.分词,中文中,将句子分成一个个的词,英文中较简单?
* ssplit Splits a sequence of tokens into sentences. 断句
* cleanxml Removes most or all XML tags from the document
* truecase Determinesthelikelytruecaseoftokens in text
* pos part of speech,Labels tokens with their POS tag,词性标注CC,DT,JJR,TO,VB等等等等
* lemma lemmatization,词元化,表示出词的原型,例如 sings--sing your--you is--be
* gender Adds likely gender information to names
* ner named entities recognizer 命名实体识别 识别出是ORGANIZATION组织,LOCATION地点 等等
* Time, Location, Organization, Person, Money, Percent, Date 这7种
* parse 找出句子的语法结构,哪些词可以成组,哪些词是这个动词的主语或宾语
* depparse Neural Network Dependency Parser 更厉害的parse?
* sentiment Sentiment analysis with a compositional model over trees using deep learning
* natlog Natural Logic some cute rabbits are small -- some rabbits are small.
* dcoref 同义词分辨 Implements mention detection and both pronominal and nominal coreference resolution
* openie open information extraction, 提取关系三元组
* */
/*
* 以下两种初始化管道的方式二选一
*/
StanfordCoreNLP pipeline = new StanfordCoreNLP(props); //用props初始化管道pipeline
// StanfordCoreNLP pipeline = new StanfordCoreNLP(
// PropertiesUtils.asProperties(
// "annotators", "tokenize,ssplit,pos,lemma,ner,depparse,natlog,openie",
// "ssplit.isOneSentence", "true",
// "parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz",
// "tokenize.language", "en"));
String text = "Stanford University is located in Stanford which is one of best good universities in 2015.\n";
Annotation doc = new Annotation(text); // 用字符串初始化一个annotation类型
pipeline.annotate(doc);
/* 将前面的一系列操作处理字符串 StanfordCoreNLP.annotate(Annotation)
* 得到的doc为处理后的doc
*/
int sentNo = 0;
// a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
// sentence 是一个coreMap类型,使用类作为key,value可以为自定义类型
for(CoreMap sentence : doc.get(CoreAnnotations.SentencesAnnotation.class)){ //对 doc 的每一句话
System.out.println("Sentence #" + ++sentNo + ": " + sentence.get(TextAnnotation.class)); //输出处理前的那句话
//如何得到分句后的结果呢?
System.out.println("word\tpos\tlema\tne");
// a CoreLabel is a CoreMap with additional token-specific methods
for(CoreLabel token : sentence.get(TokensAnnotation.class)){ //对每句话的每个单词(分词以后的)
String word = token.get(TextAnnotation.class); //获取分词
String lema = token.get(LemmaAnnotation.class);