Stanford parser java+eclipse调用

最新推荐文章于 2022-06-13 15:53:04 发布

Johnsonzx

最新推荐文章于 2022-06-13 15:53:04 发布

阅读量2.6k

点赞数

文章标签： java eclipse nlp

转载自http://blog.sina.com.cn/s/blog_8af106960101a64w.html

Stanford Parser句法分析器官网：http://nlp.stanford.edu/software/lex-parser.shtml#Download

==================================================================================

下载：Download

官网-》Download Stanford Parser version 3.2.0-》stanford-parser-full-2013-06-20.zip

==================================================================================

解压缩：stanford-parser-full-2013-06-20.zip-》stanford-parser-full

部分如下

-》解压缩stanford-parser.jar-》stanford-parser文件夹

-》解压缩stanford-parser-3.2.0-models.jar-》stanford-parser-3.2.0-models文件夹-》将下面的

edu\stanford\nlp\models\lexparser中的englishPCFG.ser.gz拷贝到工程的source文件夹下。

==================================================================================

使用：

参考1：http://sbp810050504.blog.51cto.com/2799422/778398

参考2：http://blog.sina.com.cn/s/blog_59e0c16f0100ufsv.html

首先要将stanford-parser.jar文件加载到lib文件夹中。右键build path->add to build path. 只加载stanford-parser.jar即可。

其中参考1中将englishPCFG.ser.gz

错的！！！
也加入到Referenced Libraries中，在本地报错，不要加入。

另外在初始化要写绝对路径，也和参考的不一样！

更新：englishPCFG.ser.gz拷贝到工程的source文件夹下即可。

工程的层次结构图：

  stanfordpatser工程名
   src代码
     stanfordparser包
       Parser.java类
   lib
     stanford-parser.jar
   source
     englishPCFG.ser.gz

==================================================================================

本地.java中调用的代码

//LexicalizedParser lp = LexicalizedParser.loadModel("...\\stanford-parser- //full\\englishPCFG.ser.gz");//本地中为绝对路径

//相对路径即可

LexicalizedParser lp = LexicalizedParser.loadModel("source/englishPCFG.ser.gz");

String subsen = "One beer later and I'm walking down the street smoking a cig with them";
PTBTokenizer ptb = PTBTokenizer.newPTBTokenizer(new StringReader(subsen));
List words = ptb.tokenize();
System.out.println(lp.parse(words));
=================

结果：

(ROOT (S (NP (NP (CD One) (NN beer) (RB later)) (CC and) (NP (PRP I))) (VP (VBP 'm) (VP (VBG walking) (PRT (RP down)) (NP (NP (DT the) (NN street)) (VP (VBG smoking) (NP (DT a) (NN cig)) (PP (IN with) (NP (PRP them)))))))))

==================================================================================

自带的样例ParserDemo.java

public static void main(String[] args) {
//LexicalizedParser lp = LexicalizedParser.loadModel("D:\\my download\\Parser\\Stanford //Parser\\stanford-parser-full\\englishPCFG.ser.gz");

//相对路径即可

   LexicalizedParser lp = LexicalizedParser.loadModel("source/englishPCFG.ser.gz");
   demoAPI(lp);
   }

public static void demoAPI(LexicalizedParser lp) {
    // This option shows parsing a list of correctly tokenized words第一块
    String[] sent = { "This", "is", "an", "easy", "sentence", "." };
    List rawWords = Sentence.toCoreLabelList(sent);
    Tree parse = lp.apply(rawWords);
    parse.pennPrint();
    System.out.println();

    // This option shows loading and using an explicit tokenizer第二块
    String sent2 = "This is another sentence.";
    TokenizerFactory tokenizerFactory =
      PTBTokenizer.factory(new CoreLabelTokenFactory(), "");
    List rawWords2 =
      tokenizerFactory.getTokenizer(new StringReader(sent2)).tokenize();
    parse = lp.apply(rawWords2);

    TreebankLanguagePack tlp = new PennTreebankLanguagePack();
    GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
    GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
    List tdl = gs.typedDependenciesCCprocessed();
    System.out.println(tdl);

//for(TypedDependency tdl1:tdl){
// System.out.println(tdl1); //例如输出完整的：nsubj(sentence-4, This-1)

// System.out.println(tdl1.gov()); //例如输出支配地位的：sentence-4

// System.out.println(tdl1.dep()); //例如输出从属地位的：This-1

    //   System.out.println(tdl1.reln());//例如输出关系：nsubj
    // }
    System.out.println();

输出第三块

TreePrint tp = new TreePrint("penn,typedDependenciesCollapsed");
tp.printTree(parse);
}

输出结果：

Loading parser from serialized file D:\my download\Parser\Stanford Parser\stanford-parser-full\englishPCFG.ser.gz ... done [1.6 sec].

Loading parser from serialized file source/englishPCFG.ser.gz ... done [1.4 sec].
(ROOT
(S
    (NP (DT This))
    (VP (VBZ is)
      (NP (DT an) (JJ easy) (NN sentence)))
    (. .)))