stanford parser 中文句法分析

为了进行中文句法分析,使用了stanford parser.开始时,一头茫然。搜索网上资源也有很多异常,勉强处理了下,现将可以运行的代码粘贴如下,希望对于用到的人有所帮助:
import java.util.*;
import java.io.StringReader;


import edu.stanford.nlp.process.CoreLabelTokenFactory;
import edu.stanford.nlp.process.DocumentPreprocessor;
import edu.stanford.nlp.process.PTBTokenizer;
import edu.stanford.nlp.process.TokenizerFactory;
import edu.stanford.nlp.ling.CoreLabel;  
import edu.stanford.nlp.ling.HasWord;  
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.parser.lexparser.LexicalizedParser;

class ParserDemo {

  public static void main(String[] args) {
    LexicalizedParser lp = 
       LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/chinesePCFG.ser.gz");
    if (args.length > 0) {
      demoDP(lp, args[0]);
    } else {
      demoAPI(lp);
    }
  }

  public static void demoDP(LexicalizedParser lp, String filename) {
    // This option shows loading and sentence-segment and tokenizing
    // a file using DocumentPreprocessor
    TreebankLanguagePack tlp = new PennTreebankLanguagePack();
    GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
    // You could also create a tokenier here (as below) and pass it
    // to DocumentPreprocessor
    for (List<HasWord> sentence : new DocumentPreprocessor(filename)) {
      Tree parse = lp.apply(sentence);
      parse.pennPrint();
      System.out.println();
      
      GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
      Collection tdl = gs.typedDependenciesCCprocessed(true);
      System.out.println(tdl);
      System.out.println();
    }
  }

  public static void demoAPI(LexicalizedParser lp) {
    // This option shows parsing a list of correctly tokenized words
    String[] sent = { "我", "是", "一名", "好", "学生", "。" };
    List<CoreLabel> rawWords = new ArrayList<CoreLabel>();
    for (String word : sent) {
      CoreLabel l = new CoreLabel();
      l.setWord(word);
      rawWords.add(l);
    }
    Tree parse = lp.apply(rawWords);
    parse.pennPrint();
    System.out.println();

    TreebankLanguagePack tlp = lp.getOp().langpack();
    GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
    GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
    List<TypedDependency> tdl = gs.typedDependenciesCCprocessed();
    System.out.println(tdl);
    System.out.println();
  
    TreePrint tp = new TreePrint("penn,typedDependenciesCollapsed",tlp);
    tp.printTree(parse);
  }

  private ParserDemo() {} // static methods only

}

运行结果:

Loading parser from serialized file edu/stanford/nlp/models/lexparser/chinesePCFG.ser.gz ... done [3.3 sec].
(ROOT
  (IP
    (NP (PN 我))
    (VP (VC 是)
      (NP
        (QP (CD 一名))
        (ADJP (JJ 好))
        (NP (NN 学生))))
    (PU 。)))

[top(是-2, 我-1), root(ROOT-0, 是-2), nummod(学生-5, 一名-3), amod(学生-5, 好-4), attr(是-2, 学生-5)]

(ROOT
  (IP
    (NP (PN 我))
    (VP (VC 是)
      (NP
        (QP (CD 一名))
        (ADJP (JJ 好))
        (NP (NN 学生))))
    (PU 。)))

top(是-2, 我-1)
root(ROOT-0, 是-2)
nummod(学生-5, 一名-3)
amod(学生-5, 好-4)
attr(是-2, 学生-5)


  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值