转载自http://blog.sina.com.cn/s/blog_8af106960101a64w.html
Stanford Parser句法分析器官网:http://nlp.stanford.edu/software/lex-parser.shtml#Download
==================================================================================
下载:Download
官网-》Download Stanford Parser version 3.2.0-》stanford-parser-full-2013-06-20.zip
==================================================================================
解压缩:stanford-parser-full-2013-06-20.zip-》stanford-parser-full
部分如下
-》解压缩stanford-parser.jar-》stanford-parser文件夹
-》解压缩stanford-parser-3.2.0-models.jar-》stanford-parser-3.2.0-models文件夹-》将下面的
edu\stanford\nlp\models\lexparser中的englishPCFG.ser.gz拷贝到工程的source文件夹下。
==================================================================================
使用:
参考1:http://sbp810050504.blog.51cto.com/2799422/778398
参考2:http://blog.sina.com.cn/s/blog_59e0c16f0100ufsv.html
首先要将stanford-parser.jar文件加载到lib文件夹中。右键build path->add to build path. 只加载stanford-parser.jar即可。
其中参考1中将englishPCFG.ser.gz
错的!!!
也加入到Referenced Libraries中,在本地报错,不要加入。
另外在初始化要写绝对路径,也和参考的不一样!
更新:englishPCFG.ser.gz拷贝到工程的source文件夹下即可。
工程的层次结构图:
stanfordpatser工程名
src代码
stanfordparser包
Parser.java类
lib
stanford-parser.jar
source
englishPCFG.ser.gz
==================================================================================
本地.java中调用的代码
//LexicalizedParser lp = LexicalizedParser.loadModel("...\\stanford-parser- //full\\englishPCFG.ser.gz");//本地中为绝对路径
//相对路径即可
LexicalizedParser lp = LexicalizedParser.loadModel("source/englishPCFG.ser.gz");
String subsen = "One beer later and I'm walking down the street smoking a cig with them";
PTBTokenizer ptb = PTBTokenizer.newPTBTokenizer(new StringReader(subsen));
List words = ptb.tokenize();
System.out.println(lp.parse(words));
=================
结果:
(ROOT (S (NP (NP (CD One) (NN beer) (RB later)) (CC and) (NP (PRP I))) (VP (VBP 'm) (VP (VBG walking) (PRT (RP down)) (NP (NP (DT the) (NN street)) (VP (VBG smoking) (NP (DT a) (NN cig)) (PP (IN with) (NP (PRP them)))))))))
==================================================================================
自带的样例ParserDemo.java
public static void main(String[] args) {
//LexicalizedParser lp = LexicalizedParser.loadModel("D:\\my download\\Parser\\Stanford //Parser\\stanford-parser-full\\englishPCFG.ser.gz");
//相对路径即可
LexicalizedParser lp = LexicalizedParser.loadModel("source/englishPCFG.ser.gz");
demoAPI(lp);
}
public static void demoAPI(LexicalizedParser lp) {
// This option shows parsing a list of correctly tokenized words第一块
String[] sent = { "This", "is", "an", "easy", "sentence", "." };
List rawWords = Sentence.toCoreLabelList(sent);
Tree parse = lp.apply(rawWords);
parse.pennPrint();
System.out.println();
// This option shows loading and using an explicit tokenizer第二块
String sent2 = "This is another sentence.";
TokenizerFactory tokenizerFactory =
PTBTokenizer.factory(new CoreLabelTokenFactory(), "");
List rawWords2 =
tokenizerFactory.getTokenizer(new StringReader(sent2)).tokenize();
parse = lp.apply(rawWords2);
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
List tdl = gs.typedDependenciesCCprocessed();
System.out.println(tdl);
//for(TypedDependency tdl1:tdl){
// System.out.println(tdl1); //例如输出完整的:nsubj(sentence-4, This-1)
// System.out.println(tdl1.gov()); //例如输出支配地位的:sentence-4
// System.out.println(tdl1.dep()); //例如输出从属地位的:This-1
// System.out.println(tdl1.reln());//例如输出关系:nsubj
// }
System.out.println();
输出第三块
TreePrint tp = new TreePrint("penn,typedDependenciesCollapsed");
tp.printTree(parse);
}
输出结果:
Loading parser from serialized file D:\my download\Parser\Stanford Parser\stanford-parser-full\englishPCFG.ser.gz ... done [1.6 sec].
Loading parser from serialized file source/englishPCFG.ser.gz ... done [1.4 sec].
(ROOT
(S
(NP (DT This))
(VP (VBZ is)
(NP (DT an) (JJ easy) (NN sentence)))
(. .)))
[nsubj(sentence-4, This-1), cop(sentence-4, is-2), det(sentence-4, another-3), root(ROOT-0, sentence-4)]
(ROOT
(S
(NP (DT This))
(VP (VBZ is)
(NP (DT another) (NN sentence)))
(. .)))
nsubj(sentence-4, This-1)
cop(sentence-4, is-2)
det(sentence-4, another-3)
root(ROOT-0, sentence-4)