Stanford Parser 详细使用参考

来自http://blog.csdn.net/dushenzhi/article/details/8194987

1、到斯坦福官方网站http://nlp.stanford.edu/software/lex-parser.shtml下载软件包,解压。

2、在eclipse中新建一个java project,把解压得到根目录下的stanford-parser.jar和stanford-parser-2.0.4-models.jar(不同版本文件名可能有差异)两个包导入项目到项目引用包中,然后把解压得到根目录下的ParserDemo.java文件拷贝到项目的src中,在eclipse目录中的显示如下:

 

 

 

运行该实例程序将得到如下结果:

 

该示例是英文语法解析的示例程序,如果要测试中文的话,要如下修改:

(1)按需要把输入改成要测试中文:

[java]  view plain copy
  1. String[] sent = {  "这""是""第一个""测试""句子""。" };  

(2)导入中文的解析模型文件:

[java]  view plain copy
  1. Stringgrammar = args.length > 0 ? args[0] : "edu/stanford/nlp/models/lexparser/chinesePCFG.ser.gz";  

(3)修改源文件中的部分代码:

[java]  view plain copy
  1. TreebankLanguagePacktlp = new ChineseTreebankLanguagePack();//PennTreebankLanguagePack();  

可能报错没有retainTmpSubcategories参数,在源文件中注释掉该参数:

[java]  view plain copy
  1. String[] options = {"-maxLength""80"};//, "-retainTmpSubcategories" };  

 

Stanford parser句法树分析时候占用内存可能较大,所以要调整eclipse虚拟内存空间,方法是在“运行——运行——自变量——VM自变量中填上-Xms256M -Xmx800M”,大小就要看实际情况和机子性能。

当句子较长时会出现报“FactoredParser: exceeded MAX_ITEMS work limit [200000 items]; aborting.”错误...

在options中把MAX_ITEMS设为一个更大的书,如下例子中为500000

[java]  view plain copy
  1. String[] options = { "-maxLength""140""-MAX_ITEMS","500000"};  
[java]  view plain copy
  1. lp = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/", options);  
解决办法参考:http://blog.amelielee.com/archives/140。



为了方便易于使用,Stanford Parser自带图形化操作界面,在windows操作系统下只要双击运行软件根目录下的lexparser-gui.bat文件(linux下为lexparser-gui.sh文件)即可得到如下界面:

 

 

点击“Load File”导入需要解析文件也可以直接在上面大的输入框中输入要解析内容,在“Language”选项中选择对应解析的语言。点击“Load Parser”载入模型文件,稍等片刻(载入模型文件可能需要几秒钟)进度条完成载入后“Parser”按钮变成可用状态,点击即可解上输入框中高亮的内容,解析得到的树形结果在下框中显示,可以把结果输出另存为文件:


Stanford Parser还提供了命令行的方式lexparser-gui.bat(win)和lexparser.sh(linux)具体使用见官方文档:FAQ

Stanford Parser有个在线的解释效果示例在:http://nlp.stanford.edu:8080/parser/index.jsp

更多信息请参见:The Stanford NaturalLanguage Processing Group官方网站

About A natural language parser is a program that works out the grammatical structure of sentences, for instance, which groups of words go together (as "phrases") and which words are the subject or object of a verb. Probabilistic parsers use knowledge of language gained from hand-parsed sentences to try to produce the most likely analysis of new sentences. These statistical parsers still make some mistakes, but commonly work rather well. Their development was one of the biggest breakthroughs in natural language processing in the 1990s. You can try out our parser online. This package is a Java implementation of probabilistic natural language parsers, both highly optimized PCFG and lexicalized dependency parsers, and a lexicalized PCFG parser. The original version of this parser was mainly written by Dan Klein, with support code and linguistic grammar development by Christopher Manning. Extensive additional work (internationalization and language-specific modeling, flexible input/output, grammar compaction, lattice parsing, k-best parsing, typed dependencies output, user support, etc.) has been done by Roger Levy, Christopher Manning, Teg Grenager, Galen Andrew, Marie-Catherine de Marneffe, Bill MacCartney, Anna Rafferty, Spence Green, Huihsin Tseng, Pi-Chuan Chang, Wolfgang Maier, and Jenny Finkel. The lexicalized probabilistic parser implements a factored product model, with separate PCFG phrase structure and lexical dependency experts, whose preferences are combined by efficient exact inference, using an A* algorithm. Or the software can be used simply as an accurate unlexicalized stochastic context-free grammar parser. Either of these yields a good performance statistical parsing system. A GUI is provided for viewing the phrase structure tree output of the parser. As well as providing an English parser, the parser can be and has been adapted to work with other languages. A Chinese parser based on the Chinese Treebank, a German parser based on the Negra corpus and Arabic parsers based on the Penn Arabic Treebank are also included. The parser has also been used for other languages, such as Italian, Bulgarian, and Portuguese. The parser provides Stanford Dependencies output as well as phrase structure trees. Typed dependencies are otherwise known grammatical relations. This style of output is available only for English and Chinese. For more details, please refer to the Stanford Dependencies webpage. The current version of the parser requires Java 6 (JDK1.6) or later. (You can also download an old version of the parser, version 1.4, which runs under JDK 1.4, or version 2.0 which runs under JDK 1.5, but those distributions are no longer supported.) The parser also requires a reasonable amount of memory (at least 100MB to run as a PCFG parser on sentences up to 40 words in length; typically around 500MB of memory to be able to parse similarly long typical-of-newswire sentences using the factored model). The parser is available for download, licensed under the GNU General Public License (v2 or later). Source is included. The package includes components for command-line invocation, a Java parsing GUI, and a Java API. The parser code is dual licensed (in a similar manner to MySQL, etc.). Open source licensing is under the full GPL, which allows many free uses. For distributors of proprietary software, commercial licensing with a ready-to-sign agreement is available. If you don't need a commercial license, but would like to support maintenance of these tools, we welcome gift funding. The download is a 54 MB zipped file (mainly consisting of included grammar data files). If you unpack the zip file, you should have everything needed. Simple scripts are included to invoke the parser on a Unix or Windows system. For another system, you merely need to similarly configure the classpath.
参考资源链接:[使用指南:Stanford Parser 句法分析与可视化](https://wenku.csdn.net/doc/67khmbm7uc?utm_source=wenku_answer2doc_content) Stanford Parser是自然语言处理领域中的一款强大的句法分析工具,可以对英文文本进行深入的语法结构分析并生成句法树。首先,确保你的计算机上安装了合适的Java Development Kit (JDK)版本,并下载Stanford Parser的软件包,如`stanford-parser-full-2015-12-09`。解压下载的文件,并在解压目录中找到`lexparser-gui.bat`文件。双击运行它,将启动Stanford Parser的图形界面。 在图形界面中,你可以加载一个文本文件,选择“Load”选项导入你的文本。然后,选择适当的解析模型(通常选择默认的`en-parser-chunking`模型)。接下来,点击“Parse”按钮,Parser将开始分析文本。 分析完成后,你可以点击“View”选项来查看句法树。句法树会以图形化的方式展示句子的句法结构,包括句子的主干结构和各个成分之间的依存关系。 通过这一系列步骤,你可以直观地观察到文本的句法结构,并对句法树进行进一步的分析和研究。为了更深入地了解Stanford Parser使用方法,以及如何解释生成的句法树,建议参阅《使用指南:Stanford Parser 句法分析与可视化》。这份资料不仅详细介绍了如何操作Stanford Parser,还包括对生成的句法树进行解读的技巧,是学习和掌握句法分析工具不可或缺的资源。 参考资源链接:[使用指南:Stanford Parser 句法分析与可视化](https://wenku.csdn.net/doc/67khmbm7uc?utm_source=wenku_answer2doc_content)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值