Stanford Parser (1)

最新推荐文章于 2022-08-08 16:50:45 发布

leeharry

最新推荐文章于 2022-08-08 16:50:45 发布

阅读量1.2w

点赞数

分类专栏：机器学习的研究文章标签： parsing variables application jdk pointers statistics

本文链接：https://blog.csdn.net/leeharry/article/details/2145445

版权

机器学习的研究专栏收录该内容

4 篇文章 0 订阅

订阅专栏

Have you successful loaded the parser window by double clicking either "lexparser-gui.bat", "stanford-parser-2006-06-11.jar" or "stanford-parser.jar"? If not, you may need to check whether your JDK is correctly installed.

If the parser window can be loaded, you should try to type in ONLY one or two sentences to test. If you want to open an existing text, it must be in utf-8 format. It'd be good to try with a text of only a few sentences first. Then load the parser file (englishFactored.ser.gz for English text; chineseFactored.ser.gz for Chinese text). It may take a while to load it as

"The current version of the parser requires Java 5 (JDK1.5 or above). The parser also requires plenty of memory (a minimum of 100Mb to run as a PCFG parser on sentences up to 40 words in length; typically around 500Mb of memory to be able to parse similarly long typical-of-newswire sentences using the factored model). "

Once the parser file is loaded, click one sentence in the text window, and it will be highlighted in yellow. Then click Parse, you should see the result in the output window in a second.

Maybe your computer's RAM is not big enough (mine is 1GB). Alternatively, try it from command line with "lexparser.bat input.txt >output.txt".

How much memory do I need to parse very long sentences?

Memory usage by the parser depends on a number of factors:

Memory usage expands roughly with the square of the sentence length. You may wish to set a -maxLength and to skip long sentences.

The factored parser requires several times as much memory as just running the PCFG parser, since it runs 3 parsers.

The command-line version of the parser currently loads the whole of a file into memory before parsing any of it. If your file is extremely large, splitting it into multiple files and parsing them sequentially will reduce memory usage.
A 64-bit application requires more memory than a 32-bit application (Java uses lots of pointers).

A larger grammar or POS tag set requires more memory than a smaller one.
Below are some statistics for 32-bit operation with the supplied englishPCFG and englishFactoredGrammars. We have parsed sentences as long as 234 words, but you need lots of RAM and patience.

Length PCFG Factored
20 50 MB 250 MB
50 125 MB 600 MB
100 350 MB 2100 MB

Please make sure:

1. Put the pasrer program package in a folder whose name has no Chinese characters;

2. Start the program by clicking lexparser-gui.bat (it's quite easy to get "out of memory" error if you start it from the two jar files);

3. Load chineseFactored.ser.gz for Chinese text, and you'd better not do anything else while waiting for the parser to load completely, otherwise you may get the "out of memory" error;

3. Under Language Tab, choose "Tokenized Simplified Chinese (utf-8)";

4. Input your Chinese sentence, and leave a space between words, e.g. 赵先生是个大学老师。他很喜欢写文章。

Good luck, guys!

引用:

作者: xujiajin

Tried the command mode, error log said: no "server", something call JVM.dll was missing.

I 'm using the latest version of JDK (1.6.0), which can be downloaded from:
https://sdlc1e.sun.com/ECom/EComActi...0FF9BEF430941F

If the above link doesn't work, try to find it at this web site: http://java.sun.com/javase/downloads/index.jsp
click JDK6 to the download page.

The file is about 53.16 MB in size for Windows Offline Installation, Multi-language version.

You may uninstall all previous versions of JDK or JRE first, then install this latest version. After installation completes, copy the folder named " server" from C:/Program Files/Java/jdk1.6.0/jre/bin/ to C:/Program Files/Java/jre1.6.0/bin/.

Then, you need set your Java path by:

Right click My Computer icon, choose Properties, Advanced, Environment Variables, in the System Variables box, find Path, click Edit, and add the following line (all in English) to the end of the text there:

;C:/Program Files/Java/jdk1.6.0/bin;C:/Program Files/Java/jre1.6.0/bin

Finally, reboot your machine to try the parser. The problem of "missing server jvm.dll" should be resolved.

leeharry

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
2
评论
Stanford Parser (1)

Have you successful loaded the parser window by double clicking either "lexparser-gui.bat", "stanford-parser-2006-06-11.jar" or "stanford-parser.jar"? If not, you may need to check whether your JDK i
复制链接

扫一扫

专栏目录