Stanford Parser (1)

 Have you successful loaded the parser window by double clicking either "lexparser-gui.bat", "stanford-parser-2006-06-11.jar" or "stanford-parser.jar"? If not, you may need to check whether your JDK is correctly installed.

If the parser window can be loaded, you should try to type in ONLY one or two sentences to test. If you want to open an existing text, it must be in utf-8 format. It'd be good to try with a text of only a few sentences first. Then load the parser file (englishFactored.ser.gz for English text; chineseFactored.ser.gz for Chinese text). It may take a while to load it as

"The current version of the parser requires Java 5 (JDK1.5 or above). The parser also requires plenty of memory (a minimum of 100Mb to run as a PCFG parser on sentences up to 40 words in length; typically around 500Mb of memory to be able to parse similarly long typical-of-newswire sentences using the factored model). "

Once the parser file is loaded, click one sentence in the text window, and it will be highlighted in yellow. Then click Parse, you should see the result in the output window in a second.

Maybe your computer's RAM is not big enough (mine is 1GB). Alternatively, try it from command line with "lexparser.bat input.txt >output.txt".

How much memory do I need to parse very long sentences?

Memory usage by the parser depends on a number of factors:

Memory usage expands roughly with the square of the sentence length. You may wish to set a -maxLength and to skip long sentences.

The factored parser requires several times as much memory as just running the PCFG parser, since it runs 3 parsers.

The command-line version of the parser currently loads the whole of a file into memory before parsing any of it. If your file is extremely large, splitting it into multiple files and parsing them sequentially will reduce memory usage.
A 64-bit application requires more memory than a 32-bit application (Java uses lots of pointers).

A larger grammar or POS tag set requires more memory than a smaller one.
Below are some statistics for 32-bit operation with the supplied englishPCFG and englishFactoredGrammars. We have parsed sentences as long as 234 words, but you need lots of RAM and patience.

Length PCFG Factored
20 50 MB 250 MB
50 125 MB 600 MB
100 350 MB 2100 MB

Please make sure:

1. Put the pasrer program package in a folder whose name has no Chinese characters;

2. Start the program by clicking lexparser-gui.bat (it's quite easy to get "out of memory" error if you start it from the two jar files);

3. Load chineseFactored.ser.gz for Chinese text, and you'd better not do anything else while waiting for the parser to load completely, otherwise you may get the "out of memory" error;

3. Under Language Tab, choose "Tokenized Simplified Chinese (utf-8)";

4. Input your Chinese sentence, and leave a space between words, e.g. 赵 先生 是 个 大学 老师 。 他 很 喜欢 写 文章 。

Good luck, guys!

引用:
作者: xujiajin 查看帖子
Tried the command mode, error log said: no "server", something call JVM.dll was missing.
I 'm using the latest version of JDK (1.6.0), which can be downloaded from:
https://sdlc1e.sun.com/ECom/EComActi...0FF9BEF430941F

If the above link doesn't work, try to find it at this web site: http://java.sun.com/javase/downloads/index.jsp
click JDK6 to the download page.

The file is about 53.16 MB in size for Windows Offline Installation, Multi-language version.

You may uninstall all previous versions of JDK or JRE first, then install this latest version. After installation completes, copy the folder named " server" from C:/Program Files/Java/jdk1.6.0/jre/bin/ to C:/Program Files/Java/jre1.6.0/bin/.

Then, you need set your Java path by:

Right click My Computer icon, choose Properties, Advanced, Environment Variables, in the System Variables box, find Path, click Edit, and add the following line (all in English) to the end of the text there:

;C:/Program Files/Java/jdk1.6.0/bin;C:/Program Files/Java/jre1.6.0/bin

Finally, reboot your machine to try the parser. The problem of "missing server jvm.dll" should be resolved.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值