ICTCLAS的java接口中有这个方法:
/**
对一串中文文本进行分词
*/
public synchronized native String paragraphProcess(String sParagraph);
大多数情况下该方法可以对传入的文本进行分词操作,但是对于一些特殊字符会抛出异常,比如如下的字符:
String str="[1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][下一页]";
会打印如下的异常:
A fatal error has been detected by the Java Runtime Environment:
#
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x3ae6c4e4, pid=2804, tid=2756
#
# JRE version: 6.0_22-b04
# Java VM: Java HotSpot(TM) Client VM (17.1-b03 mixed mode windows-x86 )
# Problematic frame:
# C [ICTCLAS.dll+0xc4e4]
#
# An error report file with more information is saved as:
# D:\yourproject\hs_err_pid2804.log
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
# [error occurred during error reporting , id 0xc0000005]
原因:这是ICTCLAS.dll的异常,所以java中的try catch块无法截获,致使jvm(Java虚拟机强行关闭)。
解决办法:当使用ICTCLAS进行分词前,最好对文本进行一些预处理(如去除多余空格、文本不要太长等)。