lucene版本升级到4.6.0以上之后使用ik分词器遇到的问题

在将lucene core版本从4.5.1升级到4.7.0后,如下代码使用ik分词器报错

IKAnalyzer analyzer = new IKAnalyzer(true);
StringReader reader=new StringReader(line);
TokenStream ts=analyzer.tokenStream("", reader);
CharTermAttribute term=ts.getAttribute(CharTermAttribute.class);
while(ts.incrementToken()){
    ...
}

异常信息:

java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.

后来发现是lucene从4.6.0开始TokenStream使用方法更改的问题,在使用incrementToken方法前必须调用reset方法,详见api http://lucene.apache.org/core/4_6_0/core/index.html

The workflow of the new TokenStream API is as follows:

  1. Instantiation of TokenStream/TokenFilters which add/get attributes to/from the AttributeSource.
  2. The consumer calls reset().
  3. The consumer retrieves attributes from the stream and stores local references to all attributes it wants to access.
  4. The consumer calls incrementToken() until it returns false consuming the attributes after each call.
  5. The consumer calls end() so that any end-of-stream operations can be performed.
  6. The consumer calls close() to release any resource when finished using the TokenStream.

更改代码为如下运行正常

IKAnalyzer analyzer = new IKAnalyzer(true);
StringReader reader=new StringReader(line);
TokenStream ts=analyzer.tokenStream("", reader);
CharTermAttribute term=ts.getAttribute(CharTermAttribute.class);
ts.reset();
while(ts.incrementToken()){
    ...
}

 

转载于:https://www.cnblogs.com/luzy/p/3663945.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值