Lucene自带多种分词器,其中对中文分词支持比较好的是smartcn。
1. 标准分词器StandardAnalyzer
在演示smartcn中文分词器之前,先来看看Lucene标准分词器对中文分词的效果。需要的jar为\lucene-5.5.5\core\下的lucene-core-5.5.5.jar和\lucene-5.5.5\analysis\common\下的lucene-analyzers-common-5.5.5.jar。新建测试类TestLucene04:package net.xxpsw.demo.lucene.test;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
public class TestLucene04 {
private void print(Analyzer analyzer) throws Exception {
String text = "Lucene自带多种分词器,其中对中文分词支持比较好的是smartcn。";
TokenStream tokenStream = analyzer.tokenStream("content", text);
CharTermAttribute attribute = tokenStream.addAttribute(CharTermAttribute.class);
tokenStream.reset();
while (tokenStream.incr