Lucene6.0对查询分词结果的方法做了一些细微的调整,早期lucene的实现方式:
public void analyzeDemo(Analyzer analyzer, String text) throws Exception {
TokenStream tokenStream = analyzer.tokenStream("content", new StringReader(text));
for (Token token = new Token(); (token = tokenStream.next(token)) != null;) {
System.out.println(token);
}
}
最新版lucene的实现方式:
public class AnalyzeDemo {
/**
* 打印分词结果
* @param analyzer
* @param text
*/
public void analyze(Analyzer analyzer, String text) {
try {
TokenStream tokenStream = analyzer.tokenStream("content", new StringReader(text));
tokenStream.addAttribute(CharTermAttribute.class);
tokenStream.reset();
while (tokenStream.incrementToken()) {
CharTermAttribute charTermAttribute = (CharTermAttribute) tokenStream
.getAttribute(CharTermAttribute.class);
System.out.println(charTermAttribute.toString());
}
tokenStream.end();
tokenStream.close();
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
String text = "2小时前 - 谈起对中国人none的认同,侯汉廷认为,这与家庭和小时候的教育有很大关系。";
Analyzer analyzer = new SmartChineseAnalyzer();
AnalyzeDemo demo = new AnalyzeDemo();
demo.analyze(analyzer, text);
}
}
通过更换分词器,比较相应分词器的分词效果,选择最佳分词器。