lucene3.02 的analyzer分析语汇单元_luncene中的语汇单元-CSDN博客

本文链接：https://blog.csdn.net/zpf1217/article/details/5861623

下面用代码展示 LUCENE自带的四种analyzer对相同文本分析结果的异同

注：因为在2.9以后，lucene对语汇单元引入了一种新的机制。。。我用的资料是2006年的lucene in action，所以是老的代码。。。好悲哀。。。查了好多API，才用新的实现。。。

A new TokenStream API has been introduced with Lucene 2.9. This API has moved from being Token-based to Attribute-based. While Token still exists in 2.9 as a convenience class, the preferred way to store the information of a Token is to use AttributeImpls.

下面是代码：

package org.apache.lucene.demo; import java.io.IOException; import java.io.StringReader; import org.apache.lucene.analysis.*; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.analysis.tokenattributes.TermAttribute; import org.apache.lucene.util.Version; public class AnalyseDemo { private static final String[] examples = { "The quick brown fox jumped over the lazy dogs", "XY&Z Corporation - xyz@yahoo.com" }; private static final Analyzer[] analyzers = new Analyzer[] { new WhitespaceAnalyzer(), new SimpleAnalyzer(), new StopAnalyzer(Version.LUCENE_29), new StandardAnalyzer(Version.LUCENE_29) }; /** * @param args */ public static void main(String[] args) { String[] strings = examples; for (int i = 0; i < strings.length; i++) { analyze(strings[i]); } } public static void analyze(String text) { System.out.println("analyzing : "+ text); for (int i = 0; i < analyzers.length; i++) { Analyzer analyzer= analyzers[i]; String name = analyzer.getClass().getName(); System.out.println("full name: "+name); name = name.substring(name.lastIndexOf('.')+1); System.out.println("name: "+ name); AnalyzerUtils.displayTokens(analyzer,text); } } } class AnalyzerUtils { public static TermAttribute termAtt; public static void displayTokens(Analyzer analyzer,String text) { TokenStream stream = analyzer.tokenStream("contents" , new StringReader(text)); termAtt = stream.addAttribute(TermAttribute.class); try { while (stream.incrementToken()) { System.out.println(termAtt.term()); } } catch (IOException e) { e.printStackTrace(); } } }

不多解释，自行研究吧，呵呵

PS：如果要学习搜索引擎，我现在看的两本书是非常的好，一本是：Nutch+Lucene搜索引擎开发，教你如何起步配置搜索引擎，很详细，已成功，哈哈，另外一本就是：Lucene in action,有些比较专业的信息，想深入研究lucene必须要看，只是我买的这本是06年的，好悲哀。。。不知道有没有新版本。。。。

PS2：开源太他妈伟大了！！！