Lucene和IKAnalyzer处理中文:索引、搜索实例[续]

版权声明:本文为博主原创文章,遵循 CC 4.0 by-sa 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/cesul/article/details/83808379
版本:lucene3.02, IKAnalyzer3.20
上一篇分享了应用Lucene和IKAnalyzer如何对中文建索引的方法,现在讨论如何在索引基础上进行搜索。
搜索程序(Searcher.java)使用了IKAnalyzer自带的一个查询分析器IKQueryParser,用它来解析keyword经行查询。
根据一般的检索原理,keyword也是作为一个特殊的Document来参与"打分"的。所以keyword解析的好坏直接影响搜索结果的呈现。既然IKAnalyzer的作者“吐血推荐”IKQueryParser来代替lucene原有的解析查询,那我们还是来尝试一下吧。
/** 搜索模块 **/
public class Searcher {
private File indexDir = new File("F:\\indexDir");;

public void search(String fieldName, String keyword) {
Directory directory = null;
IndexSearcher is = null;
try {
// 实例化搜索器
directory = FSDirectory.open(indexDir);
is = new IndexSearcher(directory, true);
is.setSimilarity(new IKSimilarity()); // 在索引器中使用IKSimilarity相似度评估器

Query query = IKQueryParser.parse(fieldName, keyword); // 使用IKQueryParser查询分析器构造Query对象

TopDocs topDocs = is.search(query, 50); // 搜索相似度最高的50条记录
System.out.println("命中:" + topDocs.totalHits);

// 输出结果
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
for (int i = 0; i < topDocs.totalHits; i++) {

Document targetDoc = is.doc(scoreDocs[i].doc);
System.out.println("相关性评分:" + scoreDocs[i].score + "\t位置:" + targetDoc.toString());
}

} catch (CorruptIndexException e) {
e.printStackTrace();
} catch (LockObtainFailedException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (is != null) {
try {is.close();}
catch (IOException e) {e.printStackTrace();}
}
if (directory != null) {
try {directory.close();}
catch (IOException e) {e.printStackTrace();}
}
}
}

public static void main(String[] args) {
Searcher search = new Searcher();
search.search("text", "lucene");
}
}
展开阅读全文

Lucene结合IKAnalyzer内存溢出

12-25

[code=java]rn public LuceneDomain searchIndex(String searchStr) throws Exceptionrn rn File indexDir = new File(PropertiesUtil.getPropertyValue(searchDirKEY)); rn rn String[] fields=new String[]"id","source","title","context","url";rn //索引目录 rn Directory dir=FSDirectory.open(indexDir); rn //根据索引目录创建读索引对象 rn IndexReader reader = IndexReader.open(dir); rn //搜索对象创建 rn IndexSearcher searcher = new IndexSearcher(reader); rn //IKAnalyzer中文分词 rn Analyzer analyzer = new IKAnalyzer(); rn //创建查询解析对象 rn //QueryParser parser = new QueryParser(Version.LUCENE_36,"context", analyzer);rn QueryParser parser = new MultiFieldQueryParser(Version.LUCENE_36,fields, analyzer);rn parser.setDefaultOperator(QueryParser.AND_OPERATOR); rn //根据域和目标搜索文本创建查询器 rn //Query query = parser.parse(searchStr);rn Query query =IKQueryParser.parseMultiField(fields, searchStr);rn System.out.println("Searching for: " + query.toString("context")); rn //对结果进行相似度打分排序 rn TopScoreDocCollector collector = TopScoreDocCollector.create(maxBufferedDocs,false); rn searcher.search(query, collector);rn //获取结果 rn ScoreDoc[] hits = collector.topDocs().scoreDocs; rn rn int numTotalHits = collector.getTotalHits(); rn LuceneDomain lucene=new LuceneDomain();rn lucene.setTotalNum(numTotalHits);rn lucene.setSearchText(searchStr);rn List searchList=new ArrayList();rn //显示搜索结果 rn SearchDomain search=null;rn for (int i = 0; i < hits.length; i++) rn search=new SearchDomain();rn Document doc = searcher.doc(hits[i].doc);rn// String url = doc.get("url");rn// String title=doc.get("title"); rn String context=Tools.replaceHtml(doc.get("context")); rn search.setId(Integer.parseInt(doc.get("id")));rn search.setSource(Integer.parseInt(doc.get("source")));rn search.setTitle(Tools.replaceHtml(doc.get("title")));rn if(context.length()>100)rn search.setContext(context.substring(0,100));rn elsern search.setContext(context);rn rn search.setUrl(doc.get("url"));rn// System.out.println((i + 1) + "." + title);rn// System.out.println("-----------------------------------");rn// System.out.println(context.substring(0,100)+"......");rn// System.out.println(url+"......");rn// System.out.println("-----------------------------------");rn// System.out.println(url);rn searchList.add(search);rn rn lucene.setSearchData(searchList);rn return lucene;rn rn[/code]rnrn这里 //Query query = parser.parse(searchStr);rn Query query =IKQueryParser.parseMultiField(fields, searchStr);rn我如果用原版的分词器没问题rn用这个IK分词器的话就会出现内存溢出!我设置了512的内存才5000行新闻就溢出了我想知道是什么问题 并不想去改内存大小,rn因为英文分词器可以达到20万条 论坛

没有更多推荐了,返回首页