基于Lucene多索引进行索引和搜索

Lucene支持在多个索引目录中创建和存储索引,并通过ParallelMultiSearcher或MultiSearcher实现全局得分计算。博客内容介绍了如何将索引随机分布到多个目录,以及如何处理索引和搜索,确保搜索时能聚合所有索引数据进行匹配和得分计算。此外,还讨论了索引更新时防止数据重复的方法以及搜索实现的两种方式。
摘要由CSDN通过智能技术生成

Lucene支持创建多个索引目录,同时存储多个索引。我们可能担心的问题是,在索引的过程中,分散地存储到多个索引目录中,是否在搜索时能够得到全局的相关度计算得分,其实Lucene的ParallelMultiSearcher和MultiSearcher支持全局得分的计算,也就是说,虽然索引分布在多个索引目录中,在搜索的时候还会将全部的索引数据聚合在一起进行查询匹配和得分计算。


索引目录处理


下面我们通过将索引随机地分布到以a~z的26个目录中,并实现一个索引和搜索的程序,来验证一下Lucene得分的计算。

首先,实现一个用来构建索引目录以及处理搜索的工具类,代码如下所示:

package org.shirdrn.lucene;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Random;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantLock;

import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriter.MaxFieldLength;
import org.apache.lucene.search.DefaultSimilarity;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Searchable;
import org.apache.lucene.search.Similarity;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.LockObtainFailedException;
import org.shirdrn.lucene.MultipleIndexing.IndexWriterObj;

/**
 * Indexing accross multiple Lucene indexes.
 * 
 * @author shirdrn
 * @date   2011-12-12
 */
public class IndexHelper {
	
	private static WriterHelper writerHelper = null;
	private static SearcherHelper searcherHelper = null;
	
	public static WriterHelper newWriterHelper(String root, IndexWriterConfig indexConfig) {
		return WriterHelper.newInstance(root, indexConfig);
	}
	
	public static SearcherHelper newSearcherHelper(String root, IndexWriterConfig indexConfig) {
		return SearcherHelper.newInstance(root, indexConfig);
	}

	protected static class WriterHelper {
		private String alphabet = "abcdefghijklmnopqrstuvwxyz";
		private Lock locker = new ReentrantLock();
		private String indexRootDir = null;
		private IndexWriterConfig indexConfig;
		private Map<Character, IndexWriterObj> indexWriters = new HashMap<Character, IndexWriterObj>();
		private static Random random = new Random();
		private WriterHelper() {
			
		}
		private synchronized static WriterHelper newInstance(String root, IndexWriterConfig indexConfig) {
			if(writerHelper==null) {
				writerHelper = new WriterHelper();
				writerHelper.indexRootDir = root;
				writerHelper.indexConfig = indexConfig;
			}
			return writerHelper;
		}
		public IndexWriterObj selectIndexWriter() {
			int pos = random.nextInt(alphabet.length());
			char ch = alphabet.charAt(pos);
			String dir = new String(new char[] {ch});
			locker.lock();
			try {
				File path = new File(indexRootDir, dir);
				if(!path.exists()) {
					path.mkdir();
				}
				if(!indexWriters.containsKey(ch)) {
					IndexWriter indexWriter = new IndexWriter(FSDirectory.open(path), indexConfig.getAnalyzer(), MaxFieldLength.UNLIMITED);
					indexWriters.put(ch, new IndexWriterObj(indexWriter, dir));
				}
			} catch (CorruptIndexException e) {
				e.printStackTrace();
			} catch (LockObtainFailedException e) {
				e.printStackTrace();
			} catch (IOException e) {
				e.printStackTrace();
			} finally {
				locker.unlock();
			}
			return indexWriters.get(ch);
		}
		@SuppressWarnings("deprecation")
		public void closeAll(boolean autoOptimize) {
			Iterator<Map.Entry<Character, IndexWriterObj>> iter = indexWriters.entrySet().iterator();
			while(iter.hasNext()) {
				Map.Entry<Character, IndexWriterObj> entry = iter.next();
				try {
					if(autoOptimize) {
						entry.getValue().indexWriter.optimize();
					}
					entry.getValue().indexWriter.close();
				} catch (CorruptIndexException e) {
					e.printStackTrace();
				} catch (IOException e) {
					e.printStackTrace();
				}
			}
		}
	}
	
	protected static class SearcherHelper {
		private List<IndexSearcher> searchers = new ArrayList<IndexSearcher>();
		private Similarity similarity = new DefaultSimilarity();
		private SearcherHelper() {
			
		}
		private synchronized static SearcherHelper newInstance(String root, IndexWriterConfig indexConfig) {
			if(searcherHelper==null) {
				searcherHelper = new SearcherHelper();
				if(indexConfig.getSimilarity()!=null) {
					searcherHelper.similarity = indexConfig.getSimilarity();
				}
				File indexRoot = new File(root);
				File[] files = indexRoot.listFiles();
				for(File f : files) {
					IndexSearcher searcher = null;
					try {
						searcher = new IndexSearcher(FSDirectory.open(f));
					} catch (CorruptIndexException e) {
						e.printStackTrace();
					} catch (IOException e) {
						e.printStackTrace();
					}
					if(searcher!=null) {
						searcher.setSimilarity(searcherHelper.similarity);
						searcherHelper.searchers.add(searcher);
					}
				}
			}
			return searcherHelper;
		}
		public void closeAll() {
			Iterator<IndexSearcher> iter = searchers.iterator();
			while(iter.hasNext()) {
				try {
					iter.next().close();
				} catch (IOException e) {
					e.printStackTrace();
				}
			}
		}
		public Searchable[] getSearchers() {
			Searchable[] a = new Searchable[searchers.size()];
			return searchers.toArray(a);
		}
	}
}
由于在索引的时候,同时打开了多个Directory实例,而每个Directory对应一个IndexWriter,我们通过记录a~z这26个字母为每个IndexWriter的名字,将IndexWriter和目录名称包裹在IndexWriterObj类的对象中,便于通过日志看到实际数据的分布。在进行Lucene Document构建的时候,将这个索引目录的名字(a~z字符中之一)做成一个Field。在索引的时候,值需要调用IndexHelper.WriterHelper的selectIndexWriter()方法,即可以自动选择对应的IndexWriter实例去进行索引。

在搜索的时候,通过IndexHelper.SearcherHelper工具来获取多个Searchable实例的数组,调用getSearchers()即可以获取到,提供给MultiSearcher构建搜索。


索引实现


我们的数据源,选择根据指定的查询条件直接从Mon

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值