由于Clustor的问题造成无法对索引进行同步,脑子中马上浮现用rmi(双机),UDP广播(多机)作通信中间件对clustor进行索引同步但这样经过测试后效率相对较低,故另辟蹊径,最终用索引合并的方式进行快速的索引整合,达到时间短索引同步快的目的。代码如下:
- package com.pccw;
- import java.io.File;
- import org.apache.lucene.analysis.standard.StandardAnalyzer;
- import org.apache.lucene.index.IndexWriter;
- import org.apache.lucene.store.FSDirectory;
- public class AdvancedTextFileIndexer {
- /**
- * @author Shane Zhao about merge Index in PCCW BJDEV
- * 将小索引文件合并到大的索引文件中去
- *
- * @param from
- * 将要合并到to文件的文件
- * @param to
- * 将from文件合并到该文件
- * @param sa
- */
- private static void mergeIndex(File from, File to,StandardAnalyzer sa) {
- IndexWriter indexWriter = null;
- try {
- System.out.println("正在合并索引文件!\t ");
- indexWriter = new IndexWriter(to, sa, false);
- indexWriter.setMergeFactor(100000);
- indexWriter.setMaxFieldLength(Integer.MAX_VALUE);
- indexWriter.setMaxBufferedDocs(Integer.MAX_VALUE);
- indexWriter.setMaxMergeDocs(Integer.MAX_VALUE);
- FSDirectory[] fs = { FSDirectory.getDirectory(from, false) };
- indexWriter.addIndexes(fs);
- indexWriter.optimize();
- indexWriter.close();
- System.out.println("已完成合并!\t ");
- } catch (Exception e) {
- System.out.println("合并索引出错!");
- e.printStackTrace();
- } finally {
- try {
- if (indexWriter != null)
- indexWriter.close();
- } catch (Exception e) {
- }
- }
- }
- public static void main(String[] areg){
- File from = new File("F:/web/faq/lucene/indexDir");
- File to = new File("F:/indexDir");
- mergeIndex(from,to,new StandardAnalyzer());
- }
- }
测试效率为两个150M的索引文件合并时间在10-15s 效率还是很令人满意的。