lucene实际运用

最新推荐文章于 2024-11-06 08:47:46 发布

xi_han

最新推荐文章于 2024-11-06 08:47:46 发布

阅读量439

点赞数

文章标签： lucene exception blog string 全文检索 search

本文链接：https://blog.csdn.net/xi_han/article/details/1814936

版权

刚结束了一个项目，回头想了下整个开发流程，决定再总结下缓存问题及lucene（全文检索）的运用。

首先来谈下lucene

项目的service端运用spring+hibernate开发。其间用到lucene做全文检索。版本为2.2，分词用的是JE-Analysis1.5.1.MMAnalyzer.建立索引用到队列。

我们先在blogservice里初始化索引路径，其实现是在spring配置文件里设置：

项目中主要是要求提高性能，所以采用队列来创建索引。List waitToIndexList=new LinkedList();

创建索引部分：

IndexWriter indexWriter;

IndexSearcher indexSearcher;

创建新线程：Thread indexThread=new Thread(this);

实例分词： MMAnalyzer analyzer=new MMAnalyzer();

初始化部分：

public void init(){

                              File rp = new File(indexPathRoot);
  if (!rp.exists()) {
   rp.mkdirs();
  }
  File segments = new File(indexPathRoot + File.separator
    + "segments.gen");
  boolean bCreate = true;
  if (segments.exists()) {
   bCreate = false;
  }
  try {

   indexWriter = new IndexWriter(indexPathRoot, analyzer, bCreate);
   indexSearcher = new IndexSearcher(indexPathRoot);
  } catch (Exception e) {
   logger.error("init indexWriter fail", e);
  }
  indexThread.start();                      //启动线程

}

public void run(){

while (!indexThread.isInterrupted()) {
   if (!waitToIndexList.isEmpty()) {
    Blog blog = (Blog) waitToIndexList.remove(0);
    Document doc = new Document();
    doc.add(new Field("blogID", blog.getBlogID(), Field.Store.YES,
      Field.Index.UN_TOKENIZED));
    doc.add(new Field("title", blog.getTitle(), Field.Store.YES,
      Field.Index.TOKENIZED));
    doc.add(new Field("content", blog.getContent(),
      Field.Store.YES, Field.Index.TOKENIZED));
    doc.add(new Field("author", blog.getClientUser().getNickName(),
      Field.Store.YES, Field.Index.TOKENIZED));

    try {
     indexWriter.addDocument(doc);
     indexWriter.flush();
     indexWriter.optimize();
    } catch (Exception e) {
     logger.error("create index error", e);
    }

}

   try {
    Thread.sleep(50);
   } catch (Exception e) {
    logger.error(e);
   }

}}

每创建一个新BLOG对象，我们将该对象塞到队列waitToIndexList中。

public void addBlog(Blog blog) {

  AddBlogTask saveBlogTask = new AddBlogTask(blog, blogDAO);
  asyncService.doTask(saveBlogTask);
  // 更新缓存
  String k = "KEY_BLOG" + blog.getBlogID();
  cacheService.put(k, blog);

this.waitToIndexList.add(blog);
}

search部分：

public Hits searchBlogByLucene(String keyword){

//首先从缓存中取看是否能取到。

Hits hits = (Hits) cacheService.get("BLOG_SEARCH_" + keyword);
if (hits == null) {

   try {
    MultiFieldQueryParser queryParser = new MultiFieldQueryParser(
      new String[] { "title", "content", "author" }, analyzer);
    Query query = queryParser.parse(keyword);
    hits = indexSearcher.search(query);

   } catch (Exception e) {
    logger.error("search " + keyword, e);
   }

  //缓存中没有的情况下再将搜到的结果塞到缓存中。
   cacheService.put("BLOG_SEARCH_" + keyword, hits);
  }
  return hits;

}//享元（flyweight）模式

搜索结果部分：

public List searchBlogs(String keyword, int off, int max) {

  Hits hits = searchBlogByLucene(keyword);
  String[] ids = new String[hits.length()];
  for (int i = 0; i < hits.length(); i++) {
   Document docTemp;
   try {
    docTemp = hits.doc(i);
    String blogID = docTemp.get("blogID");
    ids[i] = blogID;
   } catch (Exception e) {

e.printStackTrace();
}

}
List hitsList = blogDAO.getBlogsByBlogIDS(ids, off, max);

return hitsList;
}

搜索结果集大小：

public int getSearchBlogsCount(String keyword) {
  Hits hits = searchBlogByLucene(keyword);
  if (hits != null) {
   return hits.length();
  } else {
   return 0;
  }
}