lucene的索引创建于搜索初学Lucene(一)

最新推荐文章于 2024-11-08 15:17:56 发布

ruanjianzhong

最新推荐文章于 2024-11-08 15:17:56 发布

阅读量104

点赞数

分类专栏： Lucene 文章标签： lucene 全文检索 C C++ C#

本文链接：https://blog.csdn.net/ruanjianzhong/article/details/83614411

版权

Lucene 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

创建索引:

public static void createSearch()throws CorruptIndexException, LockObtainFailedException, IOException
{

//文件的位置
  File fileDir=new File("C:\\lucenetest");
   //索引文件的位置
  File indexDir=new File("C:\\luceneindex");
  Analyzer luceneAnalyzer=new StandardAnalyzer();
  IndexWriter indexWriter=new IndexWriter(indexDir,luceneAnalyzer,true);

  File []textFiles=fileDir.listFiles();
  long startTime=new Date().getTime();

  //添加到Document到索引去
  for(int i=0;i<textFiles.length;i++)
  {
    if(textFiles[i].isFile() && textFiles[i].getName().endsWith(".txt"))
    {
      System.out.println("File "+textFiles[i].getCanonicalPath()+"正在被索引");
      String temp=FileReaderAll(textFiles[i].getCanonicalPath(),"GBK");
      System.out.println("temp= "+temp);
      Document document=new Document();
      Field fieldPath=new Field("path",textFiles[i].getPath(), Field.Store.YES, Field.Index.NO);
      Field fieldBody=new Field("body",temp,Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS);

      document.add(fieldPath);
      document.add(fieldBody);
      indexWriter.addDocument(document);
    }
  }
  //optimize()方法是对索引进行优化
  indexWriter.optimize();
  indexWriter.close();

  //测试索引使用的时间
  long endTime=new Date().getTime();
  System.out.println("共花费了"+(endTime-startTime)+"毫秒完成文档到索引中去"+fileDir.getPath());

}

==================================================================

public static String FileReaderAll(String FileName,String charset) throws IOException
    {
    BufferedReader read=new BufferedReader(new InputStreamReader(new FileInputStream   (FileName),charset));
    String line=new String();
    String temp=new String();

    while((line=read.readLine())!=null)
    {
      temp+=line;
    }
    read.close();
    return temp;
    }

读取文件目录下的所有文件

====================================================================

建立索引: 为了对文档进行索引，Lucene 提供了五个基础的类，他们分别是 Document, Field, IndexWriter, Analyzer, Directory。

Document:

Document 是用来描述文档的，这里的文档可以指一个 HTML 页面，一封电子邮件，或者是一个文本文件。一个 Document 对象由多个 Field 对象组成的。可以把一个 Document 对象想象成数据库中的一个记录，而每个 Field 对象就是记录的一个字段。

Field: Field 对象是用来描述一个文档的某个属性的，比如一封电子邮件的标题和内容可以用两个 Field 对象分别描述。

Analyzer: 在一个文档被索引之前，首先需要对文档内容进行分词处理，这部分工作就是由 Analyzer 来做的。Analyzer 类是一个抽象类，它有多个实现。针对不同的语言和应用需要选择适合的 Analyzer。Analyzer 把分词后的内容交给 IndexWriter 来建立索引。

IndexWriter: IndexWriter 是 Lucene 用来创建索引的一个核心的类，他的作用是把一个个的 Document 对象加到索引中来。

Directory: 这个类代表了 Lucene 的索引的存储的位置，这是一个抽象类，它目前有两个实现，第一个是 FSDirectory，它表示一个存储在文件系统中的索引的位置。第二个是 RAMDirectory，它表示一个存储在内存当中的索引的位置。

索引查询:

利用Lucene进行搜索就像建立索引一样也是非常方便的。在上面一部分中，我们已经为一个目录下的文本文档建立好了索引，现在我们就要在这个索引上进行搜索以找到包含某个关键词或短语的文档。Lucene提供了几个基础的类来完成这个过程，它们分别是呢IndexSearcher, Term, Query, TermQuery, Hits. 下面我们分别介绍这几个类的功能。

Query: 这是一个抽象类，他有多个实现，比如TermQuery, BooleanQuery, PrefixQuery. 这个类的目的是把用户输入的查询字符串封装成Lucene能够识别的Query。

Term: Term是搜索的基本单位，一个Term对象有两个String类型的域组成。生成一个Term对象可以有如下一条语句来完成：Term term = new Term(“fieldName”,”queryWord”); 其中第一个参数代表了要在文档的哪一个Field上进行查找，第二个参数代表了要查询的关键词。

TermQuery: TermQuery是抽象类Query的一个子类，它同时也是Lucene支持的最为基本的一个查询类。生成一个TermQuery对象由如下语句完成： TermQuery termQuery = new TermQuery(new Term(“fieldName”,”queryWord”)); 它的构造函数只接受一个参数，那就是一个Term对象。

IndexSearch: IndexSearcher是用来在建立好的索引上进行搜索的。它只能以只读的方式打开一个索引，所以可以有多个IndexSearcher的实例在一个索引上进行操作。

Hits: hits是用来保存搜索的结果的。

public static void Search() throws CorruptIndexException, IOException, ParseException
{
   //索引文件目录位置
      String filepath="C:\\luceneindex";
      File indexDir=new File(filepath);
      FSDirectory directory=FSDirectory.getDirectory(indexDir);
      IndexReader reader=IndexReader.open(FSDirectory.getDirectory(new File(filepath)),false);

      IndexSearcher searcher=new IndexSearcher(directory);
      if(!indexDir.exists())
      {
         System.out.println("The Lucene index is not exist..");
         return ;
      }
      Term term =new Term("body","中");
      TermQuery luceneQuery=new TermQuery(term);
      Hits hit=searcher.search(luceneQuery);
      for(int i=0;i<hit.length();i++)
      {
     Document document=hit.doc(i);
     System.out.println("File: "+document.get("path"));
     System.out.println(document.get("body"));
      }

}