lucene学习之创建索引的方式

最新推荐文章于 2024-05-20 09:38:00 发布

woshixhw

最新推荐文章于 2024-05-20 09:38:00 发布

阅读量423

点赞数

分类专栏：搜索引擎文章标签： lucene string path 存储 file class

本文链接：https://blog.csdn.net/woshixhw/article/details/5933257

版权

搜索引擎专栏收录该内容

4 篇文章 0 订阅

订阅专栏

原文出处： http://blog.sina.com.cn/s/blog_694448320100lye3.html

1:理解创建索引的过程

创建索引的过程可以类比为写文集，下面以文集的写作为例进行详解，文集里面有许多文章，每一章包括标题、内容、作者名称、写作时间等信息。

首先为每一篇文章添加标题、内容、写作时间等信息，从而写好每一篇文章。

然后把每一篇文章添加到书里面去。

这样，文集就写好了。

创建索引的过程如下：

1建立索引器IndexWriter，这相当于一本书的框架。

2建立文档对象Document，这相当于一篇文章

3建立信息字段对象Field，这相当于一篇文章的不通信息(标题正文).

4将Field添加到Document里面。

5将Document添加到IndexWriter里面.

6关闭索引器IndexWriter。

基本上有三个步骤：

第一：创建Field，将文章的不通信息包装起来。

第二：将多个Field组织到一个Document里面，这样就完成了对一篇文章的包装。

第三：将多个Document组织到一个IndexWriter里面，也就是将多个文章组装起来，最终形成索引。

下面讲解一下创建索引的具体方法

①创建Field方法有很多，下面的是最常用的方法。

Field field = newField(Field名称，Field内容，存储方式，索引方式);

参数介绍：

Field名称就是为Field起的名字，类似数据表的字段名称。

Field内容就是该Field的内容，类似数据表的字段内容。

存储方式包括三种：

不存储(Field.Store.NO) 、完全存储(Field.Store.YES)、压缩存储(Field.Store.COMPRESS)

注意：Field的内容不太大的话就用完全存储，否则就采用压缩存储等

索引方式包括四种：

不索引(Field.Index.No)、索引但不分歧(Field.Index.NO_NORMS)、索引但不分词(Field.Index.UN_TOKENIZED)、分词并索引(Field.Index.TOKENIZED)

注意：通常文章标题和全文进行模糊搜索，这类需要进行模糊搜索的字段就用Field.Index.TOKENIZED.通常我们会按照作者名称进行精确搜索，需要进行精确搜索的字段就用Field.Index.UN_TOKENIZED.对于那些只需要跟着搜索结果显示出来却不需要按照内容进行搜索的字段，使用Field.Index.NO。

②创建Document的方法如下：

Document doc = new Document();

这个方法用来创建一个不含任何Field的空Document。

如果想吧Field添加到Document里面，只需要使用add方法。例如：

doc.add（field）；

③创建IndexWriter的方法很很多，下面是最常用的方法：

IndexWriter writer = new IndexWriter(存储索引的路径，分析器的实例);

存储索引的路径:在物理硬盘上的路径如：d:/aa等

分析器的实例：分析器就是词法分析器，包括英文分析器和中文分析器等，应根据情况使用分析器，常用的分析器有：StanardAnalyzer(标准分析器)、CJKAnalyzer(二分法分词器)、ChineseAnalyzer(中文分析器)和FrenchAnalyzer(法语分析器)等。

举个小例子：

IndexWriter writer = new IndexWriter("D://AA",new ChineseAnalyzer());

使用new IndexWriter()是个空的索引器，要把Document添加到索引中来，需要应用addDocument方法，例如：

writer.addDocument(doc);

最后别忘了关闭索引器。

writer.close();

下面就创建一个简单的索引：

//BasicIndexer.java
package tianen;

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.analysis.standard.StandardAnalyzer;

public class BasicIndexer
{
public static void main(String[] args) throws java.io.IOException
{
  String indexPath = "index";

  //IndexWriter
  IndexWriter writer = new IndexWriter(indexPath,new StandardAnalyzer());

   //Document
   Document doc = new Document();

    //Field -title
    String title = "i love china";
    Field field = new Field("title",title ,Field.Store.YES, Field.Index.TOKENIZED);
    //add field
    doc.add(field);

    //Field -content
    String content = "i love you, my mother land! ";
    field = new Field("content", content ,Field.Store.YES, Field.Index.TOKENIZED);
    //add field
    doc.add(field);

   //add document
   writer.addDocument(doc);

  //close IndexWriter
  writer.close();

  //message
  System.out.println("Index Created!");
}
}
执行后显示信息：

lucene学习之创建索引的方式

执行完成功之后，会发现多了几个文件，这样的话就代表创建成功了，那些都是索引文件

lucene学习之创建索引的方式

上面是一个简单的索引，下面就创建一个复杂点的索引：

//ThreeIndexer.java
package tianen;

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.analysis.standard.StandardAnalyzer;

public class ThreeIndexer
{
public static void main(String[] args) throws java.io.IOException
{
  String indexPath = "three";

  //创建IndexWriter
  IndexWriter writer = new IndexWriter(indexPath,new StandardAnalyzer());

   //创建Document--1
   Document doc = new Document();

    //创建Field -title
    String title = "i love china";
    Field field = new Field("title",title ,Field.Store.YES, Field.Index.TOKENIZED);
    //添加add field
    doc.add(field);

    //创建Field -content
    String content = "i love you, my mother land! ";
    field = new Field("content", content ,Field.Store.YES, Field.Index.TOKENIZED);
    //添加add field
    doc.add(field);

    //创建Field -time
    String time = "2007-05-31";
    field = new Field("time", time ,Field.Store.YES, Field.Index.NO);
    //创建add field
    doc.add(field);

   //添加add document
   writer.addDocument(doc);

   //创建Document--2
   doc = new Document();

    //创建Field -title
    title = "i love mom";
    field = new Field("title",title ,Field.Store.YES, Field.Index.TOKENIZED);
    //add field
    doc.add(field);

    //创建Field -content
    content = "i love you, my mother! ";
    field = new Field("content", content ,Field.Store.YES, Field.Index.TOKENIZED);
    //添加add field
    doc.add(field);

    //创建Field -time
    time = "2007-05-31";
    field = new Field("time", time ,Field.Store.YES, Field.Index.NO);
    //添加add field
    doc.add(field);

   //添加add document
   writer.addDocument(doc);

   //创建Document--3
   doc = new Document();

    //创建Field -title
    title = "i love xiaoyue";
    field = new Field("title",title ,Field.Store.YES, Field.Index.TOKENIZED);
    //添加add field
    doc.add(field);

    //创建Field -content
    content = "i love you, my wife! ";
    field = new Field("content", content ,Field.Store.YES, Field.Index.TOKENIZED);
    //添加add field
    doc.add(field);

    //创建Field -time
    time = "2007-05-31";
    field = new Field("time", time ,Field.Store.YES, Field.Index.NO);
    //add field
    doc.add(field);

   //添加add document
   writer.addDocument(doc);


  //关闭close IndexWriter
  writer.close();

  //提示message
  System.out.println("Index Three Created!");
}
}

执行效果：

lucene学习之创建索引的方式

生成的索引文件：

lucene学习之创建索引的方式

创建成功！

下面文件创建一个索引：

说明：

//FileIndexer.java
package tianen;

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.analysis.standard.StandardAnalyzer;

import java.io.*;

import tool.FileText;

public class FileIndexer
{
public static void main(String[] args) throws java.io.IOException
{
  String indexPath = "file";

  //创建IndexWriter
  IndexWriter writer = new IndexWriter(indexPath,new StandardAnalyzer());

   //创建Document
   Document doc = new Document();
   File f = new File("doc/黑帝.htm");

   //创建Field -name
   String name = f.getName();
   Field field = new Field("name",name ,Field.Store.YES, Field.Index.TOKENIZED);
   //添加add field
   doc.add(field);

   //创建Field -content
   String content = FileText.getText(f);
   field = new Field("content", content ,Field.Store.YES, Field.Index.TOKENIZED);
   //add field
   doc.add(field);

   //创建Field -path
   String path = f.getPath();
   field = new Field("path", path ,Field.Store.YES, Field.Index.NO);
   //添加add field
   doc.add(field);

   //添加add document
   writer.addDocument(doc);

   /创建**************************************************************/
   doc = new Document();
   f = new File("doc/鲧.htm");

   //创建Field -name
   name = f.getName();
   field = new Field("name",name ,Field.Store.YES, Field.Index.TOKENIZED);
   //add field
   doc.add(field);

   //创建Field -content
   content = FileText.getText(f);
   field = new Field("content", content ,Field.Store.YES, Field.Index.TOKENIZED);
   //添加add field
   doc.add(field);

   //创建Field -path
   path = f.getPath();
   field = new Field("path", path ,Field.Store.YES, Field.Index.NO);
   //添加add field
   doc.add(field);

   //添加add document
   writer.addDocument(doc);

  //关闭close IndexWriter
  writer.close();

  //提示message
  System.out.println("File Index Created!");
}
}

执行结果：

lucene学习之创建索引的方式

目录结构:

lucene学习之创建索引的方式

创建文件索引成功了！！！

下面来个更复杂点的：为某个目录的所有文件创建索引：

//LoopIndexer.java
package tianen;

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.analysis.standard.StandardAnalyzer;

import java.io.*;

import tool.FileText;
import tool.FileList;

public class LoopIndexer
{
public static void main(String[] args) throws java.io.IOException
{
  String indexPath = "loop";

  //IndexWriter
  IndexWriter writer = new IndexWriter(indexPath,new StandardAnalyzer());

  String[] files = FileList.getFiles("doc");

  int num = files.length;

  for(int i=0;i<num;i++)
  {
   Document doc = new Document();
   File f = new File(files[i]);

   //Field -name
   String name = f.getName();
   Field field = new Field("name",name ,Field.Store.YES, Field.Index.TOKENIZED);
   //add field
   doc.add(field);

   //Field -content
   String content = FileText.getText(f);
   field = new Field("content", content ,Field.Store.YES, Field.Index.TOKENIZED);
   //add field
   doc.add(field);

   //Field -path
   String path = f.getPath();
   field = new Field("path", path ,Field.Store.YES, Field.Index.NO);
   //add field
   doc.add(field);

   System.out.println("File : " + name + " Indexed!");

   //add document
   writer.addDocument(doc);
  }

  //close IndexWriter
  writer.close();

  //message
  System.out.println("Loop Index Created!");
}
}

执行结果：

lucene学习之创建索引的方式

文件目录结构图如下：

lucene学习之创建索引的方式

这样就为目录为doc的里面的所有文件都建立了索引。

上面仅仅是创建luence索引，关于执行搜索，请看我的另外一个文章《lucene学习之执行搜索》

woshixhw

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
lucene学习之创建索引的方式

1:理解创建索引的过程创建索引的过程可以类比为写文集，下面以文集的写作为例进行详解，文集里面有许多文章，每一章包括标题、内容、作者名称、写作时间等信息。首先为每一篇文章添加标题、内容、写作时间等信息，从而写好每一篇文章。然后把每一篇文章添加到书里面去。这样，文集就写好了。创建索引的过程如下：1建立索引器IndexWriter，这相当于一本书的框架。2建立文档对象Document，这相当于一篇文章3建立信息字段对象Field，这相当于一篇文章的不通信息(标题正文).4将Field添加到Document里面。5
复制链接

扫一扫