Lucene学习--创建索引库

最新推荐文章于 2022-06-26 21:48:48 发布

ysmbdjglww

最新推荐文章于 2022-06-26 21:48:48 发布

阅读量248

点赞数

分类专栏： Lucene 文章标签： Lucene

本文链接：https://blog.csdn.net/Ysmbdjglww/article/details/83215307

版权

Lucene 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Lucene是全文检索，全文检索是计算机程序通过扫描文章中的每一个词，对每一个词建立索引，并指明该词在文章中出现的次数和位置。当用户查询时根据建立的索引进行查找，就好像我们使用字典的检索来查字一样。

Lucene的原理

先来讲一讲Lucene的原理

先是根据对象文件或数据创建索引库，索引库中是二进制形式的文件。索引库中分为目录区域和数据区域。

比如：

这个分词是根据所使用的分词器来决定的。

索引库创建好了之后就可以对其进行搜索了。

创建好了索引库之后，还可以使用方法对索引库进行增删改查操作

特点：

1.由于是索引查询，检索速度快，搜索结果更加准确。
2.生成文本摘要，摘要截取搜索的文字出现嘴都的地方
3.显示查询的文字高亮
4.分词查询

如何使用Lucene创建索引库

1.在使用Lucene之前要进行配置

pom.xml配置文件

  <!--配置lucene-->
    <!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-core -->
    <dependency>
      <groupId>org.apache.lucene</groupId>
      <artifactId>lucene-core</artifactId>
      <version>7.5.0</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-queryparser -->
    <dependency>
      <groupId>org.apache.lucene</groupId>
      <artifactId>lucene-queryparser</artifactId>
      <version>7.5.0</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-analyzers-common -->
    <dependency>
      <groupId>org.apache.lucene</groupId>
      <artifactId>lucene-analyzers-common</artifactId>
      <version>7.5.0</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-analyzers-smartcn -->
    <dependency>
      <groupId>org.apache.lucene</groupId>
      <artifactId>lucene-analyzers-smartcn</artifactId>
      <version>7.5.0</version>
    </dependency>

2.使用Lucene进行检索之前，首先要创建索引

索引库中存放的数据要转换成Document对象（每条数据就是一个Document对象），并向Document对象中存放Field对象（每条数据对应的字段），将每个字段中的值都存放到Field对象中。

public class Hello {

    //检索的话要创建索引
    //写索引实例
    private IndexWriter writer;

    /**
     * 关于索引目录下的索引文件，其结构是这样的：
     * index -->segment-->document-->Field-->Term
     */
    /**
     * 构造方法，实例化IndexWriter
     * @param indexDir
     * @throws Exception
     */
    public Hello(String indexDir) throws Exception
    {
        //indexDir 是把索引写入指定的目录去
        Directory directory= FSDirectory.open(Paths.get(indexDir));
        Analyzer analyzer=new StandardAnalyzer();//标准分词器
        IndexWriterConfig indexWriterConfig=new IndexWriterConfig(analyzer);
        //通过indexWriterConfig可以设置index Writer的属性
        writer=new IndexWriter(directory,indexWriterConfig);
        //每个基本的indexWriter对象都必须有两个必要的属性，操作的索引目录dir和分析器
        //indexWriter的分词器和IndexSearch的分词器应该是相同的，否则将会影响搜索结果
    }
    public void close() throws Exception{
        writer.close();
    }
    /**
     * 索引指定目录的所有文件
     * @param dataDir
     * @throws Exception
     */
    public int  index(String dataDir) throws Exception
    {
        //定位到文件目录  然后进行遍历
        File  files[]=new File(dataDir).listFiles();
        for(File f:files)
        {
            indexFile(f);//对每个文件进行索引
        }
        //返回索引的文件个数
        return writer.numDocs();
    }
    /**
     *索引指定文件
     */
    private  void  indexFile(File f) throws  Exception
    {
        System.out.println("索引文件"+f.getCanonicalPath());
        Document document=getDocument(f);
        writer.addDocument(document);
    }
    /**
     * 获取文档,文档里再设置每个字段
     * @param f
     * @throws Exception
     */
    private Document getDocument(File f) throws Exception
    {
        Document document=new Document();
        document.add(new TextField("contents",new FileReader(f)));//文件的内容
        document.add(new TextField("fileName",f.getName(), Field.Store.YES));//Field.Store.YES吧文件名直接存入这个文件索引中
        document.add(new TextField("fullPath",f.getCanonicalPath(), Field.Store.YES));//
        return document;
    }

    public static void main(String[] args)
    {
        String indexDir="D:\\java_learn\\lucene\\lucene4";
        String dataDir="D:\\java_learn\\lucene\\lucene4\\data";
        Hello hello=null;
        int numIndex=0;
        long start=System.currentTimeMillis();
        try {
            hello=new Hello(indexDir);
            numIndex=hello.index(dataDir);
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            try {
                hello.close();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }

        long end=System.currentTimeMillis();
        System.out.println("索引："+numIndex+"个文件");
        System.out.println("花费了"+(end-start));
        }

最终会在D:\\java_learn\\lucene\\lucene4\\data目录下生成索引库.

ysmbdjglww

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Lucene学习--创建索引库

Lucene是全文检索，全文检索是计算机程序通过扫描文章中的每一个词，对每一个词建立索引，并指明该词在文章中出现的次数和位置。当用户查询时根据建立的索引进行查找，就好像我们使用字典的检索来查字一样。Lucene的原理先来讲一讲Lucene的原理先是根据对象文件或数据创建索引库，索引库中是二进制形式的文件。索引库中分为目录区域和数据区域。比如：这个分词是根据所使用的分词...
复制链接

扫一扫