Lucene的简单使用

最新推荐文章于 2024-07-04 14:41:08 发布

罗罗的1024

最新推荐文章于 2024-07-04 14:41:08 发布

阅读量307

点赞数

分类专栏：中间件文章标签： lucene java 搜索引擎

本文链接：https://blog.csdn.net/qq_42224683/article/details/110897327

版权

中间件专栏收录该内容

4 篇文章 0 订阅

订阅专栏

本文介绍了Apache Lucene作为开源全文检索引擎的基础原理，包括其倒排索引的核心技术，并通过Java示例展示了如何创建索引库和进行查询操作。内容涵盖了Lucene的使用步骤、依赖管理和关键技术如IKAnalyzer。

摘要由CSDN通过智能技术生成

简介

Lucene是apache软件基金会4 jakarta项目组的一个子项目，是一个开放源代码的全文检索引擎工具包，但它不是一个完整的全文检索引擎，而是一个全文检索引擎的架构，提供了完整的查询引擎和索引引擎，部分文本分析引擎（英文与德文两种西方语言）。Lucene的目的是为软件开发人员提供一个简单易用的工具包，以方便的在目标系统中实现全文检索的功能，或者是以此为基础建立起完整的全文检索引擎。

原理

lucene底层实现原理就是倒排索引（invertedindex）

开始使用

依赖

 <!--lucene核心包-->
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-core</artifactId>
            <version>7.6.0</version>
        </dependency>
        <!--对分词索引查询解析-->
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-queryparser</artifactId>
            <version>7.6.0</version>
        </dependency>
        <!-- smartcn中文分词器 -->
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-analyzers-smartcn</artifactId>
            <version>7.6.0</version>
        </dependency>
        <dependency>
            <groupId>com.jianggujin</groupId>
            <artifactId>IKAnalyzer-lucene</artifactId>
            <version>8.0.0</version>
        </dependency>

创建索引库

数据存放位置
在这里插入图片描述
Java代码示例

/**
 * @Auther: 罗罗
 * @Description: 创建索引库
 */
public class LuceneDemo {
    //数据所在位置
    private static final String datapath="D://logs";
    //存放索引库的位置
    private static final String dir="D://index";

    public static void main(String[] args) throws IOException {
        //指定索引库
        Directory directory = FSDirectory.open(Paths.get(dir));
        //分词器
        Analyzer analyzer = new IKAnalyzer();
        //索引写出器配置类
        IndexWriterConfig conf = new IndexWriterConfig(analyzer);
        conf.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
        //索引写出器
        IndexWriter indexWriter = new IndexWriter(directory,conf);

        File[] files = new File(datapath).listFiles();
        for (File file: files) {
            //文档对象
            Document  document = new Document();
            //在文档对象中添加域
            document.add(new StringField("filename",file.getName(),Field.Store.YES));
            document.add(new TextField("content", FileUtils.readFileToString(file),Field.Store.YES));
            document.add(new LongPoint("lastModified", file.lastModified()));
            //把文档添加到索引写出器中
            indexWriter.addDocument(document);
        }
        //提交
        indexWriter.commit();
        //释放资源
        indexWriter.close();
    }
}

结果如下
在这里插入图片描述

查询示例


/**
 * @Description: 搜索
 */
public class SearchLucene {
    //private static final String data="D://logs";
    private static final String dir="D://index";

    public static void main(String[] args) throws IOException, ParseException {
        //指定索引库
        Directory directory = FSDirectory.open(Paths.get(dir));
        //构建索引阅读器
        IndexReader indexReader = DirectoryReader.open(directory);
        //构建索引搜索器
        IndexSearcher searcher = new IndexSearcher(indexReader);
        //分词器
        Analyzer analyzer = new IKAnalyzer();
        //表示解析指定的域，分词器要和写入索引库一样
        QueryParser parser = new QueryParser("content",analyzer);
        //根据解析器解析关键词，获取查询对象
        Query query = parser.parse("log4j");
        //满足搜索条件（关键字）的前N个搜索结果.
        TopDocs topDocs = searcher.search(query, 10);
        //满足搜索条件（关键字）的总数.
        System.out.println(topDocs.totalHits);
        //成绩文档
        ScoreDoc[] scoreDocs = topDocs.scoreDocs;
        //遍历成绩文档
        for (ScoreDoc scoreDoc : scoreDocs) {
            //获取成绩文档的id
            int dociD = scoreDoc.doc;
            //通过阅读器找到指定id的文档
            Document document = indexReader.document(dociD);
            //从文档中获取相关的信息
            String filename = document.get("filename");
            System.out.println(filename);
        }
    }
}

结果如下

"C:\Program Files\Java\jdk1.8.0_144\bin\java.exe" "-javaagent:D:\IntelliJ IDEA\IntelliJ IDEA 2018.2.4\lib\idea_rt.jar=63461:D:\IntelliJ IDEA\IntelliJ IDEA 2018.2.4\bin" -Dfile.encoding=UTF-8 -classpath "C:\Program Files\Java\jdktx\5.3.1\spring-tx-5.3.1.jar" com.example.demo33.SearchLucene
加载扩展词典：ext.dic
加载扩展停止词典：stopword.dic
3
error.log
web.log
log4j2.log

Process finished with exit code 0

罗罗的1024

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
Lucene的简单使用

简介Lucene是apache软件基金会4 jakarta项目组的一个子项目，是一个开放源代码的全文检索引擎工具包，但它不是一个完整的全文检索引擎，而是一个全文检索引擎的架构，提供了完整的查询引擎和索引引擎，部分文本分析引擎（英文与德文两种西方语言）。Lucene的目的是为软件开发人员提供一个简单易用的工具包，以方便的在目标系统中实现全文检索的功能，或者是以此为基础建立起完整的全文检索引擎。原理lucene底层实现原理就是倒排索引（invertedindex）开始使用依赖 <!--luce
复制链接

扫一扫