基于Lucene的搜索引擎的建立

最新推荐文章于 2023-03-20 17:39:29 发布

san_rx

最新推荐文章于 2023-03-20 17:39:29 发布

阅读量1.1k

点赞数

分类专栏： Java 文章标签： lucene

本文链接：https://blog.csdn.net/san_rx/article/details/70941721

版权

Java 专栏收录该内容

9 篇文章 1 订阅

订阅专栏

一、基础知识
1、索引概念
索引建立：数据——>分词——>索引创建
搜索过程：获取关键字——>分词——>检索索引——>返回结果
2、索引数学模型
词元的权重计算：文档中的每个词元都对应一个权重
空间向量模型：将每个词元可以对应为空间中的一个向量
检索：将关键字依旧放入空间中，相当于求与目的词元之间的夹角
3、Lucene的索引文件结构
二、Lucene的使用
1、创建索引
定义分词器
确定索引文件存储的位置
创建IndexWriter，进行索引文件的写入
内容提取，进行索引的存储
2、通过关键字索引文档
打开存储位置
创建搜索器
进行关键字查询
关闭查询器等
官方文档给出的例子

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

    // Store the index in memory:
    Directory directory = new RAMDirectory();
    // To store an index on disk, use this instead:
    //Directory directory = FSDirectory.open("/tmp/testindex");
    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_CURRENT, analyzer);
    IndexWriter iwriter = new IndexWriter(directory, config);
    Document doc = new Document();
    String text = "This is the text to be indexed.";
    doc.add(new Field("fieldname", text, TextField.TYPE_STORED));
    iwriter.addDocument(doc);
    iwriter.close();

    // Now search the index:
    DirectoryReader ireader = DirectoryReader.open(directory);
    IndexSearcher isearcher = new IndexSearcher(ireader);
    // Parse a simple query that searches for "text":
    QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "fieldname", analyzer);
    Query query = parser.parse("text");
    ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
    assertEquals(1, hits.length);
    // Iterate through the results:
    for (int i = 0; i < hits.length; i++) {
      Document hitDoc = isearcher.doc(hits[i].doc);
      assertEquals("This is the text to be indexed.", hitDoc.get("fieldname"));
    }
    ireader.close();
    directory.close();

3、分词器的不同方法Analyzer
CJKAnalyzer、KeywordAnalyzer、SimpleAnalyzer、StopAnalyzer、WhitespaceAnalyzer、StandardAnalyzer、IKAnalyzer
4、搜索器Query的不同方法
QueryParser、 MultiFieldQueryParser、TermQuery 、PrefixQuery、 PhraseQuery、 WildcardQuery、TermRangeQuery、 NumericRangeQuery、 BooleanQuery
搜索中还会用的几个类：
Collector主要用来对搜索结果做收集、自定义排序、过滤等
Filter主要是做筛选条件的，用于指定哪些文档可以在搜索结果中
Sort在检索方法中指定排序方式，相当于数据库中的order by