Lucene

最新推荐文章于 2024-08-31 15:27:10 发布

mr_phy

最新推荐文章于 2024-08-31 15:27:10 发布

阅读量413

点赞数

文章标签： lucene 全文检索

本文链接：https://blog.csdn.net/mr_phy/article/details/73885055

版权

全文搜索

比如在一个文件夹中，有很多文件，如记事本，world，Excel，我们想根据其中的关键词搜索包含的文件。例如，输入lucene，所有内容含有lucenne的文件就会被检查出来，这就是所谓的全文检索。

我们很容易想到的一个方法是建立一个关键词与文件的相关映射。

Lucene架构设计

- 创建索引，通过indexwrite对不用的文件进行索引的创建，并将其保存在索引相关文件存储的位置中。
- 通过索引查询关键字相关文档。

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

    // Store the index in memory:
    Directory directory = new RAMDirectory();
    // To store an index on disk, use this instead:
    //Directory directory = FSDirectory.open("/tmp/testindex");
    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_CURRENT, analyzer);
    IndexWriter iwriter = new IndexWriter(directory, config);
    Document doc = new Document();
    String text = "This is the text to be indexed.";
    doc.add(new Field("fieldname", text, TextField.TYPE_STORED));
    iwriter.addDocument(doc);
    iwriter.close();

    // Now search the index:
    DirectoryReader ireader = DirectoryReader.open(directory);
    IndexSearcher isearcher = new IndexSearcher(ireader);
    // Parse a simple query that searches for "text":
    QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "fieldname", analyzer);
    Query query = parser.parse("text");
    ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
    assertEquals(1, hits.length);
    // Iterate through the results:
    for (int i = 0; i < hits.length; i++) {
      Document hitDoc = isearcher.doc(hits[i].doc);
      assertEquals("This is the text to be indexed.", hitDoc.get("fieldname"));
    }
    ireader.close();
    directory.close();

索引的创建

词法分析器

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

确定索引文件存储的位置，Lucene提供给我们两种方式。

本地文件存储

Directory directory = FSDirectory.open("/tmp/testindex");

内存存储

Directory directory = new RAMDirectory();

创建IndexWrite,进行索引文件的写入

IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_CURRENT, analyzer);
IndexWriter iwriter = new IndexWriter(directory, config);

内容提取，进行索引的存储

Document doc = new Document();
String text = "This is the text to be indexed.";
doc.add(new Field("fieldname", text, TextField.TYPE_STORED));
iwriter.addDocument(doc);
iwriter.close();

　　第一行，申请了一个document对象，这个类似于数据库中的表中的一行。

　　第二行，是我们即将索引的字符串。

　　第三行，把字符串存储起来（因为设置了TextField.TYPE_STORED,如果不想存储，可以使用其他参数，详情参考官方文档），并存储“表明”为”fieldname”.

　　第四行，把doc对象加入到索引创建中。

　　第五行，关闭IndexWriter,提交创建内容。

　　这就是索引创建的过程。

关键字查询

第一步，打开存储位置。

DirectoryReader ireader = DirectoryReader.open(directory);

第二步，创建搜索器

IndexSearcher isearcher = new IndexSearcher(ireader);

第三步，类似SQL，进行关键字查询

QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "fieldname", analyzer);
Query query = parser.parse("text");
ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
assertEquals(1, hits.length);
for (int i = 0; i < hits.length; i++) {
    Document hitDoc = isearcher.doc(hits[i].doc);
    assertEquals("This is the text to be indexed.",hitDoc.get("fieldname"));
}

第四步，关闭查询器

ireader.close();
directory.close();

mr_phy

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Lucene

全文搜索比如在一个文件夹中，有很多文件，如记事本，world，Excel，我们想根据其中的关键词搜索包含的文件。例如，输入lucene，所有内容含有lucenne的文件就会被检查出来，这就是所谓的全文检索。我们很容易想到的一个方法是建立一个关键词与文件的相关映射。 Lucene架构设计 - 创建索引，通过indexwrite对不用的文件进行索引的创建，并将其保存在索引相关文件存储的位置中。 -
复制链接

扫一扫