Tantivy使用Rust 开发的全文搜索引擎库

一、概述

Tantivy是一个全文搜索引擎库,灵感来自Apache Lucene,用Rust编写。
在这里插入图片描述

如果你正在寻找Elasticsearch或Apache Solr的替代品,请查看我们基于Tantivy构建的分布式搜索引擎Quiuckwit。

Tantivy更接近Apache Lucene,而不是Elasticsearch或Apache Solr,因为它不是现成的搜索引擎服务器,而是一个可用于构建此类搜索引擎的库。

Tantivy的性能非常棒,请看下图:
在这里插入图片描述

二、特征

  • 全文搜索
  • 可配置的标记器(可用于 17种拉丁语言的词干提取),并支持第三方对中文(tantivy-jieba和cang-jie)、日语(lindera、Vaporetto和tantivy-tokenizer-tiny-segmenter)和韩语(lindera+ lindera-ko-dic-builder)的支持
  • 快速(查看🐎 ✨基准✨ 🐎)
  • 启动时间极短 (<10ms),非常适合命令行工具
  • BM25 评分(与 Lucene 相同)
  • 自然查询语言(例如(michael AND jackson) OR “king of pop”)
  • 短语查询搜索(例如"michael jackson")
  • 增量索引
  • 多线程索引(在我的桌面上索引英文维基百科只需不到 3 分钟)
  • Mmap 目录
  • 当平台/CPU 包含 SSE2 指令集时,SIMD 整数压缩
  • 单值和多值 u64、i64 和 f64 快速字段(相当于 Lucene 中的 doc 值)
  • &[u8]快速场
  • 文本、i64、u64、f64、日期、ip、bool 和分层方面字段
  • 压缩文档存储(LZ4、Zstd、None)
  • 范围查询
  • 分面搜索
  • 可配置索引(可选词频和位置索引)
  • JSON 字段
  • 聚合收集器:直方图、范围桶、平均值和统计指标
  • 带删除的 LogMergePolicy
  • 搜索器预热 API
  • 带有马的俗气标志

注意:分布式搜索超出了 Tantivy 的范围,但如果您正在寻找此功能,请查看Quickwit。

三、Tanvity的小示例


use tantivy::collector::TopDocs;
use tantivy::query::QueryParser;
use tantivy::schema::*;
use tantivy::{doc, Index, IndexWriter, ReloadPolicy};
use tempfile::TempDir;

fn main() -> tantivy::Result<()> {
 let index_path = TempDir::new()?;
 let mut schema_builder = Schema::builder();
 schema_builder.add_text_field("title", TEXT | STORED);
 schema_builder.add_text_field("body", TEXT);
 let schema = schema_builder.build();
 let index = Index::create_in_dir(&index_path, schema.clone())?;
 let mut index_writer: IndexWriter = index.writer(50_000_000)?;
 let title = schema.get_field("title").unwrap();
    let body = schema.get_field("body").unwrap();

    let mut old_man_doc = TantivyDocument::default();
    old_man_doc.add_text(title, "The Old Man and the Sea");
    old_man_doc.add_text(
        body,
        "He was an old man who fished alone in a skiff in the Gulf Stream and he had gone \
         eighty-four days now without taking a fish.",
    );
  
   index_writer.add_document(old_man_doc)?;
   index_writer.add_document(doc!(
    title => "Of Mice and Men",
    body => "A few miles south of Soledad, the Salinas River drops in close to the hillside \
            bank and runs deep and green. The water is warm too, for it has slipped twinkling \
            over the yellow sands in the sunlight before reaching the narrow pool. On one \
            side of the river the golden foothill slopes curve up to the strong and rocky \
            Gabilan Mountains, but on the valley side the water is lined with trees—willows \
            fresh and green with every spring, carrying in their lower leaf junctures the \
            debris of the winter’s flooding; and sycamores with mottled, white, recumbent \
            limbs and branches that arch over the pool"
    ))?;
      index_writer.add_document(doc!(
    title => "Frankenstein",
    title => "The Modern Prometheus",
    body => "You will rejoice to hear that no disaster has accompanied the commencement of an \
             enterprise which you have regarded with such evil forebodings.  I arrived here \
             yesterday, and my first task is to assure my dear sister of my welfare and \
             increasing confidence in the success of my undertaking."
    ))?;
     index_writer.commit()?;
         let reader = index
        .reader_builder()
        .reload_policy(ReloadPolicy::OnCommitWithDelay)
        .try_into()?;
     let searcher = reader.searcher();
     let query_parser = QueryParser::for_index(&index, vec![title, body]);
     let query = query_parser.parse_query("sea whale")?;
     let top_docs = searcher.search(&query, &TopDocs::with_limit(10))?;
      for (_score, doc_address) in top_docs {
        let retrieved_doc: TantivyDocument = searcher.doc(doc_address)?;
        println!("{}", retrieved_doc.to_json(&schema));
    }
      let query = query_parser.parse_query("title:sea^20 body:whale^70")?;

    let (_score, doc_address) = searcher
        .search(&query, &TopDocs::with_limit(1))?
        .into_iter()
        .next()
        .unwrap();

    let explanation = query.explain(&searcher, doc_address)?;

    println!("{}", explanation.to_pretty_json());

    Ok(())
}

Github: https://github.com/quickwit-oss/tantivy

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Hello.Reader

请我喝杯咖啡吧😊

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值