mongodb 文本索引_mongodb中的文本搜索

最新推荐文章于 2024-06-03 12:00:00 发布

吴雄辉

最新推荐文章于 2024-06-03 12:00:00 发布

阅读量1k

点赞数

文章标签： python mongodb mysql 索引 nlp

原文链接：https://medium.com/@varunb94/text-search-in-mongodb-34c1f70ab86d

版权

本文介绍了MongoDB如何创建和使用文本索引进行文本搜索。内容来源于对原文的翻译，探讨了在MongoDB中实现高效文本检索的方法。

摘要由CSDN通过智能技术生成

mongodb 文本索引

Text search is a very common requirement in most applications, and you would expect most databases to support text search out of the box if you create an index on the field.

文本搜索是大多数应用程序中非常普遍的要求，并且，如果您在字段上创建索引，则希望大多数数据库开箱即用地支持文本搜索。

But when I tried to implement text search for my app, it turned out to be much more complex. After some research, I’ve uncovered three main ways to implement text search with MongoDB.

但是，当我尝试为我的应用程序实施文本搜索时，事实证明它要复杂得多。经过研究，我发现了三种使用MongoDB实现文本搜索的主要方法。

1.创建文本索引 (1. Create a Text Index)

This is the first approach that you’ll find if you Google “full text search in mongo.” It’s the most efficient way to implement text search according to MongoDB’s documentation. As an example, consider the following data:

如果您使用Google“蒙哥文全文搜索”，这是您会找到的第一种方法。根据MongoDB的文档，这是实现文本搜索的最有效方法。例如，请考虑以下数据：

db.names.insert(
    [
      { _id: 1, name: "Army of Ants" },
      { _id: 2, name: "Army Ants" },
      { _id: 3, name: "Ant Man" },
      { _id: 4, name: "Armies" }
    ]
)

Now create the index because the index will make it happen!

现在创建索引，因为索引会使其成功！

> db.names.createIndex({ name: "text" })

Now try the following queries:

现在尝试以下查询：

> db.names.find({"$text": {"$search": "Army"}})
{ "_id" : 4, "name" : "Armies" }
{ "_id" : 2, "name" : "Army Ants" }
{ "_id" : 1, "name" : "Army of Ants" }
>
>
> db.names.find({"$text": {"$search": "Arm"}})

As you can see, if you search for Army, it brings all the documents that had the exact word Army or any known variation of that word in the names column. But it doesn’t work for Arm.

如您所见，如果您搜索Army ，它将带所有带有确切单词Army或该单词的任何已知变体的文档到名称列中。但这对Arm无效。

So, our text search is smart enough to match Armies when we search for Army but not dumb enough to partially match Arm with Army or Armies.

所以，我们的文本搜索是智能足以匹配Armies ，当我们搜索Army ，但不是哑巴，足以部分匹配Arm与Army或Armies 。

My product manager was hoping (or rather expecting) that if I searched for Arm, it would bring up the three results that came up when I searched for Army.

我的产品经理希望(或更希望的是)如果我搜索Arm ，它将显示我搜索Army时出现的三个结果。

To solve this, I thought it would be a good idea to understand why the two documents did not match when I searched for Arm instead of Army.

为了解决这个问题，我认为当我搜索Arm而不是Army时，为什么这两个文档不匹配是个好主意。

Like I suspected, tokenisation! The text index breaks the data (in this case, Army Ants and Army of Ants) by the white space into tokens. So Army Ants becomes [Army, Ants] and Army of Ants becomes

最低0.47元/天解锁文章

吴雄辉

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
mongodb 文本索引_mongodb中的文本搜索

mongodb 文本索引Text search is a very common requirement in most applications, and you would expect most databases to support text search out of the box if you create an index on the field.文本搜索是大多数应用程序中非常...
复制链接

扫一扫