mongodb 文本索引
Text search is a very common requirement in most applications, and you would expect most databases to support text search out of the box if you create an index on the field.
文本搜索是大多数应用程序中非常普遍的要求,并且,如果您在字段上创建索引,则希望大多数数据库开箱即用地支持文本搜索。
But when I tried to implement text search for my app, it turned out to be much more complex. After some research, I’ve uncovered three main ways to implement text search with MongoDB.
但是,当我尝试为我的应用程序实施文本搜索时,事实证明它要复杂得多。 经过研究,我发现了三种使用MongoDB实现文本搜索的主要方法。
1.创建文本索引 (1. Create a Text Index)
This is the first approach that you’ll find if you Google “full text search in mongo.” It’s the most efficient way to implement text search according to MongoDB’s documentation. As an example, consider the following data:
如果您使用Google“蒙哥文全文搜索”,这是您会找到的第一种方法。 根据MongoDB的文档,这是实现文本搜索的最有效方法。 例如,请考虑以下数据:
db.names.insert(
[
{ _id: 1, name: "Army of Ants" },
{ _id: 2, name: "Army Ants" },
{ _id: 3, name: "Ant Man" },
{ _id: 4, name: "Armies" }
]
)
Now create the index because the index will make it happen!
现在创建索引,因为索引会使其成功!
> db.names.createIndex({ name: "text" })
Now try the following queries:
现在尝试以下查询:
> db.names.find({"$text": {"$search": "Army"}})
{ "_id" : 4, "name" : "Armies" }
{ "_id" : 2, "name" : "Army Ants" }
{ "_id" : 1, "name" : "Army of Ants" }
>
>
> db.names.find({"$text": {"$search": "Arm"}})
As you can see, if you search for Army
, it brings all the documents that had the exact word Army
or any known variation of that word in the names column. But it doesn’t work for Arm
.
如您所见,如果您搜索Army
,它将带所有带有确切单词Army
或该单词的任何已知变体的文档到名称列中。 但这对Arm
无效。
So, our text search is smart enough to match Armies
when we search for Army
but not dumb enough to partially match Arm
with Army
or Armies
.
所以,我们的文本搜索是智能足以匹配Armies
,当我们搜索Army
,但不是哑巴,足以部分匹配Arm
与Army
或Armies
。
My product manager was hoping (or rather expecting) that if I searched for Arm
, it would bring up the three results that came up when I searched for Army
.
我的产品经理希望(或更希望的是)如果我搜索Arm
,它将显示我搜索Army
时出现的三个结果。
To solve this, I thought it would be a good idea to understand why the two documents did not match when I searched for Arm
instead of Army
.
为了解决这个问题,我认为当我搜索Arm
而不是Army
时,为什么这两个文档不匹配是个好主意。
Like I suspected, tokenisation! The text index breaks the data (in this case, Army Ants
and Army of Ants
) by the white space into tokens. So Army Ants
becomes [Army, Ants]
and Army of Ants
becomes