全文实例基于Elasticsearch7.x
这篇博客介绍ElasticSearch全文搜索的基础语法, 后续的复合搜索(bool)和过滤(filter)都是基于这些基础语法的.
全文搜索在搜索时, 会对输入的搜索文本进行分词, 然后去倒排索引中进行匹配, 只要能匹配上任意一个关键词(词项), 就可以作为结果返回.
Rest API
添加全文搜索实例数据
POST /blogs/_bulk
{"index": {}}
{"post_date": "2020-01-01", "title": "Quick brown rabbits", "content": "Brown rabbits are commonly seen.", "author_id": 11401}
{"index": {}}
{"post_date": "2020-01-02", "title": "Keeping pets healthy", "content": "My quick brown fox eats rabbits on a regular basis.", "author_id": 11402}
{"index": {}}
{"post_date": "2020-01-03", "title": "My dog barks", "content": "I see a lot of barking dogs on the road.", "author_id": 11403}
match_all
搜索索引的全部文档.
GET /blogs/_search
{
"query": {
"match_all": {}
}
}
match
match匹配有三种语法.
(1) or
GET /blogs/_search
{
"query": {
"match": {
"content": "brown rabbits"
}
}
}
或者
GET /blogs/_search
{
"query": {
"match": {
"content": {
"query": "brown rabbits",
"operator": "or"
}
}
}
}
分词后的关键词各自去匹配文档, 只要有一个关键词配置上就返回.
(2) and
GET /blogs/_search
{
"query": {
"match": {
"content": {
"query": "brown rabbits",
"operator": "and"
}
}
}
}
要求分词后的关键词在同一个文档中.(不需要是连在一起的短语)
(3) minimun_should_match
GET /blogs/_search
{
"query": {
"match": {
"content": {
"query": "Quick brown rabbits",
"minimum_should_match": 2
}
}
}
}
分词后的关键词至少有n个在同一个文档中.
match_phrase
短语搜索默认情况下要求关键词必须相邻, 也可以通过slop控制关键词间隔的单词数.
(1) 默认
GET /blogs/_search
{
"query": {
"match_phrase": {
"title": "Quick brown"
}
}
}
默认情况下slop等于0.
(2) 手动设置slop
GET /blogs/_search
{
"query": {
"match_phrase": {
"title": {
"query": "Quick rabbits",
"slop": 1
}
}
}
}
dis_max与tie_breaker
前面的实例都是在一个字段上进行搜索, 但在实际场景中很多都是在多个字段上进行搜索, 那我们如何在这种情况下找到最佳匹配的结果呢?
- dis_max取单个字段上匹配分数最高的分数作为整体评分.
- tie_breaker在dis_max基础上, 将其他字段的匹配分数与tie_breaker系数相乘, 然后再加上dis_max获取的分数.
(1) dis_max
GET /blogs/_search
{
"query": {
"dis_max": {
"queries": [
{
"match": {
"title": "brown fox"
}
},
{
"match": {
"content": "brown fox"
}
}
]
}
}
}
(2) dis_max与tie_breaker
GET /blogs/_search
{
"query": {
"dis_max": {
"tie_breaker": 0.7,
"queries": [
{
"match": {
"title": "brown fox"
}
},
{
"match": {
"content": "brown fox"
}
}
]
}
}
}
multi_match
在多个字段中匹配搜索文本除了可以使用dis_max与tie_breaker之外, 还可以使用multi_match. 这里涉及到几种匹配策略:
- best-fields
文档的某个字段匹配尽可能多的关键词, 那么这个文档会优先返回. - most-fields
某个关键词匹配文档尽可能多的属性, 那么这个文档会优先返回. - cross_fields
跨越多个字段搜索一个关键词.
举例说明: doc1的field1匹配的三个关键词, doc2的field1, field2都匹配上了同一个关键词. 如果是best-fields策略, 则doc1的相关度分数要更高, 如果是most-fields策略, 则doc2的相关度分数要更高.
(1) best-fields
GET /blogs/_search
{
"query": {
"multi_match": {
"query": "Quick pets",
"type": "best_fields",
"fields": ["title", "content"]
}
}
}
(2) most_fields
GET /blogs/_se