6、Elasticsearch 检索文档的方式

最新推荐文章于 2023-08-15 00:42:53 发布

谁是谁的小确幸

最新推荐文章于 2023-08-15 00:42:53 发布

阅读量879

点赞数

分类专栏： Elastic Stack 文章标签： elasticsearch term-level 查询 full-text 查询

本文链接：https://blog.csdn.net/qq_29119581/article/details/114713597

版权

Elastic Stack 专栏收录该内容

10 篇文章

订阅专栏

接着上一篇，继续梳理 ES 检索文档的两种方式，即结构化检索和全文检索。

结构化检索，建议参考官方文档：Term-level queries | Elasticsearch Guide [6.8] | Elastic

全文检索，也建议参考官方文档：Full text queries | Elasticsearch Guide [6.8] | Elastic

一、结构化检索

1、精准搜索之term/terms查询

term query 表示单个精准值查询，而 terms query 表示多个精准值查询。请注意，字段的数据类型是 text 或 keyword 类型。

GET mytest_index_0626/_search?pretty&track_total_hits=true
{
  "query": {
    "term": {
      "pkey": "15556905959_10011" //单个精准值查询
    }
  }
}

GET mytest_index_0626/_search?pretty&track_total_hits=true
{
  "query": {
    "terms": {
      "pkey": ["15556905959_10011","15556905756_10010"] //多个精准值查询
    }
  }
}

2、精准搜索之bool filter查询

bool query 查询选项有：must、must_not、should、filter，它们的区别如下：

must：所有都必须匹配，相当于and；

GET mytest_index_0626/_search?pretty&track_total_hits=true
{
  "query": {
    "bool": {
      "must": [
        {"term": {"status": "U"}},
        {"term": {"myObj.sign_no": "10010"}}
      ]
    }
  }
}

must_not：所有都必须不被匹配，相当于not；

# 索引的文档，①不存在，使用must_not查询，会输出所有包含flag=0的文档
GET mytest_index_0626/_search?pretty&track_total_hits=true
{
  "query": {
    "bool": {
      "must_not": [
         {"term": {"flag": 1}} //①
      ]
    }
  }
}

should：至少有一个匹配，相当于or；

# 索引的文档，①不存在，②存在一条，使用should查询会输出一条结果
GET mytest_index_0626/_search?pretty&track_total_hits=true
{
  "query": {
    "bool": {
      "should": [
        {"term": {"flag": 1}}, //①
        {"term": {"myObj.user_id": "15556905756"}} //②
      ]
    }
  }
}

filter：必须匹配，运行在非评分&过滤模式。

# 索引的文档，①不存在，②存在一条，使用filter查询不到结果
GET mytest_index_0626/_search?pretty&track_total_hits=true
{
  "query": {
    "bool": {
      "filter": [
        {"term": {"flag": 1}}, //①
        {"term": {"myObj.user_id": "15556905756"}} //②
      ]
    }
  }
}

3、前缀搜索（prefix query）

GET mytest_index_0626/_search?pretty&track_total_hits=true
{
  "query": {
    "prefix": {
      "myObj.user_id": "155" //匹配 myObj.user_id 前缀包含155字符串的文档
    }
  }
}

4、模糊搜索（fuzzy query）

查找在模糊度中指定的最大编辑距离内的所有可能匹配项，再检查术语字典，以找出在索引中实际存在待检索的关键词：

# 索引中存在一条pkey=15556905954_10010的文档
GET mytest_index_0626/_search?pretty&track_total_hits=true
{
  "query": {
    "fuzzy": {
       //"pkey": "15556905954" //匹配找不到对应结果
       //"pkey": "15556905954_10" //仍然匹配找不到对应结果 
         "pkey": "15556905954_1001"
    }
  }
}

5、范围搜索（range query）

适合数值类型的字段进行范围查询，范围比较的关键字有：

gt: > 大于（greater than）
lt: < 小于（less than）
gte: >= 大于或等于（greater than or equal to）
lte: <= 小于或等于（less than or equal to）

# 查询 myObj.pay_amount 的支付金额值，范围为[20,50)
GET mytest_index_0626/_search?pretty&track_total_hits=true
{
   "query": {
       "range": {
           "myObj.pay_amount": {
                "gte": "20",
                "lt": "50"
            }
        }  
    }
}

# 多个字段值得范围查询
GET mytest_index_0626/_search?pretty&track_total_hits=true
{
  "query": {
        "bool": {
            "filter": [
                {
                  "range": {
                        "myObj.pay_amount": {
                            "gte": "20",
                            "lt": "50"
                        }
                    }
                },
                {
                  "range": {
                        "myObj.pay_amount1": {
                            "gte": "30",
                            "lte": "45"
                        }
                    }
                }
            ]
        }
    }
}

6、通配符搜索（wildcard query）

通配符有：* 表示匹配任何字符序列（包括空字符序列），？表示匹配任何单个字符。

GET mytest_index_0626/_search?pretty&track_total_hits=true
{
  "query": {
    "wildcard": {
      "myObj.user_id": "155*"
    }
  }
}

7、类型搜索（type query）

在 8.x 版本中已不再支持 type query 。

# 返回指定type的文档信息
GET mytest_index_0626/_search?pretty&track_total_hits=true
{
   "query": {
        "type": {
            "value":"_doc"
        }
    }
}

8、存在搜索（exists query）

查询某个或多个字段是否存在：

GET mytest_index_0626/_search?pretty&track_total_hits=true
{
  "query": {
    "exists": {
       "field": "collection_url"
    }
  }
}

GET mytest_index_0626/_search?pretty&track_total_hits=true
{
  "query": {
    "bool": {
      "filter": [
        {"exists": {"field": "status"}},
        {"exists": {"field": "collection_url"}}
      ]
    }
  }
}

请注意，如果查询字段不存在，可以使用前面的 must_not 实现；

另外，如果要求 exists 查询能匹配到 null 类型，需要设置mapping，以 status 字段为例：

"status": {
    "type": "keyword",
    "null_value": "_null_"
}

9、正则搜索（regexp query）

使用.*?+通配符查询的正则检索，性能是会非常低的。

# 返回指定id的文档信息
GET mytest_index_0626/_search?pretty&track_total_hits=true
{
    "query": {
        "regexp": {
            "collection_url":"https.*/hangzhou/"
        }
    }
}

10、id 搜索（ids query）

# 返回指定id的文档信息
GET mytest_index_0626/_search?pretty&track_total_hits=true
{
    "query": {
        "ids": {
            "type": "_doc",
            "values":["axEwmoEBTXtCd1dvuDqv","gxHImYEBTXtCd1dv3Dfj"]
        }
    }
}

二、全文检索

我们知道 keyword 类型不支持分词，该类型的字段使用全文检索时，必须使用全部字符串匹配才能被查找到；而 text 类型的字段适合于全文检索，支持分词，同时要考虑具体分词器，因为不同的分词器对相同的文本字符长的分解词项可能不一样。

下面使用 wheater_infos_2022 索引演示，在该索引里新增了6.1号~6.3号、四座城市的共12条天气记录数据~

1、匹配搜索（match query）

match 匹配查询，查询时会分词处理，接受文本/数字/日期等数据类型。

# today_weather.temperature_range，该字段是text类型
# temperature_range 原文本字符串为"26-344℃"，此处会使用到默认的standard分词方式
# 匹配 today_weather.temperature_range 包含 26 或 44
GET wheater_infos_2022/_search?pretty&track_total_hits=true
{
   "query": {
        "match": {
            "today_weather.temperature_range": "26 44"
        }
    }
}

# 上面的写法看不明白？上面的写法等同于下面的这种写法：
GET wheater_infos_2022/_search?pretty&track_total_hits=true
{
   "query": {
        "match": {
            "today_weather.temperature_range": {
                "query": "26 44",
                "operator": "or" //默认or操作
            }
        }
    }
}

如果需要了解更多的参数设置（比如 fuzziness/zero_terms_query/cutoff_frequency 等），可参考官网文档：Match Query | Elasticsearch Guide [6.8] | Elastic

2、匹配解析搜索（match_phrase query）

match_phrase 查询分析文本，并从分析文本中创建分词（该查询方式，首先将查询的文本字符串解析成一个词项列表，然后对这些词项进行搜索，但只保留那些包含全部搜索词项，且位置与搜索词项相同的文档）。

请注意，这句话的理解，只保留那些包含全部搜索词项，位置与搜索词项相同的文档，也就是说，我把 today_weather.temperature_range 的 "26 344"，修改成 "344 26"，返回查询的结果则会为空。

GET wheater_infos_2022/_search?pretty&track_total_hits=true
{
    "query": {
        "match_phrase": {
            "today_weather.temperature_range": "26 344" 
        }
    }
}

如果需要了解更多的参数设置（比如 analyzer），可参考官网文档：Match Phrase Query | Elasticsearch Guide [6.8] | Elastic

3、匹配解析前缀搜索（match_phrase_prefix query）

match_phrase_prefix 与 match_phrase 原理相同，不同的是 match_phrase_prefix 允许文本中最后一个术语可以前缀匹配。

GET wheater_infos_2022/_search?pretty&track_total_hits=true
{
   "query": {
        "match_phrase_prefix": {
            "today_weather.temperature_range": "34*" 
        }
    }
}

如果需要了解更多的参数设置（比如 max_expansions），可参考官网文档：Match Phrase Prefix Query | Elasticsearch Guide [6.8] | Elastic

4、多字段匹配搜索（multi_math query）

multi_math 能在多个字段上反复执行相同的查询。默认情况下，查询的类型是 best_fields（它会为每个字段生成一个 match 查询），也支持 most_fields、cross_fields、phrase、phrase_prefix 等查询类型。

GET wheater_infos_2021/_search?pretty&track_total_hits=true
{
   "query": {
        "multi_match": {
           "query": "26 AND 344", //查询字符串，支持指定AND | OR | NOT条件
            "fields": [
                "today_weather.temperature",
                "today_weather.temperature_*",
                "today_weather.pm_value^2" //^2表示提示查询该字段的权重
            ]
        }
    }
}

如果需要了解更多的参数设置（比如 type/tie_breaker 等），可参考官网文档：Multi Match Query | Elasticsearch Guide [6.8] | Elastic

5、匹配搜索（query_string query）

query_string 许在单个查询字符串中指定AND | OR | NOT条件，同时也和 multi_match query 一样，支持多字段搜索。

请注意，query_string 不支持 keyword 类型字段查询，返回的查询结果则为空，query_string 适用于查询 text 类型的字段，且查询的分词和顺序无关，分词也不需要连续，比如 "query" 使用 "26 AND 344" 和 "344 AND 26" 查询效果是一样的！

GET wheater_infos_2021/_search?pretty&track_total_hits=true
{
   "query": {
        "query_string": {
            "query": "26 AND 344", //查询字符串，支持指定AND | OR | NOT条件
            "fields": [
                "today_weather.temperature"
            ]
        }
    }
}


GET wheater_infos_2021/_search?pretty&track_total_hits=true
{
   "query": {
        "query_string": {
            "query": "26 AND 344", //查询字符串，支持指定AND | OR | NOT条件
            "fields": [
                "today_weather.temperature",
                "today_weather.temperature_*",
                "today_weather.pm_value^2" //^2表示提示查询该字段的权重
            ]
        }
    }
}

如果需要了解更多的参数设置，可参考官网文档：Query String Query | Elasticsearch Guide [6.8] | Elastic

6、简单字符串搜索（simple_query_string query）

与 query_string 相比，simple_query_string 不会抛出异常，并丢弃查询的无效部分，不支持使用 AND、OR、NOT 作为连接符，而是支持下面这些方式：

+ 表示与运算，相当于query_string 的 AND；
| 表示或运算，相当于query_string 的 OR（默认）；
- 表示取反运算，相当于query_string 的 NOT；
"" 表示对检索词进行 match_phrase query；
* 字词末尾表示前缀查询；
( and )表示优先级；

GET wheater_infos_2021/_search?pretty&track_total_hits=true
{
    "query": {
        "simple_query_string": {
            "query": "26 + 34", //查询字符串，+相当于query_string的AND
            "fields": [
                "today_weather.temperature",
                "today_weather.temperature_*",
                "today_weather.pm_value^2" //^2表示提示查询该字段的权重
            ]
        }
    }
}

如果需要了解更多的参数设置，可参考官网文档：Simple Query String Query | Elasticsearch Guide [6.8] | Elastic

最后

随着 ES 版本不断的更新，term-level queries 和 full-text queries 也在调整文档查询方式。比如，与 6.8 版本相比，在最新版本的 8.x 中，term-level queries 不再支持 type query，新增支持 terms_set query；又比如，full-text queries 与 6.8 版本相比新增了 intervals query、match_bool_prefix query、combined_fields query 等类型的查询。因此，平时要注意，根据自己当前使用的 ES 版本，并结合官方文档加以学习和使用。