01.search_api_综述

最新推荐文章于 2024-07-30 07:35:42 发布

夜月行者

最新推荐文章于 2024-07-30 07:35:42 发布

阅读量345

点赞数

分类专栏： # search查询API 文章标签： elasticsearch

本文链接：https://blog.csdn.net/u013200380/article/details/109165443

版权

search查询API 专栏收录该内容

18 篇文章 0 订阅

订阅专栏

本文介绍了Elasticsearch搜索API的关键特性，包括基于用户ID的路由策略以优化查询速度，自适应副本选择的规则及其关闭方法，以及搜索超时和取消机制。此外，还讨论了并发性和并行性控制，如限制每个节点的并发分片请求，并解释了如何通过索引设置拒绝过多分片的搜索请求。最后，展示了如何执行多索引搜索。

摘要由CSDN通过智能技术生成

文章目录

1. Search API 简介

Most search APIs are multi-index, with the exception of the Explain API endpoints.
除了使用explain功能，大部分的search api都支持多个索引

1. Routing

执行搜索时，Elasticsearch将根据自适应副本选择公式选择数据的“最佳”副本。也可以通过提供路由参数来控制要搜索哪些分片。例如，在为推特编制索引时，路由值可以是用户名

POST /twitter/_doc?routing=kimchy
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

这种使用情况是一般我们只根据用户名来识别用户，那么就可以使用这种方式让请求只路由到相关的shard上面来加速查询过程。

POST /twitter/_search?routing=kimchy
{
    "query": {
        "bool" : {
            "must" : {
                "query_string" : {
                    "query" : "some query string here"
                }
            },
            "filter" : {
                "term" : { "user" : "kimchy" }
            }
        }
    }
}

routing 参数可以是一个分割的string数组

2. es选择replica 的规则

默认情况下es会选择自适应的replica选择方式，coordinate node 选择某个target node上的shard来转发请求一般基于以下几个方面的因素

在之前的请求中coordiante和对应的target node的耗时
对应的node执行search请求的耗时（不包括coordiante node 和target node之前的请求传递的耗时）
对应的target node上的threadpool 堆积的请求

这个策略可以使用以下方式关闭

PUT /_cluster/settings
{
    "transient": {
        "cluster.routing.use_adaptive_replica_selection": false
    }
}

在关闭以后，es就使用round robin的方式来轮询请求（所有有data的shard的primary+replica）

If adaptive replica selection is turned off, searches are sent to the index/indices shards in a round robin fashion between all copies of the data (primaries and replicas).

3. Stats Groups

A search can be associated with stats groups, which maintains a statistics aggregation per group. It can later be retrieved using the indices stats API specifically. For example, here is a search body request that associate the request with two different groups:

POST /_search
{
    "query" : {
        "match_all" : {}
    },
    "stats" : ["group1", "group2"]
}

3. Global Search Timeout

单个的search可以在request body中设置timeout。因为search可以来自很多源，所以es具有一个动态的痊愈的search timeout 设置。在超过一定的时候之后，request会被cancelled。cancel的机制可以在下一个小节设置。

个别搜索在请求正文搜索中可能会超时。由于搜索请求可以源自许多来源，因此Elasticsearch具有全局搜索超时的动态集群级别设置，该设置适用于未在请求主体中设置超时的所有搜索请求。这些请求将在指定时间后使用以下有关搜索取消的部分中所述的机制取消。因此，有关超时响应性的相同警告也适用。
可以使用 Cluster Update Settings API 对search.default_search_timeout进行设置。

Individual searches can have a timeout as part of the Request Body Search. Since search requests can originate from many sources, Elasticsearch has a dynamic cluster-level setting for a global search timeout that applies to all search requests that do not set a timeout in the request body. These requests will be cancelled after the specified time using the mechanism described in the following section on Search Cancellation. Therefore the same caveats about timeout responsiveness apply.

The setting key is search.default_search_timeout and can be set using the Cluster Update Settings endpoints. The default value is no global timeout. Setting this value to -1 resets the global search timeout to no timeout.

4. Search Cancellation

可以使用标准任务取消机制来取消搜索。默认情况下，运行中的搜索超时检查仅检查仅在segment处理完之后才会发生,也就是检查的最小粒度是segment,所以cancel可以会因为遇到比较大的segment而产生延迟。可以通过将动态cluster设置search.low_level_cancellation设置为true来提高搜索cacel的响应性。但是，它会导致更频繁的取消检查从而产生额外开销，这在大型快速运行的搜索查询中会很明显。

5. Search concurrency and parallelism

默认情况下，Elasticsearch不会根据请求命中的分片数量拒绝任何搜索请求。尽管Elasticsearch将优化协调节点上的搜索执行，但大量shard可能会对CPU和内存方面产生重大影响。通常，最好以较少的比较大的shard来组织数据。如果您想配置软限制，则可以更新action.search.shard_count.limit群集设置，以拒绝命中太多shard的搜索请求。

By default Elasticsearch doesn’t reject any search requests based on the number of shards the request hits. While Elasticsearch will optimize the search execution on the coordinating node a large number of shards can have a significant impact CPU and memory wise. It is usually a better idea to organize data in such a way that there are fewer larger shards. In case you would like to configure a soft limit, you can update the action.search.shard_count.limit cluster setting in order to reject search requests that hit too many shards.

请求参数max_concurrent_shard_requests可用于控制搜索API将针对该请求的每个node可以执行的并发分片请求的最大数量。此参数应用于保护单个请求以防止集群过载（例如，默认请求将命中集群中的所有索引，如果每个节点的分片数量很高，则可能导致分片请求被拒绝）。该默认值为5。

The request parameter max_concurrent_shard_requests can be used to control the maximum number of concurrent shard requests the search API will execute per node for the request. This parameter should be used to protect a single request from overloading a cluster (e.g., a default request will hit all indices in a cluster which could cause shard request rejections if the number of shards per node is high). This default value is 5.

6. search API 的多个index查询

GET /twitter/_search?q=user:kimchy
GET /kimchy,elasticsearch/_search?q=tag:wow
GET /_all/_search?q=tag:wow

夜月行者

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录