Elasticsearch相同两次搜索出现不一样的结果

L.ZZ

已于 2023-02-23 17:25:44 修改

阅读量2k

点赞数 1

分类专栏： ElasticSearch 文章标签： elasticsearch 搜索引擎大数据

于 2020-11-20 16:47:49 首次发布

本文链接：https://blog.csdn.net/lijingjingchn/article/details/109853763

版权

ElasticSearch 专栏收录该内容

59 篇文章 19 订阅

订阅专栏

1. 原因

主要的原因是因为有副本(replica)的存在，主分片和副本分片可能不一致，导致最终在主分片和副本分片上计算得到的得分不同，而导致最终的查询结果不一致。
但是是如何造成主分片和副本分片不一致的情况，可能是因为用户删除了部分文档，之后主分片进行了merge，而副本分片没有进行merge。这种情况下主分片和副本分片上的总文档数量就会不同，打分时计算出的IDF的值不同，最终得到了不同的得分。

2. 解决办法

解决方式就是在查询时指定preference参数：保证同样的查询语句会请求到相同的分片。

2.1 老版本参数

可以指定为_primary、_replica（此为老版本参数）或者其它自定义的值，保证同样的查询语句会请求到相同的分片。

2.2 新版本参数

ES7.4.x以后版本参考：https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html

_only_local

Run the search only on shards on the local node.
_local

If possible, run the search on shards on the local node. If not, select shards using the default method.
_only_nodes:<node-id>,<node-id>

Run the search on only the specified nodes IDs. If suitable shards exist on more than one selected nodes, use shards on those nodes using the default method. If none of the specified nodes are available, select shards from any available node using the default method.
_prefer_nodes:<node-id>,<node-id>

If possible, run the search on the specified nodes IDs. If not, select shards using the default method.
_shards:<shard>,<shard>

Run the search only on the specified shards. This value can be combined with other preference values, but this value must come first. For example: _shards:2,3|_local
<custom-string>

Any string that does not start with _. If the cluster state and selected shards do not change, searches using the same value are routed to the same shards in the same order.

示例：

# 返回将针对其执行搜索请求的索引和分片（Returns the indices and shards that a search request would be executed against）。
# 启用kerberos认证
curl --negotiate -u : -XGET "http://ip:9200/index_name/_search_shards?pretty"
# 不启用kerberos认证
curl -XGET "http://ip:9200/index_name/_search_shards?pretty"

# 使用 _prefer_nodes（启用kerberos认证）
curl --negotiate -u : -XGET "http://ip:9200/index_name/_search?pretty=true&preference=_prefer_nodes:xxxx" -H 'Content-Type:application/json' -d'{"explain":treu,"query":{"match_all":{}}}'

# 使用 _prefer_nodes（不启用kerberos认证）
curl -XGET "http://ip:9200/index_name/_search?pretty=true&preference=_prefer_nodes:xxxx" -H 'Content-Type:application/json' -d'{"explain":treu,"query":{"match_all":{}}}'

L.ZZ

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
1
评论
Elasticsearch相同两次搜索出现不一样的结果

1. 原因主要的原因是因为有副本(replica)的存在，主分片和副本分片可能不一致，导致最终在主分片和副本分片上计算得到的得分不同，而导致最终的查询结果不一致。但是是如何造成主分片和副本分片不一致的情况，可能是因为用户删除了部分文档，之后主分片进行了merge，而副本分片没有进行merge。这种情况下主分片和副本分片上的总文档数量就会不同，打分时计算出的IDF的值不同，最终得到了不同的得分。2. 解决办法解决方式就是在查询时指定preference, 可以指定为_primary、_replic
复制链接

扫一扫