_count 和 _search无条件过滤查询的hits.total.value不一致

原创已于 2024-10-27 22:22:36 修改 · 582 阅读

8 ·

CC 4.0 BY-SA版权

文章标签：

#elasticsearch

于 2024-10-27 22:22:03 首次发布

_count 和 _search无条件过滤查询的hits.total.value不一致

今天遇到一个奇怪的问题之前也没有注意到，记录一下，如下：

req
POST doc_1/_search
{
  "query": {
    "match_all": {}
  }
}

rep
{
	......，
	"hits" : {
	    "total" : {
	      "value" : 10000,
	      "relation" : "gte"
    	},
	    ......
    }，
    ......
}

req
POST doc_1/_count

rep
{
  "count" : 1700000,
  ......
}

在 Elasticsearch 中，`_count` API 和 `_search` API 的行为存在一些差异：

_count API 的返回结果：
- POST doc_1/_count 返回 count 值，这个值表示索引中符合条件的实际文档总数。例如，你这里得到的 count 为 1,700,000，表示 doc_1 索引实际包含 1,700,000 条文档。
_search API 的返回结果 (hits.total.value)：
- POST doc_1/_search {"query": {"match_all": {}}} 使用默认的 from 和 size 参数，其中默认的 size 值为 10,000。如果没有显式地设置 track_total_hits，_search API 默认会限制 hits.total.value 为 10,000，以优化性能。
- 在这种情况下，Elasticsearch 会返回 hits.total.value 为 10,000，并带有 relation: "gte"，意思是文档总数 “大于等于 10,000”。

解决方法

为了获得实际的文档总数，可以将查询请求添加参数 track_total_hits，例如设置为 true：

POST doc_1/_search
{
  "query": {
    "match_all": {}
  },
  "track_total_hits": true
}

官方解释

track-total-hits

Generally the total hit count can’t be computed accurately without visiting all matches, which is costly for queries that match lots of documents. The track_total_hits parameter allows you to control how the total number of hits should be tracked. Given that it is often enough to have a lower bound of the number of hits, such as “there are at least 10000 hits”, the default is set to 10,000. This means that requests will count the total hit accurately up to 10,000 hits. It is a good trade off to speed up searches if you don’t need the accurate number of hits after a certain threshold.