ES-使用profile做性能分析_es profile-CSDN博客

本文链接：https://blog.csdn.net/wu892355227/article/details/129333304

该文详细解析了ElasticsearchAPI用于分析组件执行时间的返回结果，包括查询、重写时间、收集器、聚合和fetch阶段的耗时。它强调此API不测量网络延迟和队列等待时间。文中通过实例展示了查询的分解结构，如BooleanQuery和TermQuery的耗时，以及Lucene的内部操作如next_doc和score。同时，文章还涵盖了聚合和fetch阶段的性能详情，包括收集器的工作原理和不同类型的聚合操作。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

描述

用来查看各个组件执行时间的详细信息，但是注意，这个api不会用来测量网络延迟，请求在队列中的等待时间，以及协调节点合并各个分片响应时所花费的时间。

返回结果说明

总体的返回结构

{
   "profile": {
        "shards": [
           {
              "id": "[2aE02wS1R8q_QFnYu6vDVQ][my-index-000001][0]",   (1)
              "searches": [
                 {
                    "query": [...],                                   (2)
                    "rewrite_time": 51443,                                                  (3)
                    "collector": [...]                                                      (4)
                 }
              ],
              "aggregations": [...],                                                        (5)
              "fetch": {...}                                                                (6)
           }
        ]
     }
}

(1) 当前参与响应的分片，id是唯一的，格式是由节点名 + 索引名 + 分片组成

(2) query部分的耗时和其他的debug信息

(3) 累计重写的时间

(4) 每个收集器的名称和调用时间

(5) 聚合部分的消耗时间，调用次数，debug信息

(6) fetch部分的耗时和debug信息

注意：可以通过添加参数 ?human=true 使结果可读，默认的时间都是纳秒。比如像这样

query部分

"query": [
    {
       "type": "BooleanQuery",
       "description": "message:get message:search",
       "time_in_nanos": "11972972",
       "breakdown": {...},               
       "children": [
          {
             "type": "TermQuery",
             "description": "message:get",
             "time_in_nanos": "3801935",
             "breakdown": {...}
          },
          {
             "type": "TermQuery",
             "description": "message:search",
             "time_in_nanos": "205654",
             "breakdown": {...}
          }
       ]
    }
]

分析：从query的返回结构来看，这个查询被转换成了Lucene底层的带有两个termQuery的BooleanQuery。type是Lucene底层匹配到的类名，它和elasticsearch的别名是对应的。description描述的Lucene的解析文本，用来区分查询的各个部分。time_in_nanos 是BooleanQuery查询的总耗时，这里约为11毫秒，包含两个子查询的。breakdown部分给出的是更详细的时间花费信息。

breakdown

breakdown包含的信息如下

"breakdown": {
  "set_min_competitive_score_count": 0,
  "match_count": 5,
  "shallow_advance_count": 0,
  "set_min_competitive_score": 0,
  "next_doc": 39022,
  "match": 4456,
  "next_doc_count": 5,
  "score_count": 5,
  "compute_max_score_count": 0,
  "compute_max_score": 0,
  "advance": 84525,
  "advance_count": 1,
  "score": 37779,
  "build_scorer_count": 2,
  "create_weight": 4694895,
  "shallow_advance": 0,
  "create_weight_count": 1,
  "build_scorer": 7112295
}

字段信息：

create_weight：生成查询权重所消耗的时间，lucene要求为每个查询要生成一个weight对象，这个用来保存查询相关联的一些上下文信息。

build_scorer：生成打分器消耗的时间，打分器是为每个文档的匹配程度打分的机制。如果有启用缓存机制，那也可能包含连带的缓存时间。

next_doc：确定那个文档是下一个匹配文档所需要的时间

advance：advance 是低版本的next_doc，区别是需要额外的查询来确定匹配的下一个文档，比如识别，移动跳过之前的文档等。

match：某些查询，采用的是两阶段来确定匹配的文档。比如说短语查询，首先会先粗略的检查文档的存在，之后再此基础上会进行第二次更加细致的检查，像检查短语的顺序，距离等。这个记录的就是这种查询的消耗。如果没有这样两阶段的查询，那这个值就是0。

score：打分器打分所消耗的时间。

*_count：记录的是调用对应方法的次数，比如next_doc_count，意思是调用nextDoc()这个方法的次数，它的作用是用来帮助判断lucene是怎么选择查询的。

collectors

collector记录的是一些高级执行的详细信息，返回的结构是这个样子

"collector" : [
              {
                "name" : "CancellableCollector",
                "reason" : "search_cancelled",
                "time" : "54micros",
                "time_in_nanos" : 54091,
                "children" : [
                  {
                    "name" : "SimpleTopScoreDocCollector",
                    "reason" : "search_top_hits",
                    "time" : "28.8micros",
                    "time_in_nanos" : 28880
                  }
                ]
              }
            ]

这里看到了一个CancellableCollector包裹了一个SimpleTopScoreDocCollector，SimpleTopScoreDocCollector的作用默认是用来打分和排序的。reason是针对这个类的英文描述，time_in_nanos是总消耗的时间。要注意的是collector的时间是独立的，是单独于query的。下面是各个reason的说明

search_sorted：为文档打分和排序的收集器，这是最常见的。

search_count：只统计匹配文档数量的收集器，不会去拉取文档的数据，当size=0的时候会看到它。

search_terminated_after_count：在匹配到n个文档后停止搜索的收集器，在查询是指定terminated_after_count（实际测试没有发现这个参数的使用场景）参数后可以看到它。

search_min_score：仅返回得分大于n的收集器，在设置参数min_score是可以看到。

search_muilt：一个收集器包含了多个收集器，在一个查询中包含了，搜索，聚合，全局的聚合，post_filter 的时候会出现这种情况。

search_timeout：指定超时后停止执行的收集器，在设置timeout后会看到它。

aggregation：用来收集聚合部分文档的收集器，在name上看到的会是一个聚合的列表，就像这样

global_aggregation：针对全局查询范围聚合的收集器。

rewrite

rewrite指的是lucene查询或者子查询被优化改变的过程，比如BoolQuery->BoolQuery->TermQuery会被优化为TermQuery。rewrite_time记录的是总的rewrite时间。

注意：对于MultiTermQuery 类的查询，这类查询通常包括通配符查询，模糊查询，正则查询等。这一类的查询有与lucene的运行机制，通常不会返给你一个结构化的结果，这种如果不好分析，直接关注总的消耗时间就是了。

aggregation部分

返回的结构

{
  "profile": {
    "shards": [
      {
        "aggregations": [
          {
            "type": "NumericTermsAggregator",
            "description": "my_scoped_agg",
            "time_in_nanos": 79294,
            "breakdown": {
              "reduce": 0,
              "build_aggregation": 30885,
              "build_aggregation_count": 1,
              "initialize": 2623,
              "initialize_count": 1,
              "reduce_count": 0,
              "collect": 45786,
              "collect_count": 4,
              "build_leaf_collector": 18211,
              "build_leaf_collector_count": 1,
              "post_collection": 929,
              "post_collection_count": 1
            },
            "debug": {
              "total_buckets": 1,
              "result_strategy": "long_terms",
              "built_buckets": 1
            }
          },
          {
            "type": "GlobalAggregator",
            "description": "my_global_agg",
            "time_in_nanos": 104325,
            "breakdown": {
              "reduce": 0,
              "build_aggregation": 22470,
              "build_aggregation_count": 1,
              "initialize": 12454,
              "initialize_count": 1,
              "reduce_count": 0,
              "collect": 69401,
              "collect_count": 4,
              "build_leaf_collector": 8150,
              "build_leaf_collector_count": 1,
              "post_collection": 1584,
              "post_collection_count": 1
            },
            "debug": {
              "built_buckets": 1
            },
            "children": [
              {
                "type": "NumericTermsAggregator",
                "description": "my_level_agg",
                "time_in_nanos": 76876,
                "breakdown": {
                  "reduce": 0,
                  "build_aggregation": 13824,
                  "build_aggregation_count": 1,
                  "initialize": 1441,
                  "initialize_count": 1,
                  "reduce_count": 0,
                  "collect": 61611,
                  "collect_count": 4,
                  "build_leaf_collector": 5564,
                  "build_leaf_collector_count": 1,
                  "post_collection": 471,
                  "post_collection_count": 1
                },
                "debug": {
                  "total_buckets": 1,
                  "result_strategy": "long_terms",
                  "built_buckets": 1
                }
              }
            ]
          }
        ]
      }
    ]
  }
}

这里的type，description，time_in_nanos，breakdown同query部分的作用是一样的，这里不做赘述。不同的是返回了一个debug的信息，这个主要描述的是聚合底层的一些特性。

breakdown

"breakdown": {
  "reduce": 0,
  "build_aggregation": 30885,
  "build_aggregation_count": 1,
  "initialize": 2623,
  "initialize_count": 1,
  "reduce_count": 0,
  "collect": 45786,
  "collect_count": 4,
  "build_leaf_collector": 18211,
  "build_leaf_collector_count": 1
}

这里也是一样的，如build_leaf_collector 表示的是调用底层方法getLeafCollector()消耗的时间，需要注意的是reduce当前是保留的，它总是会返回0。

fetch部分

当一个查询需要召回文档的时候，它就会有fetch的部分，像这样：

GET /my-index-000001/_search?filter_path=profile.shards.fetch
{
  "profile": true,
  "query": {
    "term": {
      "user.id": {
        "value": "elkbee"
      }
    }
  }
}

返回的结构会是这个样子：

{
  "profile": {
    "shards": [
      {
        "fetch": {
          "type": "fetch",
          "description": "",
          "time_in_nanos": 660555,
          "breakdown": {
            "next_reader": 7292,
            "next_reader_count": 1,
            "load_stored_fields": 299325,
            "load_stored_fields_count": 5,
            "load_source": 3863,
            "load_source_count": 5
          },
          "debug": {
            "stored_fields": ["_id", "_routing", "_source"]
          },
          "children": [
            {
              "type": "FetchSourcePhase",
              "description": "",
              "time_in_nanos": 20443,
              "breakdown": {
                "next_reader": 745,
                "next_reader_count": 1,
                "process": 19698,
                "process_count": 5
              },
              "debug": {
                "fast_path": 4
              }
            }
          ]
        }
      }
    ]
  }
}

这里time_in_nanos表示的总的消耗时间，next_reader 作用可以对比next_doc，load_source表示加载source的时间，load_stored_fields 表示加载存储字段的时间。debug列出的是杂项的非计时信息。