九、ElasticSearch 运维 -集群维度

最新推荐文章于 2024-04-29 11:15:19 发布

coyote_xujie

最新推荐文章于 2024-04-29 11:15:19 发布

阅读量825

点赞数

分类专栏： Elastic 文章标签： elasticsearch 运维数据库

本文链接：https://blog.csdn.net/Wolf_xujie/article/details/131056811

版权

Elastic 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

1. 查看集群健康

用于简单的判断集群的健康状态，集群内的分片的分配迁移情况。

GET _cluster/health

-------------------------Respond-----------------------------
{
  "cluster_name" : "test-jie",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 6,
  "number_of_data_nodes" : 6,
  "active_primary_shards" : 8,
  "active_shards" : 16,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

参数说明：

Respond
- cluster_name：集群名称
- status：集群的健康状态
- timed_out：如果为 false，则在 timeout 参数指定的时间内返回(默认为30s)。
- number_of_nodes：集群当前有多少节点
- number_of_data_nodes：集群当前有多少个数据节点
- active_primary_shards：集群当前主分片总数
- active_shards：集群当前主分片和副本分片总数
- relocating_shards：集群当前有多少个正在搬迁中的分片
- initializing_shards：集群当前有多少个初始化中的分片
- unassigned_shards：集群当前有多少个还未正常分配的分片
- delayed_unassigned_shards：由于超时设置而延迟分配的分片数量
- number_of_pending_tasks：当前集群的任务堆积情况
- number_of_in_flight_fetch：未完成取回的数量 $\color{plum}{fetch 是什么意思}$
- task_max_waiting_in_queue_millis：从最早启动的任务开始等待执行的时间(以毫秒为单位)
- active_shards_percent_as_number：集群集群活跃分片百分比亦或集群恢复的进度

说明： 当 number_of_pending_tasks 数量较大时，可以说明 Master 在处理 task 时有点力不从心，承载的压力较大了。

2. 查看任务堆积详情

GET /_cat/pending_tasks

-------------------------Respond-----------------------------
insertOrder timeInQueue priority source
       1685       855ms HIGH     update-mapping [foo][t]
       1686       843ms HIGH     update-mapping [foo][t]
       1693       753ms HIGH     refresh-mapping [foo][[t]]
       1688       816ms HIGH     update-mapping [foo][t]
       1689       802ms HIGH     update-mapping [foo][t]
       1690       787ms HIGH     update-mapping [foo][t]
       1691       773ms HIGH     update-mapping [foo][t]

参数说明：

Respond
- insertOrder：任务进入队列顺序
- timeInQueue：任务在队列中等待了多长时间
- priority：任务优先级，其中优先级由大到小分别为 IMMEDIATE > URGENT > HIGH > NORMAL > LOW > LANGUID
- source：任务来源
request
- <format>：HTTP 接受报头的简短版本。有效值包括 JSON、YAML等。
- <h>：显示需要展示的列名，以逗号分隔
- <help>：如果为 true，则响应包含帮助信息。默认为 false。
- <local>：如果为 true，则请求仅从本地节点检索信息。默认为 false，即默认为从主节点检索信息。
- <master_timeout>：等待连接到主节点的时间。如果在超时之前没有收到响应，则请求失败并返回错误。默认为 30s。
- <s>：用于对响应的列名或列别名排序。列名或列别名以逗号分隔，例如 _cat/pending_tasks?s=timeInQueue:desc,insertOrder:desc。
- <time>：用于显示时间值的单位。
- <v>：如果为 true，则响应包含列标题，默认为 false。

说明： GET _cluster/health 也可以通过 "number_of_pending_tasks" 来查看集群当前任务堆积的量，而 GET/_cat/pending_tasks 可以直接查看具体是哪些任务在执行，通常创建索引的优先级是 URGENT，更新 Mapping 的优先级是 HIGH，如果数据在高压力写入时频繁更新 mapping，则会导致 pending_tasks 堆积的比较严重，对 Master 造成较大压力。此外 GET /_cluster/pending_tasks 和 GET/_cat/pending_tasks 命令相似

GET /_cluster/pending_tasks

-------------------------Respond-----------------------------
{
   "tasks": [
      {
         "insert_order": 101,
         "priority": "URGENT",
         "source": "create-index [foo_9], cause [api]",
         "time_in_queue_millis": 86,
         "time_in_queue": "86ms"
      },
      {
         "insert_order": 46,
         "priority": "HIGH",
         "source": "shard-started ([foo_2][1], node[tMTocMvQQgGCkj7QDHl3OA], [P], s[INITIALIZING]), reason [after recovery from shard_store]",
         "time_in_queue_millis": 842,
         "time_in_queue": "842ms"
      }
  ]
}

参数说明：

Respond
- tasks：多个堆积的任务
- insert_order：任务插入任务队列的数字（任务插入任务队列的顺序）
- priority：任务的优先级
- source：任务的描述内容，包括任务的内容和来源。
- time_in_queue_millis：任务等待执行的时间（毫秒）
- time_in_queue：任务等待执行的时间（毫秒）
request
- Path parameters <local>：如果设置为 true，则请求仅从当前节点检索信息。默认为 false，表示从主节点检索信息。
- Path parameters <master_timeout>：等待获得主节点相应的时间。如果在超时之前没有收到响应，则请求失败并返回一个错误。这里的默认值为 30s（time units）。

3. 查看集群元数据状态信息

GET /_cluster/state/<metrics>/<target>

GET /_cluster/state/metadata,routing_table

-------------------------Respond-----------------------------
{
  "cluster_name" : "test-jie",
  "cluster_uuid" : "UpaYjtZbQ1-40SXOxFJ1aw",
  "metadata" : {
    "cluster_uuid" : "UpaYjtZbQ1-40SXOxFJ1aw",
    "cluster_uuid_committed" : false,
    "cluster_coordination" : {
      "term" : 1,
      "last_committed_config" : [
        "mPoDzYzoTxqLMJgVPigFpQ",
        "pxz2eQt4TRmmgm1LH-Xy0A",
        "tHuo1RPETSOOJkjtLFSd1w"
      ],
      "last_accepted_config" : [
        "mPoDzYzoTxqLMJgVPigFpQ",
        "pxz2eQt4TRmmgm1LH-Xy0A",
        "tHuo1RPETSOOJkjtLFSd1w"
      ],
      "voting_config_exclusions" : [ ]
    },
    "templates" : { },
    "indices" : { },
    "index-graveyard" : {
      "tombstones" : [ ]
    }
  },
  "routing_table" : {
    "indices" : { }
  }
}

参数说明：

Respond
- Path parameters <metrics>
  - _all：显示所有的观测指标。
  - blocks：在响应中显示 blocks 部分。
  - master_node：在响应中显示选出的 master_node 部分。
  - metadata：在响应中显示 metadata 部分。如果提供以逗号分隔的索引列表，则返回的输出将只包含这些索引的元数据。
  - nodes：在响应中显示选出的 nodes 部分。
  - routing_nodes：在响应中显示选出的 routing_nodes 部分。如果提供以逗号分隔的索引列表，则返回的输出将只包含这些索引的 routing_nodes。
  - routing_table：在响应中显示选出的 routing_table 部分。
  - version：显示集群状态版本
- Path parameters <target>：用一个以逗号分隔的列表来指定 data streams、indices、index aliases 。支持使用 * 作为通配符。需要查询集群内的所有 data streams 和 indices 时，可以忽略该参数，或者使用 _all 和 *。
- Query parameters：
  - allow_no_indices：如果设置为 true，则允许使用 * 的表达式没有匹配到任何索引。默认为 true
  - expand_wildcards：是否更全面的展示使用 * 的表达式匹配到的索引，这些索引有可能处于 open、closed 状态，亦或是这两种状态都展示。expand_wildcards 的可选参数为：open、closed、none、all。
  - flat_settings：如果设置为 true，则以扁平的方式返回结果。默认为 true
  - ignore_unavailable：如果设置为 true，则不可用的索引（closed 或 missing）被。
  - local：如果设置为 true，则请求仅从当前节点检索信息。默认为 false，表示从主节点检索信息。
  - master_timeout：等待获得主节点相应的时间。如果在超时之前没有收到响应，则请求失败并返回一个错误。这里的默认值为 30s（time units）。
  - wait_for_metadata_version：
  - wait_for_timeout：指定获取 wait_for_metadata_version 响应的最大等待时间。
request
- ``：

说明： 此 API 可以获取到集群维度非常丰富的元数据相关信息。

集群中节点的集合：包括节点的名称、ip、tcp/http端口号、节点属性等信息。
所有集群级别的设置
集群中索引的信息，包括它们的映射和设置：包括索引的模板信息、索引的分片路由信息、快照等等
集群中所有分片的位置。

例1： 通过在 routing_table 获取 myindex 索引下每个分片的详细路由信息。

GET /_cluster/state/routing_table/myindex

-------------------------Respond-----------------------------
{
  "cluster_name" : "test-jie",
  "cluster_uuid" : "UpaYjtZbQ1-40SXOxFJ1aw",
  "routing_table" : {
    "indices" : {
      "myindex" : {
        "shards" : {
          "1" : [
            {
              "state" : "STARTED",
              "primary" : false,
              "node" : "Wf2vvHoLR06HjT94Jmty3A",
              "relocating_node" : null,
              "shard" : 1,
              "index" : "myindex",
              "allocation_id" : {
                "id" : "clhDgwqtTzuLY3kn-0QpVg"
              }
            },
            {
              "state" : "STARTED",
              "primary" : true,
              "node" : "eezg7elkSz2m-ldmw9YuUg",
              "relocating_node" : null,
              "shard" : 1,
              "index" : "myindex",
              "allocation_id" : {
                "id" : "x_ukq_dOQEuBRhogRbg3Fw"
              }
            }
          ],
          "0" : [
            {
              "state" : "STARTED",
              "primary" : true,
              "node" : "Wf2vvHoLR06HjT94Jmty3A",
              "relocating_node" : null,
              "shard" : 0,
              "index" : "myindex",
              "allocation_id" : {
                "id" : "uzGTFd7aR1qSqZvhB2fPfA"
              }
            },
            {
              "state" : "STARTED",
              "primary" : false,
              "node" : "eezg7elkSz2m-ldmw9YuUg",
              "relocating_node" : null,
              "shard" : 0,
              "index" : "myindex",
              "allocation_id" : {
                "id" : "1xzc2kB9RA2puUjdS1LYyQ"
              }
            }
          ]
        }
      }
    }
  }
}

例2： 通过 routing_nodes 获取集群的节点路由信息。包括每个节点上分配的索引名称以及具体分片信息，分配状态等详情。

GET /_cluster/state/routing_nodes

-------------------------Respond-----------------------------
这里仅作部分展示
{
  "cluster_name" : "test-jie",
  "cluster_uuid" : "UpaYjtZbQ1-40SXOxFJ1aw",
  "routing_nodes" : {
    "unassigned" : [
      {
        "state" : "UNASSIGNED",
        "primary" : false,
        "node" : null,
        "relocating_node" : null,
        "shard" : 0,
        "index" : "tf_b_trade_index_prod",
        "recovery_source" : {
          "type" : "PEER"
        },
        "unassigned_info" : {
          "reason" : "REPLICA_ADDED",
          "at" : "2023-06-19T10:28:04.271Z",
          "delayed" : false,
          "allocation_status" : "no_attempt"
        }
      }
    ],
    "nodes" : {
      "Wf2vvHoLR06HjT94Jmty3A" : [
        {
          "state" : "STARTED",
          "primary" : false,
          "node" : "Wf2vvHoLR06HjT94Jmty3A",
          "relocating_node" : null,
          "shard" : 2,
          "index" : "test_06_09_002",
          "allocation_id" : {
            "id" : "ihDQXHhoQgS__wM4P99_Yw"
          }
        }
      ],
      "1vxkefLiR7CR5fTrfdYi4Q" : [
        {
          "state" : "STARTED",
          "primary" : false,
          "node" : "1vxkefLiR7CR5fTrfdYi4Q",
          "relocating_node" : null,
          "shard" : 0,
          "index" : "tf_b_trade_index_prod",
          "allocation_id" : {
            "id" : "WT6QioHqSx6c6d22nm17hw"
          }
        }
      ]
    }
  }
}

4. 查看集群指标统计信息

GET /_cluster/stats

GET /_cluster/stats/nodes/<node_filter>

-------------------------Respond-----------------------------
仅展示 Respond 结构
{
  "_nodes": {
    "total": 6,
    "successful": 6,
    "failed": 0
  },
  "cluster_name": "test-jie",
  "cluster_uuid": "UpaYjtZbQ1-40SXOxFJ1aw",
  "timestamp": 1687422437426,
  "status": "yellow",
  "indices": {},
  "nodes": {}
}

参数说明：

Respond
- Path parameters <node_filter>：用于指定由 node filters 确定的节点列表，多个 node filters 由 , 分隔。默认为集群中的所有节点。
- Query parameters <timeout>：等待各个节点响应的时间。如果节点在超时前没有相应，则在响应的结果中不包含该节点的统计信息，超时的节点会在响应的 _nodes.failed 中。默认没有超时时间。
- Query parameters <timeout>：

说明： 该API展示了集群维度统计的相关指标信息。例如索引分片数量、存储大小、内存使用率、磁盘使用率等信息，以及集群节点数量、节点角色、属性、jvm版本、内存使用率、cpu使用率等监控信息。这个 API 返回的相关内容太多了，详细查看在这里。

例1：

5. 查看集群分片分配详情

GET _cluster/allocation/explain

-------------------------Respond-----------------------------
{
  "index" : "tf_b_trade_index_prod",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "REPLICA_ADDED",
    "at" : "2023-06-19T10:28:04.271Z",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "1vxkefLiR7CR5fTrfdYi4Q",
      "node_name" : "test-jie-node-du99d-g8bz.374423377486-test-vision-xx-test-es.dcos.xixian.unicom.local",
      "transport_address" : "10.125.129.42:34866",
      "node_attributes" : {
        "ml.machine_memory" : "4294967296",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "filter",
          "decision" : "NO",
          "explanation" : """node matches cluster setting [cluster.routing.allocation.exclude] filters [_name:"test-jie-node-du99d-g8bz.374423377486-test-vision-xx-test-es.dcos.xixian.unicom.local"]"""
        },
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to this node [[tf_b_trade_index_prod][0], node[1vxkefLiR7CR5fTrfdYi4Q], [R], s[STARTED], a[id=WT6QioHqSx6c6d22nm17hw]]"
        }
      ]
    }
  ]
}

参数说明：

Request
- Query parameters
  - <include_disk_info>：如果设置为 true，则返回有关磁盘使用情况和分片大小的信息，默认为false
  - <include_yes_decisions>：在响应的 node_allocation_decisions 中返回 YES 决策
- Request body
  - current_node：指定节点ID 或节点名，只解释当前位于指定节点上的分片。
  - index：指定索引名称
  - shard：指定需要解释的分片的ID。
  - primary：如果设置为 true，则返回指定 ID分片的主分片的解释描述
Respond
- current_state：分片当前的状态
- unassigned_info.reason：分片最初变成 unassigned 状态的原因
- can_allocate：是否分配分片
- node_allocation_decisions.node_decision：是否分片到特定的节点
- node_allocation_decisions.deciders.decider：导致节点做出 no 的决定的决策器
- node_allocation_decisions.deciders.explanation：关于节点做出 no 决策的原因提示。
- configured_delay：设置由于持有复制分片的节点离开集群导致的副本分片延迟分配的时间。
- remaining_delay：分配副本分片的剩余时间
- node_allocation_decisions.store：node_allocation_decisions.node_id 上该分片数据信息
- can_remain_on_current_node：当前分片是否能继续存储在当前节点上
- can_remain_decisions：决定分片不能继续存储在当前节点上的 deciders，并在 can_remain_decisions.explanation给出解释。
- can_move_to_other_node：分片是否允许分配到其它节点。
- can_rebalance_cluster：集群是否允许数据重平衡。
- can_rebalance_to_other_node：是否可以平衡到集群的其他节点。
- node_decision：分片不能重平衡到其他节点的原因。

说明： 此 API 主要用来查看集群中未分配分片的具体原因，对于诊断集群健康状态异常具有非常大的帮助。如果集群本身是 green 的状态，即所有索引分片都已经正常分配了，执行该 API 则会返回 400 的错误。

例1： 指定查询 索引的分片，未正常分配的原因。

GET _cluster/allocation/explain
{
  "index":"tf_b_trade_index_prod",
  "shard": 0,
  "primary": true
}

-------------------------Respond-----------------------------
{
  "index" : "tf_b_trade_index_prod",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "REPLICA_ADDED",
    "at" : "2023-06-19T10:28:04.271Z",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "1vxkefLiR7CR5fTrfdYi4Q",
      "node_name" : "test-jie-node-du99d-g8bz.374423377486-test-vision-xx-test-es.dcos.xixian.unicom.local",
      "transport_address" : "10.125.129.42:34866",
      "node_attributes" : {
        "ml.machine_memory" : "4294967296",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "filter",
          "decision" : "NO",
          "explanation" : """node matches cluster setting [cluster.routing.allocation.exclude] filters [_name:"test-jie-node-du99d-g8bz.374423377486-test-vision-xx-test-es.dcos.xixian.unicom.local"]"""
        },
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to this node [[tf_b_trade_index_prod][0], node[1vxkefLiR7CR5fTrfdYi4Q], [R], s[STARTED], a[id=WT6QioHqSx6c6d22nm17hw]]"
        }
      ]
    },
    {
      "node_id" : "Wf2vvHoLR06HjT94Jmty3A",
      "node_name" : "test-jie-node-6wzhl-owr3.374423377486-test-vision-xx-test-es.dcos.xixian.unicom.local",
      "transport_address" : "10.125.128.45:34866",
      "node_attributes" : {
        "ml.machine_memory" : "4294967296",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to this node [[tf_b_trade_index_prod][0], node[Wf2vvHoLR06HjT94Jmty3A], [R], s[STARTED], a[id=_BN9sUDDSwaCXWBF6LLQEg]]"
        }
      ]
    },
    {
      "node_id" : "eezg7elkSz2m-ldmw9YuUg",
      "node_name" : "test-jie-node-48n88-7kn8.374423377486-test-vision-xx-test-es.dcos.xixian.unicom.local",
      "transport_address" : "10.125.128.58:34866",
      "node_attributes" : {
        "ml.machine_memory" : "4294967296",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to this node [[tf_b_trade_index_prod][0], node[eezg7elkSz2m-ldmw9YuUg], [P], s[STARTED], a[id=l9prWOymQcOufR0rzqhb9g]]"
        }
      ]
    }
  ]
}

6. 更改分片分配

POST /_cluster/reroute

-------------------------Respond-----------------------------
{
  "acknowledged" : true,
  "state" : {
    "cluster_uuid" : "UpaYjtZbQ1-40SXOxFJ1aw",
    "routing_table" : {
      "indices" : {
        ".kibana_security_session_1" : {
          "shards" : {
            "0" : [
              {
                "state" : "STARTED",
                "primary" : false,
                "node" : "Wf2vvHoLR06HjT94Jmty3A",
                "relocating_node" : null,
                "shard" : 0,
                "index" : ".kibana_security_session_1",
                "allocation_id" : {
                  "id" : "qoyfmdxwROWTY6SuUkPovQ"
                }
              },
              {
                "state" : "STARTED",
                "primary" : true,
                "node" : "eezg7elkSz2m-ldmw9YuUg",
                "relocating_node" : null,
                "shard" : 0,
                "index" : ".kibana_security_session_1",
                "allocation_id" : {
                  "id" : "yqLDaoYqT8m-qi4xOC15bg"
                }
              }
            ]
          }
        }
      }
    }
  }
}

参数说明：

Request
- Query parameters：
  - dry_run：如果设置为 true，则该请求模拟执行操作指令，并返回结果状态
  - explain：如果设置为 true，则响应返回操作指令可以执行或不可以执行的原因。
  - metric：限制返回的测度信息。默认为除元数据外的所有选项。可以有以下选项
    - _all：展示所有的测度信息
    - blocks：仅展示 blocks 部分
    - master_node：仅展示 master_node 部分
    - metadata：仅展示 metadata 部分。如果提供以逗号分隔的索引列表，则返回的输出将只包含列表中索引的元数据。
    - nodes：仅展示 nodes 部分。
    - routing_table：仅展示 routing_table 部分。
    - version：显示集群状态版本。
  - retry_failed：如果设置为 true，则重新尝试分配集群中分配失败的分片。
- Request parameters：
  - commands
    - move：将分片从一个节点移动到另一个节点。index 参数指定索引名称、shard 参数指定分片序号、from_node 是分片的源节点、to_node 是分片的目标节点。
    - cancel：取消分片分配(或恢复)。index 参数指定索引名称、shard 参数指定分片序号、node 是分片被指定分配到的节点。可以通过取消现有的副本分片，并允许通过标准恢复过程重新初始化它们，从而强制从主分片重新同步现有副本。默认仅可以取消副本的分片，如果需要取消主分片则需要增加 allow_primary 参数。
    - allocate_replica：将未分配的副本分片分配给节点。index 参数指定索引名称、shard 参数指定分片序号、node 是分片被指定分配到的节点。指定分配也会考虑分配决策（重平衡啦、水位线啦等等）
    - allocate_stale_primary：将主分片分配给包含过期副本分片的节点。index 参数指定索引名称、shard 参数指定分片序号、node 是分片被指定分配到的节点、accept_data_loss: true 表示接受数据丢失。使用该命令可能导致这个分片的数据丢失。如果包含正确数据分片的节点过一会加入集群后，这个包含正确数据的分片将会被删除或者被过期的数据分片所覆盖。因为该操作的风险极大因此 accept_data_loss: true 是必须被显示的表明。
    - allocate_empty_primary：分配一个空的主分片到节点上。将主分片分配给包含过期副本分片的节点。index 参数指定索引名称、shard 参数指定分片序号、node 是分片被指定分配到的节点、accept_data_loss: true 表示接受数据丢失。该指令会导致该分片的所有数据完全丢失（如果该分片之前存在数据）。包含正确数据分片的节点过一会加入集群，该分片的数据也会被完全删除。因为该操作的风险极大因此 accept_data_loss: true 是必须被显示的表明。
    - 注：分配主分片的命名应该谨慎的使用，通常来说主分片都是由 Elasticsearch 完全自动处理。不能自动分配主分片的原因包括：
      - 创建了一个新索引，但没有符合分配决策的节点。
      - 在集群的当前数据节点上找不到包含最新数据的副本分片。为防止数据丢失，系统不会自动将包含过期数据的副本分片提升为主分片。

说明： 该命令允许我们手动改变集群中某些分片的分配策略，可以将分片从一个节点移动到另一个节点、可以取消分配、也可以将未分配的分片分配给特定的节点。需要注意的是，如果 cluster.routing.rebalance.enable: true 的话，每次执行完任何 routing 的命令后，都会执行 rebalancing，也就是说例，如果请求的 routing 将一个分片从 node1 移动到 node2，那么这可能会导致一个分片从 node2 移动回 node1，使整个集群均衡。因此 Query parameter 参数 ?dry_run 和 Request body 参数 "dry_run": true 都可以计算指令在当前集群执行后，返回执行当前指令(及重新平衡)的集群状态，但不会实际执行更改的请求。

例1：move 移动 myindex 索引的 0 分片，从 test-jie-node-48n88-7kn8.374423377486-test-vision-xx-test-es.dcos.xixian.unicom.local 节点到 test-jie-node-du99d-g8bz.374423377486-test-vision-xx-test-es.dcos.xixian.unicom.local。

POST /_cluster/reroute
{
  "commands": [
    {
      "move": {
        "index": "myindex", 
        "shard": 0,
        "from_node": "test-jie-node-48n88-7kn8.374423377486-test-vision-xx-test-es.dcos.xixian.unicom.local", 
        "to_node": "test-jie-node-du99d-g8bz.374423377486-test-vision-xx-test-es.dcos.xixian.unicom.local"
      }
    },
    {
      "cancel": {
        "index": "myindex", 
        "shard": 1,
        "node": "test-jie-node-6wzhl-owr3.374423377486-test-vision-xx-test-es.dcos.xixian.unicom.local"
      }
    }
  ]
}
-------------------------Respond-----------------------------
仅做粗略的展示
{
  "acknowledged" : true,
  "state" : {
    "cluster_uuid" : "UpaYjtZbQ1-40SXOxFJ1aw",
    "version" : 1021,
    "state_uuid" : "UbVk9J2iQRiv16FScTh3IQ",
    "master_node" : "pxz2eQt4TRmmgm1LH-Xy0A",
    "blocks" : { },
    "nodes" : { },
    "routing_table" : { },
    "routing_nodes" : { },
    "security_tokens" : { }
  }
}

移动后
在这里插入图片描述

例2：allocate_stale_primary 当主分片由于 version 较旧原因未能正常分配的时候。可以通过 allocate_stale_primary 属性来使包含过期数据的主分片正常分配，这个过程很有可能会丢失部分数据。这里是重新分配 myindex 索引的 0 分片到 test-jie-node-6wzhl-owr3.374423377486-test-vision-xx-test-es.dcos.xixian.unicom.local 节点。注：执行该命令的时候，需要通过 GET <index>/_shard_stores?pretty&format=json&filter_path=indices.<index>.shards.<shard> 查找包含过期数据的主分片所在的节点，查找到相关的节点后，然后再进行 allocate_stale_primary，也就是说必须是保存了包含过期数据的主分片所在的节点，而不是其他节点，否则会导致完全丢失数据。

POST /_cluster/reroute
{
    "commands" : [
        {
          "allocate_stale_primary" : {
              "index" : "myindex", 
              "shard" : 0,
              "node" : "test-jie-node-6wzhl-owr3.374423377486-test-vision-xx-test-es.dcos.xixian.unicom.local",
              "accept_data_loss": true
          }
        }
    ]
}
-------------------------Respond-----------------------------

例3：allocate_empty_primary 分配一个不包含任何数据的主分片到一个节点上。通常在运维过程中，经常会遇到有些索引由于没有设置副本，且又遇到文件损坏，导致主分片分配不了的情况。针对以上情况，如果客户能够接受丢失一个分片的数据，则可以通过 allocate_empty_primary 来分配一个包含空数据的主分片，让集群恢复 green。这里是分配 myindex 索引的 0 分片到 test-jie-node-48n88-7kn8.374423377486-test-vision-xx-test-es.dcos.xixian.unicom.local 节点上，且 myindex 索引的 0 分片上的数据全部丢失。

POST _cluster/reroute
{
  "commands": [
    {
      "allocate_empty_primary": {
        "index": "myindex",
        "shard": 0,
        "node": "test-jie-node-48n88-7kn8.374423377486-test-vision-xx-test-es.dcos.xixian.unicom.local",
        "accept_data_loss": true
      }
    }
  ]
}
-------------------------Respond-----------------------------

7. 查看和设置集群settings信息

# 查看集群设置
GET /_cluster/settings?include_defaults&flat_settings

# 更新、设置集群设置
PUT /_cluster/settings

-------------------------Respond-----------------------------
# 查看集群设置的响应
{
  "persistent" : { },
  "transient" : {
    "cluster" : {
      "routing" : {
        "allocation" : {
          "node_concurrent_recoveries" : "8"
        }
      }
    },
    "indices" : {
      "recovery" : {
        "max_bytes_per_sec" : "80mb"
      }
    }
  }
}

参数说明：

Request
- Query parameters：
  - flat_settings：如果设置为 true，则以扁平的形式返回结果，默认为 false。
  - include_defaults：如果设置为 true，则返回所有集群的默认设置，默认为 false。
  - master_timeout：等待获得主节点相应的时间。如果在超时之前没有收到响应，则请求失败并返回一个错误。这里的默认值为 30s（time units）。
  - timeout：等待查询结果响应的时间。如果在超时之前没有收到响应，则请求失败并返回一个错误。这里的默认值为 30s（time units）。
Respond
- ``：

说明： 此 API 可以更新集群设置，对设置的更新可以是持久化的，也可以是暂时的。持久化的集群设置，在集群完全重启后仍然有效，暂时的集群设置，在集群完全重启后就失效了。无论是持久化的集群设置还是暂时的集群设置，都可以通过将值指定为 null 来重置设置。此外集群设置的优先顺序为：

transient cluster settings
persistent cluster settings
settings in the elasticsearch.yml configuration file.

官网中推荐使用此 API 来设置所有集群设置，而 elasticsearch.yml 仅用来设置本地配置，这样就可以确保在所有节点上的集群设置是完全相同的。此外，使用配置文件在不同的节点上定义不同的设置，很难注意到这些差异。

例1： 设置集群分片搬迁并发度和最大传输速度

PUT /_cluster/settings
{
    "persistent" : {
        "cluster.routing.allocation.node_concurrent_recoveries": "8",
        "cluster.routing.allocation.cluster_concurrent_rebalance": "8",
        "indices.recovery.max_bytes_per_sec": "80mb"
    },
    "transient" : {
        "cluster.routing.allocation.node_concurrent_recoveries": "8",
        "cluster.routing.allocation.cluster_concurrent_rebalance": "8",
        "indices.recovery.max_bytes_per_sec": "80mb"
    }
}
-------------------------Respond-----------------------------

例2： 设置集群磁盘使用率水位线

PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.high":"95%",
    "cluster.routing.allocation.disk.watermark.low":"93%"
  },
  "transient": {
    "cluster.routing.allocation.disk.watermark.high":"95%",
    "cluster.routing.allocation.disk.watermark.low":"93%"
  }
}
-------------------------Respond-----------------------------

例3： 将某个节点上的数据驱逐

PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.exclude._name": "test-jie-node-48n88-7kn8.374423377486-test-vision-xx-test-es.dcos.xixian.unicom.local"
  },
  "transient": {
    "cluster.routing.allocation.exclude._name": "test-jie-node-48n88-7kn8.374423377486-test-vision-xx-test-es.dcos.xixian.unicom.local"
  }
}
-------------------------Respond-----------------------------

例4： 设置集群最小选主节点数

PUT _cluster/settings
{
  "persistent": {
    "discovery.zen.minimum_master_nodes": 2
  },
  "transient": {
    "discovery.zen.minimum_master_nodes": 2
  }
}
-------------------------Respond-----------------------------

例5： 将集群设置为只读

PUT _cluster/settings
{
   "transient": {
       "cluster.blocks.read_only_allow_delete": true
   }
}
-------------------------Respond-----------------------------

例6： 开启 xpack 集群监控采集

PUT _cluster/settings
{
  "persistent": {
    "xpack.monitoring.collection.enabled": true,
    "xpack.monitoring.collection.interval": "10s" // 设置采集周期
  }
}
-------------------------Respond-----------------------------

例7： 关闭集群通配符模式

PUT _cluster/settings
{
"transient": {
  "action.destructive_requires_name": "false"
}
}
-------------------------Respond-----------------------------

例8： 设置集群中每个节点能够包含的最大分片数

PUT _cluster/settings
{
  "transient": {
    "cluster.max_shards_per_node": 10000
  }
}
-------------------------Respond-----------------------------

例9： 设置集群 mapping 更新超时时间

PUT _cluster/settings
{
  "transient": {
    "indices.mapping.dynamic_timeout": "20s"
  }
}
-------------------------Respond-----------------------------

例10： 开启集群自动创建索引

PUT _cluster/settings
{
  "transient": {
    "action.auto_create_index": true
  }
}

-------------------------Respond-----------------------------

8. 查看集群任务详情

GET /_tasks/<task_id>

GET /_tasks
-------------------------Respond-----------------------------
{
  "nodes" : {
    "oTUltX4IQMOUUVeiohTt8A" : {
      "name" : "H5dfFeA",
      "transport_address" : "127.0.0.1:9300",
      "host" : "127.0.0.1",
      "ip" : "127.0.0.1:9300",
      "tasks" : {
        "oTUltX4IQMOUUVeiohTt8A:124" : {
          "node" : "oTUltX4IQMOUUVeiohTt8A",
          "id" : 124,
          "type" : "direct",
          "action" : "cluster:monitor/tasks/lists[n]",
          "start_time_in_millis" : 1458585884904,
          "running_time_in_nanos" : 47402,
          "cancellable" : false,
          "parent_task_id" : "oTUltX4IQMOUUVeiohTt8A:123"
        },
        "oTUltX4IQMOUUVeiohTt8A:123" : {
          "node" : "oTUltX4IQMOUUVeiohTt8A",
          "id" : 123,
          "type" : "transport",
          "action" : "cluster:monitor/tasks/lists",
          "start_time_in_millis" : 1458585884904,
          "running_time_in_nanos" : 236042,
          "cancellable" : false
        }
      }
    }
  }
}

参数说明：

Request
- Path parameters <task_id>：返回的任务ID（node_id:task_number）
- Query parameters：
  - actions：一个由 , 分割的 action 和 wildcard expression（通配符表达式）组成的列表，用于限制响应返回的 action。
  - detailed：如果设置为 true，则响应中包含分片回复的细节信息，默认为 false。
  - group_by：用于对响应中的任务进行分组，其中可选的参数有：
    - nodes：节点 id
    - parents：父任务 id
    - none：不对任务做分组
  - node_id：一个由 , 分割的 node id 和 node name 组成的列表，用于限制响应返回的信息。
  - parent_task_id：用于指定父任务id。
  - master_timeout：等待获得主节点相应的时间。如果在超时之前没有收到响应，则请求失败并返回一个错误。这里的默认值为 30s（time units）。
  - timeout：等待查询结果响应的时间。如果在超时之前没有收到响应，则请求失败并返回一个错误。这里的默认值为 30s（time units）。
  - wait_for_completion：如果设置为 true，则响应被阻塞，知道操作完成，默认为 false。
Respond
- ``：

说明： 获取到集群各节点上当前正在执行的任务信息。

例1： 等待 id 为 oTUltX4IQMOUUVeiohTt8A:12345 的任务完成，或等待时间超过 10s。

GET _tasks/oTUltX4IQMOUUVeiohTt8A:12345?wait_for_completion=true&timeout=10s

-------------------------Respond-----------------------------

例2： 取消 id 为 oTUltX4IQMOUUVeiohTt8A:12345 的任务

POST _tasks/oTUltX4IQMOUUVeiohTt8A:12345/_cancel

-------------------------Respond-----------------------------

例3： 消在 node1 和 node2 节点上运行的所有 reindex 任务

POST _tasks/_cancel?nodes=node1,node2&actions=*reindex

-------------------------Respond-----------------------------

coyote_xujie

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
九、ElasticSearch 运维 -集群维度

用于简单的判断集群的健康状态，集群内的分片的分配迁移情况。statustimed_outfalsetimeoutfetch是什么意思当数量较大时，可以说明 Master 在处理 task 时有点力不从心，承载的压力较大了。
复制链接

扫一扫