es集群状态yellow排查

最新推荐文章于 2024-06-27 19:06:13 发布

馥影

最新推荐文章于 2024-06-27 19:06:13 发布

阅读量6.6k

点赞数 1

分类专栏： ElasticSearch 文章标签： elasticsearch

本文链接：https://blog.csdn.net/Ghost_chou/article/details/109879747

版权

ElasticSearch 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

问题背景：

项目中全文检索接口响应时间超30s，排查接口逻辑，耗时主要花在es查询上，故对es集群进行排查。把接口请求生成的dsl拿去kibana中执行，发现响应时间确实太长，于是开始排查es健康问题

通过es命令对集群情况进行分析，得到以下结果：

1.集群健康状况为yellow，存在大量副本分片未分配情况；

{
  "cluster_name" : "cdb*",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : ***,
  "number_of_data_nodes" : ***,
  "active_primary_shards" : ***,
  "active_shards" : ***,
  "relocating_shards" : ***,
  "initializing_shards" : ***,
  "unassigned_shards" : 214, // ~注意看这里
  "delayed_unassigned_shards" : ***,
  "number_of_pending_tasks" : ***,
  "number_of_in_flight_fetch" : ***,
  "task_max_waiting_in_queue_millis" : ***
}

2.某个节点因位置原因导致连接不上，集群触发分片恢复；(1.把所有丢失的副本分片重新分配到集群其他健康节点中2.rebalancing操作)

{
	"unassigned_info": {
		"reason": "NODE_LEFT",
		"at": "2020-11-20T03:12:16",
		"details": "node_left ***",
		"last_allocation_status": "no_attempt"
	}
}

3.分片恢复并发数（源节点并发数和目标节点并发数）使用的默认设置，导致分片恢复并发拉满，恢复速度过慢；

（cluster.routing.allocation.node_concurrent_incoming_recoveries=2、cluster.routing.allocation.node_concurrent_outgoing_recoveries=2）

问题描述：
{
	"node_id": "***",
	"node_name": "mastersha",
	"transport_address": "***",
	"node_decision": "throttled",
	"deciders": [{
		"decider": "throttling",
		"decision": "THROTTLE",
		"explanation": "reached the limit of incoming shard recoveries [2], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
	}]
}

{
	"node_id": "***",
	"node_name": "master",
	"transport_address": ***,
	"node_decision": "no",
	"store": {
		"matching_sync_id": true
	},
	"deciders": [{
			"decider": "same_shard",
			"decision": "NO",
			"explanation": "the shard cannot be allocated to the same node on which a copy of the shard already exists [[index_execution][2], node[***], [P], s[STARTED], a[id=***]]"
		},
		{
			"decider": "throttling",
			"decision": "THROTTLE",
			"explanation": "reached the limit of outgoing shard recoveries [2] on the node [***] which holds the primary, cluster setting [cluster.routing.allocation.node_concurrent_outgoing_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
		}
	]
}

注:

ES性能分析用到的一些DSL命令：

GET _cat/health
GET _cluster/health
GET _cat/nodes
GET _cluster/health?level=indices
GET _cluster/health?level=shards
GET _cluster/allocation/explain
GET _cat/indices
GET _cluster/state

馥影

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
es集群状态yellow排查

问题背景：项目中全文检索接口响应时间超30s，排查接口逻辑，耗时主要花在es查询上，故对es集群进行排查。把接口请求生成的dsl拿去kibana中执行，发现响应时间确实太长，于是开始排查es健康问题通过es命令对集群情况进行分析，得到以下结果：1.集群健康状况为yellow，存在大量副本分片未分配情况；{ "cluster_name" : "cdb*", "status" : "yellow", "timed_out" : false, "number_of_nodes"
复制链接

扫一扫