ElasticSearch服务索引健康状态为yellow，溯源及解决方法

三里清渢

于 2024-08-14 14:41:56 发布

阅读量3.4k

点赞数 15

文章标签： elasticsearch 大数据搜索引擎

本文链接：https://blog.csdn.net/tansenc/article/details/141188935

版权

今天发运行项目的时候发现ES的集群和索引健康状态都变成yellow黄色了，
一开始是以为Linux虚拟机磁盘空间不足

我这里是通过浏览器Elasticvue插件查看状态，也可以通过kibana
在这里插入图片描述
在将VMware里面的centOS虚拟机扩容之后，集群健康状态还是黄色，查看es
的日志，发现了一些端倪

{"type": "server", "timestamp": "2024-08-14T03:05:39,901Z", "level": "INFO", 
"component": "o.e.c.r.a.AllocationService", "cluster.name": "docker-cluster", 
"node.name": "b029464c5838", "message": "Cluster health status changed from 
[RED] to [YELLOW] (reason: [shards started [[user][0]]]).", "cluster.uuid": 
"fw7MBctsRhKs7ZXVdwIuBA", "node.id": "kVueuBHqQ9uazalV9JNEfA"  }

但这是说明集群健康状态从红色变为黄色，是和分片有关

继续调用检查api GET /_cluster/health
返回如下

{
  "cluster_name": "docker-cluster",
  "status": "yellow",
  "timed_out": false,
  "number_of_nodes": 1,
  "number_of_data_nodes": 1,
  "active_primary_shards": 2,
  "active_shards": 2,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 2,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 0,
  "active_shards_percent_as_number": 50
}

其中**“unassigned_shards”: 2**,表示未分配的分片数为二，推测可能原因就是
分片未分配，导致集群状态Yellow
再通过GET /_cluster/allocation/explain 查看unassigned 的原因,得到

{
  "index": "user",
  "shard": 0,
  "primary": false,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "REPLICA_ADDED",
    "at": "2024-08-14T03:54:24.668Z",
    "last_allocation_status": "no_attempt"
  },
  "can_allocate": "no",
  "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions": [
    {
      "node_id": "kVueuBHqQ9uazalV9JNEfA",
      "node_name": "b029464c5838",
      "transport_address": "172.19.0.2:9300",
      "node_attributes": {
        "ml.machine_memory": "6067675136",
        "xpack.installed": "true",
        "transform.node": "true",
        "ml.max_open_jobs": "20",
        "ml.max_jvm_size": "536870912"
      },
      "node_decision": "no",
      "deciders": [
        {
          "decider": "same_shard",
          "decision": "NO",
          "explanation": "a copy of this shard is already allocated to this node [[user][0], node[kVueuBHqQ9uazalV9JNEfA], [P], s[STARTED], a[id=Jj5rl1j_ROGqw3Fn4tjJxQ]]"
        }
      ]
    }
  ]
}

“primary”: false：表明这是一个副本分片，而不是主分片。主分片负责处理写入操作，而副本分片用于提供读取操作的负载均衡和高可用性。

“allocate_explanation”: “cannot allocate because allocation is not permitted to any of the nodes”：解释了为什么分片不能被分配，因为不允许将其分配到任何节点。

“explanation”: “a copy of this shard is already allocated to this node [[user][0], node[kVueuBHqQ9uazalV9JNEfA], [P], s[STARTED], a[id=Jj5rl1j_ROGqw3Fn4tjJxQ]]”：提供了不允许分配的详细解释，即该节点上已经分配了该分片的主副本，并且状态为 STARTED。

说明该索引已经有了启动的副本节点，接下来查看一下user索引的setting
GET /user/_settings

{
  "user": {
    "settings": {
      "index": {
        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data_content"
            }
          }
        },
        "number_of_shards": "1",
        "provided_name": "user",
        "creation_date": "1723366659772",
        "number_of_replicas": "1",
        "uuid": "JFWB2NceRouaAEAum7ZTzg",
        "version": {
          "created": "7120199"
        }
      }
    }
  }
}

**“number_of_replicas”: “1”**表示副本分片为1
因为我只开启了一个es服务，所以主分片运行在这时，这一个副本分片无法分配到当前服务，解决方法，将其设为0

PUT user/_settings
{
      "number_of_replicas" : 0
}

更改完成之后索引状态变为gree
在这里插入图片描述