elasticsearch副本分片UNASSIGNED
通过命令查看
curl -XGET http://127.0.0.1:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
unassigned.reason为"ALLOCATION_FAILED"。
UNASSIGNED状态的错误原因有很多,需要根据具体的情况具体分析;我遇到的这个情况起初是因为磁盘空间使用占比超过80%引起;
要解决该问题,有几个途径:
1、增加服务器的磁盘空间,对状态为UNASSIGNED的副本分配进行重新分配;
针对单个索引副本分片的重新分配可根据当前索引副本分片数量加减进行触发,也可以由ES自行调整。
单个索引副本分片调整(如果不想出现不可控的情况,请一定指定具体的单个分片):
PUT es_index_name01/_settings { "index" : { "number_of_replicas" : 0 }}
PUT es_index_name01/_settings { "index" : { "number_of_replicas" : 1 }}
2、删除或备份你的数据之后,将部分数据移除;
3、横向扩容,需要根据es的分片数量来确认是否可行。可参考:分片数量设置
我遇到的报错的具体信息,在es_head中可以看到,如下(替换了部分重要信息):
failed shard on node [AAAAAAAAAA]: failed recovery, failure RecoveryFailedException[[index_2021_09_01][1]:
Recovery failed
from
{es-test3}{bbBBBBBBBBB}{sdiowwoewe}{ml.machine_memory=16944689152, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
into
{es-test1}{ccccccccccc}{ccccccddddd}{ml.machine_memory=16944689152, xpack.installed=true, ml.max_open_jobs=20,
ml.enabled=true}];
nested: RemoteTransportException[[es-test3][192.168.133.139:9300][internal:index/shard/recovery/start_recovery]];
nested: RecoveryEngineException[Phase[1] phase1 failed];
nested: RecoverFilesRecoveryException[Failed to transfer [66] files with total size of [871.5mb]];
nested: RemoteTransportException[[es-test1][192.168.133.138:9300][internal:index/shard/recovery/file_chunk]];
nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [7491823295/6.9gb]
, which is larger than the limit of [7491787161/6.9gb],
usages [request=1064/1kb, fielddata=599600/585.5kb, in_flight_requests=524437/512.1kb, accounting=7490698194/6.9gb]] ;
解决方案:经过确认,当前es集群分片数量大于集群的data node数量,所以采用的第三种解决方案,扩容data node节点,扩容后针对错误的分片手动进行了副本分片的调整。最终问题解决。
记录一下