【Elasticsearch】Data too large, data for which is larger than the limit of

561 篇文章 547 订阅 ¥79.90 ¥99.00
本文介绍了在Elasticsearch 7.3.2中遇到的'Data too large'异常,详细解析了异常产生的原因,涉及内存使用限制、百分比阈值以及breaker配置。通过调整indices.breaker.total.use_real_memory设置为false并滚动重启集群,成功避免了异常出现,提高了集群在业务压测时的稳定性。
摘要由CSDN通过智能技术生成

在这里插入图片描述

1.概述

参考:elasticsearch报Data too large异常处理

在线上ES集群日志中发现了如下异常,elasticsearch版本为7.3.2

[2021-03-16T21:05:10,338][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [java-d-service-es-200-56-client-1] failed to execute on node [hsF4JzeAQ6mflJRGnJIKzQ]
org.elasticsearch.transport.RemoteTransportException: [data-es-group-online-200-67-2][10.110.200.67:9301][cluster:monitor/nodes/info[n]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [33093117638/30.8gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33093114144/30.8gb], new bytes reserved: [3494/3.4kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=3494/3.4kb, accounting=104564949/99.7mb]
 at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:342) ~[elasticsearch-7.3.2.jar:7.3.2]
 at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-7.3.2.jar:7.3.2]
 at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:173) [elasticsearch-7.3.2.jar:7.3.2]
 at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:121) [elasticsearch-7.3.2.jar:7.3.2]
 at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:105) [elasticsearch-7.3.2.jar:7.3.2]
 at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:660) [elasticsearch-7.3.2.jar:7.3.2]
 at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62) [transport-netty4-client-7.3.2.jar:7.3.2]
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
 at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
 at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) [netty-codec-4.1.36.Final.jar:4.1.36.Final]
 at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.36.Final.jar:4.1.36.Final]
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
 at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
 at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.36.Final.jar:4.1.36.Final]
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
 at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
 at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
 at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
 at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
 at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:682) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
 at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
 at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
 at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) [netty-common-4.1.36.Final.jar:4.1.36.Final]
 at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.36.Final.jar:4.1.36.Final]
 at java.lang.Thread.run(Thread.java:835) [?:?]
[2021-03-16T21:05:11,203][INFO ][o.e.x.s.a.AuthenticationServi

拉下ES源码,报错类位置org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService,具体代码如下:

public void checkParentLimit(long newBytesReserved, String label) throws CircuitBreakingException {
     final MemoryUsage memoryUsed = memoryUsed(newBytesReserved);
     long parentLimit = this.parentSettings.getLimit();
     if (memoryUsed.totalUsage > parentLimit) {
         this.parentTripCount.incrementAndGet();
         final StringBuilder message = new StringBuilder("[parent] Data too large, data for [" + label + "]" +
                 " would be [" + memoryUsed.totalUsage + "/" + new ByteSizeValue(memoryUsed.totalUsage) + "]" +
                 ", which is larger than the limit of [" +
                 parentLimit + "/" + new ByteSizeValue(parentLimit) + "]");
         if (this.trackRealMemoryUsage) {
             final long realUsage = memoryUsed.baseUsage;
             message.append(", real usage: [");
             message.append(realUsage);
             message.append("/");
             message.append(new ByteSizeValue(realUsage));
             message.append("], new bytes reserved: [");
             message.append(newBytesReserved);
             message.append("/");
             message.append(new ByteSizeValue(newBytesReserved));
             message.append("]");
         } else {
             message.append(", usages [");
             message.append(String.join(", ",
                 this.breakers.entrySet().stream().map(e -> {
                     final CircuitBreaker breaker = e.getValue();
                     final long breakerUsed = (long)(breaker.getUsed() * breaker.getOverhead());
                     return e.getKey() + "=" + breakerUsed + "/" + new ByteSizeValue(breakerUsed);
                 })
                     .collect(Collectors.toList())));
             message.append("]");
         }
         // derive durability of a tripped parent breaker depending on whether the majority of memory tracked by
         // child circuit breakers is categorized as transient or permanent.
         CircuitBreaker.Durability durability = memoryUsed.transientChildUsage >= memoryUsed.permanentChildUsage ?
             CircuitBreaker.Durability.TRANSIENT : CircuitBreaker.Durability.PERMANENT;
         throw new CircuitBreakingException(message.toString(), memoryUsed.totalUsage, parentLimit, durability);
     }
 }

从代码可以看出,当memoryUsed.totalUsage > parentLimit时,才会出现熔断;parentLimit的值与配置indices.breaker.total.limit(默认值为95%或者70%)有关,它的默认值与indices.breaker.total.use_real_memory(默认值为true)的配置有关,如下代码所示:

public static final Setting<Boolean> USE_REAL_MEMORY_USAGE_SETTING =
    Setting.boolSetting("indices.breaker.total.use_real_memory", true, Property.NodeScope);
 
public static final Setting<ByteSizeValue> TOTAL_CIRCUIT_BREAKER_LIMIT_SETTING =
    Setting.memorySizeSetting("indices.breaker.total.limit", settings -> {
        if (USE_REAL_MEMORY_USAGE_SETTING.get(settings)) {
            return "95%";
        } else {
            return "70%";
        }
    }, Property.Dynamic, Property.NodeScope);

我们再来看看memoryUsed.totalUsage的值,它是该类的一个方法计算出来,代码如下:

private MemoryUsage memoryUsed(long newBytesReserved) {
       long transientUsage = 0;
       long permanentUsage = 0;
 
       for (CircuitBreaker breaker : this.breakers.values()) {
           long breakerUsed = (long)(breaker.getUsed() * breaker.getOverhead());
           if (breaker.getDurability() == CircuitBreaker.Durability.TRANSIENT) {
               transientUsage += breakerUsed;
           } else if (breaker.getDurability() == CircuitBreaker.Durability.PERMANENT) {
               permanentUsage += breakerUsed;
           }
       }
       if (this.trackRealMemoryUsage) {
           final long current = currentMemoryUsage();
           return new MemoryUsage(current, current + newBytesReserved, transientUsage, permanentUsage);
       } else {
           long parentEstimated = transientUsage + permanentUsage;
           return new MemoryUsage(parentEstimated, parentEstimated, transientUsage, permanentUsage);
       }
   }

trackRealMemoryUsage的值(取自该配置indices.breaker.total.use_real_memory)决定了是使用实际的内存使用量还是child circuit breakers的内存使用量来判断熔断; 官方解释如下:

Static setting determining whether the parent breaker should take real memory usage into account (true) or only consider the amount that is reserved by child circuit breakers (false). Defaults to true

总结:2021年3月17日中午11点50开始修改线上DATA节点配置:indices.breaker.total.use_real_memory:false 并且滚动重启了线上集群;

今天是2021年3月18日,昨天中午更新完该配置,昨天晚上18:30对集群进行了业务压测,未见该异常出现;(没改前,压力测试集群会掉点,并且由于分片漂移导致集群变yellow);

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值