[fielddata] Data too large, data for [_id] would be [13181907968/12.2gb]

问题描述

使用ES search after查询报如下错误,fielddata缓存熔断异常。

org.frameworkset.elasticsearch.ElasticSearchException: {"error":{"root_cause":[{"type":"circuit_breaking_exception","reason":"[fielddata] Data too large, data for [_id] would be [13181907968/12.2gb], which is larger than the limit of [13181806182/12.2gb]","bytes_wanted":13181907968,"bytes_limit":13181806182,"durability":"PERMANENT"},{"type":"circuit_breaking_exception","reason":"[fielddata] Data too large, data for [_id] would be [13187108998/12.2gb], which is larger than the limit of [13181806182/12.2gb]","bytes_wanted":13187108998,"bytes_limit":13181806182,"durability":"PERMANENT"},{"type":"circuit_breaking_exception","reason":"[fielddata] Data too large, data for [_id] would be [13182738143/12.2gb], which is larger than the limit of [13181806182/12.2gb]","bytes_wanted":13182738143,"bytes_limit":13181806182,"durability":"PERMANENT"},{"type":"circuit_breaking_exception","reason":"[fielddata] Data too large, data for [_id] would be [13183488574/12.2gb], which is larger than the limit of [13181806182/12.2gb]","bytes_wanted":13183488574,"bytes_limit":13181806182,"durability":"PERMANENT"},{"type":"circuit_breaking_exception","reason":"[fielddata] Data too large, data for [_id] would be [13185677559/12.2gb], which is larger than the limit of [13181806182/12.2gb]","bytes_wanted":13185677559,"bytes_limit":13181806182,"durability":"PERMANENT"},{"type":"circuit_breaking_exception","reason":"[fielddata] Data too large, data for [_id] would be [13185174477/12.2gb], which is larger than the limit of [13181806182/12.2gb]","bytes_wanted":13185174477,"bytes_limit":13181806182,"durability":"PERMANENT"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"mmyz-notice2","node":"RFo7x1R3Qjy3rsNaJgRx5w","reason":{"type":"exception","reason":"java.util.concurrent.ExecutionException: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [13181907968/12.2gb], which is larger than the limit of [13181806182/12.2gb]]","caused_by":{"type":"execution_exception","reason":"execution_exception: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [13181907968/12.2gb], which is larger than the limit of [13181806182/12.2gb]]","caused_by":{"type":"circuit_breaking_exception","reason":"[fielddata] Data too large, data for [_id] would be [13181907968/12.2gb], which is larger than the limit of [13181806182/12.2gb]","bytes_wanted":13181907968,"bytes_limit":13181806182,"durability":"PERMANENT"}}}},{"shard":1,"index":"mmyz-notice2","node":"RFo7x1R3Qjy3rsNaJgRx5w","reason":{"type":"exception","reason":"java.util.concurrent.ExecutionException: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [13187108998/12.2gb], which is larger than the limit of [13181806182/12.2gb]]","caused_by":{"type":"execution_exception","reason":"execution_exception: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [13187108998/12.2gb], which is larger than the limit of [13181806182/12.2gb]]","caused_by":{"type":"circuit_breaking_exception","reason":"[fielddata] Data too large, data for [_id] would be [13187108998/12.2gb], which is larger than the limit of [13181806182/12.2gb]","bytes_wanted":13187108998,"bytes_limit":13181806182,"durability":"PERMANENT"}}}},{"shard":2,"index":"mmyz-notice2","node":"f8S1zzPZRnS_wkBwidNB4Q","reason":{"type":"exception","reason":"java.util.concurrent.ExecutionException: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [13182738143/12.2gb], which is larger than the limit of [13181806182/12.2gb]]","caused_by":{"type":"execution_exception","reason":"execution_exception: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [13182738143/12.2gb], which is larger than the limit of [13181806182/12.2gb]]","caused_by":{"type":"circuit_breaking_exception","reason":"[fielddata] Data too large, data for [_id] would be [13182738143/12.2gb], which is larger than the limit of [13181806182/12.2gb]","bytes_wanted":13182738143,"bytes_limit":13181806182,"durability":"PERMANENT"}}}},{"shard":3,"index":"mmyz-notice2","node":"M6k9iyVfQL2xQgRz0eWLxg","reason":{"type":"exception","reason":"java.util.concurrent.ExecutionException: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [13183488574/12.2gb], which is larger than the limit of [13181806182/12.2gb]]","caused_by":{"type":"execution_exception","reason":"execution_exception: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [13183488574/12.2gb], which is larger than the limit of [13181806182/12.2gb]]","caused_by":{"type":"circuit_breaking_exception","reason":"[fielddata] Data too large, data for [_id] would be [13183488574/12.2gb], which is larger than the limit of [13181806182/12.2gb]","bytes_wanted":13183488574,"bytes_limit":13181806182,"durability":"PERMANENT"}}}},{"shard":4,"index":"mmyz-notice2","node":"V8A-Jfe2RwK6_42oBlvpKQ","reason":{"type":"exception","reason":"java.util.concurrent.ExecutionException: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [13185677559/12.2gb], which is larger than the limit of [13181806182/12.2gb]]","caused_by":{"type":"execution_exception","reason":"execution_exception: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [13185677559/12.2gb], which is larger than the limit of [13181806182/12.2gb]]","caused_by":{"type":"circuit_breaking_exception","reason":"[fielddata] Data too large, data for [_id] would be [13185677559/12.2gb], which is larger than the limit of [13181806182/12.2gb]","bytes_wanted":13185677559,"bytes_limit":13181806182,"durability":"PERMANENT"}}}},{"shard":5,"index":"mmyz-notice2","node":"NsPWLx1GQFKPRGX51Dk_mQ","reason":{"type":"exception","reason":"java.util.concurrent.ExecutionException: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [13185174477/12.2gb], which is larger than the limit of [13181806182/12.2gb]]","caused_by":{"type":"execution_exception","reason":"execution_exception: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [13185174477/12.2gb], which is larger than the limit of [13181806182/12.2gb]]","caused_by":{"type":"circuit_breaking_exception","reason":"[fielddata] Data too large, data for [_id] would be [13185174477/12.2gb], which is larger than the limit of [13181806182/12.2gb]","bytes_wanted":13185174477,"bytes_limit":13181806182,"durability":"PERMANENT"}}}}],"caused_by":{"type":"circuit_breaking_exception","reason":"[fielddata] Data too large, data for [_id] would be [13181907968/12.2gb], which is larger than the limit of [13181806182/12.2gb]","bytes_wanted":13181907968,"bytes_limit":13181806182,"durability":"PERMANENT"}},"status":500}
	at org.frameworkset.elasticsearch.handler.BaseExceptionResponseHandler.handleException(BaseExceptionResponseHandler.java:77)
	at org.frameworkset.elasticsearch.handler.BaseExceptionResponseHandler.handleException(BaseExceptionResponseHandler.java:48)
	at org.frameworkset.elasticsearch.handler.ElasticSearchResponseHandler.handleResponse(ElasticSearchResponseHandler.java:61)
	at org.frameworkset.elasticsearch.handler.ElasticSearchResponseHandler.handleResponse(ElasticSearchResponseHandler.java:16)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:222)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:164)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:139)
	at org.frameworkset.spi.remote.http.HttpRequestUtil.sendBody(HttpRequestUtil.java:1285)
	at org.frameworkset.spi.remote.http.HttpRequestUtil.sendJsonBody(HttpRequestUtil.java:1264)
	at org.frameworkset.elasticsearch.client.RestSearchExecutorUtil._executeRequest(RestSearchExecutorUtil.java:103)
	at org.frameworkset.elasticsearch.client.RestSearchExecutor.executeRequest(RestSearchExecutor.java:242)
	at org.frameworkset.elasticsearch.client.ElasticSearchRestClient$5.execute(ElasticSearchRestClient.java:1244)
	at org.frameworkset.elasticsearch.client.ElasticSearchRestClient._executeHttp(ElasticSearchRestClient.java:899)
	at org.frameworkset.elasticsearch.client.ElasticSearchRestClient.executeRequest(ElasticSearchRestClient.java:1229)
	at org.frameworkset.elasticsearch.client.ElasticSearchRestClient.executeRequest(ElasticSearchRestClient.java:1212)
	at org.frameworkset.elasticsearch.client.RestClientUtil.searchList(RestClientUtil.java:2828)
	at org.frameworkset.elasticsearch.client.ConfigRestClientUtil.searchList(ConfigRestClientUtil.java:625)
	at net.yto.security.cipher.config.SecurityBulkProcessor2.searchAfter(SecurityBulkProcessor2.java:265)

其他查询也是如此。

经排查,是ES默认的缓存设置让缓存区只进不出引起的。

概念原理 

  • ES缓存区

首先看下ES的缓存机制。ES在查询时,会将索引数据缓存在内存(JVM)中:

上图是ES的JVM Heap中的状况,可以看到有两条界限:驱逐线 和 断路器。当缓存数据到达驱逐线时,会自动驱逐掉部分数据,把缓存保持在安全的范围内。当用户准备执行某个查询操作时,断路器就起作用了,缓存数据+当前查询需要缓存的数据量到达断路器限制时,会返回Data too large错误,阻止用户进行这个查询操作。ES把缓存数据分成两类,FieldData和其他数据,我们接下来详细看FieldData,它是造成我们这次异常的“元凶”。

  • FieldData

ES配置中提到的FieldData指的是字段数据。当排序(sort),统计(aggs)时,ES把涉及到的字段数据全部读取到内存(JVM Heap)中进行操作。相当于进行了数据缓存,提升查询效率。

监控FieldData使用了多少内存以及是否有数据被驱逐的。

Fielddata缓存可以通过下面的方式来监控:

对于单个索引使用:

 GET /_stats/fielddata?fields=*

对于单个节点使用 :

GET /_nodes/stats/indices/fielddata?fields=*

或者甚至单个节点单个索引

GET /_nodes/stats/indices/fielddata?level=indices&fields=*

Cache配置

indices.fielddata.cache.size 配置fieldData的Cache大小,可以配百分比也可以配一个准确的数值。cache到达约定的内存大小时会自动清理,驱逐一部分FieldData数据以便容纳新数据。默认值为unbounded无限。
indices.fielddata.cache.expire用于约定多久没有访问到的数据会被驱逐,默认值为-1,即无限。expire配置不推荐使用,按时间驱逐数据会大量消耗性能。而且这个设置在不久之后的版本中将会废弃。

  • 断路器

fieldData的缓存配置中,有一个点会引起我们的疑问:fielddata的大小是在数据被加载之后才校验的。假如下一个查询准备加载进来的fieldData让缓存区超过可用堆大小会发生什么?

很遗憾的是,它将产生一个OOM异常。


断路器就是用来控制cache加载的,它预估当前查询申请使用内存的量,并加以限制。

断路器的配置如下:

indices.breaker.fielddata.limit

这个 fielddata 断路器限制fielddata的大小,默认情况下为堆大小的60%。

indices.breaker.request.limit

这个 request 断路器估算完成查询的其他部分要求的结构的大小, 默认情况下限制它们到堆大小的40%。

断路器限制可以通过文件 config/elasticsearch.yml 指定

https://www.elastic.co/guide/cn/elasticsearch/guide/current/_limiting_memory_usage.html

  • search after

Paginate search results | Elasticsearch Guide [7.13] | Elastic

search after分页查询是需要一组排序值来检索命中下一页。

具体分析

在我们产生Data too large异常时,对集群FieldData单个索引_id字段监控的返回结果如下:

GET index/_stats/fielddata?fields=_id
{
  "_shards" : {
    "total" : 12,
    "successful" : 12,
    "failed" : 0
  },
  "_all" : {
    "primaries" : {
      "fielddata" : {
        "memory_size_in_bytes" : 3896965844,
        "evictions" : 0,
        "fields" : {
          "stationCode" : {
            "memory_size_in_bytes" : 28864
          },
          "_id" : {
            "memory_size_in_bytes" : 3896936980
          },
          "smsSupplier" : {
            "memory_size_in_bytes" : 0
          }
        }
      }
    },
    "total" : {
      "fielddata" : {
        "memory_size_in_bytes" : 8356891112,
        "evictions" : 0,
        "fields" : {
          "stationCode" : {
            "memory_size_in_bytes" : 28864
          },
          "_id" : {
            "memory_size_in_bytes" : 8356862248
          },
          "smsSupplier" : {
            "memory_size_in_bytes" : 0
          }
        }
      }
    }
  },
  "indices" : {
    "index" : {
      "uuid" : "SvEPsxfwSDmk1ryAruRAzg",
      "primaries" : {
        "fielddata" : {
          "memory_size_in_bytes" : 3896965844,
          "evictions" : 0,
          "fields" : {
            "stationCode" : {
              "memory_size_in_bytes" : 28864
            },
            "_id" : {
              "memory_size_in_bytes" : 3896936980
            },
            "smsSupplier" : {
              "memory_size_in_bytes" : 0
            }
          }
        }
      },
      "total" : {
        "fielddata" : {
          "memory_size_in_bytes" : 8356891112,
          "evictions" : 0,
          "fields" : {
            "stationCode" : {
              "memory_size_in_bytes" : 28864
            },
            "_id" : {
              "memory_size_in_bytes" : 8356862248
            },
            "smsSupplier" : {
              "memory_size_in_bytes" : 0
            }
          }
        }
      }
    }
  }
}

可以看到memory_size_in_bytes用到了整个JVM内存的60%(可用上限),而evictions(驱逐)为0。且经过一段时间观察,字段所占内存大小都没有变化。由此推断,当下的缓存处于无法有效驱逐的状态

Data too large异常就是由于fielddata.cache的默认值为unbounded导致的了。

而问题便是我们的dsl语句使用了_id排序导致的。

...
"track_total_hits": true,
"sort": [
   {"_id": "asc"}
]
...

问题点找到了。

那么现在矛盾的是search必须使用_id,但又无法继续执行,ES集群未配置fielddata缓存过期策略,fielddata会常驻缓存。我们如何解决呢?

方案思路

1.首先考虑的是调优dsl,使用search after即需要ES的分页功能场景。 

那么是有三种方案可以考虑,search from+size,search after, scroll;

search from+size和scroll验证后并不符合我们的需求,排除;

那么只剩下search after,那么我们是否可以用其他field来构造一组排序,遗憾的是也排除了。

所以只能使用原dsl,无优化空间;

2.调整线上缓存配置indices.fielddata.cache.size和断路器配置indices.breaker.fielddata.limit,需要做两者的衡量,由于是生产环境,有一定的风险性。待辨证;

3.加大内存,涉及硬件升级,集群稳定性,目前堆内存已经是31G,正常内存占用率不过50~60%,考虑成本和集成等没必要,排除;

4.那么我们还有什么方案呢,fielddata不能自动失效,要是能主动清理掉fielddata缓存,是不是也是一种方案呢?

有的,主动清理缓存的方案如下:

POST index/_cache/clear

但是我们又需要考虑到生产上如果主动清理索引缓存,对已有的性能查询是有否有影响呢?

我们查询官方文档和测试验证,把已有的fielddata缓存干掉,牺牲一定的查询性能是可以的,需要做一个平衡。主动清理索引缓存后,验证查询就恢复正常了。

再查询fielddata可以看到

GET index/_stats/fielddata?fields=_id

 该索引缓存已清零

{
  "_shards" : {
    "total" : 12,
    "successful" : 12,
    "failed" : 0
  },
  "_all" : {
    "primaries" : {
      "fielddata" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0,
        "fields" : {
          "_id" : {
            "memory_size_in_bytes" : 0
          }
        }
      }
    },
    "total" : {
      "fielddata" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0,
        "fields" : {
          "_id" : {
            "memory_size_in_bytes" : 0
          }
        }
      }
    }
  },
  "indices" : {
    "index" : {
      "uuid" : "i84CMGeJReWZsNchvMHtPA",
      "primaries" : {
        "fielddata" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0,
          "fields" : {
            "_id" : {
              "memory_size_in_bytes" : 0
            }
          }
        }
      },
      "total" : {
        "fielddata" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0,
          "fields" : {
            "_id" : {
              "memory_size_in_bytes" : 0
            }
          }
        }
      }
    }
  }
}

Get!

参考

ElasticSearch:从[FIELDDATA]Data too large错误看FieldData配置_雷雨中的双桅船-CSDN博客_indices.fielddata.cache.size

  • 3
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值