elasticsearch field data 内存控制

最新推荐文章于 2024-08-21 17:25:55 发布

Eric-L90

最新推荐文章于 2024-08-21 17:25:55 发布

阅读量9.5k

点赞数

分类专栏： Elasticsearch 文章标签： elasticsearch

本文链接：https://blog.csdn.net/brotherdong90/article/details/50477836

版权

Elasticsearch 专栏收录该内容

22 篇文章 1 订阅

订阅专栏

今天在看elasticsearch日志的时候发现有好多错误,如下:

breaker.CircuitBreakingException:[FIELDDATA] Data too large,data for [_type] would be large than litmit of [250895009/2.3gb].

看到这个错误,第一反应就是内存不够用了.查了下资料,确实是内存不够用了,不过这个内存是field data 的缓存不够用了.field data的缓存就是在我们进行查询的时候,

比如排序,或者聚合查询的时候,将field的value加载到内存中来提高查询速度.当数据量过大的时候,我们分配的内存存储不了这么多数据的时候就会报错.

这块内存的大小可以通过参数进行设置:官网解析如下

Setting	Description
`indices.fielddata.cache.size`	The max size of the field data cache,eg `30%` of node heap space, or an absolute value, eg `12GB`. Defaultsto unbounded.
`indices.fielddata.cache.expire`	[experimental] This functionality is experimental and may be changed or removed completely in a future release. A time based setting that expiresfield data after a certain time of inactivity. Defaults to `-1`. Forexample, can be set to `5m` for a 5 minute expiry.

设置了缓存的大小,但是也还会出现内存不够用的情况,为了避免出现内存溢出OutOfMemoryError的情况,es中设置了一个叫Circuit Breaker的机制.可以叫他为断路器.我的

理解就是,根据查询的时候根据一定的规则去计算以下所需要的内存,如果所需内存大于我们设置的内存大小,这个查询就直接失败,报错,防止出现OutOfMemoryError.

当然,这个也是可以通过参数控制的.

circuit breaker分为以下两种,field data circuit breaker和request circuit breaker.我只见过第一种,没有遇到过第二种.

总的限制:

indices.breaker.total.limit Starting limit for overall parent breaker, defaults to 70% of JVM heap

field data circuit breaker:

indices.breaker.fielddata.limit Limit for fielddata breaker, defaults to 60% of JVM heap indices.breaker.fielddata.overhead (计算方法)A constant that all field data estimations are multiplied with to determine a final estimation. Defaults to 1.03 request circuit breaker:

indices.breaker.request.limit Limit for request breaker, defaults to 40% of JVM heap indices.breaker.request.overhead (计算方法)A constant that all request estimations are multiplied with to determine a final estimation. Defaults to 1

主要就是这些参数,一般情况下,默认的就够用了,但是如果数据量比较大,可以调整一下.

除了在elasticsearch.yml中配置外,也可以动态的进行调整:

curl -XPUT localhost:9200/_cluster/settings -d '{
  "persistent" : {
    "indices.breaker.fielddata.limit" : "40%" 
  }
}'

另外,如果需要跟踪节点的话,可以通过下面命令进行查看.

curl -XGET 'http://localhost:9200/_nodes/stats'
curl -XGET 'http://localhost:9200/_nodes/nodeId1,nodeId2/stats'

官网介绍如下:

By default, all stats are returned. You can limit this by combining anyof indices, os, process, jvm, network, transport, http,fs, breaker and thread_pool. For example:

`indices`	Indices stats about size, document count, indexing and deletion times, search times, field cache size , merges and flushes
`fs`	File system information, data path, free disk space, read/write stats
`http`	HTTP connection information
`jvm`	JVM stats, memory pool information, garbage collection, buffer pools
`network`	TCP information
`os`	Operating system stats, load average, cpu, mem, swap
`process`	Process statistics, memory consumption, cpu usage, open file descriptors
`thread_pool`	Statistics about each thread pool, including current size, queue and rejected tasks
`transport`	Transport statistics about sent and received bytes in cluster communication
`breaker`	Statistics about the field data circuit breaker

# return indices and os
curl -XGET 'http://localhost:9200/_nodes/stats/os'
# return just os and process
curl -XGET 'http://localhost:9200/_nodes/stats/os,process'
# specific type endpoint
curl -XGET 'http://localhost:9200/_nodes/stats/process'
curl -XGET 'http://localhost:9200/_nodes/10.0.0.1/stats/process'

The all flag can be set to return all the stats.

Field data statisticsedit

You can get information about field data memory usage on nodelevel or on index level.

# Node Stats
curl -XGET 'http://localhost:9200/_nodes/stats/indices/?fields=field1,field2&pretty'

# Indices Stat
curl -XGET 'http://localhost:9200/_stats/fielddata/?fields=field1,field2&pretty'

# You can use wildcards for field names
curl -XGET 'http://localhost:9200/_stats/fielddata/?fields=field*&pretty'
curl -XGET 'http://localhost:9200/_nodes/stats/indices/?fields=field*&pretty'

Search groupsedit

You can get statistics about search groups for searches executedon this node.

# All groups with all stats
curl -XGET 'http://localhost:9200/_nodes/stats?pretty&groups=_all'

# Some groups from just the indices stats
curl -XGET 'http://localhost:9200/_nodes/stats/indices?pretty&groups=foo,bar'

最后,总结一些.

咱们上面说的是全部加载到内存中.在1.x推出了一种新的模板,如下:

doc_values：doc_values是Elasticsearch 1.0版本引入的新特性。启用该特性的字段，索引写入的时候会在磁盘上构建fielddata。而过去，fielddata是固定只能使用内存的。在请求范围加大的时候，很容易触发OOM或者circuit breaker报错：

ElasticsearchException[org.elasticsearch.common.breaker.CircuitBreakingException: Data too large, data for field [@timestamp] would be larger than limit of [639015321/609.4mb]]

doc_values 只能给不分词（对于字符串字段就是设置了 "index":"not_analyzed"，数值和时间字段默认就没有分词）的字段配置生效。

doc_values虽然用的是磁盘，但是系统本身也有自带VFS的cache效果并不会太差。据官方测试，经过1.4的优化后，只比使用内存的fielddata慢15%。所以，在数据量较大的情况下，强烈建议开启该配置.
这种我还没有真正的使用过,不过,在第一种能够满足需求的情况下,我应该不会改到第二种,毕竟速度才是第一位的.