简介:
Elasticsearch 由于使用TTL,在文档量很大的时候,如果同时有大量文档过期,可能会导致集群节点OOM。本文记录这一现象,以及问题分析,处理步骤。
1.现象
今天我们的ES出现了OOM。ES版本是2.1.2,日志如下:
[2017-06-21 11:10:12,250][WARN ][monitor.jvm ] [dm_172.23.41.93:20002] [gc][young][535][56] duration [1.4s], collections [1]/[2.3s], total [1.4s]/[10.6s], memory [9.5gb]->[8.5gb]/[15.8gb], all_pools {[young] [1.3gb]->[26.5mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [8gb]->[8.3gb]/[14.1gb]}
[2017-06-21 11:14:06,057][WARN ][monitor.jvm ] [dm_172.23.41.93:20002] [gc][young][766][83] duration [2.1s], collections [1]/[3.1s], total [2.1s]/[15.7s], memory [12.6gb]->[11.7gb]/[15.8gb], all_pools {[young] [1.3gb]->[30.1mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [11.1gb]->[11.5gb]/[14.1gb]}
[2017-06-21 11:18:20,668][WARN ][monitor.jvm ] [dm_172.23.41.93:20002] [gc][old][985][17] duration [35.9s], collections [1]/[36.2s], total [35.9s]/[38.1s], memory [15.6gb]->[14.2gb]/[15.8gb], all_pools {[young] [1.4gb]->[96.6mb]/[1.4gb]}{[survivor] [148mb]->[0b]/[191.3mb]}{[old] [14gb]->[14.1gb]/[14.1gb]}
内存增长很快,16G的内存,启动后几分钟就满了,而且回收不了。重启还是一样,很快就OOM。多次反复重启,反复OOM,看来集群恢复不了了。
查看索引,有一个1,283,152,933 文档,主分片总大小996.1GB的索引。索引mapping如下:
{
"cluster_name": "es-302",
"metadata": {
"indices": {
"msg_center_log": {
"settings": {
"index": {
"number_of_shards": "5",
"creation_date": "1490263243013",
"analysis": {
"char_filter": {
"extend_to_space": {
"mappings": [
"\"extend1\":\"=>\\b",
"\",\"venderId\":\"=>\\b",
"\",\"extend2\":\"=>\\b",
"\",\"type\"=>\\b"
],
"type": "mapping"
}
},
"analyzer": {
"my_analyzer": {
"filter": "lower