装完elk跑起来之后,我的内心几乎是崩溃的,16G内存16核cpu还经常报错。


一、logstash和elasticsearch同时报错

logstash出现大量报错,可能是es占用heap太多,没有优化es导致的

retrying failed action with response code: 503 {:level=>:warn}

too many attempts at sending event. dropping: 2016-06-16T05:44:54.464Z %{host} %{message} {:level=>:error}


elasticsearch出现大量报错

too many open files


是这个值太小了"max_file_descriptors" : 2048,


# curl http://localhost:9200/_nodes/process\?pretty

{

  "cluster_name" : "elasticsearch",

  "nodes" : {

    "ZLgPzMqBRoyDFvxoy27Lfg" : {

      "name" : "Mass Master",

      "transport_address" : "inet[/192.168.153.200:9301]",

      "host" : "localhost",

      "ip" : "127.0.0.1",

      "version" : "1.6.0",

      "build" : "cdd3ac4",

      "http_address" : "inet[/192.168.153.200:9200]",

      "process" : {

        "refresh_interval_in_millis" : 1000,

        "id" : 943,

        "max_file_descriptors" : 2048,

        "mlockall" : true




解决办法:

设置文件打开数

# ulimit -n 65535


设置开机自启动

# vi /etc/profile


在es启动文件里面添加,然后重新启动elasticsearch

# vi /home/elk/elasticsearch-1.6.0/bin/elasticsearch

ulimit -n 65535


# curl http://localhost:9200/_nodes/process\?pretty

{

  "cluster_name" : "elasticsearch",

  "nodes" : {

    "_QXVsjL9QOGMD13Eb6t7Ag" : {

      "name" : "Ocean",

      "transport_address" : "inet[/192.168.153.200:9301]",

      "host" : "localhost",

      "ip" : "127.0.0.1",

      "version" : "1.6.0",

      "build" : "cdd3ac4",

      "http_address" : "inet[/192.168.153.200:9200]",

      "process" : {

        "refresh_interval_in_millis" : 1000,

        "id" : 1693,

        "max_file_descriptors" : 65535,

        "mlockall" : true

      }

    }



二、out of memory内存溢出


优化后的es配置文件内容:

# egrep -v '^$|^#' /home/elk/elasticsearch-1.6.0/config/elasticsearch.yml 

bootstrap.mlockall: true

http.max_content_length: 2000mb

http.compression: true

index.cache.field.type: soft

index.cache.field.max_size: 50000

index.cache.field.expire: 10m



针对bootstrap.mlockall: true还要设置

# ulimit -l unlimited


# vi /etc/sysctl.conf

vm.max_map_count=262144

vm.swappiness = 1


# ulimit -a

core file size          (blocks, -c) 0

data seg size           (kbytes, -d) unlimited

scheduling priority             (-e) 0

file size               (blocks, -f) unlimited

pending signals                 (-i) 127447

max locked memory       (kbytes, -l) unlimited

max memory size         (kbytes, -m) unlimited

open files                      (-n) 65535

pipe size            (512 bytes, -p) 8

POSIX message queues     (bytes, -q) 819200

real-time priority              (-r) 0

stack size              (kbytes, -s) 10240

cpu time               (seconds, -t) unlimited

max user processes              (-u) 127447

virtual memory          (kbytes, -v) unlimited

file locks                      (-x) unlimited



# vi /etc/security/limits.d/90-nproc.conf

*          soft    nproc     320000

root       soft    nproc     unlimited



三、es状态是yellow

es中用三种颜色状态表示:green,yellow,red.

green:所有主分片和副本分片都可用

yellow:所有主分片可用,但不是所有副本分片都可用

red:不是所有的主分片都可用


# curl -XGET http://localhost:9200/_cluster/health\?pretty

{

  "cluster_name" : "elasticsearch",

  "status" : "yellow",

  "timed_out" : false,

  "number_of_nodes" : 2,

  "number_of_data_nodes" : 1,

  "active_primary_shards" : 161,

  "active_shards" : 161,

  "relocating_shards" : 0,

  "initializing_shards" : 0,

  "unassigned_shards" : 161,

  "number_of_pending_tasks" : 0,

  "number_of_in_flight_fetch" : 0


解决办法:建立elasticsearch集群(下篇博客写)



四、kibana not indexed错误 

https://rafaelmt.net/en/2015/09/01/kibana-tutorial/#refresh-fields

kibana的索引根据事件会经常更新,所以kibana图有时候会出现 not indexed的报错:


解决办法:

我们访问kibana,然后选择settings,点击indices,点击logstash-*。点击刷新的图标就ok了