一、ELKstack实践总结
使用版本6.3.2
一)elkstack认证插件x-pack
1、启用试用版认证
启用trial license(30天试用)
curl -H "Content-Type:application/json" -XPOST http://localhost:9200/_xpack/license/start_trial?acknowledge=true
elasticsearch.yml 没添加认证前
cluster.name: elkstacktest node.name: l-elkstack1 path.data: /data/elasticsearch/data path.logs: /data/elasticsearch/logs bootstrap.system_call_filter: false bootstrap.memory_lock: false network.host: 10.0.32.8 http.port: 9200 http.cors.enabled: true http.cors.allow-origin: "*" xpack.security.enabled: true node.master: true node.data: true thread_pool.search.size: 100 thread_pool.search.queue_size: 2000 thread_pool.search.min_queue_size: 2000 thread_pool.search.max_queue_size: 2000
认证各个认证用户的用户名和密码:/usr/share/elasticsearch/bin/elasticsearch-setup-passwords interactive
Initiating the setup of passwords for reserved users elastic,apm_system,kibana,logstash_system,beats_system,remote_monitoring_user. You will be prompted to enter passwords as the process progresses. Please confirm that you would like to continue [y/N]y Enter password for [elastic]: Reenter password for [elastic]: Enter password for [apm_system]: Reenter password for [apm_system]: Enter password for [kibana]: Reenter password for [kibana]: Enter password for [logstash_system]: Reenter password for [logstash_system]: Enter password for [beats_system]: Reenter password for [beats_system]: Enter password for [remote_monitoring_user]: Reenter password for [remote_monitoring_user]: Changed password for user [apm_system] Changed password for user [kibana] Changed password for user [logstash_system] Changed password for user [beats_system] Changed password for user [remote_monitoring_user] Changed password for user [elastic]
二)收集工具filebeat
1、收集日志:filebeat.yml
IP的正则表达式
^((2(5[0-5]|[0-4]\d))|[0-1]?\d{1,2})(\.((2(5[0-5]|[0-4]\d))|[0-1]?\d{1,2})){3}
收集Tomcat实例启动的程序的日志:
############################# Filebeat ###################################### filebeat: inputs: - paths: - "/home/d/www/{{ project }}/logs/*.log" - "/home/d/www/{{ project }}/logs/catalina.out" exclude_files: - "localhost*.log" input_type: log document_type: {{ project }} fields: log_topic: {{ project }} enabled: True multiline: pattern: '^DEBUG|^INFO|^WARN|^ERROR|^[0-9]{4}-[0-9]{2}-[0-9]{2}' # pattern: '^#\sQuery\s[0-9]{0,}:' # \s 空格 match: after negate: true tail_files: true # ignore_older: 4h queue: mem: events: 8192 flush.min_events: 10 fulsh.timeout: 10s registry_file: /var/lib/filebeat/registry processors: - drop_fields: fields: ["beat","offset"] output: kafka: enabled: true hosts: ["l-kafkacluster1.ops.com:9092","l-kafkacluster2.ops.com:9092","l-kafkacluster3.ops.com:9092","l-kafkacluster4.ops.com:9092","l-kafkacluster5.ops.com:9092"] topic: '%{[fields.log_topic]}' # partition.round_robin: # reachable_only: false # worker: 2 ############################# Logging ######################################### logging: files: path: /var/log/filebeat name: filebeat rotateeverybytes: 10485760 # =10MB keepfiles: 3 level: error
收集jar程序的日志
############################# Filebeat ###################################### filebeat: inputs: - paths: - "/home/d/{{ project }}/logs/*.log" - "/home/d/www/{{ project }}/logs/*.log" input_type: log document_type: {{ project }} fields: log_topic: {{ project }} enabled: True multiline: pattern: '^DEBUG|^INFO|^WARN|^ERROR|^[0-9]{4}-[0-9]{2}-[0-9]{2}' match: after negate: true tail_files: true queue: mem: events: 8192 flush.min_events: 10 fulsh.timeout: 10s registry_file: /var/lib/filebeat/registry processors: - drop_fields: fields: ["beat","offset"] output: kafka: enabled: true hosts: ["l-kafkacluster1.ops.com:9092","l-kafkacluster2.ops.com:9092","l-kafkacluster3.ops.com:9092","l-kafkacluster4.ops.com:9092","l-kafkacluster5.ops.com:9092"] topic: '%{[fields.log_topic]}' # partition.round_robin: # reachable_only: false # worker: 2 ############################# Logging ######################################### logging: files: path: /var/log/filebeat name: filebeat rotateeverybytes: 10485760 # ==10MB keepfiles: 3 level: error
elasticsearch.yml 添加认证
xpack.security.enabled: true添加下面这段(可以不配置认证密码)
xpack: security: authc: realms: ldap1: type: ldap order: 0 url: "ldap://ldap.ops.com" bind_dn: "cn=admin,dc=ops,dc=com" bind_password: ops.com user_search: base_dn: "ou=技术中心,dc=ops,dc=com" filter: "(cn={0})" group_search: base_dn: "dc=daling,dc=com" unmapped_groups_as_roles: false
2、kibana控制台推认证map或者调用es的api进行认证
获取用户认证信息: curl -XGET -uwangzhenshan1 '10.0.32.8:9200/_xpack/security/_authenticate?pretty' POST _xpack/security/role_mapping/mapping1 { "roles": [ "superuser"], "enabled": true, "rules": { "field" : { "username" : "*" } }, "metadata" : { "version" : 1 } } 创建超级用户: DELETE _xpack/security/role_mapping/superuser 创建普通用户 注意:查看索引需要创建两个map要同时创建 DELETE _xpack/security/role_mapping/kibana_users PUT _xpack/security/role_mapping/kibana_users { "roles" : [ "kibana_user" ], "rules" : { "field" : { "dn": "*,ou=技术中心,dc=ops,dc=com" } }, "enabled": true } PUT _xpack/security/role_mapping/ops_users { "roles" : [ "ops_users" ], "rules" : { "field" : { "dn": "*,ou=技术中心,dc=ops,dc=com" } }, "enabled": true }
三)消息队列kafka及集群管理工具zookeeper
四)数据处理logstash
1、单机多实例注意事项
1、更改文件/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/awesome_print-1.8.0/lib/awesome_print/inspector.rb
注释掉报错那行
2、启动时,注意注定不同的path路径
启动方式如下:
/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/logstash-tengine1.conf --path.data=/data/logstash/tengine/tengine1
3、使用supervisor管理
注意指定java的环境变量
[program:logstash-tengine1] user=root environment=JAVA_HOME=/home/d/java/jdk1.8.0_144 command=/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/logstash-tengine1.conf --path.data=/data/logstash/tengine/tengine1 directory=/ autostart=True autorestart=True redirect_stderr=True stopsignal=INT stopasgroup=True
2、更改默认的配置文件
1、日志处理配置文件:分析nginx的访问日志IP——GeoLite2-City(地址库)
logstash处理nginx访问日志的配置文件
input { kafka { bootstrap_servers => ["l-kafkacluster1.ops.com:9092,l-kafkacluster2.ops.com:9092,l-kafkacluster3.ops.com:9092,l-kafkacluster4.ops.com:9092,l-kafkacluster5.ops.com:9092"] group_id => "elkstack-ali" topics => ["ng"] auto_offset_reset => "latest" decorate_events => true consumer_threads => 3 ##指定工作线程数 auto_commit_interval_ms => "300" codec => json } } filter { grok { match => { "message" => '%{IPV4:clientip} - - \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} (?:%{NUMBER:bytes:int}|-) "(?<referrer>.*?)" %{QS:agent} "(?:%{IPV4:forwarded}|-)" "%{BASE10NUM:request_duration_front:float}" "(%{BASE10NUM:request_duration_back:float}|.*?)" "%{IPORHOST:domain}" "(%{IPORHOST:upstream_host}:%{POSINT:upstream_port}|.*?)" "(?<clientid>.*)" "(?<platform>.*)" "(?<IDFA>.*)" "(?<uid>.*)" "(?<version>.*)" "(?<xcrole>.*)" "(?<bundle>.*)" "(?<net>.*)" "(?<ut>.*)" "(?<app>.*)"' } overwrite => ["message"] } geoip { source => "clientip" database => "/etc/logstash/GeoLite2-City/GeoLite2-City.mmdb" } if [request] { ruby { init=> "@kname = ['uri','url_args']" code => " new_event = LogStash::Event.new(Hash[@kname.zip(event.get('request').split('?'))]) new_event.remove('@timestamp') event.append(new_event) " } } # date { # timezone => "Asia/Shanghai" # match => ["timestamp", "yyyy-MM-dd HH:mm:ss.SSS"] # target => "@timestamp" # } # mutate { # remove_field => ["@timestamp","fields.@timestamp"] # # } } output { stdout { codec => rubydebug } elasticsearch { hosts => ["elastic1.ops.com:9200"] index => "logstash-tengine34-access-%{+YYYY.MM.dd}" user => "elastic" password => "changeme" template_overwrite => true } }
处理nginx(版本低于1.11.8)日志中乱码
input { kafka { bootstrap_servers => ["l-kafkacluster1.ops.bj2.daling.com:9092,l-kafkacluster2.ops.bj2.daling.com:9092,l-kafkacluster3.ops.bj2.daling.com:9092,l-kafkacluster4.ops.bj2.daling.com:9092,l-kafkacluster5.ops.bj2.daling.com:9092"] group_id => "elkstack-aliyun" topics => ["errorpush"] auto_offset_reset => "latest" decorate_events => true consumer_threads => 5 auto_commit_interval_ms => "2000" codec => json #{ escaped_hex_codes => true } #codec => plain { charset => "UTF-8"} } } filter { grok { match => { "message" => '%{IPV4:clientip} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} "(?<uid>.*)" "(?<platform>.*)" "(?<appversion>.*)" "(?<eventType>.*)" "(?<apiName>.*)" "(?<method>.*)" "(?<costTime>.*)" "(?<fullRequest>.*)" "(?<resDetail>.*)"' } overwrite => ["message"] } if[apiName] { ruby { code => " tempapiName = event.get('apiName').gsub(/\\x/,'%').gsub!(/%[a-fA-F0-9]{2}/) { |x| x = x[1..2].hex.chr } event.set('apiName',tempapiName) " } } if[resDetail] { ruby { code => " tempresDetail = event.get('resDetail').gsub(/\\x/,'%').gsub!(/%[a-fA-F0-9]{2}/) { |x| x = x[1..2].hex.chr } event.set('resDetail',tempresDetail) " } } } output { stdout { codec => rubydebug } elasticsearch { hosts => ["elastic1.ops.bj3.daling.com:9200"] index => "logstash-errorpush-%{+YYYY.MM.dd}" template_overwrite=>true #codec => plain { charset => "UTF-8"} user => "elastic" password => "Daling@Com" } }
日志目录多个日志文件,对路径进行切割
input { kafka { bootstrap_servers => ["l-kafkacluster1.ops.com:9092,l-kafkacluster2.ops.com:9092,l-kafkacluster3.ops.com:9092,l-kafkacluster4.ops.com:9092,l-kafkacluster5.ops.com:9092"] group_id => "elkstack-ali" topics => ["xcsale"] auto_offset_reset => "latest" decorate_events => true consumer_threads => 5 auto_commit_interval_ms => "300" codec => json {} } } filter { mutate { split =>["source","/"] add_field => { "log_project" => "%{[source][-3]}" "log_filename" => "%{[source][-1]}" } } } output { stdout{ codec => rubydebug } elasticsearch { hosts => ["elastic1.ops.com:9200"] index => "logstash-%{log_project}-%{+YYYY.MM.dd}" user => "elastic" password => "changeme" } }
logstash处理消费kafka多个topic,分别生产索引
input{ kafka{ bootstrap_servers => "kafka-01:9092,kafka-02:9092,kafka-03:9092" topics_pattern => "elk-.*" consumer_threads => 5 decorate_events => true codec => "json" auto_offset_reset => "latest" group_id => "logstash1"##logstash 集群需相同 } } filter { ruby { code => "event.timestamp.time.localtime" } mutate { remove_field => ["beat"] } grok { match => {"message" => "\[(?<time>\d+-\d+-\d+\s\d+:\d+:\d+)\] \[(?<level>\w+)\] (?<thread>[\w|-]+) (?<class>[\w|\.]+) (?<lineNum>\d+):(?<msg>.+)" } } } output { elasticsearch { hosts => ["192.168.16.221:9200","192.168.16.251:9200","192.168.16.252:9200"] # index => "%{[fields][logtopic}" ##直接在日志中匹配,索引会去掉elk index => "%{[@metadata][topic]}-%{+YYYY-MM-dd}" } stdout { codec => rubydebug } }
2、更改logstash.yml默认配置
主要更改以下几点
# This defaults to the number of the host's CPU cores. pipeline.workers: 20 pipeline.output.workers: 10 # # How many events to retrieve from inputs before sending to filters+workers # pipeline.batch.size: 500 # # How long to wait in milliseconds while polling for the next event # before dispatching an undersized batch to filters+outputs # pipeline.batch.delay: 50
3、jvm.options
-Xms90g -Xmx90g
五)数据存储:倒叙查询数据库elasticsearch
1、配置文件elasticsearch.yml
主要配置如下
cluster.name: escluster-b node.name: l-esb1 path.data: /data/elasticsearch/data path.logs: /data/elasticsearch/logs bootstrap.memory_lock: true bootstrap.system_call_filter: false network.host: 10.0.26.63 http.port: 9200 discovery.zen.minimum_master_nodes: 2 discovery.zen.ping_timeout: 60s discovery.zen.ping.unicast.hosts: ["10.0.26.63", "10.0.27.35", "10.0.27.65"] thread_pool.search.size: 200 thread_pool.search.queue_size: 2000 thread_pool.search.min_queue_size: 2000 thread_pool.search.max_queue_size: 2000 thread_pool.bulk.queue_size: 1000
阿里云es集群优化参数:提高写入效率
thread_pool: write: queue_size: '5000'
六)数据查询控制台:kibana
更改kibana的查询超时时间