Elasticsearch
- 性能优化(转换字段类型,会带来什么变化?性能?)
- 升级(官网有视频,但是是英文)
- 数据导出https://github.com/taskrabbit/elasticsearch-dump测试
Logstash
filter规则grok数据转发到其他各个不同的对象Windows下数据的收集(IIS,System),使用nxlog
Kibana
- 图形展示
Beats
filebeat组件收集不同类型的日志- 其他beats组件的使用
整体
- 作为一个项目的整体试试方案
- 从日志收集到展示过程中,Key、Index的命名问题,体验其中的坑
-
GitHub 使用 Elasticsearch 对1300亿行代码进行查询
-
Wikipedia 使用 Elasticsearch 提供带有高亮片段的全文搜索
Filebeat,日志和文件数据收集
Packetbeat,网络数据收集
Metricbeat,系统和服务指标收集
Winlogbeat,Windows日志收集
Heartbeat,服务可用性监测(url列表)
现在大家都在用Beats组件中的Filebeat代替Logstash在客户端收集数据。本节内容摘自知乎
- 相对于Logstash来说Filebeat
非常的轻量级
功能单一,只做数据收集和发送
数据采集方面,性能相对Logstash高
go语言编写,用来替换logstash-forward
- 相对于Filebeat来说Logstash
非常的重
功能多,数据的接收、处理、输出
数据采集方面,性能相对Filebeat低
Java语言编写,使用Jruby插件
Elasticsearch部署
基础环境部署
yum install java-1.8.131
#追加 /etc/security/limits.conf
echo " " >> /etc/security/limits.conf
echo "#elasticsearch memlock dinghe add 20170828 " >> /etc/security/limits.conf
echo "elasticsearch soft memlock unlimited" >> /etc/security/limits.conf
echo "elasticsearch hard memlock unlimited" >> /etc/security/limits.conf
配置文件描述符
echo " " >> /etc/security/limits.conf
echo "#limit dinghe add 20170828 " >> /etc/security/limits.conf
echo "* soft nofile 65536" >> /etc/security/limits.conf
echo "* hard nofile 65536" >> /etc/security/limits.conf
echo "#elasticsearch inti dinghe add 20170828" >> /etc/sysctl.conf
echo "vm.max_map_count = 262144" >> /etc/sysctl.conf
ELK安装(RPM安装、强烈推荐)
rpm -ivh elasticsearch-5.5.0.rpm
#修改用户shell
usermod elasticsearch -s /bin/bash
#5.5版本的rpm安装会出现变量找到到(oracle java找不到)和memlock配置不生效的情况,修改用户shell后或使用openjdk解决
配置Elasticsearch
mkdir /data/es-data
#多磁盘方式mkdir /data/es01,/data/es02
chown -R elasticsearch. /data/es-data
修改相关配置
vim /etc/elasticsearch/elasticsearch.yml
cluster.name: es-cluster
node.name: master-1
path.data: /data/es_data/
network.host: 192.168.0.232
http.port: 9200
discovery.zen.ping.unicast.hosts: ["192.168.0.232", "192.168.0.231"]
配置注释
#集群名称
cluster.name: es-cluster
#本节点名称
node.name: master-1
#数据文件位置
path.data: /data/es_data/
#日志位置
path.logs: /app/logs/es
#配置绑定IP
network.host: 192.168.0.232
#配置服务端口(9300为集群选举使用的端口)
http.port: 9200
#使用单播的方式发现集群节点,避免网络波动和云服务器网络限制造成的节点发现失败问题
discovery.zen.ping.unicast.hosts: ["192.168.0.232", "192.168.0.231"]
#源码包启动 /app/es/bin/elasticsearch
systemctl restart elasticsearch
安装elasticsearch-head插件
Chrome插件中搜索 elasticsearch head
http://pan.baidu.com/s/1slSDvv3
cd /opt
git clone git://github.com/mobz/elasticsearch-head.git
cd elasticsearch-head/
npm install grunt-cli --registry=https://registry.npm.taobao.org
#备注:grunt是Javascrip的构建工具
npm install --registry=https://registry.npm.taobao.org
#安装过程中会下载phantomjs-2.1.1-linux-x86_64.tar.bz2,取消下载即可
npm run start
访问elasticsearch-head插件
open (http://localhost:9100/)
错误1(RPM包处理方式):elasticsearch 5.x which: no java in
- 安装的jdk 1.8.121,不成功,安装openjava-jdk1.8.131成功
- 或修改elasticsearch用户 usermod elasticsearch -s /bin/bash
错误2:bootstrap.memory_lock: true导致Elasticsearch启动失败问题(RPM安装未解决)
...略
# allow user 'elasticsearch' mlockall
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited
...略
ERROR: bootstrap checks failed
memory locking requested for elasticsearch process but memory is not locked
...略
解决:
#追加 /etc/security/limits.conf
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited
错误3:max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
echo "vm.max_map_count = 262144" >> /etc/sysctl.conf
错误4:Centos 6.x 安装elasticsearch5.2无法启动bug
报错:
ERROR: bootstrap checks failed
system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
原因:
这是在因为Centos6不支持SecComp,而ES5.2.0默认bootstrap.system_call_filter为true进行检测,所以导致检测失败,失败后直接导致ES不能启动。
解决:
在elasticsearch.yml中配置bootstrap.system_call_filter为false,注意要在Memory下面:
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
可以查看issues
https://github.com/elastic/elasticsearch/issues/22
报错
max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
max number of threads [1024] for user [[elasticsearch] is too low, increase to at least [2048]
解决
编辑limits.conf 文件
#vim /etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
#Centos6.x修改vim /etc/security/limits.d/90-nproc.conf
soft nproc 1024 修改为 soft nproc 2048
sysctl -p
Logstash部署
基础环境部署
阿里YUM源配置(略)
全局JDK 1.8部署
yum install java-1.8.131
rpm -ivh /opt/logstash-5.5.2.rpm
#logstash家目录
/usr/share/logstash/
测试logstash
/usr/share/logstash/bin/logstash -e 'input {stdin {}} output {stdout{ }}'
...
13:46:31.584 [Api Webserver] INFO logstash.agent - Successfully started Logstash API endpoint {:port=>9600}
输入信息,logstash会增加时间戳后输出
ding
2017-08-29T05:46:35.339Z web-log.prod.ding ding
备注:时间戳不是东八区,浏览器会自动转换
/usr/share/logstash/bin/logstash -e 'input {stdin {}} output {file{path => "/tmp/logstash-test-%{+YYYY.MM.dd}.log.tar.gz" gzip => true}}'
输入后查看/tmp下的文件
/usr/share/logstash/bin/logstash -e 'input {stdin {}} output {elasticsearch{hosts=>["192.168.0.231:9200"] index => "logstash-test-%{+YYYY.MM.dd}"}}'
输入后查看Elasticsearch
Kibana部署
基础环境部署
阿里YUM源配置(略)
全局JDK 1.8部署
yum install java-1.8.131
#Kibana的bin路径为/usr/share/kibana/bin
rpm -ivh /opt/kibana-5.5.2-x86_64.rpm
/etc/kibana/kibana.yml
#设置本机和Elasticsearch的IP和端口
server.port: 5601
server.host: "192.168.0.230"
elasticsearch.url: "http://192.168.0.231:9200"
http://192.168.0.230:5601/status
添加索引名称
#根据3.Logstash部署中的内容,填写索引
logstash-test-*
X-pack部署
x-pack插件部署
x-pack可以安装到elasticsearch和kibana上,会全部启用基本验证(用户名、密码)
Kibana弃用基本验证后,ES也会启用
/usr/share/kibana/bin/kibana-plugin install x-pack
下载到/opt目录下
/usr/share/kibana/bin/kibana-plugin install file:///opt/x-pack-5.5.0.zip
Filebeat部署
安装
rpm -ihv /tmp/filebeat-5.5.2-x86_64.rpm
配置范例
filebeat.prospectors:
- input_type: log
#收集的文件
paths:
- /var/log/messages
#过滤空行
exclude_lines: ["^DBG","^$"]
#document_type: abc
fields:
type: sys
fields_under_root: true
exclude_files: [".gz$"]
#输出到本地文件
output.file:
path: "/tmp/filebeat"
filename: filebeat
注意:ducument_type参数已经在5.5版本建议丢弃,6.0中彻底废弃,使用fields代替,通过fields_under_root属性,替换原始type属性
注释
#定义自定义字段
fields:
#字段名称和值
type: ddd
#如果自定义字段和原有字段重名,是否覆盖原有字段
fields_under_root: true
启动告警
WARN DEPRECATED: document_type is deprecated. Use fields instead.
ELK高级配置
Elasticsearch实战
Elasticsearch优化
- 数据目录单独磁盘分区存放
- 最好RAID 10或RAID 0
JVM配置
/etc/elasticsearch/jvm.options
-Xms2g #是指设定程序启动时占用内存大小
-Xmx2g #是指设定程序运行期间最大可占用的内存大小
配置Elasticsearch内存锁
/etc/elasticsearch/elasticsearch.yml
# 内存锁配置,避免使用swap分区
# 5.x开启还需要配置Elasticsearch的启动选项,否则启动失败
bootstrap.memory_lock: true
/usr/lib/systemd/system/elasticsearch.service
#允许使用无限大的内存
LimitMEMLOCK=infinity
修改elasticsearch.service需要重新加载
systemctl daemon-reload
systemctl restart elasticsearch.service
(https://www.elastic.co/guide/en/elasticsearch/reference/5.5/setting-system-settings.html#systemd "官方文档")
Elasticsearch数据迁移到新集群
1台ES服务器hostA,一组新ES集群hostB,hostC,想讲hostA中的数据迁移至新集群,版本无变化
可行,但hostA想废弃,次方案放弃
可行,但有以下注意事项
1.如hostA中的数据只能复制到新集群中的一台,因为数据目录保存了节点信息,如果hostB和hostC数据一样, 会冲突
[2017-09-07T10:35:22,521][INFO ][o.e.d.z.ZenDiscovery ] [node-1] failed to send join request to master [{master-1}{DJsZJ1WURqGX_1veHrxVNw}{G-L07qVPQiu9F1_5XTUt4A}{192.168.0.231}{192.168.0.231:9300}], reason [RemoteTransportException[[master-1][192.168.0.231:9300][internal:discovery/zen/join]]; nested: IllegalArgumentException[can't add node {node-1}{DJsZJ1WURqGX_1veHrxVNw}{ll64l0GAQ-ebOrsMSGJ5Cg}{192.168.0.232}{192.168.0.232:9300}, found existing node {master-1}{DJsZJ1WURqGX_1veHrxVNw}{G-L07qVPQiu9F1_5XTUt4A}{192.168.0.231}{192.168.0.231:9300} with the same id but is a different node instance]; ]
2.集群要全部停止,然后先启动有数据的ES
故障一例(磁盘超过85%,ES会告警)
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/disk-allocator.html
可以选择ES配置中关闭检测
cluster.routing.allocation.disk.threshold_enabled: false
Logstash实战
请先看看结尾的坑
logstash注意事项
/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/system.conf -t
# logstash前台启动
/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/system.conf
测试阶段可以将收集的日志输出到本地文件中,调试后发往目标
收集日志,存储在本地文件
#cat /etc/logstash/conf.d/system.conf
input{
file {
type => "systemlog"
path => "/var/log/messages"
start_position => "beginning"
stat_interval => "5"
}
}
output {
file {
path => "/tmp/systemlog.log"
}
}
使用logstash收集系统messages日志
文件无权限报错
[2017-08-29T16:50:00,834][INFO ][logstash.pipeline ] Pipeline main started
[2017-08-29T16:50:00,852][WARN ][logstash.inputs.file ] failed to open /var/log/messages: Permission denied - /var/log/messages
[2017-08-29T16:50:00,886][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
#系统日志,默认600,会报错
chmod 644 /var/log/messages
logstash中增加配置(/etc/logstash/conf.d/systemlog.conf)
#cat /etc/logstash/conf.d/system.conf
input{
file {
type => "systemlog"
path => "/var/log/messages"
start_position => "beginning"
stat_interval => "5"
}
}
output{
elasticsearch{
hosts => ["192.168.0.231:9200"]
index => "logstash-systemlog-%{+YYYY.MM.dd}"
}
}
logstash收集Nginx日志
配置摘自jack.zhang的博客,感谢分享
log_format logstash_json '{"@timestamp":"$time_local",'
'"remote_addr":"$remote_addr",'
'"remote_user":"$remote_user",'
'"body_bytes_sent":"$body_bytes_sent",'
'"request_time":"$request_time",'
'"status":"$status",'
'"request":"$request",'
'"request_method":"$request_method",'
'"http_referrer":"$http_referer",'
'"body_bytes_sent":"$body_bytes_sent",'
'"http_x_forwarded_for":"$http_x_forwarded_for",'
'"http_user_agent":"$http_user_agent"}';
access_log /var/log/nginx/access.log logstash_json;
json日志
{
"@timestamp": "30/Aug/2017:10:18:59 +0800",
"remote_addr": "192.168.2.64",
"remote_user": "-",
"body_bytes_sent": "0",
"request_time": "0.000",
"status": "304",
"request": "GET /nginxweb/ HTTP/1.1",
"request_method": "GET",
"http_referrer": "-",
"http_x_forwarded_for": "-",
"http_user_agent": "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
}
logstash中增加配置(/etc/logstash/conf.d/nginx.conf)
input {
file {
path => "/var/log/nginx/access.log"
type => "nginx-accesslog"
start_position => "beginning"
}
}
output {
if [type] == "nginx-accesslog" {
elasticsearch {
hosts => ["192.168.0.231:9200"]
index => "nginx-accesslog-%{+YYYY.MM.dd}"
}
}
}
logstash收集Tomcat访问日志
配置摘自jack.zhang的博客,感谢分享
修改tomcat/conf/server.xml结尾部分的日志配置
修改日志名称,结尾格式,和pattern
<Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
prefix="tomcat_access" suffix=".log"
pattern="{"client":"%h","client user":"%l", "authenticated":"%u", "access time":"%t", "method":"%r", "status":"%s","send bytes":"%b","Query?string":"%q","partner":"%{Referer}i","Agent version":"%{User-Agent}i"}"/>
重启Tomcat,查看日志格式
{
"client": "192.168.2.64",
"client user": "-",
"authenticated": "-",
"access time": "[31/Aug/2017:09:48:29 +0800]",
"method": "GET /web/index.html HTTP/1.1",
"status": "304",
"send bytes": "-",
"Query?string": "",
"partner": "-",
"Agent version": "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
}
logstash中增加配置(/etc/logstash/conf.d/tomcat.conf)
input{
file {
path => "/app/tomcat1/logs/tomcat_access.*.log"
type => "tomcatlog"
start_position => "beginning"
stat_interval => "5"
}
}
output{
if[type] == "tomcatlog" {
elasticsearch {
hosts => ["192.168.0.231:9200"]
index => "tomcatlog-%{+YYYY.MM.dd}"
}
}
}
rsyslog收集HAproxy日志,发送到logstash
设置HAproxy日志输出到local6
global
...(略)
log 127.0.0.1 local6 info
...(略)
将local6的所有日志发送到logstash的5555端口
# Provides UDP syslog reception
$ModLoad imudp
$UDPServerRun 514
# Provides TCP syslog reception
$ModLoad imtcp
$InputTCPServerRun 514
local6.* @@192.168.0.230:5555
input {
syslog {
type => "rsyslog-haproxy-8888"
port => "5555"
}
}
output{
if [type] == "rsyslog-haproxy-8888" {
elasticsearch {
hosts => ["192.168.0.232:9200"]
index => "haproxy-%{+YYYY.MM.dd}"
}
}
}
一个logstash配置文件中,收集多个日志,建立不同索引
logstash中增加配置(/etc/logstash/conf.d/systemlog.conf)
input{
file {
#定义type类型,用于输出判断
type => "systemlog"
path => "/var/log/messages"
start_position => "beginning"
stat_interval => "5"
}
file {
path => "/var/log/lastlog"
#定义type类型,用于输出判断
type => "system-last"
start_position => "beginning"
stat_interval => "5"
}
}
output{
#判断输入类型,输出不同索引名称
if [type] == "systemlog" {
elasticsearch{
hosts => ["192.168.0.231:9200"]
index => "logstash-systemlog-%{+YYYY.MM.dd}"
}
}
#判断输入类型,输出不同索引名称
if [type] == "system-last" {
elasticsearch {
hosts => ["192.168.0.231:9200"]
index => "logstash-lastlog-%{+YYYY.MM.dd}"
}
}
}
logstash收集beats组件信息,并转发到不同的目标
input {
beats {
port => 5044
}
}
output {
if [type] == "web02-tomcat-info" {
redis {
host => ["192.168.0.106"]
data_type => "list"
db => "3"
key => "web02-tomcat-info"
port => "6400"
password => "123456"
batch => "true"
}
}
if [type] == "web02-tomcat-error" {
elasticsearch {
hosts => ["192.168.0.231:9200"]
index => "web02-tomcat-error"
}
}
}
logstash消费redis中的日志
input {
redis {
data_type => "list"
db => "3"
host => "192.168.0.106"
port => "6400"
key => "web02-tomcat-info"
password => "123456"
}
}
output {
if [type] == "web02-tomcat-info" {
elasticsearch {
hosts => ["192.168.0.231:9200"]
index => "tomcat-info-%{+YYYY.MM.dd}"
}
}
}
跳坑
现状描述:logstash的配置中,如果有A日志配置中使用了if [type]判断,B日志收集配置未指定if[type],那么,B的索引中,会写入A中的数据(如果有C配置,设置了if[type],C的数据也会写入到B中)
目前只判断了type,其他判断未测试
结论:如果logstash中使用了if[type]判断,所有配置中都要使用判断
也许有时候需要777,rpm装的nginx,日志目录和文件644也没权限读
nginx(rpm)日志权限
Nginx yum安装,日志位置:/var/log/nginx/access.log
/var/log/nginx/权限为700
/var/log/nginx/access.log权限为644
以上情况获取不到日志
把/var/log/nginx权限调整为644,依然获取不到日志,直到调整为777,获取日志正常。
Filebeat实战
配置文件
请看过下节的错误配置后,再进行配置
filebeat.prospectors:
#收集单行日志
- input_type: log
paths:
- /var/log/messages
#过滤文档中的空格和DBG开头的内容
exclude_lines: ["^DBG","^$"]
#过滤以gz结尾的文件
exclude_files: [".gz$"]
#定义额外字段
fields:
type: ddd
#覆盖重名字段
fields_under_root: true
#收集多行日志
- input_ytpe: log
paths:
- /app/tomcat1/logs/Java_Error.log
fields:
type: web02-tomcat-error
fields_under_root: true
#配置开头为时间的格式
multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
#处理匹配到的行
multiline.negate: true
#匹配后,向后收集,直到再次匹配算结束
multiline.match: after
#注意:filebeat不能定义某个日志发送至特定对象,如果定义多个,会全部发送
#输出到文件
output.file:
path: "/tmp/"
filename: filebeat
#输出到Redis
output.redis:
hosts: ["192.168.0.106"]
db: "3"
port: "6400"
password: "noted"
key: "abc"
#输出到Elasticsearch
output.elasticsearch:
hosts: ["192.168.0.230:9200"]
#输出到Logstash
output.logstash:
hosts: ["192.168.0.230:5044"]
通常我们会根据[type]来进行处理,所以要核实日志中的type字段是否被覆盖,否则会导致数据不处理
层级错误,导致Filebeat无法启动,日志无输出信息
fields:
type: ddd
fields_under_root: true
层级错误,字符ddd并没有覆盖默认的type类型,而是变成了field中的一个键值
fields:
type: ddd
fields_under_root: true
日志
{"@timestamp":"2017-09-05T06:00:02.210Z","beat":{"hostname":"web-log.prod.ding","name":"web-log.prod.ding","version":"5.5.2"},"fields":{"fields_under_root":true,"type":"ddd"},"input_type":"log","message":"Sep 5 14:00:01 web-log systemd: Stopping user-0.slice.","offset":359166,"source":"/var/log/messages","type":"log"}
尝试过将A文件传送至redis,B文件直接发给logstash,但是filebeat会要求设置一个output
- input_ytpe: log
paths:
- /app/tomcat1/logs/Java_Error.log
fields:
type: web02-tomcat-error
fields_under_root: true
multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
multiline.negate: true
multiline.match: after
output.logstash:
hosts: ["192.168.0.230:5044"]
收集数据,转发Kafka(待整理)
引入队列
Redis部署(略)
- 作为日志缓存,关闭快照和AOF,提升性能
引入原因
- 信息数据量增大后,Logstash写入成为瓶颈
- 引入Redis后,数据暂时保存在Redis中
- Logstash异步处理数据,缓解处理压力
业务逻辑
- (生产者)Filebeat或Logstash将数据写入Redis
- (消费者)Logstash进行接收和规律,写入ES
- Kibana查看ES中的数据
监控队列长度?
获取指定Key的长度
logstash 每秒消费1000个左右的redis日志