- 以轻量化方式收集、解析和传输数据。
- Beats 平台集合了多种单一用途数据采集器。
- 它们从成百上千或成千上万台机器和系统向 Logstash 或 Elasticsearch 发送数据。
1.安装部署
tar zxvf filebeat-7.8.0-linux-x86_64.tar.gz
ln -s filebeat-7.8.0-linux-x86_64 filebeat
2.配置文件
文档地址:https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html
cp filebeat.yml filebeat-backup.yml
vim filebeat.yml
filebeat.inputs:
- type: log
enabled: true
backoff: "1s"
tail_files: false
paths:
- /usr/local/nginx/logs/access.log
#输出到控制台
output.console:
enabled: true
-
backoff:设定多长时间检查文件更新。
-
tail_files:如果设置为true,则Filebeat将在每个文件的末尾而不是开头读取新文件。 当此选项与日志轮换结合使用时,可能会跳过新文件中的第一个日志条目。
-
#输出到es
filebeat.inputs:
- type: log
enabled: true
backoff: "1s"
tail_files: false
paths:
- /usr/local/nginx/logs/access.log
#输出到es
output.elasticsearch:
hosts: ["localhost:9200"]
3.启动filebeat
#查看启动参数
./filebeat --help
#删除上一次日志加载地址
cd data
rm -rf *
#列出日志数量
cd /usr/local/nginx/logs
cat access.log|wc -l
#前台启动
./filebeat -e -c filebeat.yml
#后台启动
vi startup.sh
#! /bin/bash
nohup /usr/local/filebeat/filebeat -e -c filebeat.yml >> /usr/local/filebeat/output.log 2>&1 &
chmod a+x startup.sh
4.filebeat+logstash采集日志
-
logstash 和filebeat都具有日志收集功能,filebeat更轻量,占 用资源更少,但logstash 具有filter功能,能过滤分析日志。一 般结构都是filebeat采集日志,然后发送到消息队列,redis, kafaka。然后logstash去获取,利用filter功能过滤分析,然后 存储到elasticsearch中。
-
架构图
这种架构解决了 Logstash 在各服务器节点上占用系统资源高的问题。相比 Logstash,Beats 所占系统的 CPU 和内存几乎可以忽略不计。 -
配置文件
#filebeat配置,filebeat.yml filebeat.inputs: - type: log enabled: true backoff: "1s" tail_files: false paths: - /usr/local/nginx/logs/access.log #输出到logstash output.logstash: enabled: true hosts: ["localhost:5044"] #logstash配置,logstash.conf #使用logstash-input-beats插件 #会开启5044端口 input { beats { host => "0.0.0.0" port => 5044 } } filter { grok { match => { "message" => "%{HTTPD_COMBINEDLOG}" } } date { match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"] target => "@timestamp" } } output { elasticsearch { hosts => ["127.0.0.1:9200"] index => "nginx-%{+YYYY.MM.dd}" } } #logstash移除不必要的字段 input { beats { host => "0.0.0.0" port => 5044 } } filter { grok { match => { "message" => "%{HTTPD_COMBINEDLOG}" } } date { match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"] target => "@timestamp" } mutate{ remove_field => ["agent"] } } output { elasticsearch { hosts => ["127.0.0.1:9200"] index => "nginx-%{+YYYY.MM.dd}" } }
5.filebeat采集json格式的日志数据
-
修改nginx的日志为json格式
#nginx访问日志 192.168.230.110 - - [29/Aug/2020:12:50:21 +0800] "GET /abc/abc2.txt HTTP/1.1" 404 555 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" #修改nginx.conf log_format log_json '{"remote_addr":"$remote_addr", ' '"ident": "-", ' '"user": "$remote_user", ' '"timestamp": "$time_local",' '"request": "$request", ' '"status": $status, ' '"bytes": $body_bytes_sent, ' '"referer": "$http_referer",' '"agent": "$http_user_agent",' '"x_forwarded":"$http_x_forwarded_for"' ' }'; access_log logs/access-json.log log_json; #检查配置格式是否正确 sbin/nginx -t -c conf/nginx.conf 输出: nginx: the configuration file /usr/local/nginx/conf/nginx.conf syntax is ok nginx: configuration file /usr/local/nginx/conf/nginx.conf test is successful #开启nginx多线程 worker_processes 4; #重新加载配置文件 sbin/nginx -s reload
-
filebeat配置文件
#filebeat配置,filebeat.yml filebeat.inputs: - type: log enabled: true backoff: "1s" tail_files: false paths: - /usr/local/nginx/logs/access-json.log #输出到logstash output.logstash: enabled: true hosts: ["localhost:5044"]
-
logstash配置
input { beats { host => "0.0.0.0" port => 5044 } } filter { json { source => "message" remove_field => ["agent"] } date { match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"] target => "@timestamp" } } output { elasticsearch { hosts => ["127.0.0.1:9200"] index => "nginx-%{+YYYY.MM.dd}" } }
6.Filebeat同时采集多个日志
-
filebeat配置文件
#filebeat.yml filebeat.inputs: - type: log enabled: true backoff: "1s" tail_files: false paths: - /usr/local/nginx/logs/access-json.log fields: filetype: logjson fields_under_root: true - type: log enabled: true backoff: "1s" tail_files: false paths: - /var/log/messages fields: filetype: logsystem fields_under_root: true output.logstash: enabled: true hosts: ["localhost:5044"]
fields:自定义字段
fields_under_root:为true,则自定义字段将为文档中的顶级字段。
-
logstash配置
#logstash.conf input { beats { host => "0.0.0.0" port => 5044 } } filter { if [filetype] == "logjson" { json { source => "message" remove_field => ["agent","beat","offset","tags","prospector"] } date { match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"] target => "@timestamp" } } } output { if [filetype] == "logjson" { elasticsearch { hosts => ["127.0.0.1:9200"] index => "nginx-%{+YYYY.MM.dd}" } } else if [filetype] == "logsystem" { elasticsearch { hosts => ["127.0.0.1:9200"] index => "msg-%{+YYYY.MM.dd}" } } }
注意:filter过滤的字段只能越来越少,不能增加
否则报:[ElasticSearch MapperParsingException object mapping](https://stackoverflow.com/questions/23605942/elasticsearch-mapperparsingexception-object-mapping)
错误
7.Filebeat+Redis+Logstash收集日志数据
-
当logstash宕机的时候,这时候filebeat就不能往logstash里写数据了,这期间的日志信息可能就无法采集到,因此一般都会采用redis或kafka作为一个消息缓冲层。logstash去消费数据写至es。
-
安装Redis
tar zxvf redis-5.0.11.tar.gz make make install #初始化redis: ./utils/install_server.sh Please select the redis port for this instance: [6379] Selecting default: 6379 Please select the redis config file name [/etc/redis/6379.conf] Selected default - /etc/redis/6379.conf Please select the redis log file name [/var/log/redis_6379.log] Selected default - /var/log/redis_6379.log Please select the data directory for this instance [/var/lib/redis/6379] Selected default - /var/lib/redis/6379 Please select the redis executable path [/usr/local/bin/redis-server] Selected config: Port : 6379 Config file : /etc/redis/6379.conf Log file : /var/log/redis_6379.log Data dir : /var/lib/redis/6379 Executable : /usr/local/bin/redis-server Cli Executable : /usr/local/bin/redis-cli Is this ok? Then press ENTER to go on or Ctrl-C to abort. Copied /tmp/6379.conf => /etc/init.d/redis_6379 Installing service... Successfully added to chkconfig! Successfully added to runlevels 345! Starting Redis server... Installation successful! #检查配置 chkconfig --list #检查redis-cli [root@elk utils]# which redis-cli /usr/local/bin/redis-cli #修改配置: vi /etc/redis/6379.conf bind 0.0.0.0 port 6379 daemonize yes logfile /var/log/redis_6379.log dir /var/lib/redis/6379 #重启redis kill -9 16101 rm /var/run/redis_6379.pid service redis_6379 start systemctl enable redis_6379(chkconfig redis_6379 on ) #进入redis redis-cli
-
filebeat配置文件
#filebeat.yml filebeat.inputs: - type: log enabled: true backoff: "1s" tail_files: false paths: - /usr/local/nginx/logs/access-json.log fields: filetype: nginxjson fields_under_root: true #输出到redis output.redis: enabled: true hosts: ["127.0.0.1:6379"] key: nginx db: 0 datatype: list
-
logstash配置
input { redis { host => "127.0.0.1" port => 6379 key => "nginx" data_type => "list" db => 0 } } filter { json { source => "message" remove_field => ["agent","beat","offset","tags","prospector"] } date { match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"] target => "@timestamp" } } output { elasticsearch { hosts => ["127.0.0.1:9200"] index => "nginx-%{+YYYY.MM.dd}" } }
8.Filebeat+Kafka+Logstash 收集日志数据
-
安装kafka
tar zxvf kafka_2.13-2.7.0.tgz mv kafka_2.13-2.7.0 /usr/local ln -s kafka_2.13-2.7.0 kafka
-
启动zookeeper
Kafka使用ZooKeeper,所以需要先启动一个ZooKeeper服务器。
bin/zookeeper-server-start.sh config/zookeeper.properties
后台启动
vi start-zk.sh #! /bin/bash nohup /usr/local/kafka/bin/zookeeper-server-start.sh /usr/local/kafka/config/zookeeper.properties >> /usr/local/kafka/zk-output.log 2>&1 & chmod a+x start-zk.sh
-
Kafka基本配置
vim config/server.properties listeners=PLAINTEXT://:9092 advertised.listeners=PLAINTEXT://192.168.122.150:9092
-
启动Kafka
bin/kafka-server-start.sh config/server.properties
后台启动
bin/kafka-server-start.sh -daemon config/server.properties
-
创建Topic
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic fx-topic
-
查看topic列表
bin/kafka-topics.sh --list --zookeeper localhost:2181
-
启动生产者
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic fx-topic
-
启动消费者
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic fx-topic --from-beginning
-
filebeat配置
https://www.elastic.co/guide/en/beats/filebeat/current/kafka-output.html
#filebeat.yml filebeat.inputs: - type: log enabled: true backoff: "1s" tail_files: false paths: - /usr/local/nginx/logs/access-json.log fields: filetype: nginxjson fields_under_root: true #输出到kafka output.kafka: hosts: ["localhost:9092"] topic: fx-topic required_acks: 1
-
logstash配置
input { kafka { bootstrap_servers => "127.0.0.1:9092" topics => "fx-topic" group_id => "logstash" } } filter { json { source => "message" remove_field => ["agent","beat","offset","tags","prospector"] } date { match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"] target => "@timestamp" } } output { elasticsearch { hosts => ["127.0.0.1:9200"] index => "nginx-%{+YYYY.MM.dd}" } } group_id 消费者分组,可以通过组 ID 去指定,不同的组之间消费是相互不受影响的,相互隔离。