背景
业务目的是能够分析nginx和apache每天产生的日志,对url、ip、rest接口等信息进行监控,并将数据发送到elasticsearch服务。
对比flume
不重复消费,数据不丢失
目前flume支持hdfs比较好(个人理解)
离线安装
先配置JAVA_HOME 必须java8以上
标准输入输出
bin/logstash -e 'input { stdin {} } output { stdout{} }'
文件到标准输出
首先在logstash中mkdir conf & touch file-stdout.conf
vim file-stdout.conf
input {
file {
path => "/home/bingo/data/test.log"
start_position => "beginning"
ignore_older => 0
}
}
output {
stdout{}
}
最后启动
bin/logstash -f conf/file-stdout.conf
#多文件 path => "/home/bingo/data/*.log"、
#多目录 path => "/home/bingo/data/*/*.log"
#参数说明
start_position:默认end,是从文件末尾开始解析
ignore_older:默认超过24小时的日志不解析,0表示不忽略任何过期日志
执行命令后会看到控制台输出log文件的内容
- 此方式可以持续监控一个文件的输入
文件到文件
- 启动方式和文件到标准输出相同,不同之处在于配置文件:
touch file-file.conf
vim file-file.conf
input {
file {
path => "/home/connect/install/data/test.log"
start_position => "beginning"
ignore_older => 0
}
}
output {
file {
path => "/home/connect/install/data/test1.log"
}
stdout{
codec => rubydebug
}
}
上游到elasticsearch
touch file-es.conf
vim file-es.conf
input {
file {
type => "flow"
path => "/home/bingo/data/logstash/logs/*/*.txt"
discover_interval => 5
start_position => "beginning"
}
}
output {
if [type] == "flow" {
elasticsearch {
index => "flow-%{+YYYY.MM.dd}"
hosts => ["master01:9200", "worker01:9200", "worker02:9200"]
}
}
}
上游到kafka
kafka到es
touch kafka-es.conf
vim kafka-es.conf
input {
kafka {
zk_connect => "master01:2181", "worker01:2181", "worker02:2181"
auto_offset_reset => "smallest"
group_id => "bdes_clm_bs_tracking_log_json"
topic_id => "clm_bs_tracking_log_json"
consumer_threads => 2
codec => "json"
queue_size => 500
fetch_message_max_bytes => 104857600
}
}
output {
elasticsearch {
hosts => ["A:9900","B:9900","C:9900"]
document_type => "bs_tracking_log"
#document_id => "%{[mblnr]}%{[mjahr]}"
flush_size => 102400
index => "clm"
timeout => 10
}
}