logstash学习——01

最新推荐文章于 2024-08-16 18:44:59 发布

乌鱼鸡汤

最新推荐文章于 2024-08-16 18:44:59 发布

阅读量1.4k

点赞数 1

分类专栏： ELK 文章标签： ELK logstash log4j2

本文链接：https://blog.csdn.net/a123123sdf/article/details/121511346

版权

ELK 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

目录标题

如何启动logstash

# cd到 logstash解压文件的bin目录下
PS C:\Users\hs> cd D:\lihua\ELK\logstash-7.15.1-windows-x86_64\logstash-7.15.1\bin
# logstash -f 指定配置文件的路径
PS D:\lihua\ELK\logstash-7.15.1-windows-x86_64\logstash-7.15.1\bin> .\logstash -f D:\lihua\iot\iot-engine\code\hx-iot-engine-starter\src\main\resources\logstash.conf

一、专业术语介绍

（一）@metadata

用于存储您不想包含在输出事件中的内容的特殊字段。例如，该@metadata 字段可用于创建用于条件语句的临时字段。
例子：

filter {
  mutate { add_field => { "show" => "This data will be in the output" } }
  mutate { add_field => { "[@metadata][test]" => "Hello" } }
  mutate { add_field => { "[@metadata][no_show]" => "This data will not be in the output" } }
}

logstash控制台输出：

{
    "@timestamp" => 2016-06-30T02:46:48.565Z,
    # 被@metadata修饰的字段（field）是不会流入到output里面的。也就是这个字段是临时的字段，生命周期只存在filter 阶段
     "@metadata" => {
           "test" => "Hello",
        "no_show" => "This data will not be in the output"
    },
      "@version" => "1",
          "host" => "example.com",
          "show" => "This data will be in the output",
       "message" => "asdf"
}

@metadata当您需要临时字段但不希望它出现在最终输出中时，请随时使用该字段。
注意：mutate { add_field => { “[@metadata][test]” => “Hello” } } 中 field的name为[@metadata][test]，你引用的时候不能写[test]，需要写成[@metadata][test]

（二）field

一个事件属性。例如，apache 访问日志中的每个事件都有属性，例如状态代码（200、404）、请求路径（“/”、“index.html”）、HTTP 动词（GET、POST）、客户端 IP 地址、等等。Logstash 使用术语“字段”来指代这些属性。

field的具体表现：
创建（声明） field ——【 add_field】
- 值类型是哈希
- 默认值为 {}
- 作用：向事件添加字段

例子：

input {
    file { add_field => { "show" => "This data will be in the output" } }
}
filter {
	mutate { add_field => { "show" => "这个字段可以流入output" } }
	mutate { add_field => { "[@metadata][test]" => "Hello" } }
	mutate { add_field => { "[@metadata][no_show]" => "这个字段是临时的，不能流入output" } }
}

注意：
1、三种类型的插件【input、filter、output】都能创建field ，只要具体插件中提供了add_field这个配置选项。
2、当创建的field 已经存在，那么会将这个field 转换成数组类型，并插入一个元素。如下：
在这里插入图片描述

使用（引用）field
字段引用通常用方 ( [] ) 括号括起来，例如[fieldname]。如果您指的是顶级字段，则可以省略[]并仅使用字段名称。要引用嵌套字段，请指定该字段的完整路径：[top-level field][nested field]

在逻辑运算中引用field
详细参考官网
在逻辑运算中使用 [fieldname] 引用 field

filter {
	# 如果字段foo 在字段foobar中
  if [foo] in [foobar] {
  	# 向tag数组添加"field in field"这个元素
    mutate { add_tag => "field in field" }
  }
  if [foo] in "foo" {
    mutate { add_tag => "field in string" }
  }
  if "hello" in [greeting] {
    mutate { add_tag => "string in field" }
  }
  if [foo] in ["hello", "world", "foo"] {
    mutate { add_tag => "field in list" }
  }
  if [missing] in [alsomissing] {
    mutate { add_tag => "shouldnotexist" }
  }
  if !("foo" in ["hello", "world"]) {
    mutate { add_tag => "shouldexist" }
  }
}

注意：[foo] 本身可以判断是否存在foo 这个字段，例如：

output {
  # 如果[loglevel]不为空，并且[loglevel]的值为 "ERROR" 
  if [loglevel] and [loglevel] == "ERROR" {
    pagerduty {
    ...
    }
  }
}

在字符输出中引用field
在字符输出中使用%{fieldname} 引用field
参考官网文档

output {
    elasticsearch {
        hosts => ["192.168.1.83:9200"]
        index => "jmqttlogs-%{type}-%{logger}-%{loglevel}-%{+YYYY.MM}"
    }
    stdout { codec => rubydebug }
}

（三）field reference

对事件字段的引用。此引用可能出现在 Logstash 配置文件的输出块或过滤器块中。字段引用通常用方 ( [] ) 括号括起来，例如[fieldname]。如果您指的是顶级字段，则可以省略[]并仅使用字段名称。要引用嵌套字段，请指定该字段的完整路径：[top-level field][nested field]

可以认为概念与field 一致。

（四）input plugin

从特定来源读取事件数据的 Logstash插件。输入插件是 Logstash 事件处理管道的第一阶段。流行的输入插件包括 file、syslog、redis 和 beats。

input {
	# file输入插件
    file{
    	# 插件提供的配置项 ，具体配置项可以查看官网
        path => ["/jmqttlogs/*.log","/jmqttlogs/"]
        type => "test"
        exclude => ["brokerLog.log","remotingLog.log"]
    }
    #beats 输入插件
    beats {
    	# 插件提供的配置项 ，具体配置项可以查看官网
    	port => 5044
  	}
  	# tcp输入插件
  	tcp {
  		# 插件提供的配置项 ，具体配置项可以查看官网
	    port => 12345
	    codec => json
  	}
}

logstash 为我们提供了以下输入插件： 官网地址

Plugin	Description	Github repository
azure_event_hubs	Receives events from Azure Event Hubs	azure_event_hubs
beats	Receives events from the Elastic Beats framework	logstash-input-beats
cloudwatch	Pulls events from the Amazon Web Services CloudWatch API	logstash-input-cloudwatch
couchdb_changes	Streams events from CouchDB’s `_changes` URI	logstash-input-couchdb_changes
dead_letter_queue	read events from Logstash’s dead letter queue	logstash-input-dead_letter_queue
elastic_agent	Receives events from the Elastic Agent framework	logstash-input-beats (shared)
elasticsearch	Reads query results from an Elasticsearch cluster	logstash-input-elasticsearch
exec	Captures the output of a shell command as an event	logstash-input-exec
file	Streams events from files	logstash-input-file
ganglia	Reads Ganglia packets over UDP	logstash-input-ganglia
gelf	Reads GELF-format messages from Graylog2 as events	logstash-input-gelf
generator	Generates random log events for test purposes	logstash-input-generator
github	Reads events from a GitHub webhook	logstash-input-github
google_cloud_storage	Extract events from files in a Google Cloud Storage bucket	logstash-input-google_cloud_storage
google_pubsub	Consume events from a Google Cloud PubSub service	logstash-input-google_pubsub
graphite	Reads metrics from the `graphite` tool	logstash-input-graphite
heartbeat	Generates heartbeat events for testing	logstash-input-heartbeat
http	Receives events over HTTP or HTTPS	logstash-input-http
http_poller	Decodes the output of an HTTP API into events	logstash-input-http_poller
imap	Reads mail from an IMAP server	logstash-input-imap
irc	Reads events from an IRC server	logstash-input-irc
java_generator	Generates synthetic log events	core plugin
java_stdin	Reads events from standard input	core plugin
jdbc	Creates events from JDBC data	logstash-integration-jdbc
jms	Reads events from a Jms Broker	logstash-input-jms
jmx	Retrieves metrics from remote Java applications over JMX	logstash-input-jmx
kafka	Reads events from a Kafka topic	logstash-integration-kafka
kinesis	Receives events through an AWS Kinesis stream	logstash-input-kinesis
log4j	Reads events over a TCP socket from a Log4j `SocketAppender` object	logstash-input-log4j
lumberjack	Receives events using the Lumberjack protocl	logstash-input-lumberjack
meetup	Captures the output of command line tools as an event	logstash-input-meetup
pipe	Streams events from a long-running command pipe	logstash-input-pipe
puppet_facter	Receives facts from a Puppet server	logstash-input-puppet_facter
rabbitmq	Pulls events from a RabbitMQ exchange	logstash-integration-rabbitmq
redis	Reads events from a Redis instance	logstash-input-redis
relp	Receives RELP events over a TCP socket	logstash-input-relp
rss	Captures the output of command line tools as an event	logstash-input-rss
s3	Streams events from files in a S3 bucket	logstash-input-s3
s3-sns-sqs	Reads logs from AWS S3 buckets using sqs	logstash-input-s3-sns-sqs
salesforce	Creates events based on a Salesforce SOQL query	logstash-input-salesforce
snmp	Polls network devices using Simple Network Management Protocol (SNMP)	logstash-input-snmp
snmptrap	Creates events based on SNMP trap messages	logstash-input-snmptrap
sqlite	Creates events based on rows in an SQLite database	logstash-input-sqlite
sqs	Pulls events from an Amazon Web Services Simple Queue Service queue	logstash-input-sqs
stdin	Reads events from standard input	logstash-input-stdin
stomp	Creates events received with the STOMP protocol	logstash-input-stomp
syslog	Reads syslog messages as events	logstash-input-syslog
tcp	Reads events from a TCP socket	logstash-input-tcp
twitter	Reads events from the Twitter Streaming API	logstash-input-twitter
udp	Reads events over UDP	logstash-input-udp
unix	Reads events over a UNIX socket	logstash-input-unix
varnishlog	Reads from the `varnish` cache shared memory log	logstash-input-varnishlog
websocket	Reads events from a websocket	logstash-input-websocket
wmi	Creates events based on the results of a WMI query	logstash-input-wmi
xmpp	Receives events over the XMPP/Jabber protocol	logstash-input-xmpp

注意：这里的input plugin、filter plugin、output plugin 虽然叫做插件，但是他们并不需要我们额外安装，logstash已经集成了他们。

（五）filter plugin

对事件执行中间处理的 Logstash插件。通常，过滤器在通过输入摄取事件数据后，通过根据配置规则对数据进行变异、丰富和/或修改来对事件数据进行处理。过滤器通常根据事件的特征有条件地应用。流行的过滤器插件包括 grok、mutate、drop、clone 和 geoip。过滤阶段是可选的。
logstash 为我们提供了以下过滤插件： 官网地址

Plugin	Description	Github repository
age	Calculates the age of an event by subtracting the event timestamp from the current timestamp	logstash-filter-age
aggregate	Aggregates information from several events originating with a single task	logstash-filter-aggregate
alter	Performs general alterations to fields that the `mutate` filter does not handle	logstash-filter-alter
bytes	Parses string representations of computer storage sizes, such as "123 MB" or "5.6gb", into their numeric value in bytes	logstash-filter-bytes
cidr	Checks IP addresses against a list of network blocks	logstash-filter-cidr
cipher	Applies or removes a cipher to an event	logstash-filter-cipher
clone	Duplicates events	logstash-filter-clone
csv	Parses comma-separated value data into individual fields	logstash-filter-csv
date	Parses dates from fields to use as the Logstash timestamp for an event	logstash-filter-date
de_dot	Computationally expensive filter that removes dots from a field name	logstash-filter-de_dot
dissect	Extracts unstructured event data into fields using delimiters	logstash-filter-dissect
dns	Performs a standard or reverse DNS lookup	logstash-filter-dns
drop	Drops all events	logstash-filter-drop
elapsed	Calculates the elapsed time between a pair of events	logstash-filter-elapsed
elasticsearch	Copies fields from previous log events in Elasticsearch to current events	logstash-filter-elasticsearch
environment	Stores environment variables as metadata sub-fields	logstash-filter-environment
extractnumbers	Extracts numbers from a string	logstash-filter-extractnumbers
fingerprint	Fingerprints fields by replacing values with a consistent hash	logstash-filter-fingerprint
geoip	Adds geographical information about an IP address	logstash-filter-geoip
grok	Parses unstructured event data into fields	logstash-filter-grok
http	Provides integration with external web services/REST APIs	logstash-filter-http
i18n	Removes special characters from a field	logstash-filter-i18n
java_uuid	Generates a UUID and adds it to each processed event	core plugin
jdbc_static	Enriches events with data pre-loaded from a remote database	logstash-integration-jdbc
jdbc_streaming	Enrich events with your database data	logstash-integration-jdbc
json	Parses JSON events	logstash-filter-json
json_encode	Serializes a field to JSON	logstash-filter-json_encode
kv	Parses key-value pairs	logstash-filter-kv
memcached	Provides integration with external data in Memcached	logstash-filter-memcached
metricize	Takes complex events containing a number of metrics and splits these up into multiple events, each holding a single metric	logstash-filter-metricize
metrics	Aggregates metrics	logstash-filter-metrics
mutate	Performs mutations on fields	logstash-filter-mutate
prune	Prunes event data based on a list of fields to blacklist or whitelist	logstash-filter-prune
range	Checks that specified fields stay within given size or length limits	logstash-filter-range
ruby	Executes arbitrary Ruby code	logstash-filter-ruby
sleep	Sleeps for a specified time span	logstash-filter-sleep
split	Splits multi-line messages, strings, or arrays into distinct events	logstash-filter-split
syslog_pri	Parses the `PRI` (priority) field of a `syslog` message	logstash-filter-syslog_pri
threats_classifier	Enriches security logs with information about the attacker’s intent	logstash-filter-threats_classifier
throttle	Throttles the number of events	logstash-filter-throttle
tld	Replaces the contents of the default message field with whatever you specify in the configuration	logstash-filter-tld
translate	Replaces field contents based on a hash or YAML file	logstash-filter-translate
truncate	Truncates fields longer than a given length	logstash-filter-truncate
urldecode	Decodes URL-encoded fields	logstash-filter-urldecode
useragent	Parses user agent strings into fields	logstash-filter-useragent
uuid	Adds a UUID to events	logstash-filter-uuid
wurfl_device_detection	Enriches logs with device information such as brand, model, OS	logstash-filter-wurfl_device_detection
xml	Parses XML into fields	logstash-filter-xml

（六）output plugin

将事件数据写入特定目的地的 Logstash插件。输出是事件管道的最后阶段。流行的输出插件包括 elasticsearch、file、graphite 和 statsd。

logstash 为我们提供了以下输出插件： 官网地址

Plugin	Description	Github repository
app_search	Sends events to the Elastic App Search solution	logstash-integration-elastic_enterprise_search
boundary	Sends annotations to Boundary based on Logstash events	logstash-output-boundary
circonus	Sends annotations to Circonus based on Logstash events	logstash-output-circonus
cloudwatch	Aggregates and sends metric data to AWS CloudWatch	logstash-output-cloudwatch
csv	Writes events to disk in a delimited format	logstash-output-csv
datadog	Sends events to DataDogHQ based on Logstash events	logstash-output-datadog
datadog_metrics	Sends metrics to DataDogHQ based on Logstash events	logstash-output-datadog_metrics
dynatrace	Sends events to Dynatrace based on Logstash events	logstash-output-dynatrace
elastic_app_search	Sends events to the Elastic App Search solution	logstash-integration-elastic_enterprise_search
elastic_workplace_search	Sends events to the Elastic Workplace Search solution	logstash-integration-elastic_enterprise_search
elasticsearch	Stores logs in Elasticsearch	logstash-output-elasticsearch
email	Sends email to a specified address when output is received	logstash-output-email
exec	Runs a command for a matching event	logstash-output-exec
file	Writes events to files on disk	logstash-output-file
ganglia	Writes metrics to Ganglia’s `gmond`	logstash-output-ganglia
gelf	Generates GELF formatted output for Graylog2	logstash-output-gelf
google_bigquery	Writes events to Google BigQuery	logstash-output-google_bigquery
google_cloud_storage	Uploads log events to Google Cloud Storage	logstash-output-google_cloud_storage
google_pubsub	Uploads log events to Google Cloud Pubsub	logstash-output-google_pubsub
graphite	Writes metrics to Graphite	logstash-output-graphite
graphtastic	Sends metric data on Windows	logstash-output-graphtastic
http	Sends events to a generic HTTP or HTTPS endpoint	logstash-output-http
influxdb	Writes metrics to InfluxDB	logstash-output-influxdb
irc	Writes events to IRC	logstash-output-irc
java_stdout	Prints events to the STDOUT of the shell	core plugin
juggernaut	Pushes messages to the Juggernaut websockets server	logstash-output-juggernaut
kafka	Writes events to a Kafka topic	logstash-integration-kafka
librato	Sends metrics, annotations, and alerts to Librato based on Logstash events	logstash-output-librato
loggly	Ships logs to Loggly	logstash-output-loggly
lumberjack	Sends events using the `lumberjack` protocol	logstash-output-lumberjack
metriccatcher	Writes metrics to MetricCatcher	logstash-output-metriccatcher
mongodb	Writes events to MongoDB	logstash-output-mongodb
nagios	Sends passive check results to Nagios	logstash-output-nagios
nagios_nsca	Sends passive check results to Nagios using the NSCA protocol	logstash-output-nagios_nsca
opentsdb	Writes metrics to OpenTSDB	logstash-output-opentsdb
pagerduty	Sends notifications based on preconfigured services and escalation policies	logstash-output-pagerduty
pipe	Pipes events to another program’s standard input	logstash-output-pipe
rabbitmq	Pushes events to a RabbitMQ exchange	logstash-integration-rabbitmq
redis	Sends events to a Redis queue using the `RPUSH` command	logstash-output-redis
redmine	Creates tickets using the Redmine API	logstash-output-redmine
riak	Writes events to the Riak distributed key/value store	logstash-output-riak
riemann	Sends metrics to Riemann	logstash-output-riemann
s3	Sends Logstash events to the Amazon Simple Storage Service	logstash-output-s3
sink	Discards any events received	core plugin
sns	Sends events to Amazon’s Simple Notification Service	logstash-output-sns
solr_http	Stores and indexes logs in Solr	logstash-output-solr_http
sqs	Pushes events to an Amazon Web Services Simple Queue Service queue	logstash-output-sqs
statsd	Sends metrics using the `statsd` network daemon	logstash-output-statsd
stdout	Prints events to the standard output	logstash-output-stdout
stomp	Writes events using the STOMP protocol	logstash-output-stomp
syslog	Sends events to a `syslog` server	logstash-output-syslog
tcp	Writes events over a TCP socket	logstash-output-tcp
timber	Sends events to the Timber.io logging service	logstash-output-timber
udp	Sends events over UDP	logstash-output-udp
webhdfs	Sends Logstash events to HDFS using the `webhdfs` REST API	logstash-output-webhdfs
websocket	Publishes messages to a websocket	logstash-output-websocket
workplace_search	Sends events to the Elastic Workplace Search solution	logstash-integration-elastic_enterprise_search
xmpp	Posts events over XMPP	logstash-output-xmpp
zabbix	Sends events to a Zabbix server	logstash-output-zabbix

（七）其他

参考官网介绍

二、具体的logstash配置实例

注意：配置文件不要写注释，不然会加载失败。

#配置输入
input {
	#file输入插件，数据来源于文件，这里是.log日志文件
    file{
    	# 指定文件路径，注意只能是绝对路径，不能是相对路径。这里有个细节，如果需要配置文件排除，那么必须给定一个文件夹路径。这个配置项是必须要写的
        path => ["D:/lihua/javacode/jmqtt/iot-jmqtt/code/jmqttlogs/*.log","D:/lihua/javacode/jmqtt/iot-jmqtt/code/jmqttlogs/"]
         # 这个是设置需要排除的文件，需要结合path使用。
        exclude => ["brokerLog.log","remotingLog.log"]
        # type这个配置是一个field，不是必须的，并且它的值没有具体要求，可以灵活设置
        type => "test"
       
    }
}
# 配置过滤器，可以过滤（处理、解析）input 输入的数据
filter {
	# 解析日志的插件。具体使用后面介绍。
    grok {
    	# 配置规则，这个规则可以通过官方提供的在线工具生成
        match => { "message" => "(?<timestamp>%{TIMESTAMP_ISO8601}) \[%{LOGLEVEL:loglevel}\] (?<logger>[A-Za-z0-9$_.]+) – %{GREEDYDATA:messagebody}$" }
    }
    #json解析插件：发现并解析日志中存在的json
    json {
    	# 指定需要解析哪个字段（field），解析后会将json里面的属性变成field
        source => "messagebody"
    }
    # 数据转换插件，通常用来转换field的值，比如转换成小写
    mutate {
    	# 将指定的字段转换成小写，注意：es的索引库的名字不能存在大写字母
        lowercase => [ "logger","loglevel" ]
        # 删除一些不需要的field和add_field配套，并且这两个配置项大多数插件都提供有。
        remove_field => ["path","timestamp"]
    }
}
# 配置输出
output {
	# elasticsearch 输出插件，将日志输出到es中存储
    elasticsearch {
    	# 配置es地址
        hosts => ["192.168.1.83:9200"]
        # 配置索引库，如果这个索引库不存在那么会创建。注意索引库的名字不能存在大写字母
        index => "jmqttlogs-%{type}-%{logger}-%{loglevel}-%{+YYYY.MM}"
    }
    # 控制台输出插件，配置了这个插件logstash的运行控制台才会输出调试日志
    stdout { codec => rubydebug }
}