Logstash的数据类型和基本语法

最新推荐文章于 2025-06-03 15:44:22 发布

转载最新推荐文章于 2025-06-03 15:44:22 发布 · 2.6w 阅读

文章标签：

#Elasticsearch #Logstash

大数据专栏收录该内容

15 篇文章

订阅专栏

本文介绍了Logstash支持的各种数据类型及字段引用语法，并详细解释了如何使用条件判断来控制过滤器和输出插件的行为。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

From:http://www.ttlsa.com/elk/elk-logstash-configuration-syntax/

logstash支持的数据类型有：

array
数组可以是单个或者多个字符串值。
path => [ "/var/log/messages", "/var/log/*.log" ]
path => "/data/mysql/mysql.log"
如果指定了多次，追加数组。此实例path数组包含三个字符串元素。
boolean
布尔值必须是TRUE或者false。true和false不能有引号。
ssl_enable => true
bytes
指定字节单位。支持的单位有SI (k M G T P E Z Y) 和 Binary (Ki Mi Gi Ti Pi Ei Zi Yi)。Binary单位基于1024，SI单位基于1000。不区分大小写和忽略值与单位之间的空格。如果没有指定单位，默认是byte。
my_bytes => "1113" # 1113 bytes
my_bytes => "10MiB" # 10485760 bytes
my_bytes => "100kib" # 102400 bytes
my_bytes => "180 mb" # 180000000 bytes
Codec
logstash编码名称用来表示数据编码。用于input和output段。便于数据的处理。如果input和output使用合适的编码，就无需单独的filter对数据进行处理。
codec => "json"
hash
键值对，注意多个键值对用空格分隔，而不是逗号。
match => {
"field1" => "value1"
"field2" => "value2"
... }
number
必须是有效的数值，浮点数或者整数。
port => 33
password
一个单独的字符串。
my_password => "password"
path
一个代表有效的操作系统路径。
my_path => "/tmp/logstash"
string
name => "Hello world"
name => 'It\'s a beautiful day'

字段引用

logstash字段引用语法。要在 Logstash 配置中使用字段的值，只需要把字段的名字写在中括号 [] 里就行了，这就叫字段引用。还需注意字段层次。如果引用的是一个顶级字段，可以省略[]，直接指定字段名。要引用嵌套的字段，需要指定完整的路径，如[top-level field][nested field]。

下面有五个顶级字段(agent, ip, request, response, ua) 和三个嵌套字段 (status, bytes, os)。

1
2
3
4
5
6
7
8
9
10
11
12
{
  "agent": "Mozilla/5.0 (compatible; MSIE 9.0)",
  "ip": "192.168.24.44",
  "request": "/index.html"
  "response": {
    "status": 200,
    "bytes": 52353
  },
  "ua": {
    "os": "Windows 7"
  }
}

为了引用os字段，需指定[ua][os]。引用顶级字段如request，可以简单指定request即可。

sprintf格式

字段引用格式也可以用于logstash调用sprintf格式。这种格式可以从其他字符串中引用字段值。如：

1

2

3

4

5

output
{

  statsd
{

    increment
=>
"apache.%{[response][status]}"

  }

}

也可以格式化时间。如：

1
2
3
4
5
output {
  file {
    path => "/var/log/%{type}.%{+yyyy.MM.dd.HH}"
  }
}

条件判断

使用条件来决定filter和output处理特定的事件。

logstash条件类似于编程语言。条件支持if、else if、else语句，可以嵌套。

条件语法如下：

1

2

3

4

5

6

7

if
EXPRESSION
{

  ...

}
else
if
EXPRESSION
{

  ...

}
else
{

  ...

}

比较操作有：

相等: ==, !=, <, >, <=, >=
正则: =~(匹配正则), !~(不匹配正则)
包含: in(包含), not in(不包含)

布尔操作：

and(与), or(或), nand(非与), xor(非或)

一元运算符：

!(取反)
()(复合表达式), !()(对复合表达式结果取反)

如mutate filter删除secret字段对于action是login的：

1
2
3
4
5
filter {
  if [action] == "login" {
    mutate { remove => "secret" }
  }
}

在一个条件里指定多个表达式：

1

2

3

4

5

6

7

8

output
{

  #
 Send production errors to pagerduty

  if
[loglevel]
==
"ERROR"
and
[deployment]
==
"production"
{

    pagerduty
{

    ...

    }

  }

}

在in条件，可以比较字段值：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
filter {
  if [foo] in [foobar] {
    mutate { add_tag => "field in field" }
  }
  if [foo] in "foo" {
    mutate { add_tag => "field in string" }
  }
  if "hello" in [greeting] {
    mutate { add_tag => "string in field" }
  }
  if [foo] in ["hello", "world", "foo"] {
    mutate { add_tag => "field in list" }
  }
  if [missing] in [alsomissing] {
    mutate { add_tag => "shouldnotexist" }
  }
  if !("foo" in ["hello", "world"]) {
    mutate { add_tag => "shouldexist" }
  }
}

1

2

3

4

5

output
{

  if
"_grokparsefailure"
not
in
[tags]
{

    elasticsearch
{
...
}

  }

}

字段引用、sprintf格式、条件判断只能用于filter和output，不能用于input。

@metadata字段

在logstash1.5版本开始，有一个特殊的字段，叫做@metadata。@metadata包含的内容不会作为事件的一部分输出。

1
2
3
4
5
6
7
8
9
10
11
12
13
input { stdin { } }
 
filter {
  mutate { add_field => { "show" => "This data will be in the output" } }
  mutate { add_field => { "[@metadata][test]" => "Hello" } }
  mutate { add_field => { "[@metadata][no_show]" => "This data will not be in the output" } }
}
 
output {
  if [@metadata][test] == "Hello" {
    stdout { codec => rubydebug }
  }
}

查看输出：

1

2

3

4

5

6

7

8

9

10

$
bin/logstash
-f
../test.conf

Logstash
startup
completed

asdf

{

      
"message"
=>
"asdf",

      "@version"
=>
"1",

    "@timestamp"
=>
"2015-03-18T23:09:29.595Z",

          "host"
=>
"www.ttlsa.com",

          "show"
=>
"This
 data will be in the output"

}

"asdf"变成message字段内容。条件与@metadata内嵌的test字段内容判断成功，但是输出并没有展示@metadata字段和其内容。

不过，如果指定了metadata => true，rubydebug codec允许显示@metadata字段的内容。

1
stdout { codec => rubydebug { metadata => true } }

下面是输出的内容：

1

2

3

4

5

6

7

8

9

10

11

12

13

14

$
bin/logstash
-f
../test.conf

Logstash
startup
completed

asdf

{

      
"message"
=>
"asdf",

      "@version"
=>
"1",

    "@timestamp"
=>
"2015-03-18T23:10:19.859Z",

          "host"
=>
"www.ttlsa.com",

          "show"
=>
"This
 data will be in the output",

    
"@metadata"
=>
{

          
"test"
=>
"Hello",

        "no_show"
=>
"This
 data will not be in the output"

    }

}

可以看到@metadata字段及其子字段内容。

注意：只有rubydebug codec可以显示@metadata字段内容。

确保@metadata字段临时需要，不希望最终输出。最常见的情景是filter的时间字段，需要一临时的时间戳。如：

1
2
3
4
5
6
7
8
9
10
input { stdin { } }
 
filter {
  grok { match => [ "message", "%{HTTPDATE:[@metadata][timestamp]}" ] }
  date { match => [ "[@metadata][timestamp]", "dd/MMM/yyyy:HH:mm:ss Z" ] }
}
 
output {
  stdout { codec => rubydebug }
}

输出结果：

1

2

3

4

5

6

7

8

9

$
bin/logstash
-f
../test.conf

Logstash
startup
completed

02/Mar/2014:15:36:43
+0100

{

      
"message"
=>
"02/Mar/2014:15:36:43
 +0100",

      "@version"
=>
"1",

    "@timestamp"
=>
"2014-03-02T14:36:43.000Z",

          "host"
=>
"example.com"

}