fluentd学习——tail（输入插件）

本文介绍 Fluentd 的 in_tail 输入插件，详细解释了如何配置此插件来收集文本文件日志，并提供了多种日志格式的示例配置。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

tail（输入插件）

http://docs.fluentd.org/articles/in_tail

tail Input Plugin

The in_tail Input plugin allows Fluentd to read events from the tail of text files. Its behavior is similar to the tail -F command.

in_tail 输入插件允许fluentd从文本文件的尾部读事件。它的行为类似于tail-f 命令。

Example Configuration

in_tail is included in Fluentd’s core. No additional installation process is required. 不需要额外的安装过程。

<source>
  type tail
  path /var/log/httpd-access.log
  pos_file /var/log/td-agent/httpd-access.log.pos
  tag apache.access
  format apache2
</source>

Please see the Config File article for the basic structure and syntax of the configuration file.

请参阅 Config File 的基本结构和文章语法的配置文件。

How it Works

When Fluentd is first configured with in_tail, it will start reading from the tail of that log, not the beggining.
Once the log is rotated, Fluentd starts reading the new file from the beggining. It keeps track of the current inode number.
If td-agent restarts, it starts reading from the last position td-agent read before the restart. This position is recorded in the position file specified by the pos_file parameter.
当Fluentd首先配置in_tail插件时,它将开始从尾部的日志阅读,而不是beggining。
一旦日志是动（更新）,Fluentd开始从beggining阅读新文件。它跟踪当前的inode号。
如果 td-agent 重新启动时,在重启之前它从 td-agent最后一个位置开始阅读。这个位置是记录在指定的位置文件文件pos参数。（说明为什么pos的重要性，它必须有）

Parameters

type (required)

The value must be tail.

path (required)

The paths to read. Multiple paths can be specified, separated by ‘,’.

路径读取。可以指定多个路径,”、“分离。（这就可以说明，你可以同时收集多个log日志，而不用在重新起一个source）

tag (required)

The tag of the event. 事件tag

format (required)指定日志的格式

The format of the log. Itis the name of a template or regexp surrounded by ‘/’.

该日志的格式。它是模板的名称或是正则表达式‘/’包围。

The regexp must have at least one named capture (?<NAME>PATTERN). If the regexp has a capture named ‘time’, it is used as the time of the event. You can specify the time format using the time_format parameter. If the regexp has a capture named ‘tag’, the tag parameter + the captured tag is used as the tag of the event.

正则表达式必须至少有一个名叫捕获 (? <名称>模式)。如果正则表达式有一个捕捉名为“time”,它是用作事件的时间。你可以使用时间格式参数指定时间格式。如果正则表达式有一个捕捉名为“tag”, tag 参数+捕获的 tag 是作为标记的事件。

The following templates are supported:

以下模板支持:

regexp
正则表达式

The regexp for the format parameter can be specified. Fluentular is a great website to test your regexp for Fluentd configuration.

格式参数的正则表达式可以指定。 Fluentular 是一个伟大的网站来测试你的regexp Fluentd配置。

apache2

Reads apache’s log file for the following fields: host, user, time, method, path, code, size, referer and agent. This template is analogous to the following configuration:

读取日志文件apache的为以下字段:主机、用户、时间、方法、路径、代码、大小、推荐人和代理。这个模板类似于如下配置:

format /^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$/
time_format %d/%b/%Y:%H:%M:%S %z

syslog 系统记录

Reads syslog’s output file (e.g. /var/log/syslog) for the following fields: time, host, ident, and message. This template is analogous to the following configuration:

读取syslog的输出文件(例如,/ var / log / syslog)对下列字段:时间、主机，识别,和消息。这个模板类似于如下配置:

format /^(?<time>[^ ]* [^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?[^\:]*\: *(?<message>.*)$/
time_format %b %d %H:%M:%S

tsv or csv

If you use tsv or csv format, please also specify the keys parameter.

如果你使用tsv或csv格式,也请指定键参数。

format tsv
keys key1, key2, key3
time_key key2

If you specify the time_key parameter, it will be used to identify the timestamp of the record. The timestamp when Fluentd reads the record is used by default.

如果你指定 time_key 参数,它将被用来确定记录的时间戳。时间戳是当Fluentd读取记录是默认情况下使用的。

format csv
keys key1, key2, key3
time_key key3

json

One JSON map, per line. This is the most straight forward format :).

format json

The time_key parameter can also be specified.

format json
time_key key3

pos_file (highly recommended)

pos文件(强烈推荐)

This parameter is highly recommended. Fluentd will record the position it last read into this file.

这个参数是高度推荐。Fluentd将记录它上次读到这个文件的位置。

pos_file /var/log/td-agent/tmp/access.log.pos

time_format 时间格式

The format of the time field. This parameter is required only if the format includes a ‘time’ capture and it cannot be parsed automatically. Please see Time#strftime for additional information.

时间字段的格式。这个参数是必需的,只是如果格式包含一个“时间”捕获和它不能自动解析。请看看 Time#strftime了解更多信息。

rotate_wait 循环等待 rotating 我感觉翻译成（更新）更适合

in_tail actually does a bit more than tail -F itself. When rotating a file, some data may still need to be written to the old file as opposed to the new one.

in_tai确实有点超过tail - f本身。当 rotating 一个文件,一些不是新的数据可能仍然需要写入旧文件。

in_tail takes care of this by keeping a reference to the old file (even after it has been rotated) for some time before transitioning completely to the new file. This helps prevent data designated for the old file from getting lost. By default, this time interval is 5 seconds.

in_tail通过保持一个参考(即使它已更新)对于在完全转变成新文件之前的一些时间来保护这个旧的文件。这有助于防止数据被指定为丢失旧文件。默认情况下,这个时间间隔是5秒

The rotate_wait parameter accepts a single integer representing the number of seconds you want this time interval to be.

这个 rotate_wait 参数接受一个整数代表你想要间隔的时间秒数。

关于正则表达式：我利用自己配置的机器上收集 .log 文件的记录匹配的正则：

在客户端fluentd配置文件——fluent.conf

.log 数据——源数据
[2013-03-29 07:21:55.483292] router - pid=14615 tid=7a93 fid=5354 DEBUG -- Request body: {"host":"api.vcap.me","stats":[{"response_latency":0,"request_tags":"BAh7BjoOY29tcG9uZW50SSIUQ2xvdWRDb250cm9sbGVyBjoGRVQ=","response_codes":{"responses_2xx":2},"response_samples":2}]}

匹配正则：

format /\[(?<time>.*)\] (?<name>[^ ]*) - (?<pid>[^ ]*) (?<tid>[^ ]*) (?<fid>[^ ]*) (?<level>[^ ]*) -- (?<info>[^ ].*)$/
time_format %Y-%m-%d %H:%M:%S

在mongodb 数据表中查询结果：

{ "_id" : ObjectId("516d31c415bb53374d000004"), "name" : "router", "pid" : "pid=14615", "tid" : "tid=7a93", "fid" : "fid=5354", "level" : "DEBUG", "info" : "Request body: {\"host\":\"api.vcap.me\",\"stats\":[{\"response_latency\":0,\"request_tags\":\"BAh7BjoOY29tcG9uZW50SSIUQ2xvdWRDb250cm9sbGVyBjoGRVQ=\",\"response_codes\":{\"responses_2xx\":2},\"response_samples\":2}]}", "time" : ISODate("2013-04-23T11:21:55Z") }