【Filebeat 6.1】Configuring Filebeat》Set up prospectors设置探查者（一）配置文件讲解

最新推荐文章于 2024-05-09 15:51:34 发布

星辰_mya

最新推荐文章于 2024-05-09 15:51:34 发布

阅读量1.4k

点赞数 1

分类专栏： ELK+Filebeat

ELK+Filebeat 专栏收录该内容

30 篇文章 4 订阅

订阅专栏

前言：
看官网有点费劲，翻译一下，个人笔记
https://www.elastic.co/guide/en/beats/filebeat/6.1/configuration-filebeat-modules.html

Set up prospectorsedit
Filebeat modules provide the fastest getting started experience for common log formats. See Quick start for common log formats to learn how to get started with modules. Also see Specify which modules to run for information about enabling and configuring modules.

Filebeat模块提供了最快速的普通日志格式的入门经验。请参阅Quick start，以了解如何从模块开始使用常见的日志格式。还请参见指定要运行哪些模块以获取关于启用和配置模块的信息。

Filebeat uses prospectors to locate and process files. To configure Filebeat, you specify a list of prospectors in the filebeat.prospectors section of the filebeat.yml config file.

Each item in the list begins with a dash (-) and specifies prospector-specific configuration options, including the list of paths that are crawled to locate the files.

Filebeat使用勘探者定位和处理文件。为了配置filebeat，可以在filebeat.yml配置文件中指定filebeat.prospectors部分的列表。
列表中的每一项都以一个dash(-)开头，并指定了特定于预期的配置选项，包括获取文件定位的路径列表。

Here is a sample configuration:这有一个简单是配置

filebeat.prospectors:
 - type: log
  paths:
    - /var/log/apache/httpd-*.log

 - type: log
  paths:
    - /var/log/messages
    - /var/log/*.log

###Configuration options 配置项

type类型

log: Reads every line of the log file (default).读取日志文件的每一行(默认)。
stdin: Reads the standard in.读取标准输入。
redis: Reads slow log entries from redis (experimental).从redis读取慢日志(实验)
udp: Reads events over UDP. Also see max_message_size从udp读取
docker: Reads logs from Docker. Also see containersedit (experimental).从docker读取

The value that you specify here is used as the type for each event published to Logstash and Elasticsearch.
你在这里指定的值将作用于发布到logstash和elasticsearch的每个事件的类型上

path路径

A list of glob-based paths that should be crawled and fetched. All patterns supported by Golang Glob are also supported here. For example, to fetch all files from a predefined level of subdirectories, the following pattern can be used: /var/log//.log. This fetches all .log files from the subfolders of /var/log. It does not fetch log files from the /var/log folder itself. It is possible to recursively fetch all files in all subdirectories of a directory using the optional recursive_glob settings.

提取的路径列表应该基于全局（不确定是不是这个意思），这里也支持所有由Golang Glob支持的模式；例如要从预定义级别的子目录中获取所有文件，可以使用以下模式:/var/ log// .log，这将从/var/log的子文件夹中获取所有.log文件，而不从/var/log文件夹本身获取日志文件；可以利用可选的递归式配置来递归地获取子路径下的所有文件。

Filebeat starts a harvester for each file that it finds under the specified paths. You can specify one path per line. Each line begins with a dash (-).
filebeat收集每一个指定路径下的文件，你可以在一行指定一个路径，每行以-开始

recursive_glob.enabled
Enable expanding ** into recursive glob patterns. With this feature enabled, the rightmost ** in each path is expanded into a fixed number of glob patterns. For example: /foo/** expands to /foo, /foo/*, /foo/*/*, and so on. If enabled it expands a single ** into a 8-level deep * pattern.

This feature is enabled by default, set to recursive_glob.enabled to false to disable it.

这段讲解**的作用，/foo/**可以匹配 /foo, /foo/, /foo//*等，可以通过recursive_glob.enabled使其无效

encoding：编码

The file encoding to use for reading files that contain international characters. See the encoding names recommended by the W3C for use in HTML5.

Here are some sample encodings from W3C recommendation:

plain, latin1, utf-8, utf-16be-bom, utf-16be, utf-16le, big5, gb18030, gbk, hz-gb-2312,
euc-kr, euc-jp, iso-2022-jp, shift-jis, and so on

The plain encoding is special, because it does not validate or transform any input.

文件编码用于读取包含国际字符的文件。请参阅W3C推荐的用于HTML5的编码命名建议。
……
plain编码比较特殊，因为他没有被证实也不改变如何输入。

exclude_lines

A list of regular expressions to match the lines that you want Filebeat to exclude. Filebeat drops any lines that match a regular expression in the list. By default, no lines are dropped.

If multiline is also specified, each multiline message is combined into a single line before the lines are filtered by exclude_lines.
排除，在这里面配置的是要被删除的行，被包含的行将不会收集，如果多行信息被指定，则多行会合并为一行。

The following example configures Filebeat to drop any lines that start with “DBG”.
下面的配置是去排除任何以DBG开头的行：

filebeat.prospectors:
- paths:
    - /var/log/myapp/*.log
  exclude_lines: ['^DBG']

See Regular expression support for a list of supported regexp patterns.
查看正则表达式

include_lines

A list of regular expressions to match the lines that you want Filebeat to include. Filebeat exports only the lines that match a regular expression in the list. By default, all lines are exported.

If multiline is also specified, each multiline message is combined into a single line before the lines are filtered by include_lines.

The following example configures Filebeat to export any lines that start with “ERR” or “WARN”:

filebeat.prospectors:
- paths:
- /var/log/myapp/*.log
include_lines: [‘^ERR’, ‘^WARN’]
似exclude_lines，略过

If both include_lines and exclude_lines are defined, 
Filebeat executes include_lines first and then executes exclude_lines. 
The order in which the two options are defined doesn’t matter. 
The include_lines option will always be executed before the exclude_lines option, 
even if exclude_lines appears before include_lines in the config file.

如果都配置了则这两个配置的顺序不重要，会先执行include_lines中的规则

exclude_files

A list of regular expressions to match the files that you want Filebeat to ignore. By default no files are excluded.

The following example configures Filebeat to ignore all the files that have a gz extension:

exclude_files: [‘.gz$’]
See Regular expression support for a list of supported regexp patterns.
排除文件，同上，略过

tags

A list of tags that the Beat includes in the tags field of each published event. Tags make it easy to select specific events in Kibana or apply conditional filtering in Logstash. These tags will be appended to the list of tags specified in the general configuration.

Example:

filebeat.prospectors:
- paths: ["/var/log/app/*.json"]
  tags: ["json"]

在每个已发布事件的标记字段中包含的标记列表，标记使得在Kibana中选择特定的事件或在logstash中应用条件过滤变得很容易，这些标记将附加到常规配置中指定的标记列表中去。

fields

Optional fields that you can specify to add additional information to the output. For example, you might add fields that you can use for filtering log data. Fields can be scalar values, arrays, dictionaries, or any nested combination of these. By default, the fields that you specify here will be grouped under a fields sub-dictionary in the output document. To store the custom fields as top-level fields, set the fields_under_root option to true. If a duplicate field is declared in the general configuration, then its value will be overwritten by the value declared here.

可选的字段是你可以指定将附加信息添加到输出。例如，你可以添加用于过滤日志数据的字段。字段可以是标量值、数组、字典，或者任何嵌套的组合。默认情况下，在这里指定的字段将在输出文档中的字段子字典下分组。要将自定义字段存储为顶级字段，请将fields_under_root选项设置为true。如果在一般配置中声明了一个重复字段，那么它的值将被这里声明的值覆盖。

filebeat.prospectors:
- paths: ["/var/log/app/*.log"]
  fields:
    app_id: query_engine_12

fields_under_root

If this option is set to true, the custom fields are stored as top-level fields in the output document instead of being grouped under a fields sub-dictionary. If the custom field names conflict with other field names added by Filebeat, then the custom fields overwrite the other fields.

如果将此选项设置为true，则自定义字段将存储为输出文档中的顶级字段，而不是在字段子字典下分组。如果自定义字段名与Filebeat添加的其他字段名冲突，则自定义字段覆盖其他字段。

processors

A list of processors to apply to the data generated by the prospector.
处理器列表应用于prospector所生成的数据
See Filter and enhance the exported data for information about specifying processors in your config.

ignore_older

If this option is enabled, Filebeat ignores any files that were modified before the specified timespan. Configuring ignore_older can be especially useful if you keep log files for a long time. For example, if you want to start Filebeat, but only want to send the newest files and files from last week, you can configure this option.

如果启用此选项，Filebeat将忽略在指定的timespan之前修改的任何文件。如果你长期保存日志文件，那么配置ignore_older可以特别有用。例如，如果你想要启动Filebeat，但只希望从上周发送最新的文件和文件，你可以配置这个选项。

You can use time strings like 2h (2 hours) and 5m (5 minutes). The default is 0, which disables the setting. Commenting out the config has the same effect as setting it to 0.

你可以使用2h(2小时)和5m(5分钟)的时间字符串。默认值为0，禁用设置。注释掉配置与将其设置为0的效果是一样的。

You must set ignore_older to be greater than close_inactive.
你必须将ignore_older的值大于close_inactive的值

The files affected by this setting fall into two categories:
受此设置影响的文件分为两类:

Files that were never harvested
没有被收集的数据
Files that were harvested but weren’t updated for longer than ignore_older
被捕获的文件，但是没有更新的时间比 ignore_older 的时间长。

For files which were never seen before, the offset state is set to the end of the file. If a state already exist, the offset is not changed. In case a file is updated again later, reading continues at the set offset position.

对于以前从未见过的文件，偏移状态设置为文件的末尾。如果一个状态已经存在，那么偏移量不会改变。为了防止稍后更新文件，则在设置偏移位置继续读取。

The ignore_older setting relies on the modification time of the file to determine if a file is ignored. If the modification time of the file is not updated when lines are written to a file (which can happen on Windows), the ignore_older setting may cause Filebeat to ignore files even though content was added at a later time.

ignore_older设置依赖于文件的修改时间来确定文件是否被忽略。如果在将行写入文件(在Windows上可能发生)时，文件的修改时间没有更新，那么ignore_older设置可能会导致Filebeat忽略该文件，即使在以后添加了内容。

To remove the state of previously harvested files from the registry file, use the clean_inactive configuration option.
使用clean_inactive配置选项从注册表文件中删除先前收获的文件的状态

Before a file can be ignored by the prospector, it must be closed. To ensure a file is no longer being harvested when it is ignored, you must set ignore_older to a longer duration than close_inactive.
在一个文件可以被prospector忽略之前，它必须被关闭。为了确保文件在被忽略时不再被捕获，你必须将ignore_older设置的比close_inactive更长的时间。

If a file that’s currently being harvested falls under ignore_older, the harvester will first finish reading the file and close it after close_inactive is reached. Then, after that, the file will be ignored.
如果当前正在收获的文件属于ignore_older，harvester 将先读完文件并在close_inactive 时间达到后关闭它，之后文件会被忽略。

close_*

The close_* configuration options are used to close the harvester after a certain criteria or time. Closing the harvester means closing the file handler. If a file is updated after the harvester is closed, the file will be picked up again after scan_frequency has elapsed. However, if the file is moved or deleted while the harvester is closed, Filebeat will not be able to pick up the file again, and any data that the harvester hasn’t read will be lost.

在特定的条件或时间之后，close_* 的配置选项用于关闭harvester，关闭harvester意味着关闭文件处理程序。如果一个文件在harvester 关闭后更新了，那么在scan_frequency结束后文件将再次被恢复。但是，如果当harvester 关闭时文件被移动或删除，Filebeat将不能够再次读取文件，任何没有被harvester读取的数据都将丢失。

close_inactive

When this option is enabled, Filebeat closes the file handle if a file has not been harvested for the specified duration. The counter for the defined period starts when the last log line was read by the harvester. It is not based on the modification time of the file. If the closed file changes again, a new harvester is started and the latest changes will be picked up after scan_frequency has elapsed.

当启用此选项时，如果文件未在指定的时间内被捕获，Filebeat将关闭文件句柄。定义期间的计数器从harvester 读取最后一个日志行的时候开始而不是基于文件的修改时间，如果关闭的文件再次被修改了，将启动一个新的harvester并且在scan_frequency结束后，会接收到最新的更改。

We recommended that you set close_inactive to a value that is larger than the least frequent updates to your log files. For example, if your log files get updated every few seconds, you can safely set close_inactive to 1m. If there are log files with very different update rates, you can use multiple prospector configurations with different values.
我们建议你将close_inactive设置的值大于对日志文件最不频繁更新的值。例如，如果你的日志文件每隔几秒更新一次，你可以安全地将close_inactive设置为1m。如果有非常不同的更新率的日志文件，你可以使用不同值的多个prospector配置。

Setting close_inactive to a lower value means that file handles are closed sooner. However this has the side effect that new log lines are not sent in near real time if the harvester is closed.

将close_inactive设置为更低的值意味着文件句柄会提前关闭，如果harvester关闭，这就会产生新的日志行不被实时发送的副作用。

The timestamp for closing a file does not depend on the modification time of the file. Instead, Filebeat uses an internal timestamp that reflects when the file was last harvested. For example, if close_inactive is set to 5 minutes, the countdown for the 5 minutes starts after the harvester reads the last line of the file.

关闭文件的时间戳不依赖于文件的修改时间。相反，Filebeat使用一个内部的反映了文件上次被捕获时的情况的时间戳；例如，如果close_inactive设置为5分钟，则在harvester读取文件的最后一行之后开始5分钟的倒计时。

You can use time strings like 2h (2 hours) and 5m (5 minutes). The default is 5m.
你可以使用2h(2小时)和5m(5分钟)的时间字符串。默认是5分钟。

close_renamed

Only use this option if you understand that data loss is a potential side effect.
只有当你理解数据丢失是潜在的副作用时才使用此选项。

When this option is enabled, Filebeat closes the file handler when a file is renamed. This happens, for example, when rotating files. By default, the harvester stays open and keeps reading the file because the file handler does not depend on the file name. If the close_renamed option is enabled and the file is renamed or moved in such a way that it’s no longer matched by the file patterns specified for the prospector, the file will not be picked up again. Filebeat will not finish reading the file.

启用此选项时，当文件重命名时Filebeat会关闭文件处理程序。这发生在旋转文件时。默认情况下，harvester保持打开状态并继续读取文件，因为文件处理程序不依赖于文件名。如果启用了close_rename选项，并以不再匹配prospector指定的文件模式的方式重命名或移动文件,那么该文件将不再被重新拾起,Filebeat无法完成读取文件。

WINDOWS: If your Windows log rotation system shows errors because it can’t rotate the files, you should enable this option.
WINDOWS系统：如果你的WINDOWS日志轮询（rotate）系统显示错误，因为它不能轮询（rotate）文件，你应该启用这个选项。

星辰_mya

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
11
评论
【Filebeat 6.1】Configuring Filebeat》Set up prospectors设置探查者（一）配置文件讲解

前言：看官网有点费劲，翻译一下，个人笔记 https://www.elastic.co/guide/en/beats/filebeat/6.1/configuration-filebeat-modules.htmlSet up prospectorsedit Filebeat modules provide the fastest getting started exper...
复制链接

扫一扫