filebeat-input-stream配置

莫为善

已于 2024-04-22 10:01:07 修改

阅读量3.1k

点赞数

分类专栏： Elastic 文章标签： windows

于 2022-12-18 15:10:18 首次发布

原文链接：https://www.elastic.co/guide/en/beats/filebeat/8.5/filebeat-input-filestream.html

版权

Elastic 专栏收录该内容

4 篇文章 1 订阅

订阅专栏

本文详细介绍了Filebeat的filestream输入类型，包括配置文件路径、监控选项、解析器设置、日志轮换策略、文件标识方法、收割机关闭条件以及错误处理。重点讨论了如何处理多行事件、文件旋转、网络共享和云存储中的文件，以及如何通过配置选项优化日志处理速度和避免数据重复。

摘要由CSDN通过智能技术生成

介绍

使用输入从活动日志文件中读取行。

Use the input to read lines from active log files.

它是输入的新改进替代品。

It is the new, improved alternative to the input.

它对现有的输入进行了各种改进:file stream log

It comes with various improvements to the existing input:filestreamlog

① 检查选项发生在带外。因此，如果一个输出被阻塞，Filebeat可以关闭reader并避免打开太多文件。close_ *

② 所有匹配配置的文件都可以获得详细的监控指标。通过这种方式，您可以跟踪所有文件，即使是那些没有主动读取的文件。path sharvester_limit

③ 的顺序是可配置的。因此，可以解析JSON行，然后将内容聚合到一个多行事件中。parsers

④ 一些位置更新和元数据更改不再依赖于发布管道。如果管道被阻塞，仍然会对注册表应用一些更改。

⑤ 只有最近的更新才被序列化到注册表中。相比之下，输入必须在输出的每个ACK上序列化完整的注册表。这使得注册表更新速度更快。log

⑥ 该输入确保只有偏移量的更新被写入注册表的append only日志。写入完整的文件状态。log

⑦ 即使没有活动输入，过期的文件也可以从注册表中删除。

要配置此输入，请指定一个基于全局路径的列表，这些路径必须爬取以定位和获取日志行。

这块对比log，区别不大。无非也是配置绝对路径。

filebeat.inputs:
- type: filestream
  id: my-filestream-id
  paths:
    - /var/log/messages
    - /var/log/*.log

警告：每个filestream输入必须有一个唯一的ID。

忽略或修改filestream ID可能导致数据重复。如果没有唯一的ID, filestream就无法正确地跟踪文件的状态。

您可以对从这些文件中获取的行应用其他配置设置(参考)。您指定的选项将应用于此输入收集的所有文件。fields include_lines exclude_lines

要对不同的文件应用不同的配置设置，你需要定义多个输入部分:

filebeat.inputs:
- type: filestream  ①
  id: my-filestream-id
  paths:
    - /var/log/system.log
    - /var/log/wifi.log
- type: filestream ②
  id: apache-filestream-id
  paths:
    - "/var/log/apache2/*"
  fields:
    apache: true

①从两个文件中获取行:一个是.system logwifi.log

②从目录中的每个文件中获取行，并使用配置选项添加一个名为output.apache2 fields apache的字段

读取网络共享和云提供商上的文件

警告：Filebeat不支持从网络共享和云提供商读取数据。

但是，如果充分配置Filebeat，这些数据源的一个限制可以缓解。

默认情况下，Filebeat根据文件的inode和设备id识别文件。但是，在网络共享和云提供商上，这些值可能在文件的生命周期内发生变化。如果发生这种情况，Filebeat认为文件是新的，并重新发送文件的全部内容。要解决这个问题，您可以配置选项。除了默认值之外，可能的值还有和。file_identity inode_deviceid path ode_marker

警告：在两次运行之间更改可能会导致输出出现重复的事件。 file_identity

选择Filebeat根据文件路径识别文件。如果inode和设备id可能改变，这是避免重新读取文件的一种快速方法。但是，如果文件被旋转(重命名)，它们将被重新读取和提交。path

如果inode不变，即使设备id改变，也可以使用该选项。

如果你的文件是轮转的，你应该选择这个方法。您必须配置一个由Filebeat可读的标记文件，并在选项中设置路径。inode_marker path path inode_marker

这个文件的内容对设备来说必须是唯一的。您可以将存储输入的设备的UUID或挂载点放在其中。下面的例子oneliner为选定的挂载点生成一个隐藏的标记文件。/logs

请注意，您不应该在Windows上使用此选项，因为文件标识符可能更不稳定。

$ lsblk -o MOUNTPOINT,UUID | grep /logs | awk '{print $2}' >> /logs/.filebeat-marker

要将生成的文件设置为标记，您应该按以下方式配置输入:file_identity

filebeat.inputs:
- type: filestream
  id: my-filestream-id
  paths:
    - /logs/*.log
  file_identity.inode_marker.path: /logs/.filebeat-marker

读取轮转日志

在处理轮转日志时，避免获取符号链接。相反，使用路径设置指向原始文件，并指定一个模式，与你想要收集的文件及其所有旋转后的文件相匹配。还要确保日志轮换策略可以防止消息丢失或重复。有关更多信息，请参见日志轮转导致事件丢失或重复。

此外，为了避免重复的轮转日志消息，不要使用的方法。或用选项排除旋转的文件。path file_identity exclude_files

Prospector options 探勘者参数

勘探者正在运行一个文件系统监视程序，它查找选项中指定的文件。目前只支持简单的文件系统扫描。paths

Id

filestream输入的唯一标识符。每个filestream输入必须有一个唯一的ID。

警告：更改输入ID可能会导致数据重复，因为文件的状态会丢失，并且会重新开始读取。

paths (和log相差不大)

将要爬取和获取的基于全局路径的列表。这里也支持Go Glob支持的所有模式。例如，要从预定义级别的子目录中获取所有文件，可以使用以下模式:

从的子文件夹中获取所有文件。它不会从文件夹本身获取日志文件。使用可选的recursive_glob设置，可以递归地获取一个目录的所有子目录中的所有文件。 /var/log/*/*.log .log /var/log

Filebeat为它在指定路径下找到的每个文件启动一个收割机。你可以为每行指定一条路径。每一行都以破折号(-)开头。

Scanner options 扫描配置

扫描器监视配置的路径。它定期扫描文件系统，并将文件系统事件返回给勘探者。

prospector.scanner.recursive_glob

允许扩展为递归通配模式。启用此功能后，每个路径中最右边的位置将扩展为固定数量的glob模式。例如:扩展到、、、等等。如果启用，它将单个扩展为8级深度模式。** ** /foo/** /foo /foo/* /foo/*/* ** *

下面的例子配置Filebeat忽略所有扩展名为gz的文件

filebeat.inputs:
- type: filestream
  ...
  prospector.scanner.exclude_files: ['\.gz$']

prospector.scanner.exclude_files

正则表达式的列表，用于匹配您希望Filebeat忽略的文件。默认情况下不排除任何文件。

默认情况下不排除任何文件。

下面的示例将Filebeat配置为排除不在:/var/log

filebeat.inputs:
- type: filestream
  ...
  prospector.scanner.include_files: ['^/var/log/.*']

注意：对于绝对路径，模式应该以^开头。

See Regular expression support for a list of supported regexp patterns.

prospector.scanner.symlinks

除了普通文件之外，symlinks选项允许Filebeat获取符号链接。当收集符号链接时，即使Filebeat报告了符号链接的路径，它也会打开并读取原始文件。

当您配置用于收集的符号链接时，请确保排除原始路径。如果将单个输入配置为同时获取符号链接和原始文件，则Filebeat将检测问题并仅处理它找到的第一个文件。但是，如果配置了两个不同的输入(一个读取符号链接，另一个读取原始路径)，则两个路径都会被捕获，导致Filebeat发送重复的数据，并且输入会覆盖彼此的状态。

如果指向日志文件的符号链接文件名中包含额外的元数据，并且你想在Logstash中处理这些元数据，那么symlinks选项会很有用。例如，Kubernetes日志文件就是这样。

prospector.scanner.resend_on_touch

如果启用此选项，如果文件的大小没有改变，但其修改时间比以前晚，则重新发送文件。默认禁用，以避免意外重发文件。

prospector.scanner.check_interval

Filebeat检查指定用于收集的路径中的新文件的频率。例如，如果您指定了一个类似/var/log/*的通体，那么将使用check_interval指定的频率扫描目录中的文件。指定1可以尽可能频繁地扫描目录，而不会导致Filebeat过于频繁地扫描。我们不建议设置这个值<1s。

默认值为10秒。

如果用户要求日志行以接近实时的方式发送，请不要使用很低的check_interval，而是调整close.on_state_change.inactive ，因此文件处理程序保持打开并不断轮询文件。

ignore_older

如果启用此选项，Filebeat将忽略在指定时间跨度之前修改的任何文件。如果要长时间保存日志文件，配置ignore_older可能特别有用。例如，如果您想启动Filebeat，但只想发送最新的文件和上周的文件，您可以配置此选项。

你可以使用2h(2小时)和5m(5分钟)这样的时间字符串。默认值为0，表示禁用该设置。注释掉配置与将其设置为0的效果相同。

必须将ignore_older设置为大于close.on_state_change.inactive。

受此设置影响的文件分为两类。

那些从未被收集的文件
收集的文件更新时间没有超过ignore_older

对于从未见过的文件，offset状态设置为文件末尾。如果状态已经存在，则偏移量重置为文件的大小。如果稍后再次更新文件，读取将从设置的偏移位置继续。

ignore_older设置依赖于文件的修改时间，以确定是否忽略某个文件。如果在写入文件时没有更新文件的修改时间(这在Windows上可能发生)，ignore_older设置可能会导致Filebeat忽略文件，即使内容是在稍后添加的。

要从注册表文件中删除以前收集的文件的状态，可以使用clean_inactive配置选项。

在文件被Filebeat忽略之前，文件必须关闭。为确保在忽略文件时不再收集文件，用户必须将ignore_older设置为比close.on_state_change.inactive更长的持续时间。

如果当前正在收集的文件属于ignore_older，收割者将首先完成文件的读取，并在close.on_state_change之后关闭它。进入非活动状态。然后，在此之后，该文件将被忽略。

ignore_inactive

如果启用此选项，则Filebeat将忽略自选定时间以来未更新的每个文件。可能的选项有since_first_start和since_last_start。

第一个选项忽略自Filebeat第一次启动以来没有更新的每个文件。当filebeat可能由于配置更改或失败而重新启动时，它很有用。

第二个选项告诉filebeat从它开始读取已经更新的文件。

受此设置影响的文件分为两类。

那些从未被收集的文件
从ignore_inactive开始收集但没有更新的文件。

对于从未见过的文件，offset状态设置为文件末尾。如果状态已经存在，则不改变偏移量。如果稍后再次更新文件，读取将从设置的偏移位置继续。

该设置依赖于文件的修改时间，以确定是否忽略文件。如果在写入文件时没有更新文件的修改时间(这在Windows上可能发生)，则该设置可能导致Filebeat忽略文件，即使内容是在稍后添加的。

要从注册表文件中删除以前收集的文件的状态，可以使用clean_inactive配置选项。

close.*

close*配置选项用于在特定条件或时间后关闭收割机。关闭收割机意味着关闭文件处理程序。如果一个文件是在收割机关闭后更新的，那么这个文件会在.scanner之后再次被获取。已经过Check_interval。然而，如果文件在收割机关闭时被移动或删除，则Filebeat将无法再次拾取文件，任何收割机未读取的数据都将丢失。

close.on_state_change。*设置是异步应用于读取文件的，这意味着如果Filebeat由于输出阻塞、队列满或其他问题而处于阻塞状态，则文件无论如何都会被关闭。

cose.on_state_change.inactive

当启用此选项时，如果在指定的时间内未收集文件，则Filebeat将关闭文件句柄。定义周期的计数器从收割机读取最后一行日志时开始。不是基于文件的修改时间。如果关闭的文件再次发生变化，则启动一个新的收割机，并在勘探者.scanner之后收集最新的变化。已经过Check_interval。

我们建议你设置close.on_state_change.Inactive设置为大于日志文件中最不频繁更新的值。例如，如果你的日志文件每隔几秒钟更新一次，你可以安全地设置close.on_state_change。静止到1m。如果有更新速度非常不同的日志文件，您可以使用具有不同值的多个配置。

设置close.on_state_change。较低的值是不活动的，意味着文件句柄关闭得更快。然而，这有一个副作用，如果收割机关闭，新的日志线将无法接近实时发送。

关闭文件的时间不依赖于文件的修改时间。相反，Filebeat使用一个内部时间戳来反映上次收集文件的时间。例如，如果close.on_state_change.Inactive设置为5分钟，在收割机读取文件的最后一行后开始倒计时5分钟。

你可以使用2h(2小时)和5m(5分钟)这样的时间字符串。默认值为5m。

close.on_state_change.renamed

警告：只有在你知道数据丢失是一个潜在的副作用时，才使用这个选项。

当启用此选项时，Filebeat将在重命名文件时关闭文件处理程序。例如，在轮询文件时就会发生这种情况。默认情况下，收割机保持打开状态并继续读取文件，因为文件处理程序不依赖于文件名。如果close.on_state_change.renamed 选项被启用，文件被重命名或移动时，它将不再与指定的文件模式匹配，该文件将不会被再次拾取。Filebeat不会完成文件的读取。

在配置基于路径的file_identity时，不要使用此选项。启用该选项没有意义，因为Filebeat无法检测使用路径名作为唯一标识符的重命名。

WINDOWS:如果你的WINDOWS日志旋转系统显示错误，因为它不能旋转文件，你应该启用这个选项。

close.on_state_change.removed

当启用此选项时，Filebeat会在文件被移除时关闭收割机。通常，一个文件只有在close.on_state_change.inactive指定的时间内处于非活动状态后才应该被删除。但是，如果一个文件被提前删除，并且您没有启用close.on_state_change。删除，Filebeat保持文件打开，以确保收割机已经完成。如果此设置导致文件由于过早从磁盘中删除而不能完全读取，请禁用此选项。

该选项默认启用。如果禁用此选项，则必须禁用clean.on_state_change.removed。

WINDOWS:如果您的WINDOWS日志旋转系统因为无法旋转文件而显示错误，请确保启用此选项。

close.reader.on_eof

警告：只有在了解数据丢失是潜在副作用的情况下才使用此选项。

启用此选项时，Filebeat将在到达文件末尾时立即关闭文件。当您的文件只写入一次，而不是不时更新时，这是非常有用的。例如，当您将每个日志事件写入一个新文件时，就会发生这种情况。默认情况下，该选项是禁用的。

评论：用户不大，特殊场景使用。目前咱们的应用日志都是追加的。

close.reader.after_interval

此功能默认启用。设置为false禁用。

prospector.scanner.recursive_glob

警告：只有在了解数据丢失是潜在副作用的情况下才使用此选项。另一个副作用是，多行事件可能在超时到期之前无法完全发送。

当这个选项被启用时，Filebeat会给每个收割机一个预定义的生命周期。无论读取器在文件中的哪个位置，读取都将在close.reader之后停止。After_interval周期已过。当您只想在旧日志文件上花费预定义的时间时，此选项对于旧日志文件非常有用。而close.reader。after_interval将在预定义的超时后关闭文件，如果文件仍在更新，Filebeat将根据定义的prospector.scanner.check_interval再次启动一个新的收割机。亲密的读者。此收割机的After_interval将再次开始计时超时。

这个选项在输出被阻塞的情况下特别有用，这使得Filebeat即使对于从磁盘删除的文件也保持打开的文件处理程序。设置close.reader。After_interval到5m确保定期关闭文件，以便操作系统可以释放它们。

如果你设置close。reader。After_interval等于ignore_older，如果在收割机关闭时被修改，文件将不会被拾取。这种设置组合通常会导致数据丢失，并且无法发送完整的文件。

当你使用close.reader时。After_interval对于包含多行事件的日志，收割机可能会在多行事件中停止，这意味着只会发送部分事件。如果收割机再次启动并且文件仍然存在，则只发送事件的第二部分。

该选项默认设置为0，这意味着它被禁用。

clean_*

选项clean_*用于清除注册表文件中的状态项。这些设置有助于减小注册表文件的大小，并可以防止潜在的inode重用问题。

clean_inactive

只有在了解数据丢失是潜在副作用的情况下才使用此选项。

启用此选项时，Filebeat将在指定的不活动时间结束后删除文件的状态。只有当Filebeat已经忽略该文件(该文件比ignore_older更老)时，该状态才能被删除。clean_inactive设置必须大于ignore_older +
prospector.scanner.Check_interval以确保在收集文件时没有状态被删除。否则，该设置可能导致Filebeat不断重新发送完整内容，因为clean_inactive删除了Filebeat仍然检测到的文件的状态。如果文件更新或再次出现，则从开始读取该文件。

clean_inactive配置选项对于减小注册表文件的大小非常有用，特别是在每天生成大量新文件的情况下。

这个配置选项对于防止Linux上inode重用导致的Filebeat问题也很有用。有关更多信息，请参见Inode重用导致Filebeat跳过行。

备注：每次重命名文件时，文件状态都会更新，clean_inactive的计数器再次从0开始。

提醒：在测试期间，您可能会注意到注册表包含应该根据clean_inactive设置删除的状态项。这是因为Filebeat在再次打开注册表以读取不同的文件之前不会删除条目。如果您正在测试clean_inactive设置，请确保将Filebeat配置为从多个文件读取，否则永远不会从注册表中删除文件状态。

clean_removed

启用此选项时，如果在磁盘上最后一个已知名称下再也找不到文件，Filebeat将从注册表中清除这些文件。这也意味着在收割机完成后重新命名的文件将被删除。该选项默认启用。

如果共享驱动器在短时间内消失并再次出现，所有文件将重新从开始读取，因为状态已从注册表文件中删除。在这种情况下，我们建议禁用clean_removed选项。

如果同时禁用了close_removed，则必须禁用此选项。

backoff.*

backoff选项指定Filebeat爬取打开文件进行更新的速度。在大多数情况下，您可以使用默认值。

backoff.init

该选项定义Filebeat在到达EOF后再次检查文件之前第一次等待的时间。回退间隔呈指数增长。缺省值是2s。因此，在2秒后检查文件，然后是4秒，然后是8秒，直到达到back .max中定义的限制。每次在文件中出现新行，回退。Init值重置为初始值。

backoff.max

Filebeat在到达EOF后再次检查文件之前等待的最大时间。在多次退出检查文件之后，等待时间将永远不会超过back - off.max。因为读取新行最多需要10秒，所以指定10秒用于回退。max意味着，在最坏的情况下，如果Filebeat多次后退，可能会向日志文件添加新行。缺省值是10秒。

要求:设置回退。Max大于等于backoff。Init，小于等于prospector.scanner。check_interval(倒扣。Init <= backoff。Max <= prospector.scanner.check_interval)。如果补偿。max需要更高，建议关闭文件处理程序，让Filebeat再次拾取文件。

---------

file_identity

可以配置不同的file_identity方法，以适应收集日志消息的环境。

在运行之间更改file_identity方法可能会导致输出中出现重复的事件。

native

Filebeat的默认行为是使用索引节点和设备id来区分文件。

file_identity.native: ~

path

要根据文件路径识别文件，请使用此策略。

警告：只有当日志文件被轮询到输入范围之外的文件夹或根本不旋转时，才使用此策略。否则就会出现重复的事件。——info.log 文件100M进行清零归档，此场景适合使用path.

警告：此策略不支持重命名文件。如果重命名了输入文件，如果新路径与输入设置匹配，Filebeat将再次读取它。——重复读取。

file_identity.path: ~

inode_marker

如果设备id经常变化，则必须使用此方法来区分文件。Windows上不支持此选项。
设置标记文件的位置，方法如下:

file_identity.inode_marker.path: /logs/.filebeat-marker

Log rotation 日志轮询

由于日志文件不断被写入，必须对它们进行旋转和清除，以防止记录器应用程序将磁盘填满。旋转是由外部应用程序完成的，因此，Filebeat需要如何与它合作的信息。

当读取旋转文件时，确保路径配置包括活动文件和所有旋转文件。

默认情况下，Filebeat能够在以下策略中正确跟踪文件:* create:在旋转时创建具有唯一名称的新活动文件* rename:旋转后的文件将被重命名

但是，对于复制截断策略，您应该向Filebeat提供额外的配置。

rotation.external.strategy.copytruncate

警告：此功能处于技术预览阶段，在未来的版本中可能会更改或删除。Elastic将尽最大努力修复任何问题，但技术预览中的功能不受官方GA功能的支持SLA的约束。

如果日志旋转应用程序复制活动文件的内容，然后截断原始文件，请使用这些选项帮助Filebeat正确读取文件。

设置选项suffix_regex，以便Filebeat能够区分活动文件和旋转文件。输入中支持两种后缀类型:数字和日期。

数字后缀

Numeric suffix

如果您旋转的文件有一个递增的索引附加到文件名的末尾，例如，活动文件apache.log和旋转文件命名为apache.log。1, apache.log。2、等，使用以下配置。

---
rotation.external.strategy.copytruncate:
  suffix_regex: \.\d$
---

日期后缀

Date suffix

如果将旋转日期附加到文件名的末尾，例如活动文件apache.log，并且旋转的文件名为apache.log-20210526, apache.log-20210527等，请使用以下配置:

---
rotation.external.strategy.copytruncate:
  suffix_regex: \-\d{6}$
  dateformat: -20060102
---

encoding

用于读取包含国际字符的数据的文件编码。请参阅W3C推荐的HTML5中使用的编码名称。

exclude_lines

正则表达式列表，用于匹配希望Filebeat排除的行。Filebeat删除列表中与正则表达式匹配的任何行。默认情况下，不删除任何行。空行将被忽略。

下面的示例将Filebeat配置为删除以DBG开头的任何行。

filebeat.inputs:
- type: filestream
  ...
  exclude_lines: ['^DBG']

include_lines

正则表达式列表，用于匹配您希望Filebeat包含的行。Filebeat仅导出列表中与正则表达式匹配的行。缺省情况下，导出所有行。空行将被忽略。

下面的示例配置Filebeat导出任何以ERR或WARN开头的行:

filebeat.inputs:
- type: filestream
  ...
  include_lines: ['^ERR', '^WARN']

注意：如果同时定义了include_lines和exclude_lines, Filebeat将首先执行include_lines，然后再执行exclude_lines。这两个选项的定义顺序并不重要。

以下示例导出包含sometext的所有日志行，以DBG(调试消息)开头的行除外:

filebeat.inputs:
- type: filestream
  ...
  include_lines: ['sometext']
  exclude_lines: ['^DBG']

buffer_size

每个收割机在获取文件时使用的缓冲区的字节大小。默认值是16384字节= 16 kb.

message_max_bytes

单个日志消息所能具有的最大字节数。message_max_bytes之后的所有字节将被丢弃，不发送。默认值是10MB(10485760)。

parsers 解析器

该选项需要日志行必须经过的解析器列表。

可用参数：

multiline
ndjson
container
syslog

在本例中，Filebeat读取由3行组成的多行消息，这些消息封装在单行JSON对象中。多行消息存储在键msg下。

filebeat.inputs:
- type: filestream
  ...
  parsers:
    - ndjson:
        target: ""
        message_key: msg
    - multiline:
        type: count
        count_lines: 3

下面将详细介绍可用的解析器设置。

parsers 解析器- multiline

控制Filebeat如何处理跨多行日志消息的选项。有关配置多线路选项的详细信息，请参阅多线路消息。

parsers 解析器-ndjson

这些选项使Filebeat能够解码JSON消息结构的日志。Filebeat逐行处理日志，因此只有当每个消息有一个JSON对象时，JSON解码才有效。

解码发生在行滤波之前。如果设置message_key选项，则可以将JSON解码与过滤结合起来。这在应用程序日志包装在JSON对象中的情况下很有帮助，比如使用Docker时。

示例：

- ndjson:
    target: ""
    add_error_key: true
    message_key: log

target

新JSON对象的名称，该对象应包含已解析的键值对。如果您将其保留为空，则新密钥将位于root目录下。
overwrite_keys
来自解码后的JSON对象的值会覆盖Filebeat通常添加的字段(类型、源、偏移等)以防冲突。如果您想保留以前添加的值，请禁用它。
expand_keys
如果启用了此设置，Filebeat将递归地去掉解码JSON中的点键，并将它们展开为层次对象结构。例如：{"a.b.c": 123}将会扩展到{“a”:{" b ":{“c”:123}}}。当输入由ECS记录器产生时，应启用此设置。
add_error_key
如果启用了此设置，Filebeat将添加一个“错误”。消息“和”错误。type: json" key用于json解组错误，或者在配置中定义了message_key但不能使用。
message_key
一个可选的配置设置，指定要对其应用行过滤和多行设置的JSON键。如果指定了该键，则该键必须位于JSON对象的顶层，并且与该键关联的值必须是字符串，否则不会发生过滤或多行聚合。
document_id
选项配置设置，指定用于设置文档id的JSON键。如果配置了，该字段将从原始JSON文档中删除，并存储在@metadata._id中
ignore_decoding_error
可选配置设置，指定是否记录JSON解码错误。如果设置为true，错误将不会被记录。默认为false。
container
使用容器解析器从容器日志文件中提取信息。它将行解析为公共消息行，还提取时间戳。

stream
只从指定的流中读取:all, stdout或stderr。默认为all。
format
在解析日志时使用给定的格式:auto、docker或cri。默认为auto，它将自动检测格式。若要禁用自动检测，请设置任何其他选项。

下面的代码片段将Filebeat配置为从默认Kubernetes日志路径下的所有容器读取标准输出流:

  paths:
    - "/var/log/containers/*.log"
  parsers:
    - container:
        stream: stdout

syslog
syslog解析器解析RFC 3146和/或RFC 5424格式的syslog消息。

支持的配置选项有:

format
(可选)使用的syslog格式:rfc3164或rfc5424。若要自动检测日志条目的格式，请将此选项设置为auto。默认为auto。
timezone
(可选)IANA时区名称。美国/纽约)或固定的时间偏移量(例如+0200)，用于解析不包含时区的syslog时间戳。可以指定Local以使用机器的本地时区。默认为Local。
log_errors
(可选)如果为true，解析器将记录syslog解析错误。默认为false。
add_error_key
(可选)如果启用了此设置，解析器将添加或追加错误。带有遇到的解析错误的消息键。默认为true。

示例：

- syslog:
    format: rfc3164
    timezone: America/Chicago
    log_errors: true
    add_error_key: true

Timestamps 时间戳

RFC 3164格式接受以下形式的时间戳:

Local timestamp (Mmm dd hh:mm:ss):

Jan 23 14:09:01
RFC-3339*:

2003-10-11T22:14:15Z
2003-10-11T22:14:15.123456Z
2003-10-11T22:14:15-06:00
2003-10-11T22:14:15.123456-06:00

注意:伴随着RFC 3164消息的本地时间戳(例如，Jan 23 14:09:01)缺少年份和时区信息。时区将使用时区配置选项进行丰富，年份将使用Filebeat系统的本地时间(考虑时区)进行丰富。因此，将来可能会出现消息。这种情况可能发生的一个例子是，2021年12月31日生成的日志在2022年1月1日被摄入。这些日志将在2022年而不是2021年充实。

RFC 5424格式接受以下形式的时间戳:

RFC-3339:

2003-10-11T22:14:15Z
2003-10-11T22:14:15.123456Z
2003-10-11T22:14:15-06:00
2003-10-11T22:14:15.123456-06:00

带有星号(*)的格式是非标准允许的。

include_message

使用include_message解析器来过滤解析器管道中的消息。匹配所提供模式的消息被传递到下一个解析器，其他的则被丢弃。

如果您想控制过滤发生的时间，您应该使用include_message而不是include_lines。Include_lines在解析器之后运行，include_message在解析器管道中运行。

patterns

要匹配的regexp模式列表。

这个例子展示了如何包含以字符串ERR或WARN开头的消息:

paths:
    - "/var/log/containers/*.log"
  parsers:
    - include_message.patterns: ["^ERR", "^WARN"]

通用配置

所有输入都支持以下配置选项。

enable

使用enabled选项启用和禁用输入。缺省情况下，enabled为true。

tags

Filebeat包含在每个已发布事件的tags字段中的标记列表。标记使得在Kibana中选择特定事件或在Logstash中应用条件过滤变得很容易。这些标记将被添加到常规配置中指定的标记列表中。

filebeat.inputs:
- type: filestream
  . . .
  tags: ["json"]

fields

可选字段，您可以指定这些字段向输出添加其他信息。例如，您可以添加用于过滤日志数据的字段。字段可以是标量值、数组、字典或它们的任何嵌套组合。默认情况下，您在这里指定的字段将分组在输出文档中的fields子字典下。要将自定义字段存储为顶级字段，请将fields_under_root选项设置为true。如果在通用配置中声明了一个重复字段，那么它的值将被这里声明的值覆盖。

filebeat.inputs:
- type: filestream
  . . .
  fields:
    app_id: query_engine_12

fields_under_root

如果此选项设置为true，则自定义字段将存储为输出文档中的顶级字段，而不是分组在字段子字典下。如果自定义字段名与Filebeat添加的其他字段名冲突，则自定义字段将覆盖其他字段。

processors

要应用于输入数据的处理器列表。

有关在配置中指定处理器的信息，请参阅处理器。

pipeline

要为此输入生成的事件设置的摄取管道ID。

备注：管道ID也可以在Elasticsearch输出中配置，但是这个选项通常会导致更简单的配置文件。如果在输入和输出中都配置了管道，则使用来自输入的选项。

keep_null

如果此选项被设置为true，则带有空值的字段将在输出文档中发布。缺省情况下，keep_null值为false。

index

如果存在，该格式化字符串将覆盖来自此输入的事件的索引(用于elasticsearch输出)，或者设置事件元数据的raw_index字段(用于其他输出)。该字符串只能引用代理名称和版本以及事件时间戳;要访问动态字段，请使用output.elasticsearch.index或处理器。

Example value: "%{[agent.name]}-myindex-%{+yyyy.MM.dd}" might expand to "filebeat-myindex-2019.11.01".

publisher_pipeline.disable_host

缺省情况下，所有事件都包含host.name。此选项可设置为true，以禁用向所有事件添加此字段。默认值为false。

Filestream input-原稿

#--------------------------- Filestream input ----------------------------
- type: filestream

  # Unique ID among all inputs, an ID is required.
  id: my-filestream-id

  # Change to true to enable this input configuration.
  enabled: false

  # Paths that should be crawled and fetched. Glob based paths.
  # To fetch all ".log" files from a specific level of subdirectories
  # /var/log/*/*.log can be used.
  # For each file found under this path, a harvester is started.
  # Make sure not file is defined twice as this can lead to unexpected behaviour.
  paths:
    - /var/log/*.log
    #- c:\programdata\elasticsearch\logs\*

  # Configure the file encoding for reading files with international characters
  # following the W3C recommendation for HTML5 (http://www.w3.org/TR/encoding).
  # Some sample encodings:
  #   plain, utf-8, utf-16be-bom, utf-16be, utf-16le, big5, gb18030, gbk,
  #    hz-gb-2312, euc-kr, euc-jp, iso-2022-jp, shift-jis, ...
  #encoding: plain


  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list. The include_lines is called before
  # exclude_lines. By default, no lines are dropped.
  # Line filtering happens after the parsers pipeline. If you would like to filter lines
  # before parsers, use include_message parser.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list. The include_lines is called before
  # exclude_lines. By default, all the lines are exported.
  # Line filtering happens after the parsers pipeline. If you would like to filter lines
  # before parsers, use include_message parser.
  #include_lines: ['^ERR', '^WARN']

  ### Prospector options

  # How often the input checks for new files in the paths that are specified
  # for harvesting. Specify 1s to scan the directory as frequently as possible
  # without causing Filebeat to scan too frequently. Default: 10s.
  #prospector.scanner.check_interval: 10s

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #prospector.scanner.exclude_files: ['.gz$']

  # Include files. A list of regular expressions to match. Filebeat keeps only the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #prospector.scanner.include_files: ['/var/log/.*']

  # Expand "**" patterns into regular glob patterns.
  #prospector.scanner.recursive_glob: true

  # If symlinks is enabled, symlinks are opened and harvested. The harvester is opening the
  # original for harvesting but will report the symlink name as source.
  #prospector.scanner.symlinks: false

  ### Parsers configuration

  #### JSON configuration

  #parsers:
    #- ndjson:
      # Decode JSON options. Enable this if your logs are structured in JSON.
      # JSON key on which to apply the line filtering and multiline settings. This key
      # must be top level and its value must be a string, otherwise it is ignored. If
      # no text key is defined, the line filtering and multiline features cannot be used.
      #message_key:

      # By default, the decoded JSON is placed under a "json" key in the output document.
      # If you enable this setting, the keys are copied to the top level of the output document.
      #keys_under_root: false

      # If keys_under_root and this setting are enabled, then the values from the decoded
      # JSON object overwrite the fields that Filebeat normally adds (type, source, offset, etc.)
      # in case of conflicts.
      #overwrite_keys: false

      # If this setting is enabled, then keys in the decoded JSON object will be recursively
      # de-dotted, and expanded into a hierarchical object structure.
      # For example, `{"a.b.c": 123}` would be expanded into `{"a":{"b":{"c":123}}}`.
      #expand_keys: false

      # If this setting is enabled, Filebeat adds an "error.message" and "error.key: json" key in case of JSON
      # unmarshaling errors or when a text key is defined in the configuration but cannot
      # be used.
      #add_error_key: false

  #### Filtering messages

  # You can filter messsages in the parsers pipeline. Use this method if you would like to
  # include or exclude lines before they are aggregated into multiline or the JSON contents
  # are parsed.

  #parsers:
    #- include_message.patterns:
      #- ["WARN", "ERR"]

  #### Multiline options

  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  #parsers:
    #- multiline:
      #type: pattern
      # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
      #pattern: ^\[

      # Defines if the pattern set under the pattern setting should be negated or not. Default is false.
      #negate: false

      # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
      # that was (not) matched before or after or as long as a pattern is not matched based on negate.
      # Note: After is the equivalent to previous and before is the equivalent to next in Logstash
      #match: after

      # The maximum number of lines that are combined to one event.
      # In case there are more than max_lines the additional lines are discarded.
      # Default is 500
      #max_lines: 500

      # After the defined timeout, a multiline event is sent even if no new pattern was found to start a new event
      # Default is 5s.
      #timeout: 5s

      # Do not add new line character when concatenating lines.
      #skip_newline: false

  # To aggregate constant number of lines into a single event use the count mode of multiline.

  #parsers:
    #- multiline:
      #type: count

      # The number of lines to aggregate into a single event.
      #count_lines: 3

      # The maximum number of lines that are combined to one event.
      # In case there are more than max_lines the additional lines are discarded.
      # Default is 500
      #max_lines: 500

      # After the defined timeout, an multiline event is sent even if no new pattern was found to start a new event
      # Default is 5s.
      #timeout: 5s

      # Do not add new line character when concatenating lines.
      #skip_newline: false

  #### Parsing container events

  # You can parse container events with different formats from all streams.

  #parsers:
    #- container:
       # Source of container events. Available options: all, stdin, stderr.
       #stream: all

       # Format of the container events. Available options: auto, cri, docker, json-file 
       #format: auto

  ### Log rotation

  # When an external tool rotates the input files with copytruncate strategy
  # use this section to help the input find the rotated files.
  #rotation.external.strategy.copytruncate:
  # Regex that matches the rotated files.
  #  suffix_regex: \.\d$
  # If the rotated filename suffix is a datetime, set it here. 
  #  dateformat: -20060102

  ### State options

  # Files for the modification data is older then clean_inactive the state from the registry is removed
  # By default this is disabled.
  #clean_inactive: 0

  # Removes the state for file which cannot be found on disk anymore immediately
  #clean_removed: true

  # Method to determine if two files are the same or not. By default
  # the Beat considers two files the same if their inode and device id are the same.
  #file_identity.native: ~

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

  # Set to true to publish fields with null values in events.
  #keep_null: false

  # By default, all events contain `host.name`. This option can be set to true
  # to disable the addition of this field to all events. The default value is
  # false.
  #publisher_pipeline.disable_host: false

  # Ignore files which were modified more then the defined timespan in the past.
  # ignore_older is disabled by default, so no files are ignored by setting it to 0.
  # Time strings like 2h (2 hours), 5m (5 minutes) can be used.
  #ignore_older: 0

  # Ignore files that have not been updated since the selected event.
  # ignore_inactive is disabled by default, so no files are ignored by setting it to "".
  # Available options: since_first_start, since_last_start.
  #ignore_inactive: ""

  # Defines the buffer size every harvester uses when fetching the file
  #harvester_buffer_size: 16384

  # Maximum number of bytes a single log event can have
  # All bytes after max_bytes are discarded and not sent. The default is 10MB.
  # This is especially useful for multiline log messages which can get large.
  #message_max_bytes: 10485760

  # Characters which separate the lines. Valid values: auto, line_feed, vertical_tab, form_feed,
  # carriage_return, carriage_return_line_feed, next_line, line_separator, paragraph_separator,
  # null_terminator
  #line_terminator: auto

  # The ingest pipeline ID associated with this input. If this is set, it
  # overwrites the pipeline option from the Elasticsearch output.
  #pipeline:

  # Backoff values define how aggressively filebeat crawls new files for updates
  # The default values can be used in most cases. Backoff defines how long it is waited
  # to check a file again after EOF is reached. Default is 1s which means the file
  # is checked every second if new lines were added. This leads to a near real time crawling.
  # Every time a new line appears, backoff is reset to the initial value.
  #backoff.init: 1s

  # Max backoff defines what the maximum backoff time is. After having backed off multiple times
  # from checking the files, the waiting time will never exceed max_backoff independent of the
  # backoff factor. Having it set to 10s means in the worst case a new line can be added to a log
  # file after having backed off multiple times, it takes a maximum of 10s to read the new line
  #backoff.max: 10s

  ### Harvester closing options

  # Close inactive closes the file handler after the predefined period.
  # The period starts when the last line of the file was, not the file ModTime.
  # Time strings like 2h (2 hours), 5m (5 minutes) can be used.
  #close.on_state_change.inactive: 5m

  # Close renamed closes a file handler when the file is renamed or rotated.
  # Note: Potential data loss. Make sure to read and understand the docs for this option.
  #close.on_state_change.renamed: false

  # When enabling this option, a file handler is closed immediately in case a file can't be found
  # any more. In case the file shows up again later, harvesting will continue at the last known position
  # after scan_frequency.
  #close.on_state_change.removed: true

  # Closes the file handler as soon as the harvesters reaches the end of the file.
  # By default this option is disabled.
  # Note: Potential data loss. Make sure to read and understand the docs for this option.
  #close.reader.on_eof: false

  # Close timeout closes the harvester after the predefined time.
  # This is independent if the harvester did finish reading the file or not.
  # By default this option is disabled.
  # Note: Potential data loss. Make sure to read and understand the docs for this option.
  #close.reader.after_interval: 0