telegraf插件文件夹监控解析文件

最新推荐文章于 2024-10-09 21:15:29 发布

糕手慕辰

最新推荐文章于 2024-10-09 21:15:29 发布

阅读量414

点赞数 4

分类专栏： Telegraf 文章标签： linux 服务器

本文链接：https://blog.csdn.net/weixin_44928129/article/details/141960215

版权

Telegraf 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

这个插件是监控文件夹中文件，进行解析，解析完remove到指定目录。我这块用于解析csv文件，其他类型自行探索。

贴一下配置信息，下面有解释

[[inputs.directory_monitor]]
  ##指定表名
  name_override = "four_base_test"
  ## The directory to monitor and read files from (including sub-directories if "recursive" is true).
  directory = "/mydata/Telegraf/input"
  #
  ## The directory to move finished files to (maintaining directory hierarchy from source).
  finished_directory = "/mydata/Telegraf/temp"
  #
  ## Setting recursive to true will make the plugin recursively walk the directory and process all sub-directories.
  recursive = false
  #
  ## The directory to move files to upon file error.
  ## If not provided, erroring files will stay in the monitored directory.
  error_directory = "/mydata/Telegraf/errorfile"
  #
  ## The amount of time a file is allowed to sit in the directory before it is picked up.
  ## This time can generally be low but if you choose to have a very large file written to the directory and it's potentially slow,
  ## set this higher so that the plugin will wait until the file is fully copied to the directory.
  directory_duration_threshold = "100ms"
  #
  ## A list of the only file names to monitor, if necessary. Supports regex. If left blank, all files are ingested.
  files_to_monitor = ["^.*\\.csv"]
  #
  ## A list of files to ignore, if necessary. Supports regex.
  files_to_ignore = [".DS_Store"]
  #
  ## Maximum lines of the file to process that have not yet be written by the
  ## output. For best throughput set to the size of the output's metric_buffer_limit.
  ## Warning: setting this number higher than the output's metric_buffer_limit can cause dropped metrics.
  # max_buffered_metrics = 10000
  #
  ## The maximum amount of file paths to queue up for processing at once, before waiting until files are processed to find more files.
  ## Lowering this value will result in *slightly* less memory use, with a potential sacrifice in speed efficiency, if absolutely necessary.
  # file_queue_size = 100000
  #
  ## Name a tag containing the name of the file the data was parsed from.  Leave empty
  ## to disable. Cautious when file name variation is high, this can increase the cardinality
  ## significantly. Read more about cardinality here:
  ## https://docs.influxdata.com/influxdb/cloud/reference/glossary/#series-cardinality
  # file_tag = ""
  #
  ## Specify if the file can be read completely at once or if it needs to be read line by line (default).
  ## Possible values: "line-by-line", "at-once"
  # parse_method = "line-by-line"
  csv_header_row_count = 1
  #
  ## The dataformat to be read from the files.
  ## Each data format has its own unique set of configuration options, read
  ## more about them here:
  ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
  data_format = "csv"

其中 name_override = "four_base_test" 用于指定表名，别的插件应该也可以用

directory:
- 说明: 需要监控和读取文件的源目录路径。
- 示例: "/path/to/source/directory"
finished_directory:
- 说明: 文件处理完后，将它们移动到的目标目录。这会保持源目录的目录层次结构。
- 示例: "/path/to/finished/directory"
recursive:
- 说明: 如果设置为 true，插件将递归地遍历指定目录及其子目录以处理所有文件。
- 默认值: false
- 示例: true
error_directory:
- 说明: 如果处理文件时发生错误，将它们移动到该目录。如果未指定，错误文件将保留在监控目录中。
- 示例: "/path/to/error/directory"
directory_duration_threshold:
- 说明: 文件在目录中允许存在的时间，以确保文件已完全写入。时间可以设置得较低，但对于大型文件，可以设置较高的值以等待文件完全写入。
- 默认值: "50ms"
- 示例: "100ms"
files_to_monitor:
- 说明: 仅监控符合指定正则表达式的文件名。如果留空，将监控所有文件。
- 示例: ["^.*\\.csv"] (监控所有以 .csv 结尾的文件)
files_to_ignore:
- 说明: 需要忽略的文件名列表，支持正则表达式。
- 示例: [".DS_Store"] (忽略 .DS_Store 文件)
max_buffered_metrics:
- 说明: 处理文件时，最大允许的未写入输出的行数。为了最佳吞吐量，设置为输出的 metric_buffer_limit 的大小。
- 默认值: 10000
- 示例: 5000
file_queue_size:
- 说明: 在处理完当前文件之前，最多允许排队处理的文件路径数量。较低的值将减少内存使用，但可能会影响速度。
- 默认值: 100000
- 示例: 50000
file_tag:
- 说明: 为文件数据添加的标签，标签值是文件名。留空以禁用。文件名变异性较高时，这可能会显著增加卡迪纳利性（cardinality）。
- 示例: "filename"
parse_method:
- 说明: 指定文件的读取方式。可能的值包括 "line-by-line" 和 "at-once"。
- 默认值: "line-by-line"
- 示例: "at-once"
data_format:
- 说明: 读取文件的数据格式。支持多种数据格式，每种格式有其特定的配置选项。
- 示例: "influx" (表示文件格式符合 InfluxDB 的格式)