关于telegraf采集json数据的坑

telegraf github地址:https://github.com/influxdata/telegraf

需求背景

现需要搭建一个监控系统,需要采集服务器相关的信息和部分业务数据,采用telegraf统一去采集相关数据,业务数据场景是采集用户每次调用接口的响应时间


  1. 设计:在网关拦截器中将每次接口的调用信息,存储到json文件中

  2. telegraf去采集数据

    2.1 使用inputs节点,inputs支持很丰富的格式,点击inputs——>tail,tail适用于追加文件内容,只要文件有追加的内容,就采集数据
    在这里插入图片描述

2.2 tail相关配置, data_format 指定文件的输入类型

# Stream a log file, like the tail -f command
[[inputs.tail]]
 ## files to tail.
 ## These accept standard unix glob matching rules, but with the addition of
 ## ** as a "super asterisk". ie:
 ##   "/var/log/**.log"  -> recursively find all .log files in /var/log
 ##   "/var/log/*/*.log" -> find all .log files with a parent dir in /var/log
 ##   "/var/log/apache.log" -> just tail the apache log file
 ##
 ## See https://github.com/gobwas/glob for more examples
 ##
 files = ["/var/mymetrics.out"]
 ## Read file from beginning.
 from_beginning = false
 ## Whether file is a named pipe
 pipe = false

 ## Method used to watch for file updates.  Can be either "inotify" or "poll".
 # watch_method = "inotify"

 ## Data format to consume.
 ## Each data format has its own unique set of configuration options, read
 ## more about them here:
 ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
 data_format = "influx"

2.3 根据https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md,查看json格式的文件配置

[[inputs.file]]
  files = ["example"]

  ## Data format to consume.
  ## Each data format has its own unique set of configuration options, read
  ## more about them here:
  ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
  data_format = "json"

  ## When strict is true and a JSON array is being parsed, all objects within the
  ## array must be valid
  json_strict = true

  ## Query is a GJSON path that specifies a specific chunk of JSON to be
  ## parsed, if not specified the whole document will be parsed.
  ##
  ## GJSON query paths are described here:
  ##   https://github.com/tidwall/gjson/tree/v1.3.0#path-syntax
  json_query = ""

  ## Tag keys is an array of keys that should be added as tags.  Matching keys
  ## are no longer saved as fields. Supports wildcard glob matching.
  tag_keys = [
    "my_tag_1",
    "my_tag_2",
    "tags_*",
    "tag*"
  ]

  ## Array of glob pattern strings or booleans keys that should be added as string fields.
  json_string_fields = []

  ## Name key is the key to use as the measurement name.
  json_name_key = ""

  ## Time key is the key containing the time that should be used to create the
  ## metric.
  json_time_key = ""

  ## Time format is the time layout that should be used to interpret the json_time_key.
  ## The time must be `unix`, `unix_ms`, `unix_us`, `unix_ns`, or a time in the
  ## "reference time".  To define a different format, arrange the values from
  ## the "reference time" in the example to match the format you will be
  ## using.  For more information on the "reference time", visit
  ## https://golang.org/pkg/time/#Time.Format
  ##   ex: json_time_format = "Mon Jan 2 15:04:05 -0700 MST 2006"
  ##       json_time_format = "2006-01-02T15:04:05Z07:00"
  ##       json_time_format = "01/02/2006 15:04:05"
  ##       json_time_format = "unix"
  ##       json_time_format = "unix_ms"
  json_time_format = ""

  ## Timezone allows you to provide an override for timestamps that
  ## don't already include an offset
  ## e.g. 04/06/2016 12:41:45
  ##
  ## Default: "" which renders UTC
  ## Options are as follows:
  ##   1. Local               -- interpret based on machine localtime
  ##   2. "America/New_York"  -- Unix TZ values like those found in https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
  ##   3. UTC                 -- or blank/unspecified, will return timestamp in UTC
  json_timezone = ""

示例

Config:

[[inputs.file]]
  files = ["example"]
  json_name_key = "name"
  tag_keys = ["my_tag_1"]
  json_string_fields = ["b_my_field"]
  data_format = "json"

Input:

{
    "a": 5,
    "b": {
        "c": 6,
        "my_field": "description"
    },
    "my_tag_1": "foo",
    "name": "my_json"
}

Output:

my_json,my_tag_1=foo a=5,b_c=6,b_my_field="description"
  1. 根据配置文档配置我的telegraf.conf文件

     [[inputs.tail]]
        files = ["/opt/applog/app-gateway/app-response/*.json"]
        watch_method = "poll"
        data_format = "json"
    
    
  2. sh start.sh 启动telegraf


    然鹅,telegraf并没有采集到我要的数据,入坑啦!!!

    排查过程:

    1. ps -ef | grep telegraf 查看telegraf是否启动了 【已启动】
      在这里插入图片描述

    2. 查看日志 tail -1000f usr/nohup.out 【日志无报错信息】

    2022-02-28T07:44:51Z I! Tags enabled: host=vm-osvm77983-app
    2022-02-28T07:44:51Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"vm-osvm77983-app", Flush Interval:10s
    2022-02-28T07:46:38Z I! Starting Telegraf 1.13.4
    2022-02-28T07:46:38Z I! Loaded inputs: tail
    2022-02-28T07:46:38Z I! Loaded aggregators:
    2022-02-28T07:46:38Z I! Loaded processors:
    2022-02-28T07:46:38Z I! Loaded outputs: kafka
    2022-02-28T07:46:38Z I! Tags enabled: host=vm-osvm77983-app
    2022-02-28T07:46:38Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"vm-osvm77983-app", Flush Interval:10s
    
    1. 检查files的路径是否正常 cd /opt/applog/app-gateway/app-response/ | ls -l |grep “.json”|wc -l 【该路径下有5个文件】

在这里插入图片描述

尝试各个方式排查问题,卡住了好久,直到~~~

​ 将官网上的示例配置进去,发现居然可以,那么问题应该定位到[[inputs.tail]]的配置文件,重点看配置文件中的属性节点,发现json_string_fields属性的注释表明,如果json文件中key对应的value值是string/boolean,则必须配置对应的key,日志文件也不报错。。。

  ## Array of glob pattern strings or booleans keys that should be added as string fields.
  json_string_fields = []

查看接口调用返回响应时间的json,含有字符串的value

{
    "serviceName":"order-server",
    "apiPath":"v1/order/order-query",
    "apiAccessTime":1646035200736,
    "apiUsedTime":20,
    "year":2022,
    "month":2,
    "week":9,
    "day":59
}

解决方法,修改[[inputs.tail]]的配置如下:

 [[inputs.tail]]
    files = ["/opt/applog/app-gateway/app-response/*.json"]
     watch_method = "poll"
     json_string_fields = ["serviceName","apiPath"]
     data_format = "json"

总结:使用过程中一定要多看官方配置文件中的每个属性上方的注释文档,不要只看给的示例,忽略了一些必填的属性
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
当然!以下是一个示例的Python代码,可以用来集省级网站的招标信息,并且将数据保存为JSON格式: ```python import requests import json def fetch_tender_data(): # 设置请求头信息 headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36' } # 发送请求获取网页内容 url = '这里填写省级网站的招标信息页面URL' response = requests.get(url, headers=headers) # 解析网页内容,提取需要的数据 # 这里以示例,假设网页内容中的招标信息为一个列表,每个招标信息包含title和content属性 tender_list = [] # 解析网页内容,提取招标信息 # 这里使用你熟悉的方法,例如使用BeautifulSoup等库进行解析 for tender in tender_list: title = tender['title'] content = tender['content'] # 构建字典保存招标信息 tender_data = { 'title': title, 'content': content } # 将招标信息添加到列表中 tender_data.append(tender_data) # 将招标信息保存为JSON文件 with open('tender_data.json', 'w', encoding='utf-8') as f: json.dump(tender_list, f, ensure_ascii=False) print('数据采集完成并保存为JSON文件!') # 执行函数,开始采集数据 fetch_tender_data() ``` 请替换代码中的`url`变量为你要采集的省级网站招标信息页面的URL。同时,根据实际情况,你可能需要使用适当的解析库(例如BeautifulSoup)来解析网页内容,并提取出需要的数据。 此代码将采集到的招标信息保存为名为`tender_data.json`的JSON文件。你可以根据需要修改文件路径和文件名。 希望这个示例代码能对你有所帮助!如有任何疑问,请随时提问。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值