单节点prometheus

单节点搭建:

  1. zabbix server搭建过程了解
  2. 采集的数据默认在./data中,默认以2h的数据存储为一个block,https://www.ctolib.com/docs/sfile/prometheus-book/ha/prometheus-local-storage.html
  3. 告警配置如何生效?确定当前配置的告警配置哪里有问题?
    未生效原因及配置的主要点:
    • rules file中的内容是会全部显示到报警所发的内容中,在slack发送中的对link的配置是指在slack中显示报警时可以直接让关注的报警接收人点击链接进入到报警发生的位置或者你想让他看的位置
    • 对于rule file中的username是可以用中文
    • 在alertmanager.yml中关于slack的配置,api_url不加引号,channel 那么必须是指定的,否则会报错,错误如下
level=error ts=2018-10-19T08:42:36.63691218Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="cancelling noretry for \"slack\" due to unrecoverable error: unexpected status code 404"

实例:

# prometheus.yml

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets: ["localhost:9093"]

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "/usr/local/prometheus-2.4.3/rules/test.yml" #要不与promutheus.yml在同一级目录中,要不是绝对路径,相对路径无法读取

scrape_configs:

  - job_name: 'prometheus'
    static_configs:
      - targets: ['127.0.0.1:9090']
        labels:
          instance: localhost

  - job_name: 'linux'
    static_configs:
      - targets: ['127.0.0.1:9100']
        labels:
          instance: node1

      - targets: ['172.18.2.28:9090']
        labels:
          instance: node2

      - targets: ['172.18.2.28:1234']
        labels:
          instance: node3



# rules/test.yml 
groups:
- name: test
  rules:

  - alert: InstanceDown
    expr: up == 0
    for: 1m
    labels:
      severity: page
    annotations:
      description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes.'
      summary: 'Instance {{ $labels.instance }} down'
      link: 'http://172.18.2.27:9090/alerts'
      color: "#D00000"  #发送时的颜色显示,#D00000为红色
      username: "刘蓉"


#alertmanager.yml


global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.163.com:25'
  smtp_from: 'lori_liurong@163.com'
  smtp_auth_username: 'lori_liurong@163.com'
  smtp_auth_password: 'liurong199686'
  smtp_require_tls: false

route:
  group_by: ['ip','id','type']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 2h  #在发送成功的前提下,重复发报警的时间间隔
  receiver: 'liurong'

receivers:
  - name: 'liurong'
    email_configs:
      - to: 'lori_liurong@163.com'
        headers: { Subject: "[WARN] 报警邮件test" }

    slack_configs:
      - send_resolved: true
        api_url: https://hooks.slack.com/services/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  #slack中webhook的url,每个channel的webhook的url都不同
        channel: "#test-alermanager"
        text: "{{ range .Alerts }} {{ .Annotations.description}}\n {{end}} @{{ .CommonAnnotations.username}} <{{.CommonAnnotations.link}}| click here>"
        title: "{{.CommonAnnotations.summary}}"
        title_link: "{{.CommonAnnotations.link}}"
        color: "{{.CommonAnnotations.color}}"

在检测到alertmanager的计算规则时会出现当前有问题的报警,具体解释:http://blog.51cto.com/xujpxm/2055970

  1. 日志输出
    where can I find prometheus logs?
    https://github.com/prometheus/prometheus/issues/2363

    启动方式使用脚本方式启动,指定输出日志路径

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值