vmalert和alertmanager使用helm部署和配置

vmalert

1、从源代码构建vmalert

git clone https://github.com/VictoriaMetrics/VictoriaMetrics
cd VictoriaMetrics
make vmalert

构建二进制文件将放置在VictoriaMetrics/bin文件夹中。

2、添加alert.rules

vim alert.rules

#rule示例
groups:
    - name: test-rule
      rules:
      - alert: 主机状态
        expr: up == 0
        for: 2m
        labels:
          status: warning
        annotations:
          summary: "{{$labels.instance}}:服务器关闭"
          description: "{{$labels.instance}}:服务器关闭"

Helm部署Prometheus Operator

使用helm安装promethues和alertmanager的配置是同一文件中。

1、helm添加仓库

#阿里云
helm  repo add aliyuncs https://apphub.aliyuncs.com

#官方
helm  repo add stable  https://kubernetes-charts.storage.googleapis.com

2、helm search prometheus-operator

helm search repo  prometheus-operator

NAME                            CHART VERSION    APP VERSION    DESCRIPTION                                       
aliyuncs/prometheus-operator    8.7.0            0.35.0         Provides easy monitoring definitions for Kubern...
stable/prometheus-operator      8.13.7           0.38.1         Provides easy monitoring definitions for Kubern...

3、安装

helm  install mypro aliyuncs/prometheus-operator 

4、查看

#helm list

NAME     NAMESPACE    REVISION    UPDATED                                    STATUS      CHART                        APP VERSION
mypro    default      1           2020-06-09 09:32:37.091220013 +0800 CST    deployed    prometheus-operator-8.7.0    0.35.0     

#helm  status mypro

NAME: mypro
LAST DEPLOYED: Tue Jun  9 09:32:37 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
The Prometheus Operator has been installed. Check its status by running:
  kubectl --namespace default get pods -l "release=mypro"

Visit https://github.com/coreos/prometheus-operator for instructions on how
to create & configure Alertmanager and Prometheus instances using the Operator.

#kubectl --namespace default get pods -l "release=mypro"

NAME                                                  READY   STATUS    RESTARTS   AGE
mypro-grafana-f5b868868-8ckgs                         2/2     Running   0          55m
mypro-prometheus-node-exporter-dg6w4                  1/1     Running   0          55m
mypro-prometheus-node-exporter-x9l4b                  1/1     Running   0          55m
mypro-prometheus-operator-operator-5b458d4659-p7t4l   2/2     Running   0          55m

5、配置ingress浏览器访问

cat grafana-ingress.yaml

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-grafana
spec:
  rules:
  - host: grafana.com
    http:
      paths:
      - backend:
          serviceName: mypro-grafana
          servicePort: 80

cat prometheus-ingress.yaml 

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-prometheus
spec:
  rules:
  - host: prometheus.com
    http:
      paths:
      - backend:
          serviceName: mypro-prometheus-operator-prometheus
          servicePort: 9090

cat alertmanager-ingress.yaml 

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-alertmanager
spec:
  rules:
  - host: alertmanager.com
    http:
      paths:
      - backend:
          serviceName: mypro-prometheus-operator-alertmanager
          servicePort: 9093

6、查看

# kubectl get ingress

NAME                   CLASS    HOSTS              ADDRESS         PORTS   AGE
ingress-alertmanager   <none>   alertmanager.com   x.x.x.x   80      29m
ingress-grafana        <none>   grafana.com        x.x.x.x   80      32m
ingress-prometheus     <none>   prometheus.com     x.x.x.x   80      30m

7、浏览器访问

访问以下几个url

http://grafana.com 
http://prometheus.com 
http://alertmanager.com

8、修改alertmanager的报警配置

(1)创建alertmanger的配置文件

# cat  alertmanger_config.yaml 

global:
  resolve_timeout: 5m # 处理超时时间,默认为5min

templates:    # 指定邮件模板的路径,可以使用相对路径,template/*.tmpl的方式
  - '/usr/local/alertmanager/template/default.tmpl'
# 定义路由树信息
route:
  group_by: [alertname]  # 报警分组依据
  receiver: ops_notify   # 设置默认接收人
  group_wait: 30s        # 最初即第一次等待多久时间发送一组警报的通知
  group_interval: 60s    # 在发送新警报前的等待时间
  repeat_interval: 1h    # 重复发送告警时间。默认1h
  routes:

  - receiver: ops_notify  # 基础告警通知人
    group_wait: 10s
    match_re:
      alertname: 实例存活告警|磁盘使用率告警   # 匹配告警规则中的名称发送

  - receiver: info_notify  # 消息告警通知人
    group_wait: 10s
    match_re:
      alertname: 内存使用率告警|CPU使用率告警|目录大小告警

# 定义基础告警接收者
receivers:
- name: ops_notify
  webhook_configs:
  - url: http://localhost:8060/dingtalk/webhook1/send
    send_resolved: true  # 警报被解决之后是否通知
#    message: '{{ template "wechat.default.message" . }}'

# 定义消息告警接收者
- name: info_notify
  webhook_configs:
  - url: http://localhost:8060/dingtalk/webhook1/send
    send_resolved: true
 #   message: '{{ template "wechat.default.message" . }}'

# 一个inhibition规则是在与另一组匹配器匹配的警报存在的条件下,使匹配一组匹配器的#警报失效的规则。两个警报必须具有一组相同的标签。
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

(2)base64编码(要使用xargs设置为一行,再用sed删掉空格)

cat  alertmanger_config.yaml |base64 

(3)替换secret中的alertmanager.yaml配置(用上面base64编码的替换)

 kubectl edit secret  alertmanager-mypro-prometheus-operator-alertmanager 

9、自定义监控报警项

prometheus-operator可以使用PrometheusRule来动态的添加自定义监控项

(1)查看prometheus-operator项目中Prometheus的标签选择器

kubectl get  prometheus mypro-prometheus-operator-prometheus  -o jsonpath={".spec.ruleSelector"};echo 

(2)创建自定义的PrometheusRule监控

# cat  PrometheusRule.yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    app: prometheus-operator   #和Prometheus中的标签选择器中的标签,如果要自己创建一个Prometheus的配置关联到PrometheusRule的labels
    release: mypro             #和Prometheus中的标签选择器中的标签,如果要自己创建一个Prometheus的配置关联到PrometheusRule的labels
    prometheus: test-example
  name: test-load1-prometheusrule 
spec:
  groups:
  - name: test-load-1
    rules:
    - alert: test-load-1
      expr: node_load1 > 1
      for: 2m
      labels:
        team: node
      annotations:
        summary: "{{$labels.instance}}: load 1 >1"
        description: "{{$labels.instance}}: job {{$labels.job}} 测试测试 负载大于1"
 
 
#导入
kubectl  apply -f PrometheusRule.yaml

(3)登录pod查看

# kubectl exec -it   prometheus-mypro-prometheus-operator-prometheus-0  sh 

kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
Defaulting container name to prometheus.
Use 'kubectl describe pod/prometheus-mypro-prometheus-operator-prometheus-0 -n default' to see all of the containers in this pod.
/prometheus $ ls /etc/prometheus/rules/prometheus-mypro-prometheus-operator-prometheus-rulefiles-0/default-test-load1-prometheusrule.yaml 
/etc/prometheus/rules/prometheus-mypro-prometheus-operator-prometheus-rulefiles-0/default-test-load1-prometheusrule.yaml
/prometheus $ cat  /etc/prometheus/rules/prometheus-mypro-prometheus-operator-prometheus-rulefiles-0/default-test-load1-prometheusrule.yaml 
groups:
- name: test-load-1
  rules:
  - alert: test-load-1
    annotations:
      description: '{{$labels.instance}}: job {{$labels.job}} 测试测试 负载大于1'
      summary: '{{$labels.instance}}: load 1 >1'
    expr: node_load1 > 1
    for: 2m
    labels:
      team: node

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值