KubeSphere 告警系统启用--日志告警系统ElastAlert2部署

KubeSphere 告警系统

日志告警系统ElastAlert2部署

1、下载安装包

将elastalert2存储库添加到Helm配置中
helm repo add elastalert2 https://jertel.github.io/elastalert2/
查询 elastalert2资源
helm search repo elastalert2

下载安装包至本地,chart的版本2.19.0 (app version 2.19.0)

$ helm pull elastalert2/elastalert2  --version=2.19.0

# 解压缩安装包
$ tar xf elastalert2-2.19.0.tgz

# 修改values.yaml文件
$ cd elastalert2
$ cp values.yaml values.yaml.bak 
$ vim values.yaml

# 查看配置文件
$ grep -Ev "$^|#" values.yaml

# 查看集群 storageclasses
$ kubectl get sc

2、修改配置文件

elasticsearch:
  # elasticsearch endpoint e.g. (svc.namespace||svc)
  host: opensearch-cluster-master
  # elasticsearch port
  port: 9200
  # whether or not to connect to es_host using TLS
  useSsl: "true"
  # Username if authenticating to ES with basic auth
  username: "admin"
  # Password if authenticating to ES with basic auth
  password: "admin"
  # Specifies an existing secret to be used for the ES username/password
  credentialsSecret: ""
  # The key in elasticsearch.credentialsSecret that stores the ES password
  credentialsSecretUsernameKey: ""
  # The key in elasticsearch.credentialsSecret that stores the ES username
  credentialsSecretPasswordKey: ""
  # whether or not to verify TLS certificates
  verifyCerts: "false"

钉钉告警规则配置

rules: 
  ding_talk: |-
    ---
    name: dingtalk-alert     # 规则名字,唯一值
    type: any                #所有类型
    index: ks-*               #要搜索的索引
    num_events: 1  
    timeframe:
      minutes: 1   #1分钟内,统计个数大于等于1个触发报警
    filter:
    - query:
        query_string:
          query: "ERROR"  #key:value格式,匹配错误日志
    alert:
    - "dingtalk"  #钉钉模块
    dingtalk_msgtype: "text"             #发消息内容
    dingtalk_access_token: "......"  #钉钉token
    alert_text_type: alert_text_only 
    alert_text: |  #和下面匹配key:value
     日志监控
     报错信息!!!
     time:{}
     log:{}
     num_hits:{}
     num_matches:{}
     kubernetes.pod_name:{}
     kubernetes.namespace_name:{}
     kubernetes.container_name:{}
     kubernetes.container_image:{}
    alert_text_args:
    - "@timestamp"
    - log
    - num_hits
    - num_matches
    - kubernetes.pod_name
    - kubernetes.namespace_name
    - kubernetes.container_name
    - kubernetes.container_image

在部署到k8s集群之前,Helm可以找到本地规则的文件夹,可将规则文件放在。elastalert2/rules目录下

然后修改配置

rootRulesFolder: "rules"
enabledRules: ["deadman_slack", "deadman_pagerduty"]

# specifies the rules volume to be used
rulesVolumeName: "rules"

# additional rule configurations e.g. (http://elastalert2.readthedocs.io/en/latest/)
rules: 
  # ding_talk: |-
  #   ---
  #   name: dingtalk-alert     # 规则名字,唯一值
  #   type: any                #所有类型
  #   index: ks-*               #要搜索的索引
  #   num_events: 1  
  #   timeframe:
  #     minutes: 1   #1分钟内,统计个数大于等于1个触发报警
  #   filter:
  #   - query:
  #       query_string:
  #         query: "ERROR"  #key:value格式,匹配错误日志
  #   alert:
  #   - "dingtalk"  #钉钉模块
  #   dingtalk_msgtype: "text"             #发消息内容
  #   dingtalk_access_token: ""  #钉钉加签
  #   alert_text_type: alert_text_only 
  #   alert_text: |  #和下面匹配key:value
  #    日志监控
  #    报错信息!!!
  #    time:{}
  #    log:{}
  #    num_hits:{}
  #    num_matches:{}
  #    kubernetes.pod_name:{}
  #    kubernetes.namespace_name:{}
  #    kubernetes.container_name:{}
  #    kubernetes.container_image:{}
  #   alert_text_args:
  #   - "@timestamp"
  #   - log
  #   - num_hits
  #   - num_matches
  #   - kubernetes.pod_name
  #   - kubernetes.namespace_name
  #   - kubernetes.container_name
  #   - kubernetes.container_image

默认情况下,'rules’文件夹必须位于chart目录的根目录中。这个设置将覆盖rules 和 secretRulesName值。同样,这些规则仅在chart部署(安装)到集群时读取。
规则设置可参考官方文档:
https://elastalert2.readthedocs.io/en/latest/

3、安装elastalert2

# 进入elastalert2的上级目录执行
helm install elastalert2-helm elastalert2/ -n kubesphere-logging-system
# 更新
helm upgrade elastalert2-helm elastalert2/ -n kubesphere-logging-system

4、查看部署的elastalert2

$ kubectl describe pod -n kubesphere-logging-system
$ kubectl get pod -n kubesphere-logging-system
$ helm -n kubesphere-logging-system list
$ kubectl -n kubesphere-logging-system get pods -l app.kubernetes.io/name=elastalert2

5、部署成功后规则修改

登录kubesphere找到部署的elastalert容器组->资源状态->卷->rules,对yaml或设置进行修改保存即可

按照官方文档描述的方法启用告警系统

https://kubesphere.io/zh/docs/v3.4/pluggable-components/alerting/

遇到的问题以及解决方案

1、启用告警系统后在集群管理页面遇到问题

查看pod发现thanos-ruler-kubesphere一直重启

3.4.x 告警存在的bug,在 “定制资源定义中” 搜索 “ThanosRuler” 然后将
alertmanagersUrl:
- 'dnssrv+http://alertmanager-operated.kubesphere-monitoring-system.svc:9093'
修改为
- 'http://alertmanager-operated.kubesphere-monitoring-system.svc:9093'

2、以及卸载告警系统后遇到kubesphere登录不上的问题

报错信息是证书发放机构问题:

W0812 07:06:18.190907       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
W0812 07:06:18.192455       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
W0812 07:06:18.204313       1 metricsserver.go:238] Metrics API not available.
W0812 07:06:18.204335       1 cache.go:64] In-memory cache will be used, this may cause data inconsistencies when running with multiple replicas.
I0812 07:06:18.204484       1 interface.go:50] start helm repo informer
I0812 07:06:18.576824       1 apiserver.go:428] Start cache objects
W0812 07:06:19.778709       1 reflector.go:424] pkg/client/informers/externalversions/factory.go:129: failed to list *v2beta1.Receiver: conversion webhook for notification.kubesphere.io/v2beta2, Kind=Receiver failed: Post "https://notification-manager-webhook.kubesphere-monitoring-system.svc:443/convert?timeout=30s": x509: certificate signed by unknown authority
E0812 07:06:19.778791       1 reflector.go:140] pkg/client/informers/externalversions/factory.go:129: Failed to watch *v2beta1.Receiver: failed to list *v2beta1.Receiver: conversion webhook for notification.kubesphere.io/v2beta2, Kind=Receiver failed: Post "https://notification-manager-webhook.kubesphere-monitoring-system.svc:443/convert?timeout=30s": x509: certificate signed by unknown authority
W0812 07:06:20.661502       1 reflector.go:424] pkg/client/informers/externalversions/factory.go:129: failed to list *v2beta1.Receiver: conversion webhook for notification.kubesphere.io/v2beta2, Kind=Receiver failed: Post "https://notification-manager-webhook.kubesphere-monitoring-system.svc:443/convert?timeout=30s": x509: certificate signed by unknown authority

解决方法

caBundle=$(kubectl get validatingWebhookConfiguration notification-manager-validating-webhook -o jsonpath=‘{.webhooks[0].clientConfig.caBundle})
kubectl edit crd configs.notification.kubesphere.io,$caBundle的值填入到spec.conversion.webhook.clientConfig.caBundle
kubectl edit crd receivers.notification.kubesphere.io;$caBundle的值填入到spec.conversion.webhook.clientConfig.caBundle

参考链接:
https://ask.kubesphere.io/forum/d/23373-341-thanos-ruler-kubesphere-feng-kuang-zhong-qi
https://ask.kubesphere.io/forum/d/2200-kubesphere-monitoring-system-alertmanager-main

  • 3
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值