Prometheus+钉钉告警实现对中间件和Server的监控

一.搭建Prometheus
$ wget
https://github.com/prometheus/prometheus/releases/download/v2.12.0/prometheus-2.12.0.linux-amd64.tar.gz
$ tar xf prometheus-2.12.0.linux-amd64.tar.gz && cd prometheus
$ nohup  ./prometheus --config.file=prometheus.yml --web.enable-lifecycle   &


$ vim prometheus.yml
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
       - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
   - "/data/prometheus/rule.yml"

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    static_configs:
    - targets: ['localhost:9090']
#对主机的监控
  - job_name: 'OS'
    static_configs:
    - targets: ['172.31.xxx:9100']
      labels:
        instance: 'web'

    - targets: ['172.xxx:9100']
      labels:
        instance: 'db'
#对mysql的监控
  - job_name: 'MySQL'
    static_configs:
    - targets: ['172.31.xxx:9104']
      labels:
        instance: 'db-mysql'

#对Redis的监控
  - job_name: 'Redis'
    static_configs:
    - targets: ['172.31.xxx:9121']
      labels:
        instance: 'db-redis'

 

二.Node/mysql/redis_exporter
//node_exporter
$ wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz
$ tar xf node_exporter-0.18.1.linux-amd64.tar.gz &&cd node_exporter
$ nohup ./node_exporter &

//mysql_exporter
$ wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.12.1/mysqld_exporter-0.12.1.linux-amd64.tar.gz
$ tar xf mysqld_exporter-0.12.1.linux-amd64.tar.gz &&cd mysqld_exporter-0.12.1.linux-amd64
#先在MySQL创建用户
mysql> CREATE USER 'mysql_monitor'@'localhost' identified by 'mysql_monitor';
mysql> GRANT REPLICATION CLIENT, PROCESS ON *.* TO 'mysql_monitor'@'localhost';
mysql> GRANT SELECT ON performance_schema.* TO 'mysql_monitor'@'localhost';
在mysql_exporter目录下创建.my.cnf文件
$vim .my.cnf
[client]
port=3306
user=mysql_monitor
password=mysql_monitor
#启动时指定配置文件
$ nohup ./mysqld_exporter --config.my-cnf=.my.cnf &
如果要监听多个实例
$ nohup   ./mysqld_exporter --config.my-cnf=.mycntr.cnf --web.listen-address=172.31.243.198:9105  &

//redis_exporter
$ wget https://github.com/oliver006/redis_exporter/releases/download/v0.30.0/redis_exporter-v0.30.0.linux-amd64.tar.gz
$ tar xf redis_exporter-v0.30.0.linux-amd64.tar.gz  && cd redis_exporter

#指定server和exporter程序地址
$ nohup  ./redis_exporter -redis.addr=172.31.243.198:6379 -web.listen-address 0.0.0.0:9121  &

 

 
三、alertmanager
wget https://github.com/prometheus/alertmanager/releases/download/v0.15.2/alertmanager-0.15.2.linux-amd64.tar.gz

tar xf alertmanager-0.15.2.linux-amd64.tar.gz

nohup ./alertmanager &
-------------------------------------------
cat rule.yaml
#根据情况修改,expr为promql
groups:
    - name: UnicornServerStatus
      rules:
      - alert: InstanceStatus
        expr: up == 0
        for: 2m
        labels:
          status: warning
        annotations:
          summary: "{{$labels.instance}}: has been down"
          description: "{{$labels.instance}}: job {{$labels.job}} has been down"
    - name: base-monitor-rule
      rules:
      - alert: NodeCpuUsage
        expr: (100 - (avg by (instance) (rate(node_cpu{job=~".*",mode="idle"}[2m])) * 100)) > 99
        for: 15m
        labels:
          service_name: unicornServer
          level: warning
        annotations:
          description: "{{$labels.instance}}: CPU usage is above 99% (current value is: {{ $value }}"
      - alert: NodeMemUsage
        expr: avg by  (instance) ((1- (node_memory_MemFree{} + node_memory_Buffers{} + node_memory_Cached{})/node_memory_MemTotal{}) * 100) > 90
        for: 15m
        labels:
          service_name: unicornServer
          level: warning
        annotations:
          description: "{{$labels.instance}}: MEM usage is above 90% (current value is: {{ $value }}"
      - alert: NodeDiskUsage
        expr: (1 - node_filesystem_free{fstype!="rootfs",mountpoint!="",mountpoint!~"/(run|var|sys|dev).*"} / node_filesystem_size) * 100 > 80
        for: 2m
        labels:
          service_name: unicornServer
          level: warning
        annotations:
          description: "{{$labels.instance}}: Disk usage is above 80% (current value is: {{ $value }}"
      - alert: NodeLoad15
        expr: avg by (instance) (node_load15{}) > 100
        for: 2m
        labels:
          service_name: unicornServer
          level: warning
        annotations:
          description: "{{$labels.instance}}: Load15 is above 100 (current value is: {{ $value }}"
      - alert: NodeAgentStatus
        expr: avg by (instance) (up{}) == 0
        for: 2m
        labels:
          service_name: unicornServer
          level: warning
        annotations:
          description: "{{$labels.instance}}: Node Agent is down (current value is: {{ $value }}"
      - alert: NodeProcsBlocked
        expr: avg by (instance) (node_procs_blocked{}) > 100
        for: 2m
        labels:
          service_name: unicornServer
          level: warning
        annotations:
          description: "{{$labels.instance}}: Node Blocked Procs detected!(current value is: {{ $value }}"
      - alert: NodeTransmitRate
        expr:  avg by (instance) (floor(irate(node_network_transmit_bytes{device="eth0"}[2m]) / 1024 / 1024)) > 100
        for: 2m
        labels:
          service_name: unicornServer
          level: warning
        annotations:
          description: "{{$labels.instance}}: Node Transmit Rate  is above 100MB/s (current value is: {{ $value }}"
      - alert: NodeReceiveRate
        expr:  avg by (instance) (floor(irate(node_network_receive_bytes{device="eth0"}[2m]) / 1024 / 1024)) > 100
        for: 2m
        labels:
          service_name: unicornServer
          level: warning
        annotations:
          description: "{{$labels.instance}}: Node Receive Rate  is above 100MB/s (current value is: {{ $value }}"
      - alert: NodeDiskReadRate
        expr: avg by (instance) (floor(irate(node_disk_bytes_read{}[2m]) / 1024 / 1024)) > 50
        for: 2m
        labels:
          service_name: unicornServer
          level: warning
        annotations:
          description: "{{$labels.instance}}: Node Disk Read Rate is above 50MB/s (current value is: {{ $value }}"
      - alert: NodeDiskWriteRate
     # - alert: 磁盘写入率正常
        expr: avg by (instance) (floor(irate(node_disk_bytes_written{}[2m]) / 1024 / 1024)) > 50
        for: 2m
        labels:
          service_name: unicornServer
          level: warning
        annotations:
          description: "{{$labels.instance}}: Node Disk Write Rate is above 50MB/s (current value is: {{ $value }}"
      #    description: "{{$labels.instance}}: 磁盘写入率低于50MB/s (current value is: {{ $value }}"

-------------------------------------------
cat alertmanager.yaml

global:
  resolve_timeout: 5m
receivers:
- name: "dingding.webhook"
  webhook_configs:
  - url: 'http://127.0.0.1:8060/dingtalk/ops_dingding/send'
    send_resolved: true
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'dingding.webhook'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

 

四、dingtalk
1.钉钉群新增机器人:
2.dingtalk-webhook:
//https://github.com/timonwong/prometheus-webhook-dingtalk/releases/
$ tar xf prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz
//启动
$ ./prometheus-webhook-dingtalk --ding.profile="ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=571db1bc165ffcec8d0a33246ce950xxxxxcbab4eef9b2e885a1d80b2"

//修改alertmanager.yaml  alert相关的内容,webhook url那行

 

五、Grafana
 
添加数据源--$prometheus'sIP:9090
 
Redis Template ID:11692
MySQL Template ID:6239
Node Template ID:8919
 
curl -X POST http://localhost:9090/-/reload    修改配置文件后使之生效
 
遇到任何问题请一定看各个组件的日志
 
 
 
 
 
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
一、prometheus简介 Prometheus是一个开源的系统监控告警系统,现在已经加入到CNCF基金会,成为继k8s之后第二个在CNCF维护管理的项目,在kubernetes容器管理系统中,通常会搭配prometheus进行监控prometheus支持多种exporter采集数据,还支持通过pushgateway进行数据上报,Prometheus再性能上可支撑上万台规模的集群。 二、prometheus架构图 三、prometheus组件介绍 1.Prometheus Server: 用于收集和存储时间序列数据。 2.Client Library: 客户端库,检测应用程序代码,当Prometheus抓取实例的HTTP端点时,客户端库会将所有跟踪的metrics指标的当前状态发送到prometheus server端。 3.Exporters: prometheus支持多种exporter,通过exporter可以采集metrics数据,然后发送到prometheus server端 4.Alertmanager: 从 Prometheus server 端接收到 alerts 后,会进行去重,分组,并路由到相应的接收方,发出报警,常见的接收方式有:电子邮件,微信,钉钉, slack等。 5.Grafana:监控仪表盘 6.pushgateway: 各个目标主机可上报数据到pushgatewy,然后prometheus server统一从pushgateway拉取数据。 四、课程亮点 五、效果图展示 六、讲师简介 先超(lucky):高级运维工程师、资深DevOps工程师,在互联网上市公司拥有多年一线运维经验,主导过亿级pv项目的架构设计和运维工作 主要研究方向: 1.云计算方向:容器 (kubernetes、docker),虚拟化(kvm、Vmware vSphere),微服务(istio),PaaS(openshift),IaaS(openstack)等2.系统/运维方向:linux系统下的常用组件(nginx,tomcat,elasticsearch,zookeeper,kafka等),DevOps(Jenkins+gitlab+sonarqube+nexus+k8s),CI/CD,监控(zabbix、prometheus、falcon)等 七、课程大纲
实现Prometheus钉钉告警,你需要按照以下步骤进行操作。 1. 首先,你需要下载并安装Prometheus钉钉告警插件。你可以从GitHub上找到钉钉告警插件的下载链接。使用wget命令下载插件压缩包,例如:`wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v2.1.0/prometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gz` 2. 下载完成后,解压缩插件并将其移动到指定的安装目录。使用以下命令完成这一步骤: ``` tar -xvf prometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gz -C /usr/local cd /usr/local mv prometheus-webhook-dingtalk-2.1.0.linux-amd64 prometheus-webhook-dingtalk ``` 3. 在Prometheus的配置文件中,指定告警规则并配置钉钉告警渠道。你可以使用正则表达式来匹配特定的告警名称,并将其发送到钉钉。在配置文件中添加似以下的配置: ``` routes: - receiver: 'dingding.webhook1' match_re: alertname: "Mysql.*|Memory Usage" ``` 这样,当告警名称以"Mysql"开头或者为"Memory Usage"时,将触发钉钉告警。 完成以上步骤后,Prometheus将会通过钉钉发送告警消息。确保Prometheus钉钉告警插件都已正确配置,以确保告警的正常发送。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* *2* *3* [Prometheus 通过钉钉告警](https://blog.csdn.net/shaochenshuo/article/details/126700256)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v92^chatsearchT3_1"}}] [.reference_item style="max-width: 100%"] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值