【监控】prometheus传统环境监控告警常用配置

这个监控很简单,不了解流程会感觉很复杂,先知道配置的先后顺序,了解整个框架后,将配置切分成多个部分,每个部分百度配置即可。主要怕不了解每层如何配置,无从下手。粗略看几本相关书籍,理解流程,按配置顺序提出问题,挨个解决的同时也搭建成功了。路跑通后开始精细化配置。百炼成钢不搭建 20 遍,不要说你学习了。

学习一个新的知识时应尽量避免完美主义,先把整个路简化的跑通,对自信心影响很大,跑通后精深研究每个技术点,最后结合生产中遇到的问题,思考每个每个功能点对你的环境的适配性,从而得到适合自己公司的配置方案。

简化图

在这里插入图片描述

服务器信息 :

节点名    IP 地址      服务名
node01  10.10.8.62   grafana   prometheus   alertmanager  node_exporter mysqld_exporter
node02  10.10.8.63   node_exporter mysqld_exporter

创建专用用户和组

groupadd monitor
useradd -MN -s /sbin/nologin monitor  -g monitor

grafana

安装

node01

#wget https://dl.grafana.com/oss/release/grafana-10.4.0.linux-amd64.tar.gz

cd /home/zcsadmin/
tar xf grafana-10.4.0.linux-amd64.tar.gz  -C /usr/local/
mv /usr/local/grafana-v10.4.0/  /usr/local/grafana
配置

node01

mkdir -p  /usr/local/grafana/data/{log,plugins,socket}

cp /usr/local/grafana/conf/defaults.ini /usr/local/grafana/conf/granfana.ini

chown -R monitor:monitor /usr/local/grafana/

sed -i 's#socket = /tmp/grafana.sock#socket = data/socket/grafana.sock#g' /usr/local/grafana/conf/granfana.ini
sed -i 's#en-US#zh-CN#g' /usr/local/grafana/conf/granfana.ini
启动

node01

cat >/usr/lib/systemd/system/grafana.service<<'EOF'
[Unit]
Description=Grafana
After=network.target 


[Service]
User=monitor
Group=monitor
Environment="GRAFANA_HOME=/usr/local/grafana"
ExecStart=/usr/local/grafana/bin/grafana-server   --config=/usr/local/grafana/conf/granfana.ini --homepath=/usr/local/grafana
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl restart grafana
systemctl status  grafana
systemctl enable  grafana

默认账号密码:admin/admin

prometheus

告警规则合集,不要手写监控规则啦,改改就用呗

https://github.com/samber/awesome-prometheus-alerts#-rules

https://samber.github.io/awesome-prometheus-alerts/

安装

node01

#wget  https://github.com/prometheus/prometheus/releases/download/v2.50.1/prometheus-2.50.1.linux-amd64.tar.gz

cd /home/zcsadmin/
tar xf prometheus-2.50.1.linux-amd64.tar.gz  -C /usr/local/
mv /usr/local/prometheus-2.50.1.linux-amd64/ /usr/local/prometheus
cd /usr/local/prometheus
配置

node01

cat >/usr/local/prometheus/prometheus.yml<<'EOF'
global:

  scrape_interval: 15s # 抓取target的时间间隔,设置为15秒,默认值为1分钟。经验值为10~60s
  evaluation_interval: 15s #Prometheus计算一条规则配置的时间间隔,设置为15秒,

alerting:
  alertmanagers:
    - static_configs:    # 静态配置Alertmanager的地址,也可以依赖服务发现动态识别
      - targets:         # 可以配置多个IP地址
        - 10.10.8.62:9093

# 添加告警规则文件
rule_files:
  - "rules/*.yml"

scrape_configs:
  # prometheus 监控
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  
  # alertmanager 监控
  - job_name: 'alertmanager'
    static_configs:
      - targets: ['localhost:9093']
  
  # linux 系统监控
  - job_name: 'node-exporter'
    static_configs:
      - targets: 
        - 'localhost:9100'


  # mysql 监控
  - job_name: 'mysqld-exporter'
    static_configs:
      - targets: 
        - localhost:3306
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        # 这里配置 mysqld_exporter 主机端口
        replacement: localhost:9104

EOF

# 创建告警规则文件
mkdir /usr/local/prometheus/rules

chown -R monitor:monitor /usr/local/prometheus/
chown -R monitor:monitor /data

检查配置

node01

/usr/local/prometheus/promtool check config /usr/local/prometheus/prometheus.yml
启动

node01

cat >/usr/lib/systemd/system/prometheus.service<<'EOF'
[Unit]
Description=Prometheus
After=network.target

[Service]
Type=simple
User=monitor
Group=monitor
ExecStart=/usr/local/prometheus/prometheus \
  --config.file "/usr/local/prometheus/prometheus.yml" \
  --web.listen-address "0.0.0.0:9090" \
  --storage.tsdb.retention=1095d \
  --web.enable-lifecycle
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl restart prometheus
systemctl status prometheus
systemctl enable prometheus

node01

# 配置检查
/usr/local/prometheus/promtool check config /usr/local/prometheus/prometheus.yml
# 重载配置
curl -X POST http://127.0.0.1:9090/-/reload

在 grafana 中配置数据源

alertmanager

安装

node01

#https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz

cd /home/zcsadmin/
tar xf alertmanager-0.27.0.linux-amd64.tar.gz -C /usr/local
mv /usr/local/alertmanager-0.27.0.linux-amd64/ /usr/local/alertmanager
配置

node01

cat  >/usr/local/alertmanager/alertmanager.yml<<'EOF'
global:
  resolve_timeout: 5m
  
  #邮箱
  smtp_smarthost: 'mail.test.com:25'
  smtp_from: 'test@test.com'
  smtp_auth_username: 'test@test.com'
  smtp_auth_password: 'test@!QAZ' 
  smtp_require_tls: false
  
  # 企业微信
  wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
  wechat_api_corp_id: 'ww2edb882dtest93222'      # 企业微信中企业ID

# 配置路由树
route:
  # group_by: ['alertname'] # 根据告警规则组名进行分组
  group_wait: 1s # 分组内第一个告警等待时间,
  group_interval: 1s # 发送新告警间隔时间
  repeat_interval: 1h # 重复告警间隔发送时间
  receiver: 'email_wechat'

# 接收人
receivers:
- name: 'email_wechat'
  # 邮箱配置
  email_configs:
  - to: 'duyuhang@inmyshow.com'
    html: '{{ template "email.html" . }}'
    send_resolved: true
    
  # 企业微信配置
  wechat_configs:
  - send_resolved: true
    api_secret: 'x7NQ305cPcR1dsdsHDSnW9oU_ioOaGqdsdsdsdsds6Oy4M'
    agent_id: '10000034'   #企微后台查询的agentid
    message: '{{ template "wechat.message" . }}'
    to_party: '57'
    to_user : "@all"

# 告警模板位置
templates:
- '/usr/local/alertmanager/templates/*.tmpl'

# 抑制规则
#inhibit_rules:
#- source_match:
#    severity: 'critical'
#  target_match:
#    severity: 'warning'
#  equal: ['alertname', 'dev', 'instance']
EOF

企业微信创建机器人:自行百度

必须配置可信 IP: https://blog.csdn.net/weixin_45385457/article/details/132278442

邮件模板

node01

# 通知模板
mkdir /usr/local/alertmanager/templates
cat >/usr/local/alertmanager/templates/email.tmpl<<'EOF'
{{ define "email.html" }}
{{ range .Alerts }}
告警主题: {{ .Annotations.summary }} <br>
故障主机: {{ .Labels.instance }} <br>
告警程序: prometheus_alert <br>
告警级别: {{ .Labels.severity }} 级 <br>
告警类型: {{ .Labels.alertname }} <br>
告警详情: {{ .Annotations.description }} <br>
触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }} <br>
{{ end }}
{{ end }}
EOF
微信模板

微信通知模板

node01

cat >/usr/local/alertmanager/templates/wechat.tmpl<<'EOF'
{{ define "wechat.message" }}
{{- if gt (len .Alerts.Firing) 0 -}}
{{- range $index, $alert := .Alerts -}}
{{- if eq $index 0 }}
告警:{{ .Labels.instance }} {{ .Annotations.summary }}
告警状态:{{   .Status }}
告警级别:{{ .Labels.severity }}
告警类型:{{ .Labels.alertname }}
故障主机:{{ .Labels.instance }}
告警主题:{{ .Annotations.summary }}
告警详情:{{ .Annotations.description }};
故障时间:{{ .StartsAt.Format "2006-01-02 15:04:05" }}
{{- end }}
{{- end }}
{{- end }}

{{- if gt (len .Alerts.Resolved) 0 -}}
{{- range $index, $alert := .Alerts -}}
{{- if eq $index 0 }}
恢复:{{ .Labels.instance }} {{ .Annotations.summary }}
告警类型:{{ .Labels.alertname }}
告警状态:{{ .Status }}
告警主题:{{ .Annotations.summary }}
告警详情:{{ .Annotations.description }};
故障时间:{{ .StartsAt.Format "2006-01-02 15:04:05" }}
恢复时间:{{ .EndsAt.Format "2006-01-02 15:04:05" }}
{{- if gt (len $alert.Labels.instance) 0 }}
实例信息:{{ $alert.Labels.instance }}
{{- end }}
{{- end }}
{{- end }}
{{- end }}
{{- end }}
EOF
chown -R monitor:monitor /usr/local/alertmanager/
启动

node01

cat >/usr/lib/systemd/system/alertmanager.service<<'EOF'
[Unit]
Description=alertmanager
After=network.target 

[Service]
User=monitor
Group=monitor
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF
chown -R monitor:monitor /usr/local/alertmanager/
systemctl daemon-reload
systemctl restart alertmanager
systemctl status  alertmanager
systemctl enable  alertmanager

granfana 配置数据源

node_exporter

需要安装在每个需要监控的服务器上。

使用node_exporter进行 linux 系统监控,在 prometheus配置文件中添加node_exporter,grafana 导入模板即可,

安装

node01 node02

#wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz

cd /home/zcsadmin/
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz

tar xf node_exporter-1.7.0.linux-amd64.tar.gz -C /usr/local/
mv /usr/local/node_exporter-1.7.0.linux-amd64  /usr/local/node_exporter
启动

node01 node02

cat >/usr/lib/systemd/system/node_exporter.service<<'EOF'
[Unit]
Description=node_exporter
After=network.target 

[Service]
ExecStart=/usr/local/node_exporter/node_exporter
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl restart node_exporter
systemctl status  node_exporter
systemctl enable  node_exporter
配置

granfana 导入模板地址:

https://grafana.com/grafana/dashboards/1860-node-exporter-full/

告警规则

node01 node02

cd  /usr/local/prometheus/rules && \
wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/master/dist/rules/host-and-hardware/node-exporter.yml

# 重载配置
curl -X POST http://127.0.0.1:9090/-/reload
验证

node01 node02

curl 'http://localhost:9100/metrics' |grep cpu

mysqld_exporter

不需要安装在每个需要监控的服务器上,流程如下:

  1. 在 prometheus 服务器上安装mysqld_exporter
  2. 配置统一的mysql用户密码连接文件
  3. 在需要监控的mysql 实例中创建对应的账号密码,注意:账号必须可以在prometheus服务器上连接
  4. 开通防火墙规则
安装

node01

#wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.15.1/mysqld_exporter-0.15.1.linux-amd64.tar.gz

cd /home/zcsadmin/
tar xf mysqld_exporter-0.15.1.linux-amd64.tar.gz -C /usr/local/
mv /usr/local/mysqld_exporter-0.15.1.linux-amd64  /usr/local/mysqld_exporter
启动

node01

cat >/usr/lib/systemd/system/mysqld_exporter.service<<'EOF'
[Unit]
Description=mysqld_exporter
After=network.target 

[Service]
ExecStart=/usr/local/mysqld_exporter/mysqld_exporter --config.my-cnf=/usr/local/mysqld_exporter/config.my.cnf
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl restart mysqld_exporter
systemctl status  mysqld_exporter
systemctl enable  mysqld_exporter
配置

安装测试 mysql

node01 node02

yum install -y mariadb
systemctl start mariadb 

客户端 需要在对应的 MySQL 实例中创建账号

node01

# 数据库创建账号
create user exporter@'10.10.8.62' identified by 'exportertest';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'10.10.8.62';

node01

# 创建mysqld_exporter 连接 mysql 配置文件
cat >/usr/local/mysqld_exporter/config.my.cnf<<'EOF'
[client]
user = exporter
password = exportertest
EOF

node01

cat >/usr/local/prometheus/prometheus.yml<<'EOF'
global:

  scrape_interval: 15s # 抓取target的时间间隔,设置为15秒,默认值为1分钟。经验值为10~60s
  evaluation_interval: 15s #Prometheus计算一条规则配置的时间间隔,设置为15秒,

alerting:
  alertmanagers:
    - static_configs:    # 静态配置Alertmanager的地址,也可以依赖服务发现动态识别
      - targets:         # 可以配置多个IP地址
        - 10.10.8.62:9093

# 添加告警规则文件
rule_files:
  - "rules/*.yml"


scrape_configs:
  # prometheus 监控
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  
  # alertmanager 监控
  - job_name: 'alertmanager'
    static_configs:
      - targets: ['localhost:9093']
  
  # linux 系统监控
  - job_name: 'node-exporter'
    static_configs:
      - targets: 
        - 'localhost:9100'
        - '10.10.8.63:9100'

  # mysql 监控
  - job_name: 'mysqld-exporter'
      params:
      # 不需要。将值匹配到配置文件中的子项。默认值为 “client”。
      auth_module: [client.servers]
    static_configs:
      - targets: 
        - localhost:3306
        - 10.10.8.63:3306 # 添加一行 有新的实例 往下加就行了

    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        # 这里配置 mysqld_exporter 主机端口
        replacement: localhost:9104
EOF

告警规则

node01

cd  /usr/local/prometheus/rules && \
wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/master/dist/rules/mysql/mysqld-exporter.yml

# 修改权限
chown -R monitor:monitor /usr/local/prometheus/

# 检查配置
/usr/local/prometheus/promtool check config /usr/local/prometheus/prometheus.yml

# 重载配置
curl -X POST http://127.0.0.1:9090/-/reload

grafana 导入仪表板 ID: 7362

验证

node01

curl 'http://localhost:9104/metrics' |grep mysql
curl 'http://10.10.8.63:9104/metrics' |grep mysql

自动发现

监控传统环境不需要自动发现,也不好用,直接配置文件也能满足,如果要用的话可以配置一下基于文件的方式,如果使用 k8s 可以去学习一下Consul

安全相关

grafana 配置 https
mkdir  /usr/local/grafana/certificate
cd /usr/local/grafana/certificate

openssl req -newkey rsa:2048 -nodes -keyout key.pem -x509 -days 3650 -out certificate.pem  # 一路回车
vim /usr/local/grafana/conf/granfana.ini
protocol = https
cert_file = /usr/local/grafana/certificate/certificate.pem
cert_key = /usr/local/grafana/certificate/key.pem
systemctl restart grafana.service 
systemctl status  grafana.service 
Prometheus 配置用户密码

配置后需要重新配置 grafana 的数据源里的链接信息

使用 htpasswd 工具生成密码

# 安装 htpasswd 工具
yum install httpd-tools -y

# 执行命令 我这里密码为 admintest
htpasswd -nBC 12 '' | tr -d ':\n'
New password:  
Re-type new password: 

# 加密的密码
$2y$12$NHyeXrePI1gUx/kAHLNfn.H6sizsTgIer/ishuh/cdczmntUJ3Ywm

配置 web 用户密码

cat >/usr/local/prometheus/web-config.yml<<'EOF'
basic_auth_users:
    admin: $2y$12$NHyeXrePI1gUx/kAHLNfn.H6sizsTgIer/ishuh/cdczmntUJ3Ywm
EOF

修改prometheus配置添加 basic_auth

vim /usr/local/prometheus/prometheus.yml
scrape_configs:
  # prometheus 监控
  - job_name: 'prometheus'
    basic_auth:
      username: admin         # 账号为 admin
      password: admintest     # 密码为 admintest
    static_configs:
      - targets: ['localhost:9090']

修改启动配置

cat >/usr/lib/systemd/system/prometheus.service<<'EOF'
[Unit]
Description=Prometheus
After=network.target

[Service]
Type=simple
User=monitor
Group=monitor
ExecStart=/usr/local/prometheus/prometheus \
  --config.file "/usr/local/prometheus/prometheus.yml" \
  --web.listen-address "0.0.0.0:9090" \
  --web.config.file=/usr/local/prometheus/web-config.yml \
  --storage.tsdb.retention=1095d \
  --web.enable-lifecycle 
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl restart prometheus
systemctl status prometheus
systemctl enable prometheus

标签的应用和分类

在配置 targets 时,可以定义标签

vim /usr/local/prometheus/prometheus.yml
- job_name: 'example'
  static_configs:
    - targets: ['server:9100']
      labels:  # 定义标签
        environment: 'production'

实际应用:

在告警规则文件中,根据标签来区别告警的严重等级

vim /usr/local/prometheus/rules/test.yml
groups:
- name: example-alerts
  rules:
  - alert: HighHttpRequests
    expr: http_requests_total{job="example", instance="example-instance"} > 100
    for: 5m
    labels:
      severity: critical  # 根据 severity 标签的不同值,来配置告警
    annotations:
      summary: "High HTTP Requests"
      description: "The number of HTTP requests is high on example-instance"

在告警时使用 route 里的 group_by 来区分不同的告警发送至哪个 receivers 内

vim  /usr/local/alertmanager/alertmanager.yml
route:
  group_by: ['alertname', 'severity']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'sms-critical'

receivers:
- name: 'sms-critical'
  webhook_configs:
  - url: 'https://your-sms-provider/api/send'
    send_resolved: true
    http_config:
      bearer_token: 'your-bearer-token'

route:
  routes:
  - match:
      severity: 'critical'
    receiver: 'sms-critical'

总结

在生产环境使用 prometheus 监控时,要充分利用标签的功能,对不同的环境不同作用的机器制定不同的告警规则,避免出现告警过多导致的漏处理。要严格把控安全问题,防止信息的泄露。

  • 12
    点赞
  • 30
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
在使用Prometheus监控MySQL并设置告警时,可以采用以下步骤: 1. 首先,确保已经安装并启动了node_exporter来监控操作系统。这可以通过在被监控系统上安装node_exporter并启动来实现\[2\]。 2. 其次,为MySQL创建一个专门的用户,并授予适当的权限。 3. 安装并启动mysqld_exporter来监控MySQL数据库\[2\]。可以将mysqld_exporter与MySQL主容器分开运行,或者将它们放在同一个镜像中运行\[1\]。 4. 配置Prometheus配置文件prometheus.yml,将mysqld_exporter添加为一个目标。这样Prometheus就能够收集MySQL的指标数据。 5. 在Prometheus中设置告警规则,以便在满足特定条件时触发告警。可以根据需要设置不同的告警规则,例如检测MySQL连接数超过阈值、慢查询等。 6. 重启Prometheus服务,使配置生效\[3\]。 7. 最后,通过Web界面查看Prometheus监控数据,并确保告警设置正确。 通过以上步骤,你可以使用Prometheus监控MySQL并设置相应的告警规则。这样,当MySQL出现异常情况时,你将能够及时收到告警通知。 #### 引用[.reference_title] - *1* [Prometheus 的云上 MySQL 监控实践](https://blog.csdn.net/qq_40907977/article/details/108853910)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^koosearch_v1,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* [使用prometheus监控mysql服务](https://blog.csdn.net/lee_yanyi/article/details/120363769)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^koosearch_v1,239^v3^insert_chatgpt"}} ] [.reference_item] - *3* [prometheus服务监控mysql监控](https://blog.csdn.net/qq_42499737/article/details/118576231)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^koosearch_v1,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值