Grafana部署与应用

参考:https://grafana.com/docs/grafana/latest/installation

文章链接https://www.cnblogs.com/suim1218/p/11358678.html

grafana我们这里采用docker方式部署,Docker安装,可参考官方文档

1.下载

# docker pull grafana/grafana-enterprise:8.3.0

1.1在宿主机中创建数据⽬录

# mkdir -p /opt/grafana-data/

1.2.临时启动grafana

# docker run -d --rm --name grafana grafana/grafana-enterprise:8.3.0

1.3.查看grafana的⽤户id

# docker exec -it grafana 

# cat /etc/passwd 

输出

grafana:x:472:0:Linux User,,,:/home/grafana:/sbin/nologin 

可以看出grafana的⽤户id是472

1.4.拷出配置⽂件

# docker cp grafana:/etc/grafana/grafana.ini /opt/grafana-data/etc/ 

# chown -R 472 /opt/grafana-data

1.5.停⽌临时的容器

# docker stop grafana

2.启动

2.1命令行启动

docker run -d -- user 104 --name grafana -p 3000:3000 -v /opt/grafana-data/etc:/etc/grafana/ -v /opt/grafana-data/:/var/lib/grafana grafana/grafana-enterprise:8.3.0

2.2docker-compose启动

docker-compose.yaml

cat > docker-compose.yaml << EOF
version: '2'
services:
  grafana:
    image: grafana/grafana-enterprise:8.3.0
    user: '104'
    ports:
      - 3000:3000
    volumes: 
      - /opt/grafana-data/etc:/etc/grafana
      - /opt/grafana-data/:/var/lib/grafana

EOF    

3.浏览器访问

打开浏览器,访问http://192.168.229.139:3000,用户名密码:admin,如下图所示

4.为Grafana添加Prometheus数据源

img
img

URL输入服务端地址
img

5. 创建Dashboard

img

img

6.添加CPU使用率图形

Metrics输入

100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

img

7.添加内存使用率图形

Metrics输入

(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes ))* 100

img

8.添加磁盘使用率图形

Metrics输入

(1-(node_filesystem_free_bytes{fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"})) * 100

img

9.添加网络出使用率图形

Metrics输入

irate(node_network_transmit_bytes_total{device='ens33'}[5m])

img

10.查看完整Dashboard

img

Promethues告警rules

#rules.linux.yml
groups:
- name: Node-Alert
  rules:
  - alert: Instance-Down #告警名称
    expr: up == 0
    for: 1m #持续多久后发送
    labels:
      severity: warning
    annotations: #信息
      summary: "Instance {{$labels.instance}} down"
      description: "{{$labels.instance}}: job {{$labels.job}} has been down for more than 1 minutes."

  - alert: "内存使用率过高"
    expr: round(100- node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes*100) > 80
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "{{ $labels.instance }}内存使用率过高"
      description: "{{ $labels.instance }}当前使用率{{ $value }}%"

  - alert: "CPU使用率过高"
    expr: round(100 - ((avg by (instance,job)(irate(node_cpu_seconds_total{mode="idle",instance!~'bac-.*'}[5m]))) *100)) > 85
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "{{ $labels.instance }}CPU使用率过高"
      description: "{{ $labels.instance }}当前使用率{{ $value }}%"

  - alert: "磁盘使用率过高"
    expr: round(100-100*(node_filesystem_avail_bytes{fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"})) > 80
    for: 15s
    labels:
      severity: warning
    annotations:
      summary: "{{ $labels.instance }}磁盘使用率过高"
      description: "{{ $labels.instance }}当前磁盘{{$labels.mountpoint}} 使用率{{ $value }}%"

  - alert: "分区容量过低"
    expr: round(node_filesystem_avail_bytes{fstype=~"ext4|xfs",instance!~"testnode",mountpoint!~"/boot.*"}/1024/1024/1024) < 10
    for: 15s
    labels:
      severity: warning
    annotations:
      summary: "{{ $labels.instance }}分区容量过低"
      description: "{{ $labels.instance }}当前分区为“{{$labels.mountpoint}} ” 剩余容量{{ $value }}GB"

  - alert: "网络流出速率过高"
    expr: round(irate(node_network_receive_bytes_total{instance!~"data.*",device!~'tap.*|veth.*|br.*|docker.*|vir.*|lo.*|vnet.*'}[1m])/1024) > 2048
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "{{ $labels.instance }}网络流出速率过高"
      description: "{{ $labels.instance }}当前速率{{ $value }}KB/s"

alertmanager.yml

global:
  resolve_timeout: 5m
  smtp_smarthost: "smtp.163.com:25"
  smtp_from: "wsl.********@163.com"
  smtp_auth_username: "wsl.********@163.com"
  smtp_auth_password: "HW**********KGM"
  smtp_require_tls: false
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1m
  #receiver: 'web.hook'
  receiver: 'mail'
receivers:
#- name: 'web.hook'
#  webhook_configs:
#  - url: 'http://127.0.0.1:5001/'
- name: "mail"
  email_configs:
  - to: "645******08@qq.com"

Email configuration

咱们这里选择 email 告警,首先修改 grafana 的配置文件 /etc/grafana/grafana.ini,找到 SMTP 部分,修改成以下:node

[smtp]
;enabled = false
enabled = true
;host = localhost:25
host = smtp.exmail.qq.com:25
;user =
user = notice@wzlinux.com
# If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;"""
;password =
password = Q7P1hsdfsenzzyM
;cert_file =
;key_file =
;skip_verify = false
;from_address = admin@grafana.localhost
from_address = notice@wzlinux.com
from_name = Grafana
# EHLO identity in SMTP dialog (defaults to instance_name)
;ehlo_identity = dashboard.example.com

修改完成后,重启 grafana。linux

Notification channels

登录到 grafana 中,建立一个通知渠道,测试一下,看看是否收到邮件。
在这里插入图片描述

  • Name - Enter a name for this channel. It will be displayed when users add notifications to alert rules.
  • Type - Select the channel type. Refer to the List of supported notifiers for details.
  • Default (send on all alerts) - When selected, this option sends a notification on this channel for all alert rules.
  • Include Image - See Enable images in notifications for details.
  • Disable Resolve Message - When selected, this option disables the resolve message [OK] that is sent when the alerting state returns to false.
  • Send reminders - When this option is checked additional notifications (reminders) will be sent for triggered alerts. You can specify how often reminders should be sent using number of seconds (s), minutes (m) or hours (h), for example 30s, 3m, 5m or 1h.

Create alerts

grafana 不支持带有变量的模板报警,因此咱们须要建立一个不带有变量的模板,这个咱们能够去官方的 dashboard 中找一个,我这里找到编号为5984,你们能够安装这个模板。
在这里插入图片描述

我在里面简单修改了一下格式,以及数据源,主要是针对 CPU 的负载修改了一下,其余资源默承认以使用。测试

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-zodkMmSh-1656907272546)(https://resource.shangmayuan.com/droxy-blog/2021/01/10/51cca8464afd49049fbf8c83d520c5c9-1.jpg)]

我这边已经测试过了,没有什么问题,报警均可以正常运行,我把 promSQL 贴出来this

CPU:

100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

legend 设置为{{instance}}3d

内存:

100*(node_memory_MemTotal_bytes - node_memory_MemFree_bytes - node_memory_Buffers_bytes - node_memory_Cached_bytes) / node_memory_MemTotal_bytes

legend 设置为{{instance}}code

存储:

100.0 - 100 * ((node_filesystem_avail_bytes / 1000 / 1000 ) / (node_filesystem_size_bytes  / 1024 / 1024))

legend 设置为 {{instance}} - {{mountpoint}}blog

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

运维0到1

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值