一、prometheus
https://prometheus.io/docs/prometheus/latest/querying/functions/#deriv
https://p8s.io/docs/promql/metric-type/
1. node_exporter 只监控指定挂载点(路径)
--collector.filesystem.ignored-mount-points="^/(dev|proc|sys|var/lib/docker/.+)($|/)"
Regexp of mount points to ignore for filesystem collector.
2. 查看 $labels 中有哪些值
cat prometheus-2.20.1.linux-amd64/rules/availability.yml
groups:
- name: availability
rules:
- alert: AvailabilityFailing
expr: probe_success == 0
for: 1m
labels:
severity: critical
level: 3
annotations:
description: "从 {{ $labels.job }} 访问 {{ $labels.instance }} 异常。({{ $labels }})"
3. 开源 exporter 列表
https://prometheus.io/docs/instrumenting/exporters/
4. 常用 PromQL
1. 对累积的变量,求一定时间间隔的变化值
irate(elasticsearch_indices_search_query_total{cluster_name="fqlcollectiones",name="es_XX.11.19.23"}[1m])
2. 排除指定值
elasticsearch_os_load5{cluster_name !~ "fqlafplatformes|fqlbankes|fqlcollectiones"} > 15
3. 过滤指定字段的值
irate(elasticsearch_indices_indexing_index_total{cluster_name="$cluster_name",instance="$instance"}[$interval])
irate(elasticsearch_indices_search_query_total{cluster="fqlcollectiones",host="XX.11.19.23",instance="XX.11.19.22:19317"}[1m])
CPU 告警
node_load1 > (count without(cpu, mode) (node_cpu_seconds_total{mode="system"})) * 3
磁盘告警
(1 - node_filesystem_avail_bytes{fstype=~"ext.?|xfs", owner="aaronyu;XXX"} / node_filesystem_size_bytes{fstype=~"ext.?|xfs", owner="aaronyu;XXX"}) * 100
(1 - node_filesystem_avail_bytes{fstype=~"ext.?|xfs", owner=~".*aaronyu.*", instanceName=~".*es.*"} / node_filesystem_size_bytes{fstype=~"ext.?|xfs", owner=~".*aaronyu.*", instanceName=~".*es.*"}) * 100 > 80
(1 - node_filesystem_avail_bytes{fstype=~"ext.?|xfs", owner=~".*aaronyu.*", instanceName=~".*es.*", instance!~"10.11.18.15:9100"} / node_filesystem_size_bytes{fstype=~"ext.?|xfs", owner=~".*aaronyu.*", instanceName=~".*es.*", instance!~"10.11.18.15:9100"}) * 100 > 80
5. Grafana 高级配置
https://blog.csdn.net/qq_34556414/article/details/123689279
x、滴滴夜莺 n9e
x、zabbix
zabbix_get -s ${ONE_IP} -p 10050 -k "system.run[ls /]"
zabbix_get -s ${ONE_IP} -p 10050 -k "zabbix[host,agent,available]"
zabbix_get -s ${ONE_IP} -p 10050 -k "system.cpu.load[all,avg1]"
zabbix_get -s ${ONE_IP} -p 10050 -k "agent.hostname"
zabbix_get -s ${ONE_IP} -p 10050 -k "agent.version"
zabbix_get -s ${ONE_IP} -p 10050 -k "agent.ping"
本文介绍Prometheus监控系统的配置与使用技巧,包括如何设置node_exporter监控特定挂载点、使用PromQL进行数据查询及告警规则设定等。此外还提供了Grafana高级配置建议以及与其他监控系统的对比。
9490

被折叠的 条评论
为什么被折叠?



