五、扩展
Prometheus在后续的性能扩展方面的策略
1、官方提供的联邦 Federation allows a Prometheus server to scrape selected time series from another Prometheus server.
。联邦提供两种不通的用例:1)从将数据从一个prometheus拉到另一个服务中;2)等级联邦,区分全局和局部prometheus
2、Thanos: Open source, highly available Prometheus setup with long term storage capabilities.
实现跨集群联合、跨集群无限存储和全局查询为Prometheus增加高可用性的插件。重点开源。链接
检测工具和脚本
网络检测工具:smokePing
3、修改大盘默认时间间隔
grafana 大盘json文件
"time": {
"from": "now-6h",
"to": "now"
4、修改登录超时时间
login_maximum_inactive_lifetime_duration = 2M
login_maximum_lifetime_duration = 2M
token_rotation_interval_minutes = 1000
Prometheus和alertmanager增加鉴权
步骤1:密码文件生成:
[root@iZ2zef0llgs69lx3vc9rfgZ prometheus-2.27.1.linux-amd64]# cat ../a.py
import getpass
import bcrypt
password = getpass.getpass("password: ")
hashed_password = bcrypt.hashpw(password.encode("utf-8"), bcrypt.gensalt())
print(hashed_password.decode())
步骤2:Prometheus:增加鉴权,一个是控制它,一个是访问alertmanager时提供用户名和密码
[root@iZ2zef0llgs69lx3vc9rfgZ prometheus-2.27.1.linux-amd64]# cat prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 127.0.0.1:9093
basic_auth:
username: admin
password: admin123!
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- ./rules/*.yml
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9100']
[root@iZ2zef0llgs69lx3vc9rfgZ prometheus-2.27.1.linux-amd64]# cat web.yml #
basic_auth_users:
admin: $2b$12$Y1AmEmEipc3HOx/kVIkVAusFKE3WziWUzzVV.fOMfl1PkpZixW7x.
[root@iZ2zef0llgs69lx3vc9rfgZ prometheus-2.27.1.linux-amd64]# ./promtool check config prometheus.yml #检查配置
[root@iZ2zef0llgs69lx3vc9rfgZ prometheus-2.27.1.linux-amd64]# ./promtool check web-config web.yml
web.yml SUCCESS
启动: nohup ./prometheus --web.enable-lifecycle --web.enable-admin-api --web.config.file=web.yml &>./a.log &
步骤3:Alertmanager添加鉴权
启动:
nohup ./alertmanager --config.file=alertmanager.yml --web.config.file=./web.yml &>/dev/null &
配置文件:
[root@iZ2zef0llgs69lx3vc9rfgZ prometheus-2.27.1.linux-amd64]# cat ../alertmanager-0.22.2.linux-amd64/web.yml
basic_auth_users:
admin: $2b$12$Y1AmEmEipc3HOx/kVIkVAusFKE3WziWUzzVV.fOMfl1PkpZixW7x.
5、修改prometheus钉钉告警模板
- 修改prometheus-webhook的配置
## Request timeout
# timeout: 5s
## Customizable templates path
templates:
- contrib/templates/mytemplate.tmpl
targets:
webhook1:
url: https://oapi.dingtalk.com/robot/send?access_token=$你的TOKEN
# secret for signature
secret: SEC000000000000000000000
message:
text: '{{ template "_ding.link.content" . }}'
- 修改模板文件-template.tmpl
[root@iZyz800ony0blg7zox9h4gZ prometheus-webhook-dingtalk-1.4.0.linux-amd64]# cat contrib/templates/mytemplate.tmpl
{{ define "__subject" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }}({{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}){{ end }}{{ end }}
{{ define "__alertmanagerURL" }}{{ .ExternalURL }}/#/alerts?receiver={{ .Receiver }}{{ end }}
{{ define "__text_alert_list" }}{{ range . }}
**Labels**
{{ range .Labels.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}
**Annotations**
{{ range .Annotations.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}
**Source:** [{{ .GeneratorURL }}]({{ .GeneratorURL }})
{{ end }}{{ end }}
{{ define "___text_alert_list" }}{{ range . }}
---
**告警主题:** {{ .Labels.alertname }}
**触发时间:** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
**事件信息:** {{ .Annotations.description }}
{{ end }}
{{ end }}
{{ define "___text_alertresovle_list" }}{{ range . }}
---
**告警主题:** {{ .Labels.alertname }}
**触发时间:** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
**结束时间:** {{ dateInZone "2006.01.02 15:04:05" (.EndsAt) "Asia/Shanghai" }}
**事件信息:** {{ .Labels.description }}
{{ end }}
{{ end }}
{{/* Default */}}
{{ define "_default.title" }}{{ template "__subject" . }}{{ end }}
{{ define "_default.content" }} [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})**
{{ if gt (len .Alerts.Firing) 0 -}}
**========告警触发========**
{{ template "___text_alert_list" .Alerts.Firing }}
{{- end }}
{{ if gt (len .Alerts.Resolved) 0 -}}
![恢复图标](https://duojia-lemei.oss-cn-beijing.aliyuncs.com/OK.jpg)
**========告警恢复========**
{{ template "___text_alertresovle_list" .Alerts.Resolved }}
{{- end }}
{{- end }}
{{/* Legacy */}}
{{ define "legacy.title" }}{{ template "__subject" . }}{{ end }}
{{ define "legacy.content" }} [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})**
{{ template "__text_alert_list" .Alerts.Firing }}
{{- end }}
{{/* Following names for compatibility */}}
{{ define "_ding.link.title" }}{{ template "_default.title" . }}{{ end }}
{{ define "_ding.link.content" }}{{ template "_default.content" . }}{{ end }}