1、告警模板
关于Alertmanager的告警模板,我们以上篇《Prometheus配置和使用Alertmanager发送告警至企业微信》的模板为例,对其做个说明,
[root@centos74 home]# cat /usr/local/prometheus/alertmanager/wechat.tmpl
{{ define "wechat.default.message" }}
{{- if gt (len .Alerts.Firing) 0 -}}
{{- range $index, $alert := .Alerts -}}
======== 异常告警 ========
告警名称:{{ $alert.Labels.alertname }}
告警级别:{{ $alert.Labels.severity }}
告警机器:{{ $alert.Labels.instance }} {{ $alert.Labels.device }}
告警详情:{{ $alert.Annotations.summary }}
告警时间:{{ $alert.StartsAt.Format "2006-01-02 15:04:05" }}
========== END ==========
{{- end }}
{{- end }}
{{- if gt (len .Alerts.Resolved) 0 -}}
{{- range $index, $alert := .Alerts -}}
======== 告警恢复 ========
告警名称:{{ $alert.Labels.alertname }}
告警级别:{{ $alert.Labels.severity }}
告警机器:{{ $alert.Labels.instance }}
告警详情:{{ $alert.Annotations.summary }}
告警时间:{{ $alert.StartsAt.Format "2006-01-02 15:04:05" }}
恢复时间:{{ $alert.EndsAt.Format "2006-01-02 15:04:05" }}
========== END ==========
{{- end }}
{{- end }}
{{- end }}
2、主要语法
模板是基于go语言的template——https://golang.org/pkg/text/template/
2.1 Text and spaces
By default, all text between actions is copied verbatim when the template
is executed. For example, the string " items are made of " in the example
above appears on standard output when the program is run.
However, to aid in formatting template source code, if an action's left
delimiter (by default "{{") is followed immediately by a minus sign and
ASCII space character ("{{- "), all trailing white space is trimmed from
the immediately preceding text. Similarly, if the right delimiter ("}}")
is preceded by a space and minus sign (" -}}"), all leading white space
is trimmed from the immediately following text. In these trim markers,
the ASCII space must be present; "{{-3}}" parses as an action containing
the number -3.
For instance, when executing the template whose source is
"{{23 -}} < {{- 45}}"
the generated output would be
"23<45"
For this trimming, the definition of white space characters is the same
as in Go: space, horizontal tab, carriage return, and newline.
2.2 Actions
Here is the list of actions. "Arguments" and "pipelines" are evaluations
of data, defined in detail in the corresponding sections that follow.
{{/* a comment */}}
{{- /* a comment with white space trimmed from preceding and following text */ -}}
A comment; discarded. May contain newlines.
Comments do not nest and must start and end at the
delimiters, as shown here.
{{pipeline}}
The default textual representation (the same as would be
printed by fmt.Print) of the value of the pipeline is copied
to the output.
{{if pipeline}} T1 {{end}}
If the value of the pipeline is empty, no output is generated;
otherwise, T1 is executed. The empty values are false, 0, any
nil pointer or interface value, and any array, slice, map, or
string of length zero.
Dot is unaffected.
{{if pipeline}} T1 {{else}} T0 {{end}}
If the value of the pipeline is empty, T0 is executed;
otherwise, T1 is executed. Dot is unaffected.
{{if pipeline}} T1 {{else if pipeline}} T0 {{end}}
To simplify the appearance of if-else chains, the else action
of an if may include another if directly; the effect is exactly
the same as writing
{{if pipeline}} T1 {{else}}{{if pipeline}} T0 {{end}}{{end}}
{{range pipeline}} T1 {{end}}
The value of the pipeline must be an array, slice, map, or channel.
If the value of the pipeline has length zero, nothing is output;
otherwise, dot is set to the successive elements of the array,
slice, or map and T1 is executed. If the value is a map and the
keys are of basic type with a defined order, the elements will be
visited in sorted key order.
{{range pipeline}} T1 {{else}} T0 {{end}}
The value of the pipeline must be an array, slice, map, or channel.
If the value of the pipeline has length zero, dot is unaffected and
T0 is executed; otherwise, dot is set to the successive elements
of the array, slice, or map and T1 is executed.
{{template "name"}}
The template with the specified name is executed with nil data.
{{template "name" pipeline}}
The template with the specified name is executed with dot set
to the value of the pipeline.
{{block "name" pipeline}} T1 {{end}}
A block is shorthand for defining a template
{{define "name"}} T1 {{end}}
and then executing it in place
{{template "name" pipeline}}
The typical use is to define a set of root templates that are
then customized by redefining the block templates within.
{{with pipeline}} T1 {{end}}
If the value of the pipeline is empty, no output is generated;
otherwise, dot is set to the value of the pipeline and T1 is
executed.
{{with pipeline}} T1 {{else}} T0 {{end}}
If the value of the pipeline is empty, dot is unaffected and T0
is executed; otherwise, dot is set to the value of the pipeline
and T1 is executed.
3、Alert数据结构
告警的数据结构主要如下,
Name | Type | Notes |
---|---|---|
Status | string | Defines whether or not the alert is resolved or currently firing. |
Labels | KV | A set of labels to be attached to the alert. |
Annotations | KV | A set of annotations for the alert. |
StartsAt | time.Time | The time the alert started firing. If omitted, the current time is assigned by the Alertmanager. |
EndsAt | time.Time | Only set if the end time of an alert is known. Otherwise set to a configurable timeout period from the time since the last alert was received. |
GeneratorURL | string | A backlink which identifies the causing entity of this alert. |
3.1 Labels
Labels为prometheus web上告警时的label,如下
3.2 Annotations
Annotations为用户在告警规则里定义的annotations字段,
groups:
- name: node_health
rules:
- alert: HighMemoryUsage
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes < 0.9
for: 1m
labels:
severity: warning
annotations:
summary: High memory usage
3.3 StartsAt和EndsAt
StartsAt用于告警触发的时间,EndsAt则用于告警恢复的时间。
如果我们在告警模板中直接使用$alert.StartsAt,得到的时间格式如下,
告警时间:2020-12-16 22:35:33.676515606 +0800 CST
这个时间也就是和我们机器上的时间一致,因此如果需要保证机器上的时间是我们需要的时区。
可以通过tzselect命令设置时区,完成后最好重启下系统,通过/var/log/messages的时间戳来确定时间是否满足我们的需求,这样我们收到的告警时间才是符合我们所在时区。
不过对于我们的告警不需要精确到纳秒级别,也不需要显示时区,那就需要对这个时间进行格式化,这也是我们模板中使用$alert.StartsAt.Format的原因,至于其中的"2006-01-02 15:04:05",可以理解为时间格式,而且还必须就是这个时间,不可以修改,就当做是魔术字吧,go的开发者就是这么个性。
在使用邮件告警时,一般使用如下格式,
告警时间: {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
其中Add 28800e9表示在基准时间上添加8小时,28800e9是8小时的纳秒数。这就是从UTC时间转换到北京东八区时间。