Prometheus源码学习(5) notifier

最新推荐文章于 2022-06-04 15:46:52 发布

疯琴

最新推荐文章于 2022-06-04 15:46:52 发布

阅读量415

点赞数

分类专栏： prometheus go

本文链接：https://blog.csdn.net/qq_35753140/article/details/115976877

版权

go 同时被 2 个专栏收录

27 篇文章 1 订阅

订阅专栏

prometheus

18 篇文章 3 订阅

订阅专栏

文章目录

notifier 模块
Alert 结构体
Manager 结构体
- Manager.Run()
- Manager.sendAll()
习得

notifier 模块

notifier 模块是用于向 Alertmanager 发送告警通知的。

主要的结构体包括：

Alert
Manager
Options
alertMetrics
alertmanagerLabels
alertmanagerSet

Alert 结构体

字段都是 Alertmanager 的告警信息 JSON 中 “alerts” 列表内每个告警的字段。

// Alert is a generic representation of an alert in the Prometheus eco-system.
// Alert 结构体，Prometheus 生态系统对于告警（Alert）的通用表示。
type Alert struct {
	// Label value pairs for purpose of aggregation, matching, and disposition
	// dispatching. This must minimally include an "alertname" label.
	// 标签集，用于聚合、匹配和分发。至少要有一个 alertname 标签。
	Labels labels.Labels `json:"labels"`

	// Extra key/value information which does not define alert identity.
	// 额外的键值对信息，不用于唯一标识一个 alert。
	Annotations labels.Labels `json:"annotations"`

	// The known time range for this alert. Both ends are optional.
	// 已知的告警存在的时间范围。两端都是可选的。如果没有 EndsAt，那么 Alertmanager 中
	// 配置的 resolved_timeout 将生效，通常 Prometheus 都会带 EndsAt。
	StartsAt     time.Time `json:"startsAt,omitempty"`
	EndsAt       time.Time `json:"endsAt,omitempty"`
	GeneratorURL string    `json:"generatorURL,omitempty"`
}

Manager 结构体

// Manager is responsible for dispatching alert notifications to an
// alert manager service.
// Manager 负责分发告警通知到 Alertmanager
type Manager struct {
	// 告警队列
	queue []*Alert
	// 参数
	opts *Options

	// 关于发送告警通知的观测指标
	metrics *alertMetrics

	// 无缓冲通道，用作触发发送告警动作的信号
	more   chan struct{}
	mtx    sync.RWMutex
	ctx    context.Context
	cancel func()

	// 接收告警的 Alertmanager
	alertmanagers map[string]*alertmanagerSet
	logger        log.Logger
}

Manager.Run()

Manager 的 more 无缓冲 channel 作为一个信号通道，其中有信号的时候才会分发告警，否则阻塞等待。告警分批发送，批尺寸默认为64。

// Run dispatches notifications continuously.
// Run 方法持续分发告警通知
func (n *Manager) Run(tsets <-chan map[string][]*targetgroup.Group) {

	for {
		select {
		case <-n.ctx.Done():
			return
		// 更新配置以后重置 Alertmanager 配置
		case ts := <-tsets:
			n.reload(ts)
		case <-n.more:
		}
		// 获取下一批告警
		alerts := n.nextBatch()

		// 如果全部am都没发送成功，就记录丢弃数量指标
		if !n.sendAll(alerts...) {
			n.metrics.dropped.Add(float64(len(alerts)))
		}
		// If the queue still has items left, kick off the next iteration.
		// 如果告警队列不空，发送信号进入下一次发送迭代
		if n.queueLen() > 0 {
			n.setMore()
		}
	}
}

Manager.sendAll()

将告警发送给全部当前配置的 Alertmanager。至少成功发送给一个 Alertmanager 就返回 true。

如果 alerts 为空就返回 true
记录函数执行开始时间
声明 v1Payload, v2Payload 两个字节数组，他们是 alerts 序列化的结果，提前声明是作为缓存，避免在循环中反复声明降低性能
加锁读取 Alertmanager 集合
声明一个 WaitGroup 用于同步等待每个 am 都发送完毕后退出函数
循环 amSets 中的每个 ams，amSet 是不同的服务发现方式配置的am集合，每个 amSet 里面可能有多个 am
根据 ams 的 API 版本对 alerts 进行序列化成 payload
循环 ams 中的每个 am，启一个 goroutine，调用 sendOne() 函数将 payload 发送过去，根据成功或者失败的结果记录观测指标

习得

用零时（1年1月1日 0时0分0秒）标识未恢复，time.Time 有个 IsZero() 方法
prometheus.SummaryOpts 的 Namespace、Subsystem 和 Name 字段会构成标签名的三个以下划线分隔的单词
将告警排队，分批从队列中取出告警进行处理，增大吞吐量
Manager 对象的 more channel用于标识是否有待处理的告警
在 Manager.ApplyConfig(),Manager.reload(),Manager.Alertmanagers()，Manager.DroppedAlertmanagers() 和 Manager.sendAll() 方法中读写 Manager 的 amSets 时通过加锁进行保护
Manager.sendOne() 在向 AlertManager 发送 POST 请求后，关闭 resp.body 之前会读取并丢弃其中的内容

io.Copy(ioutil.Discard, resp.Body)
resp.Body.Close()

之所以要读取并丢弃一下，在 net/http 包的源码中注释有说明：如果不完成读取，http 客户端可能不会复用 HTTP/1.x 的 keep alive。

    // Body represents the response body.
    //
    // The response body is streamed on demand as the Body field
    // is read. If the network connection fails or the server
    // terminates the response, Body.Read calls return an error.
    //
    // The http Client and Transport guarantee that Body is always
    // non-nil, even on responses without a body or responses with
    // a zero-length body. It is the caller's responsibility to
    // close Body. The default HTTP client's Transport may not
    // reuse HTTP/1.x "keep-alive" TCP connections if the Body is
    // not read to completion and closed.

Manager.setMore()：发送一个通知信号，如果有未处理完的信号就返回

// setMore signals that the alert queue has items.
func (n *Manager) setMore() {
	// If we cannot send on the channel, it means the signal already exists
	// and has not been consumed yet.
	select {
	case n.more <- struct{}{}:
	default:
	}
}

疯琴

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Prometheus源码学习(5) notifier

文章目录notifier 模块Alert 结构体Manager 结构体Manager.Run()Manager.sendAll()习得notifier 模块按照 main.go 的顺序逐个捋一下每个模块的细节。notifier 模块是用于向 Alertmanager 发送告警通知的。主要的结构体包括：AlertManagerOptionsalertMetricsalertmanagerLabelsalertmanagerSetAlert 结构体字段都是 Alertmanager
复制链接

扫一扫

专栏目录