Go语言简单开发一个exporter

最新推荐文章于 2024-05-25 02:11:10 发布

242030

最新推荐文章于 2024-05-25 02:11:10 发布

阅读量1k

点赞数

分类专栏： golang 文章标签： golang

本文链接：https://blog.csdn.net/qq_30614345/article/details/130670970

版权

golang 专栏收录该内容

68 篇文章 7 订阅

订阅专栏

Go语言简单开发一个exporter

很多场景下需要自己定义监控指标，那么我们就需要自己开发 exporter 。

使用 go 语言开发 exporter 可以参考：

https://prometheus.io/docs

api 手册：

https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#section-readme

GitHub地址：

https://github.com/prometheus/client_golang

下面将使用 client_golang 语言编写一个极简的 exporter。

1、Prometheus的基本指标类型

1.1 Counter(只增不减的累加指标)

Counter 就是一个计数器，表示一种累积型指标，该指标只能单调递增或在重新启动时重置为零，例如可以使用

计数器来表示所服务的请求数。

1.2 Gauge(可增可减的测量指标)

Gauge 是最简单的度量类型，只有一个简单的返回值，可增可减，也可以设置为指定的值。所以 Gauge 通常用于

反映当前状态，比如当前温度或当前内存使用情况，当然也可以用于可增加可减少的计数指标。

1.3 Histogram(自带buckets区间用于统计分布的直方图)

Histogram 主要用于在设定的分布范围内记录的大小或者次数。

例如 http 请求响应时间：0-100ms、100-200ms、200-300ms、>300ms 的分布情况，Histogram 会自动创建 3

个指标，分别为：

事件发送的总次数：比如当前一共发生了2次http请求。

所有事件产生值的大小的总和：比如发生的2次http请求总的响应时间为150ms。

事件产生的值分布在bucket中的次数：比如响应时间0-100ms的请求1次，100-200ms的请求1次，其它的0次。

1.4 Summary(数据分布统计图)

Summary 和 Histogram 类似，都可以统计事件发生的次数或者大小，以及其分布情况。

Summary 和 Histogram 都提供了对于事件的计数 _count 以及值的汇总 _sum，因此使用 _count 和 _sum 时

间序列可以计算出相同的内容。

同时 Summary 和 Histogram 都可以计算和统计样本的分布情况，比如中位数，n分位数等等。不同在于

Histogram 可以通过 histogram_quantile 函数在服务器端计算分位数。而 Sumamry 的分位数则是直接在客户端

进行定义。因此对于分位数的计算。 Summary 在通过 PromQL 进行查询时有更好的性能表现，而 Histogram 则

会消耗更多的资源，相对的对于客户端而言 Histogram 消耗的资源更少。

2、编写exporter

2.1 Guage仪表盘类型

Gauge的特点：

1、可以任意上升或下降，没有固定的范围限制。

2、可以被设置为任何值，不像Counter只能递增。

3、可以被用来表示瞬时值或累计值。

4、可以被用来表示单个实体的状态，例如单个服务器的CPU使用率。

5、可以被用来表示多个实体的总体状态，例如整个集群的CPU使用率。

Gauge的使用：

1、Gauge的值可以通过set()方法进行设置。

2、Gauge的值可以通过inc()和dec()方法进行增加或减少。

3、Gauge的值可以通过add()方法进行增加或减少指定的值。

4、Gauge的值可以通过set_to_current_time()方法设置为当前时间戳。

5、Gauge的值可以通过observe()方法进行设置，这个方法可以用来记录样本值和时间戳。

package main

import (
	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promhttp"
	"log"
	"net/http"
	"proj/exporter3/util"
	"time"
)

var (
	// Gauge仪表盘类型
	// # HELP our_company_blob_storage_ops_queued Number of blob storage operations waiting to be processed
	// # TYPE our_company_blob_storage_ops_queued gauge
	// our_company_blob_storage_ops_queued 112
	opsQueued = prometheus.NewGauge(prometheus.GaugeOpts{
		Namespace: "our_company",
		Subsystem: "blob_storage",
		Name:      "ops_queued",
		Help:      "Number of blob storage operations waiting to be processed",
	})
	// # HELP job_in_queue Current number of jobs in the queue
	// # TYPE job_in_queue gauge
	// job_in_queue{job_type="testjob"} 3
	jobsInQueue = prometheus.NewGaugeVec(prometheus.GaugeOpts{
		Name: "job_in_queue",
		Help: "Current number of jobs in the queue",
	}, []string{"job_type"})
)

func init() {
	prometheus.MustRegister(opsQueued, jobsInQueue)
}

func main() {
	jobsInQueue.WithLabelValues("testjob").Add(3)
	// 也可以使用With添加标签
	// jobsInQueue.With(map).Inc()
	// 或者是使用prometheus.Labels
	// jobsInQueue.With(prometheus.Labels{"device":"/dev/sda"}).Inc()
	go func() {
		for true {
			//每隔一秒加 4
			opsQueued.Add(4)
			// 如果设置固定值可以使用
			// opsQueued.Set(4)
			time.Sleep(time.Second)
		}
	}()
	http.Handle("/metrics", promhttp.Handler())
	log.Fatal(http.ListenAndServe(":9090", nil))
}

2.2 Count计数器类型

Counter的特点：

1、Counter只能增加，不能减少或重置。

2、Counter的值是一个非负整数。

3、Counter的值可以随时间增加，但不会减少。

4、Counter的值在重启Prometheus时会重置为0。

5、Counter的值可以被多个Goroutine同时增加，不需要加锁。

6、Counter的值可以被推送到Pushgateway中，用于监控非Prometheus监控的数据。

Counter的使用方法：

1、在程序中定义一个Counter对象，并初始化为0。

2、当需要记录计数时，调用Counter的Inc()方法增加计数器的值。

3、将Counter对象暴露给Prometheus，使其能够收集数据。

4、在Prometheus中定义一个相应的指标，并将Counter对象与该指标关联。

package main

import (
	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promhttp"
	"log"
	"net/http"
	"time"
)

var (
	// Count计数器类型
	// # HELP worker_pool_completed_tasks_total Total number of tasks completed.
	// # TYPE worker_pool_completed_tasks_total counter
	// worker_pool_completed_tasks_total 17
	taskCounter = prometheus.NewCounter(prometheus.CounterOpts{
		Subsystem: "worker_pool",
		Name:      "completed_tasks_total",
		Help:      "Total number of tasks completed.",
	})
)

func init() {
	prometheus.MustRegister(taskCounter)
}

func main() {
	go func() {
		for true {
			//每隔一秒加 4
			taskCounter.Inc()
			time.Sleep(time.Second)
		}
	}()
	http.Handle("/metrics", promhttp.Handler())
	log.Fatal(http.ListenAndServe(":9090", nil))
}

2.3 Summary类型

Summary是Prometheus中的一种指标类型，用于记录一组样本的总和、计数和分位数。它适用于记录耗时、请

求大小等具有较大变化范围的指标。

Summary指标类型包含以下几个指标：

1、sum：样本值的总和。

2、count：样本值的计数。

3、quantile：分位数。

其中，sum和count是必须的，而quantile是可选的。

在使用Summary指标类型时，需要注意以下几点：

1、每个Summary指标类型都会记录所有样本的总和和计数，因此它们的值会随时间变化而变化。

2、每个Summary指标类型都可以记录多个分位数，例如50%、90%、95%、99%等。

3、每个Summary指标类型都可以设置一个时间窗口，用于计算分位数。

4、每个Summary指标类型都可以设置一个最大样本数，用于限制内存使用。

5、每个Summary指标类型都可以设置一个标签集，用于区分不同的实例。

总之，Summary指标类型是一种非常有用的指标类型，可以帮助我们更好地了解系统的性能和健康状况。

Summary 类型的指标，需要提供分位点，如下：

package main

import (
	"fmt"
	"github.com/golang/protobuf/proto"
	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promhttp"
	dto "github.com/prometheus/client_model/go"
	"log"
	"math"
	"net/http"
)

var (
	// Summary类型,需要提供分位点
	// # HELP pond_temperature_celsius The temperature of the frog pond.
	// # TYPE pond_temperature_celsius summary
	// pond_temperature_celsius{quantile="0.5"} 31.1
	// pond_temperature_celsius{quantile="0.9"} 41.3
	// pond_temperature_celsius{quantile="0.99"} 41.9
	// pond_temperature_celsius_sum 29969.50000000001
	// pond_temperature_celsius_count 1000
	tempSummary = prometheus.NewSummary(prometheus.SummaryOpts{
		Name:       "pond_temperature_celsius",
		Help:       "The temperature of the frog pond.",
		Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001},
	})

	// # HELP rpc_durations_seconds RPC latency distributions.
	// # TYPE rpc_durations_seconds summary
	// rpc_durations_seconds{error_code="400",service="normal",quantile="0.5"} 31.1
	// rpc_durations_seconds{error_code="400",service="normal",quantile="0.9"} 41.3
	// rpc_durations_seconds{error_code="400",service="normal",quantile="0.99"} 41.9
	// rpc_durations_seconds_sum{error_code="400",service="normal"} 29969.50000000001
	// rpc_durations_seconds_count{error_code="400",service="normal"} 1000
	rpcDurations = prometheus.NewSummaryVec(
		prometheus.SummaryOpts{
			Name:       "rpc_durations_seconds",
			Help:       "RPC latency distributions.",
			Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001}},
		[]string{"service", "error_code"},
	)
)

func init() {
	prometheus.MustRegister(tempSummary, rpcDurations)
}

func main() {
	go func() {
		// 模拟观察温度
		for i := 0; i < 1000; i++ {
			tempSummary.Observe(30 + math.Floor(120*math.Sin(float64(i)*0.1))/10)
		}
		// 仅供示范,让我们通过使用它的Write方法检查摘要的状态(通常只在Prometheus内部使用)
		metric := &dto.Metric{}
		tempSummary.Write(metric)
		/*
			summary: <
			  sample_count: 1000
			  sample_sum: 29969.50000000001
			  quantile: <
			    quantile: 0.5
			    value: 31.1
			  >
			  quantile: <
			    quantile: 0.9
			    value: 41.3
			  >
			  quantile: <
			    quantile: 0.99
			    value: 41.9
			  >
			>
		*/
		fmt.Println(proto.MarshalTextString(metric))
	}()
	go func() {
		// 模拟观察温度
		for i := 0; i < 1000; i++ {
			rpcDurations.WithLabelValues("normal", "400").Observe(30 + math.Floor(120*math.Sin(float64(i)*0.1))/10)
		}
	}()
	http.Handle("/metrics", promhttp.Handler())
	log.Fatal(http.ListenAndServe(":9090", nil))
}

2.4 Histogram类型

Histogram是一种Prometheus指标类型，用于度量数据的分布情况。它将数据分成一系列桶(bucket)，每个桶代

表一段范围内的数据。每个桶都有一个计数器(counter)，用于记录该范围内的数据数量。在Prometheus中，

Histogram指标类型的名称以_bucket结尾。

Histogram指标类型通常用于度量请求延迟、响应大小等连续型数据。例如，我们可以使用Histogram指标类型来

度量Web应用程序的请求延迟。我们可以将请求延迟分成几个桶，例如0.1秒、0.5秒、1秒、5秒、10秒、30秒

等。每个桶都记录了在该范围内的请求延迟的数量。

Histogram指标类型还有两个重要的计数器：sum和count。sum用于记录所有数据的总和，count用于记录数据

的数量。通过这两个计数器，我们可以计算出平均值和其他统计信息。

在Prometheus中，我们可以使用histogram_quantile函数来计算某个百分位数的值。例如，我们可以使用

histogram_quantile(0.9, my_histogram)来计算my_histogram指标类型中90%的请求延迟的值。

总之，Histogram指标类型是一种非常有用的指标类型，可以帮助我们了解数据的分布情况，从而更好地监控和优

化应用程序的性能。

Histogram 类型的指标，需要提供 Bucket 大小，如下：

package main

import (
	"fmt"
	"github.com/golang/protobuf/proto"
	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promhttp"
	dto "github.com/prometheus/client_model/go"
	"log"
	"math"
	"net/http"
)

var (
	//Histogram类型,需要提供Bucket大小
	// # HELP pond_temperature_histogram_celsius The temperature of the frog pond.
	// # TYPE pond_temperature_histogram_celsius histogram
	// pond_temperature_histogram_celsius_bucket{le="20"} 192
	// pond_temperature_histogram_celsius_bucket{le="25"} 366
	// pond_temperature_histogram_celsius_bucket{le="30"} 501
	// pond_temperature_histogram_celsius_bucket{le="35"} 638
	// pond_temperature_histogram_celsius_bucket{le="40"} 816
	// pond_temperature_histogram_celsius_bucket{le="+Inf"} 1000
	// pond_temperature_histogram_celsius_sum 29969.50000000001
	// pond_temperature_histogram_celsius_count 1000
	tempsHistogram = prometheus.NewHistogram(prometheus.HistogramOpts{
		Name:        "pond_temperature_histogram_celsius",
		Help:        "The temperature of the frog pond.",
		ConstLabels: nil,
		// 5个buckets,跨度为5摄氏度
		Buckets: prometheus.LinearBuckets(20, 5, 5),
		// 等价于这个
		// Buckets:     []float64{20, 25, 30, 35, 40},
		// 由于填写一串Bucket过于繁琐,所以Prometheus还提供了便捷的生成方法:
		// LinearBuckets用于创建等差数列
		// ExponentialBucket用于创建等比数列
	})
)

func init() {
	prometheus.MustRegister(tempsHistogram)
}

func main() {
	go func() {
		// 模拟观察温度
		for i := 0; i < 1000; i++ {
			tempsHistogram.Observe(30 + math.Floor(120*math.Sin(float64(i)*0.1))/10)
		}
		// 仅供示范,让我们通过使用它的Write方法检查摘要的状态(通常只在Prometheus内部使用)
		metric := &dto.Metric{}
		tempsHistogram.Write(metric)
		/*
		histogram: <
		  sample_count: 1000
		  sample_sum: 29969.50000000001
		  bucket: <
		    cumulative_count: 192
		    upper_bound: 20
		  >
		  bucket: <
		    cumulative_count: 366
		    upper_bound: 25
		  >
		  bucket: <
		    cumulative_count: 501
		    upper_bound: 30
		  >
		  bucket: <
		    cumulative_count: 638
		    upper_bound: 35
		  >
		  bucket: <
		    cumulative_count: 816
		    upper_bound: 40
		  >
		>
		*/
		fmt.Println(proto.MarshalTextString(metric))
	}()
	http.Handle("/metrics", promhttp.Handler())
	log.Fatal(http.ListenAndServe(":9090", nil))
}

3、自定义类型

如果上面 Counter，Gauge，Histogram，Summary 四种内置指标都不能满足我们要求时，我们还可以自定义类

型。只要实现了 Collect 接口的方法，然后调用 MustRegister 即可。

func MustRegister(cs ...Collector) {
    DefaultRegisterer.MustRegister(cs...)
}

type Collector interface {
    // 用于传递所有可能的指标的定义描述符
    // 可以在程序运行期间添加新的描述,收集新的指标信息
    // 重复的描述符将被忽略,两个不同的Collector不要设置相同的描述符
    Describe(chan<- *Desc)

    // Prometheus的注册器调用Collect执行实际的抓取参数的工作,
    // 并将收集的数据传递到Channel中返回
    // 收集的指标信息来自于Describe中传递,可以并发的执行抓取工作,但是必须要保证线程的安全。
    Collect(chan<- Metric)
}

3.1 示例1

package main

import (
	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promhttp"
	log "github.com/sirupsen/logrus"
	"net/http"
)

type fooCollector struct {
	fooMetric *prometheus.Desc
	barMetric *prometheus.Desc
}

func newFooCollector() *fooCollector {
	return &fooCollector{
		// # HELP foo_metric Shows whether a foo has occurred in our cluster
		// # TYPE foo_metric counter
		// foo_metric 1
		fooMetric: prometheus.NewDesc("foo_metric",
			"Shows whether a foo has occurred in our cluster",
			nil, nil,
		),
		// # HELP bar_metric Shows whether a bar has occurred in our cluster
		// # TYPE bar_metric counter
		// bar_metric 1
		barMetric: prometheus.NewDesc("bar_metric",
			"Shows whether a bar has occurred in our cluster",
			nil, nil,
		),
	}
}

func (collector *fooCollector) Describe(ch chan<- *prometheus.Desc) {
	ch <- collector.fooMetric
	ch <- collector.barMetric
}

func (collector *fooCollector) Collect(ch chan<- prometheus.Metric) {
	var metricValue float64
	if 1 == 1 {
		metricValue = 1
	}
	ch <- prometheus.MustNewConstMetric(collector.fooMetric, prometheus.CounterValue, metricValue)
	ch <- prometheus.MustNewConstMetric(collector.barMetric, prometheus.CounterValue, metricValue)
}

func main() {

	foo := newFooCollector()
	prometheus.MustRegister(foo)

	http.Handle("/metrics", promhttp.Handler())
	log.Info("Beginning to serve on port :9090")
	log.Fatal(http.ListenAndServe(":9090", nil))
}

启动访问 http://127.0.0.1:9090/metrics：

# HELP bar_metric Shows whether a bar has occurred in our cluster
# TYPE bar_metric counter
bar_metric 1
# HELP foo_metric Shows whether a foo has occurred in our cluster
# TYPE foo_metric counter
foo_metric 1
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
go_gc_duration_seconds_sum 0
go_gc_duration_seconds_count 0
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 6
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.18.4"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 1.614816e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 1.614816e+06
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 6668
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 0
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 1.444472e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 1.614816e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 2.400256e+06
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 1.630208e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 16185
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 2.400256e+06
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 4.030464e+06
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 0
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 16185
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 9344
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16352
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 28968
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 32640
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 4.194304e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.010924e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 163840
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 163840
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 6.70536e+06
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 8
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.03125
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.6777216e+07
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 99
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 8.47872e+06
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.684024205e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.4680064e+07
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 0
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0

3.2 示例2

package main

import (
	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promhttp"
	log "github.com/sirupsen/logrus"
	"math/rand"
	"net/http"
	"os"
)

// 先定义结构体,这是一个集群的指标采集器
// 每个集群都有自己的Zone,代表集群的名称,另外两个是保存的采集的指标
type ClusterManager struct {
	Zone         string
	OOMCountDesc *prometheus.Desc
	RAMUsageDesc *prometheus.Desc
}

// 返回一个按照主机名作为键采集到的数据,两个返回值分别代表了OOM错误计数和RAM使用指标信息
func (c *ClusterManager) ReallyExpensiveAssessmentOfTheSystemState() (
	oomCountByHost map[string]int, ramUsageByHost map[string]float64,
) {
	oomCountByHost = map[string]int{
		"foo.example.org": int(rand.Int31n(1000)),
		"bar.example.org": int(rand.Int31n(1000)),
	}
	ramUsageByHost = map[string]float64{
		"foo.example.org": rand.Float64() * 100,
		"bar.example.org": rand.Float64() * 100,
	}
	return
}

// 实现Describe接口,传递指标描述符到channel
func (c *ClusterManager) Describe(ch chan<- *prometheus.Desc) {
	ch <- c.OOMCountDesc
	ch <- c.RAMUsageDesc
}

// Collect函数将执行抓取函数并返回数据,返回的数据传递到channel中,并且传递的同时绑定原先的指标描述符
// 以及指标的类型(一个Counter和一个Gauge)
func (c *ClusterManager) Collect(ch chan<- prometheus.Metric) {
	oomCountByHost, ramUsageByHost := c.ReallyExpensiveAssessmentOfTheSystemState()
	for host, oomCount := range oomCountByHost {
		ch <- prometheus.MustNewConstMetric(
			c.OOMCountDesc,
			prometheus.CounterValue,
			float64(oomCount),
			host,
		)
	}
	for host, ramUsage := range ramUsageByHost {
		ch <- prometheus.MustNewConstMetric(
			c.RAMUsageDesc,
			prometheus.GaugeValue,
			ramUsage,
			host,
		)
	}
}

// 创建结构体及对应的指标信息,NewDesc参数第一个为指标的名称,第二个为帮助信息,显示在指标的上面作为注释
// 第三个是定义的label名称数组,第四个是定义的Labels
func NewClusterManager(zone string) *ClusterManager {
	return &ClusterManager{
		Zone: zone,
		OOMCountDesc: prometheus.NewDesc(
			"clustermanager_oom_crashes_total",
			"Number of OOM crashes.",
			[]string{"host"},
			prometheus.Labels{"zone": zone},
		),
		RAMUsageDesc: prometheus.NewDesc(
			"clustermanager_ram_usage_bytes",
			"RAM usage as reported to the cluster manager.",
			[]string{"host"},
			prometheus.Labels{"zone": zone},
		),
	}
}

func main() {
	workerDB := NewClusterManager("db")
	workerCA := NewClusterManager("ca")
	reg := prometheus.NewPedanticRegistry()
	reg.MustRegister(workerDB)
	reg.MustRegister(workerCA)
	// prometheus.Gatherers用来定义一个采集数据的收集器集合,可以merge多个不同的采集数据到一个结果集合
	// 这里我们传递了缺省的DefaultGatherer,所以他在输出中也会包含go运行时指标信息
	// 同时包含reg是我们之前生成的一个注册对象,用来自定义采集数据
	// 如果注释掉了prometheus.DefaultGatherername只会采集我们自己定义的指标
	gatherers := prometheus.Gatherers{
		//prometheus.DefaultGatherer,
		reg,
	}
	// promhttp.HandlerFor()函数传递之前的Gatherers对象,并返回一个httpHandler对象
	// 这个httpHandler对象可以调用其自身的ServHTTP函数来接手http请求并返回响应
	// 其中promhttp.HandlerOpts定义了采集过程中如果发生错误时,继续采集其他的数据。
	h := promhttp.HandlerFor(gatherers,
		promhttp.HandlerOpts{
			ErrorLog:      &log.Logger{Level: log.ErrorLevel},
			ErrorHandling: promhttp.ContinueOnError,
		})
	http.HandleFunc("/metrics", func(w http.ResponseWriter, r *http.Request) {
		h.ServeHTTP(w, r)
	})
	log.Infoln("Start server at :9090")
	if err := http.ListenAndServe(":9090", nil); err != nil {
		log.Errorf("Error occur when start server %v", err)
		os.Exit(1)
	}
}

启动访问 http://127.0.0.1:9090/metrics：

# HELP clustermanager_oom_crashes_total Number of OOM crashes.
# TYPE clustermanager_oom_crashes_total counter
clustermanager_oom_crashes_total{host="bar.example.org",zone="ca"} 318
clustermanager_oom_crashes_total{host="bar.example.org",zone="db"} 887
clustermanager_oom_crashes_total{host="foo.example.org",zone="ca"} 81
clustermanager_oom_crashes_total{host="foo.example.org",zone="db"} 81
# HELP clustermanager_ram_usage_bytes RAM usage as reported to the cluster manager.
# TYPE clustermanager_ram_usage_bytes gauge
clustermanager_ram_usage_bytes{host="bar.example.org",zone="ca"} 15.651925473279125
clustermanager_ram_usage_bytes{host="bar.example.org",zone="db"} 43.771418718698015
clustermanager_ram_usage_bytes{host="foo.example.org",zone="ca"} 6.563701921747622
clustermanager_ram_usage_bytes{host="foo.example.org",zone="db"} 66.45600532184905