node_exporter二次开发-采集自定义指标
背景需求
: 由于业务需要采集服务器ssh连接的数量判断有多少IP登录。说明服务器使用的ssh默认端口是22
node_exporter采集自定义指标,可以使用textfile_directory通过调用shell脚本方式采集,也可以通过对node_exporter源码二次开发通过新添加一个Collector来采集自定义指标。本文总结了第二种方法的实现。
实现步骤
-
- 下载node_exporter源代码
-
- 创建一个新的collector.go文件
-
- 定义一个新的collector结构体用于采集ssh的连接数据
-
- 写一个采集ssh连接数据的方法Update(), 每次curl :9100/metrics都会调用Update()方法获取一次值
-
- 将新的collector注册到promethues
下载node_exporter源码
git clone https://github.com/prometheus/node_exporter.git
在collector目录下创建新的sshconnectnumber.go文件
# 查看已经创建的sshconnectnumber.go文件
(base) node_exporter(master) ✗: tree collector |grep ssh
├── ssh_connnect_number.go
(base) node_exporter(master) ✗: ls collector/ssh_connnect_number.go
collector/ssh_connnect_number.go
采集字段说明
为了便于理解,先展示出采集的结果,并说明各字段含义
# HELP node_sshconnectnumber_ssh_connect_number ssh connnect number of personaldev user
# TYPE node_sshconnectnumber_ssh_connect_number gauge
node_sshconnectnumber_ssh_connect_number{ssh_port="22"} 8
- 第一二行的输出是prometheus.NewDesc定义的显示信息。其中gauage表示采集类型是仪表盘。
Gauge 类型代表一种样本数据可以任意变化的指标,即可增可减。Gauge 通常用于像温度或者内存使用率这种指标数据,也可以表示能随时增加或减少的“总数”,例如:当前并发请求的数量。
详细说明见参考官方文档> https://prometheus.fuckcloudnative.io/di-er-zhang-gai-nian/metric_types
- 最后一行就是采集的结果,其中
node
表示namespaces
sshconnectnumber
表示subsystem
ssh_connect_number
表示FQname中的name
定义一个新的collector结构体并写一个构造函数
// 定义常量代表subsystem显示的字段
const (
sshConnectNumberSubsystem = "sshconnectnumber"
)
// 定义个采集温度的collector类型
type sshConnectNumberCollector struct {
sshconnectnumber *prometheus.Desc
logger log.Logger
}
// 新collector类型的构造函数
func NewsshConnectNumberCollector(logger log.Logger) (Collector, error) {
return &sshConnectNumberCollector{
// 定义一些显示的描述信息
sshconnectnumber: prometheus.NewDesc(
// 指定namespace,subsystem,name等字段,定义HELP显示的内容
prometheus.BuildFQName(namespace, sshConnectNumberSubsystem, "ssh_connect_number"),
"ssh connnect number of personaldev user",
[]string{
"ssh_port",
},
nil,
),
logger: logger,
}, nil
}
实现Collector接口的Update()方法
必须实现Collector接口的Update()方法才能将采集的值进行暴露
// 必须实现Collector接口的,Update()方法;Update()方法中,会采集用户登录的ssh连接数,并更新监控值到ch
func (s sshConnectNumberCollector) Update(ch chan<- prometheus.Metric) error {
// 使用shell命令获取22端口的连接数
sshPort := "22"
command := fmt.Sprintf("lsof -i:%s |grep ESTABLISHED |wc -l", sshPort)
//fmt.Printf("command exec: %s\n", command)
cmd := exec.Command("/bin/bash", "-c", command)
outputBytes, err := cmd.Output()
if err != nil {
//level.Info(logger).Log("msg", "Starting node_exporter", "version", version.Info())
return err
}
//fmt.Println("output=", string(outputBytes))
// 处理output,去掉空格和换行符
outputString := strings.Trim(string(outputBytes), "\n")
outputStringTrimSpace := strings.Trim(outputString, " ")
// 转换为数值
value, err := strconv.Atoi(outputStringTrimSpace)
if err != nil {
fmt.Println("strconv.Atoi error,err=", err.Error())
return err
}
// 将结果写入ch
ch <- prometheus.MustNewConstMetric(
s.sshconnectnumber,
prometheus.GaugeValue,
float64(value),
sshPort,
)
return nil
}
备注:Collector接口的源码如下
// Collector is the interface a collector has to implement.
type Collector interface {
// Get new metrics and expose them via prometheus registry.
Update(ch chan<- prometheus.Metric) error
}
将构造函数注册到promethues
// 将自定义的collector注册
func init() {
registerCollector("sshconnectnumber", defaultEnabled, NewsshConnectNumberCollector)
}
全部代码
package collector
import (
"fmt"
"github.com/go-kit/log"
"github.com/prometheus/client_golang/prometheus"
"os/exec"
"strconv"
"strings"
)
/*
采集结果如下:
# HELP node_sshconnectnumber_ssh_connect_number ssh connnect number of personaldev user
# TYPE node_sshconnectnumber_ssh_connect_number gauge
node_sshconnectnumber_ssh_connect_number{ssh_port="22"} 8
*/
// 定义常量
const (
sshConnectNumberSubsystem = "sshconnectnumber"
)
// 定义个采集温度的collector类型
type sshConnectNumberCollector struct {
sshconnectnumber *prometheus.Desc
logger log.Logger
}
// 将自定义的collector注册
func init() {
registerCollector("sshconnectnumber", defaultEnabled, NewsshConnectNumberCollector)
}
// 新Collector类型的构造函数
func NewsshConnectNumberCollector(logger log.Logger) (Collector, error) {
return &sshConnectNumberCollector{
// 定义一些显示的描述信息
sshconnectnumber: prometheus.NewDesc(
// 指定namespace,subsystem,name等字段,定义HELP显示的内容
prometheus.BuildFQName(namespace, sshConnectNumberSubsystem, "ssh_connect_number"),
"ssh connnect number of personaldev user",
[]string{
"ssh_port",
},
nil,
),
logger: logger,
}, nil
}
// 必须实现Collector接口的,Update()方法;Update()方法中,会采集用户登录的ssh连接数,并更新监控值到ch
func (s sshConnectNumberCollector) Update(ch chan<- prometheus.Metric) error {
// 使用shell命令获取22端口的连接数
sshPort := "22"
command := fmt.Sprintf("lsof -i:%s |grep ESTABLISHED |wc -l", sshPort)
//fmt.Printf("command exec: %s\n", command)
cmd := exec.Command("/bin/bash", "-c", command)
outputBytes, err := cmd.Output()
if err != nil {
//level.Info(logger).Log("msg", "Starting node_exporter", "version", version.Info())
return err
}
//fmt.Println("output=", string(outputBytes))
// 处理output,去掉空格和换行符
outputString := strings.Trim(string(outputBytes), "\n")
outputStringTrimSpace := strings.Trim(outputString, " ")
// 转换为数值
value, err := strconv.Atoi(outputStringTrimSpace)
if err != nil {
fmt.Println("strconv.Atoi error,err=", err.Error())
return err
}
// 将结果写入ch
ch <- prometheus.MustNewConstMetric(
s.sshconnectnumber,
prometheus.GaugeValue,
float64(value),
sshPort,
)
return nil
}
编译二进制文件并进行测试
备注:这里测试的是本地mac电脑
- 编译出新的二进制文件
(base) node_exporter(master) ✗: pwd
/Users/80280051/Documents/go/src/node_exporter
(base) node_exporter(master) ✗: make build
>> building binaries
/Users/80280051/Documents/go/bin/promu --config .promu-cgo.yml build --prefix /Users/80280051/Documents/go/src/node_exporter
> node_exporter
(base) node_exporter(master) ✗: ls -la node_exporter
-rwxr-xr-x 1 80280051 ADC\Domain Users 17875472 Jul 15 22:12 node_exporter
(base) node_exporter(master) ✗:
- 启动二进制文件
(base) node_exporter(master) ✗: ./node_exporter
ts=2024-07-15T14:13:31.632Z caller=node_exporter.go:193 level=info msg="Starting node_exporter" version="(version=1.8.1, branch=master, revision=c0c1a8c57241071c651a67a9e51cb233faf2d539)"
ts=2024-07-15T14:13:31.632Z caller=node_exporter.go:194 level=info msg="Build context" build_context="(go=go1.21.5, platform=darwin/amd64, user=80280051@PM80280051, date=20240715-14:12:01, tags=unknown)"
ts=2024-07-15T14:13:31.634Z caller=filesystem_common.go:111 level=info collector=filesystem msg="Parsed flag --collector.filesystem.mount-points-exclude" flag=^/(dev)($|/)
ts=2024-07-15T14:13:31.634Z caller=filesystem_common.go:113 level=info collector=filesystem msg="Parsed flag --collector.filesystem.fs-types-exclude" flag=^devfs$
ts=2024-07-15T14:13:31.634Z caller=node_exporter.go:111 level=info msg="Enabled collectors"
ts=2024-07-15T14:13:31.634Z caller=node_exporter.go:118 level=info collector=boottime
ts=2024-07-15T14:13:31.634Z caller=node_exporter.go:118 level=info collector=cpu
ts=2024-07-15T14:13:31.634Z caller=node_exporter.go:118 level=info collector=diskstats
ts=2024-07-15T14:13:31.634Z caller=node_exporter.go:118 level=info collector=filesystem
ts=2024-07-15T14:13:31.634Z caller=node_exporter.go:118 level=info collector=loadavg
ts=2024-07-15T14:13:31.634Z caller=node_exporter.go:118 level=info collector=meminfo
ts=2024-07-15T14:13:31.634Z caller=node_exporter.go:118 level=info collector=netdev
ts=2024-07-15T14:13:31.634Z caller=node_exporter.go:118 level=info collector=os
ts=2024-07-15T14:13:31.634Z caller=node_exporter.go:118 level=info collector=powersupplyclass
ts=2024-07-15T14:13:31.634Z caller=node_exporter.go:118 level=info collector=sshconnectnumber
ts=2024-07-15T14:13:31.634Z caller=node_exporter.go:118 level=info collector=temperature
ts=2024-07-15T14:13:31.635Z caller=node_exporter.go:118 level=info collector=textfile
ts=2024-07-15T14:13:31.635Z caller=node_exporter.go:118 level=info collector=thermal
ts=2024-07-15T14:13:31.635Z caller=node_exporter.go:118 level=info collector=time
ts=2024-07-15T14:13:31.635Z caller=node_exporter.go:118 level=info collector=uname
ts=2024-07-15T14:13:31.636Z caller=tls_config.go:313 level=info msg="Listening on" address=[::]:9100
ts=2024-07-15T14:13:31.636Z caller=tls_config.go:316 level=info msg="TLS is disabled." http2=false address=[::]:9100
ts=2024-07-15T14:13:31.634Z caller=node_exporter.go:118 level=info collector=sshconnectnumber
从这行可以看到新添加的collector=sshconnectnumber已经暴露出来了。
- curl查询结果
(base) ➜ ~ curl 127.0.0.1:9100/metrics |grep ssh
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 38191 0 38191 0 0 382k 0 --:--:-- --:--:-- --:--:-- 384k
node_scrape_collector_duration_seconds{collector="sshconnectnumber"} 0.093141649 # 表示获取数据的时间
node_scrape_collector_success{collector="sshconnectnumber"} 1 # 1 表示成功获取数据; 0 表示获取失败
# HELP node_sshconnectnumber_ssh_connect_number ssh connnect number of personaldev user
# TYPE node_sshconnectnumber_ssh_connect_number gauge
node_sshconnectnumber_ssh_connect_number{ssh_port="22"} 2 # 这里是我们希望得到的数据
(base) ➜ ~
从结果看成功获取了sshconnectnumber对应的值