静态服务发现
Prometheus yml配置文件配置服务发现
[root@localhost prometheus-2.3.2.linux-amd64]# cat prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'test'
static_configs:
- targets:
- 192.168.64.169:9100
scrape_confugs-抓取服务发现主配置
static_configs-静态服务发现配置,job_name-指的是在Prometheus可视化端筛选的标签,默认会添加一个Prometheus的job_name,用于监控自身服务是否正常,一个job是指多个类型功能的集合。
上述是job_name为test的参数指标,可以在promQL用下述代码进行筛选。
node_cpu_seconds_total{job="test"}
指标参数metrics是在配置中已经默认给予,可自己修改,targets对应着Prometheus可视化端的targets,可视化端记录了被监控的节点和state信息。
静态服务发现配置如下:
# metrics_path defaults to '/metrics'
- job_name: 'test'
static_configs:
- targets:
- 192.168.64.169:9100
DNS服务发现(不太熟悉)
prometheus 基于DNS的目标发现
DNS服务发现依赖于查询A、AAAA或SRV DNS记录。
1、基于 SRV 记录发现
scrape_configs:
- job_name: 'webapp'
dns_sd_configs:
- names: [‘_prometheus._tcp.shhnwangjian.com]
注意:_prometheus 为服务名称,_tcp 为协议, shhnwangjian.com 为域名
2、基于 A 记录
需要编辑bind服务的named配置文件形成A记录
- job_name: 'webapp'
dns_sd_configs:
- names: [ 'ops.shhnwangjian.cn']
type: A
port: 9090
文件服务发现
准备工作
[root@localhost prometheus-2.3.2.linux-amd64]# mkdir target
[root@localhost prometheus-2.3.2.linux-amd64]# cd target/
[root@localhost target]# vim nodes-only.yaml
[root@localhost target]# cat nodes-only.yaml
- targets:
- localhost:9090
labels:
app:prometheus
job:prometheus
- targets:
- 192.168.64.169:9100
labels:
app:node-exporter
job:test
- job_name: node
file_sd_configs
- files:
- targets/nodes-*.yaml
refresh_interval: 2m #每隔俩分钟加载一次
静态服务发现有一个很大的弊端,若你Prometheus.yml文件的targets进行修改,可视化端不会实时响应修改后的配置,使用文件服务发现,不需要依赖任何组件,refresh_interval可以解决静态服务发现不及时响应的弊端。
consul服务发现
让大量的node在consul进行注册,让Prometheus去读取注册信息
下载地址 https://www.consul.io/downloads/
安装 consul
unzip consul_1.9.2_linux_amd64.zip -d /usr/local/bin/
启动开发者模式 端口是8500
mkdir /consul/data -pv
consul agent -dev -ui -data-dir=/consul/data -config-dir=/etc/consul/ -client=0.0.0.0
在/etc/sonsul 写对应的json文件
{
"services":[
{
"id":"node_exporter01",
"name":"node01", #在consul ui展示的名称
"address":"192.168.64.169",
"port":9100,
"tags":["nodes"], #在Prometheus进行筛选
"checks":[{
"http":"http://192.168.64.146:9100/mestics",
"interval":"5s"
}]
},
{
"id":"node_exporter02",
"name":"node02",
"address":"192.168.64.169",
"prot":9100,
"tags":["nodes"],
"checks":[{
"http":"http://192.168.64.168:9100/mestics",
"interval":"5s"
}]
}]
}
Prometheus.yml文件内 发现机制变成consul_sd_config
配置如下:
- job_name:"test"
consul_sd_configs:
- server: "consul的ui地址(ip:port)"
tags: #是/etc/consul/_.json内定义的tags和值
- "nodes"
refresh_interval: 2m #2m加载一下配置
再consul安装path内
consul reload
基于kubernetesAPI对各类资源包含Node,Service,Endpoint,Pod和Ingress等资源类型下相应的个资源对象是做target,持续监视相关资源变动,内容有点多,下个博客再介绍。