Prometheus（八）：Prometheus监控elasticsearch及常用API

暮雨浅夏

于 2024-08-29 16:17:51 发布

阅读量700

点赞数 5

分类专栏：监控文章标签： prometheus elasticsearch jenkins

本文链接：https://blog.csdn.net/dl_11/article/details/141680508

版权

监控专栏收录该内容

13 篇文章 8 订阅

订阅专栏

1 Prometheus监控elasticsearch

使用Prometheus获取Elasticsearch的监控指标，一共有两者方法：

通过启动ES自带的监控模块暴露指标数据
通过Prometheus的插件 Elasticsearch Exporter来获取指标数据

1.1 启动ES自带的监控模块暴露指标数据

通过启动ES自带的监控模块暴露指标数据，主要步骤如下:
1、在 Elasticsearch 中启用监控模块修改 Elasticsearch 的配置文件,加入监控相关配置:

xpack.monitoring.collection.enabled: true  # 启用监控收集
http.cors.enabled: true
http.cors.allow-origin: "*"  # 设置跨域访问

重启 Elasticsearch 实例后,监控相关 API 会自动启用。
2、配置 Prometheus 监控 Elasticsearch
在 Prometheus 的配置文件中添加 Elasticsearch 的 job:

scrape_configs:
- job_name: 'elasticsearch'
  metrics_path: "/_prometheus/metrics"
  static_configs:
  - targets: 
    - "es-master:9200"     # Elasticsearch master 节点地址

3、Prometheus 初始抓取后,可以在控制台看到 Elasticsearch 的相关指标,如:

es_process_cpu_seconds_total # CPU 时间
es_jvm_memory_bytes_committed # JVM 内存占用
es_indices_indexing_index_total # 索引次数
es_nodes_fs_total_bytes # 节点磁盘空间占用
等等

4、根据指标定义告警规则
当某些关键指标超过阈值时,Prometheus 可以发出告警,如:

groups:
- name: elasticsearch 
  rules:
  - alert: ElasticsearchNodeDown
    expr: up{job="elasticsearch", instance="es-master:9200"} == 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Elasticsearch master node is down"

1.2 通过Prometheus的插件 Elasticsearch Exporter来获取指标数据

1、简介

Elasticsearch Exporter主要是用来获取 Elasticsearch的指标数据，是使用Go语言写的，端口号为：9114

2、安装

# 下载安装
wget https://github.com/prometheus-community/elasticsearch_exporter/releases/download/v1.6.0/elasticsearch_exporter-1.6.0.linux-amd64.tar.gz
tar -zxf elasticsearch_exporter-1.6.0.linux-amd64.tar.gz -C /usr/local/

# 配置快速启动文件
cat > /lib/systemd/system/elasticsearch_exporter.service << EOF
[Unit]
Description=elasticsearch_exporter
After=syslog.target network.target
[Service]
Type=simple
RemainAfterExit=no
WorkingDirectory=/usr/local/elasticsearch_exporter-1.6.0.linux-amd64/
User=root
Group=root
ExecStart=/usr/local/elasticsearch_exporter-1.6.0.linux-amd64/elasticsearch_exporter  --es.all --es.indices --collector.clustersettings --es.node="elk" --es.indices_settings --es.shards --es.snapshots --es.timeout=5s --web.listen-address ":9114" --web.telemetry-path "/metrics" --es.ssl-skip-verify --es.clusterinfo.interval=5m --es.uri https://elastic:password@192.168.92.100:9200
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
### 参数说明
--es.all：查询集群中所有节点的统计信息
--es.indices：查询集群中所有索引的统计信息
--es.indices_settings：查询集群中所有索引的设置状态
--collector.clustersettings：查询集群设置的统计信息(从v1.6.0开始，此标志已取代" .cluster_settings")
--es.node：需要获取指标的节点min
--es.shards：查询集群中所有索引的统计信息，包括分片级统计信息
--es.snapshots：导出集群快照的统计信息 
--es.timeout=5s：尝试从Elasticsearch获取统计信息超时 
--web.listen-address ":9114"：地址监听网络接口，默认是9114
--es.uri http://用户:口令@IP:端口 ：应该连接到的Elasticsearch节点的地址(主机和端口)。
--web.telemetry-path "/metrics"：指标路径
--es.ssl-skip-verify：连接Elasticsearch时跳过SSL验证
--es.clusterinfo.interval=5m：集群标签的集群信息更新间隔


# 启动
systemctl daemon-reload
systemctl start elasticsearch_exporter.service 

# 查看
192.168.92.100:9555/metrics

3、Prometheus配置

cat prometheus.yml
  - job_name: 'elasticsearch'
    scrape_interval: 15s
    scrape_timeout: 15s
    metrics_path: /metrics
    static_configs:
      - targets: ['192.168.92.100:9555']
      
systemctl restart prometheus

2 Prometheus常用API

在Prometheus中，API的使用很常见，以下是常见的几个

2.1 查询

# 接口查询
curl http://localhost:9090/api/v1

# 查询时间段内的
curl 'http://localhost:9090/api/v1/query_range?query=up&start=2023-12-10T20:10:30.781Z&end=2023-12-11T20:11:00.781Z&step=15s'
 
# 获取到所有job
curl http://127.0.0.1:9090/api/v1/label/job/values
 
 
# 查询10秒内数据
curl http://127.0.0.1:9090/api/v1/query_range?query=my_job&start=1607999428.447&end=1607999468.447&step=10s
 
 
# 查询当前
curl http://127.0.0.1:9090/api/v1/query?query=network_traffic_input
 

    

# 查询元数据
curl -g 'http://localhost:9090/api/v1/series?match[]=up&match[]=process_start_time_seconds{job="prometheus"}'

2.2 删除

# 删除某个标签匹配的数据
curl -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={instance="172.18.0.2:8300"}'

curl -X POST -g 'http://192.168.92.100:9090/api/v1/admin/tsdb/delete_series?match[]=job="pushgateway"}&start<2023-02-26T00:00:00Z&end=2023-12-11T00:00:00Z'

# 删除指定 Metric 名称全部数据.只是将数据标记为删除，实际的数据(tombstones)仍然存在于磁盘上，其在将来的某一时刻会被Prometheus清除释放空间，也可以通过数据清理接口显式地清除。
curl -X PUT -g 'http://127.0.0.1:9090/api/v1/admin/tsdb/delete_series?match[]=up{job="pushgateway"}&start=2023-08-01T00:00:00.000Z'

# 删除单个instance的数据
curl -X PUT -g 'http://127.0.0.1:9090/api/v1/admin/tsdb/delete_series?match[]={instance="192.168.92.100:9100"}'
# 删除所有instance的数据，
curl -X PUT -g 'http://127.0.0.1:9090/api/v1/admin/tsdb/delete_series?match[]={instance=~".*"}&start>2023-02-26T00:00:00Z&end=2023-12-12T03:30:00Z'

# 从磁盘删除已经被 delete_series 接口删除的数据，并清理现有的 tombstones
curl -X POST http://192.168.92.100:9090/api/v1/admin/tsdb/clean_tombstones

curl -XDELETE -g 'http://localhost:9090/api/v1/series?match[]=up&match[]=process_start_time_seconds{job="prometheus"}'

2.3 注册服务

consul注册服务
curl -X PUT -d '{"id": "test1","name": "test1","address": "192.24.17.156","port": 9500,"tags": ["dev"],"checks": [{"http": "http://192.24.17.156:9500/","interval": "5s"}]}'     http://192.24.17.156:8500/v1/agent/service/register