Nacos监控中心篇（一）Prometheus+Grafana的配置

zhang24360

已于 2022-06-23 10:44:35 修改

阅读量2.2k

点赞数

分类专栏： Nacos源码分析文章标签： java spring cloud 开发语言

于 2022-06-19 18:03:03 首次发布

本文链接：https://blog.csdn.net/zhang24360/article/details/125357297

版权

Nacos源码分析专栏收录该内容

6 篇文章 0 订阅

订阅专栏

配置

暴露metrics数据

Nacos集群搭建地址：Nacos支持三种部署模式

搭建Nacos集群后，我们要在集群的每个节点中的配置文件application.properties中，暴露metric数据。

实际上，就是把注释取消即可。

访问10.128.198.200:8845/nacos/actuator/prometheus，看是否能访问到metrics数据，我这里直接演示下访问效果

搭建prometheus采集Nacos metrics数据

下载你想安装的prometheus版本，地址为：https://prometheus.io/download/

我这里选择的版本是

修改配置文件prometheus.yml采集Nacos metrics数据

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
      - targets: ["localhost:9090"]
      
  - job_name: "nacos-cluster"    
    scrape_interval: 60s
    metrics_path: '/nacos/actuator/prometheus'
    static_configs:
      - targets: ["10.128.198.200:8845","10.128.198.200:8846","10.128.198.200:8848"]
        labels:
            instance: nacos cluster

启动prometheus服务

prometheus.exe --config.file=prometheus.yml

检测是否配置成功

访问这个地址

http://10.128.198.200:9090/targets

看到如下,配置成功

或者

http://10.128.198.200:9090/graph

可以看到prometheus的采集数据，在搜索栏搜索nacos_monitor可以搜索到Nacos数据说明采集数据成功!

此时，说明可以收集到Nacos的metric数据了，但是这个数据我需要有个大盘展示

搭建grafana图形化展示metrics数据

和prometheus在同一台机器上安装grafana

参考文档：Install on Windows | Grafana documentationhttp://docs.grafana.org/installation/windows/

访问grafana（用谷歌浏览器）:

账号密码都是admin

http://10.128.198.200:3000/loginhttp://10.128.198.200:3000/login

配置prometheus数据源

URL填写prometheus的地址，http method选择get方法,然后点"Save & Test"

配置一个Nacos的数据源头

现在你添加了两个数据源了，点击save保存

导入Nacos grafana监控模版

这里你需要从Nacosgrafana监控模版，官网指定模版、详见Nacos官网。用于监控nacos各项指标。注：prometheus在Gr-Java文档类资源-CSDN下载下载MySQL_Overview.json，然后通过上面页面的Upload .json File按钮上传上去，导入即可。

会跳到这个界面

点击右上角的设置按钮

跳转主页，显示结果，成功

遇见的问题

Nacos grafana显示为空_StarJava_的博客-CSDN博客_nacos订阅者列表为空Nacos grafana显示为空在搭建 Shoulder-Platform 时，根据 Nacos 官方教程对 Nacos 监控时，未正确显示数据，仪表盘为空。检查后发现是以下两个原因导致的：数据源不正确- [nacos 官方给的监控模板](https://github.com/nacos-group/nacos-template)中的数据源名称为 `prometheus` ，而 `Grafana` 默认的 Prometheus 数据源名称为 `Prometheus`（P大写），由于不匹配，导致不会https://blog.csdn.net/qq_35425070/article/details/108114911

Nacos监控分为三个模块:

nacos monitor展示核心监控项
nacos detail展示指标的变化曲线
nacos alert为告警项

Nacos metrics含义

jvm metrics

指标	含义
system_cpu_usage	CPU使用率
system_load_average_1m	load
jvm_memory_used_bytes	内存使用字节，包含各种内存区
jvm_memory_max_bytes	内存最大字节，包含各种内存区
jvm_gc_pause_seconds_count	gc次数，包含各种gc
jvm_gc_pause_seconds_sum	gc耗时，包含各种gc
jvm_threads_daemon	线程数

Nacos 监控指标

指标	含义
http_server_requests_seconds_count	http请求次数，包括多种(url,方法,code)
http_server_requests_seconds_sum	http请求总耗时，包括多种(url,方法,code)
nacos_timer_seconds_sum	Nacos config水平通知耗时
nacos_timer_seconds_count	Nacos config水平通知次数
nacos_monitor{name='longPolling'}	Nacos config长连接数
nacos_monitor{name='configCount'}	Nacos config配置个数
nacos_monitor{name='dumpTask'}	Nacos config配置落盘任务堆积数
nacos_monitor{name='notifyTask'}	Nacos config配置水平通知任务堆积数
nacos_monitor{name='getConfig'}	Nacos config读配置统计数
nacos_monitor{name='publish'}	Nacos config写配置统计数
nacos_monitor{name='ipCount'}	Nacos naming ip个数
nacos_monitor{name='domCount'}	Nacos naming域名个数(1.x 版本)
nacos_monitor{name='serviceCount'}	Nacos naming域名个数(2.x 版本)
nacos_monitor{name='failedPush'}	Nacos naming推送失败数
nacos_monitor{name='avgPushCost'}	Nacos naming平均推送耗时
nacos_monitor{name='leaderStatus'}	Nacos naming角色状态
nacos_monitor{name='maxPushCost'}	Nacos naming最大推送耗时
nacos_monitor{name='mysqlhealthCheck'}	Nacos naming mysql健康检查次数
nacos_monitor{name='httpHealthCheck'}	Nacos naming http健康检查次数
nacos_monitor{name='tcpHealthCheck'}	Nacos naming tcp健康检查次数

nacos 异常指标

指标	含义
nacos_exception_total{name='db'}	数据库异常
nacos_exception_total{name='configNotify'}	Nacos config水平通知失败
nacos_exception_total{name='unhealth'}	Nacos config server之间健康检查异常
nacos_exception_total{name='disk'}	Nacos naming写磁盘异常
nacos_exception_total{name='leaderSendBeatFailed'}	Nacos naming leader发送心跳异常
nacos_exception_total{name='illegalArgument'}	请求参数不合法
nacos_exception_total{name='nacos'}	Nacos请求响应内部错误异常（读写失败，没权限，参数错误）

client metrics

指标	含义
nacos_monitor{name='subServiceCount'}	订阅的服务数
nacos_monitor{name='pubServiceCount'}	发布的服务数
nacos_monitor{name='configListenSize'}	监听的配置数
nacos_client_request_seconds_count	请求的次数，包括多种(url,方法,code)
nacos_client_request_seconds_sum	请求的总耗时，包括多种(url,方法,code)