Centos下初次体验Prometheus、Node Exporter和Grafana

最新推荐文章于 2024-04-02 19:50:14 发布

Leon.Sun.

最新推荐文章于 2024-04-02 19:50:14 发布

阅读量466

点赞数

分类专栏：运维文章标签： centos 服务器运维

本文链接：https://blog.csdn.net/qq_34256673/article/details/118447527

版权

运维专栏收录该内容

7 篇文章 0 订阅

订阅专栏

安装Prometheus Server

1.从https://prometheus.io/download/找到最新版本的Prometheus Sevrer软件包：

2.用迅雷下载下来（使用curl可能速度会比较慢），再通过挂载U盘或其他方式传到服务器上。

或者在服务器上直接使用下面的命令下载，其中2.28.1是版本号，需要根据实际情况来改。

curl -LO https://github.com/prometheus/prometheus/releases/download/v2.28.1/prometheus-2.28.1.linux-amd64.tar.gz

3.将下载下来的包放到合适的位置，然后解压。

tar -xzf prometheus-2.28.1.linux-amd64.tar.gz

4.解压后当前目录会包含默认的Prometheus配置文件promethes.yml:

[root@localhost prometheus-2.28.1.linux-amd64]# more prometheus.yml
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Defaul
t is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is eve
ry 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evalua
tion_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

Promtheus作为一个时间序列数据库，其采集的数据会以文件的形似存储在本地中，默认的存储路径为data/，因此我们需要先手动创建该目录：

mkdir -p data

用户也可以通过参数--storage.tsdb.path="data/"修改本地数据存储的路径。

启动prometheus服务，其会默认加载当前路径下的prometheus.yaml文件：

./prometheus

正常的情况下，你可以看到以下输出内容：

[root@localhost prometheus-2.28.1.linux-amd64]# ./prometheus
level=info ts=2021-07-03T15:17:09.788Z caller=main.go:389 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2021-07-03T15:17:09.788Z caller=main.go:443 msg="Starting Prometheus" version="(version=2.28.1, branch=HEAD, revision=b0944590a1c9a6b35dc5a696869f75f422b107a1)"
level=info ts=2021-07-03T15:17:09.788Z caller=main.go:448 build_context="(go=go1.16.5, user=root@2915dd495090, date=20210701-15:20:10)"
level=info ts=2021-07-03T15:17:09.788Z caller=main.go:449 host_details="(Linux 4.18.0-80.el8.x86_64 #1 SMP Tue Jun 4 09:19:46 UTC 2019 x86_64 localhost.localdomain (none))"
level=info ts=2021-07-03T15:17:09.788Z caller=main.go:450 fd_limits="(soft=1024, hard=4096)"
level=info ts=2021-07-03T15:17:09.788Z caller=main.go:451 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2021-07-03T15:17:09.789Z caller=web.go:541 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2021-07-03T15:17:09.791Z caller=main.go:824 msg="Starting TSDB ..."
level=info ts=2021-07-03T15:17:09.792Z caller=tls_config.go:191 component=web msg="TLS is disabled." http2=false
level=info ts=2021-07-03T15:17:09.794Z caller=head.go:780 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
level=info ts=2021-07-03T15:17:09.794Z caller=head.go:794 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=3.148µs
level=info ts=2021-07-03T15:17:09.794Z caller=head.go:800 component=tsdb msg="Replaying WAL, this may take a while"
level=info ts=2021-07-03T15:17:09.794Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
level=info ts=2021-07-03T15:17:09.794Z caller=head.go:860 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=16.788µs wal_replay_duration=148.297µs total_replay_duration=184.408µs
level=info ts=2021-07-03T15:17:09.795Z caller=main.go:851 fs_type=XFS_SUPER_MAGIC
level=info ts=2021-07-03T15:17:09.795Z caller=main.go:854 msg="TSDB started"
level=info ts=2021-07-03T15:17:09.795Z caller=main.go:981 msg="Loading configuration file" filename=prometheus.yml
level=info ts=2021-07-03T15:17:09.796Z caller=main.go:1012 msg="Completed loading of configuration file" filename=prometheus.yml totalDuration=1.239266ms remote_storage=2.094µs web_handler=534ns query_engine=1.043µs scrape=864.745µs scrape_sd=30.923µs notify=38.37µs notify_sd=15.864µs rules=1.41µs
level=info ts=2021-07-03T15:17:09.796Z caller=main.go:796 msg="Server is ready to receive web requests."

5.启动完成后，可以通过0.0.0.0:9090或http://localhost:9090访问Prometheus的UI界面：

使用Node Exporter采集主机运行数据

安装Node Exporter

1.Node Exporter同样采用Golang编写，并且不存在任何的第三方依赖，只需要下载，解压即可运行。可以从https://prometheus.io/download/获取最新的node exporter版本的二进制包。用迅雷下载下来（使用curl可能速度会比较慢），再通过挂载U盘或其他方式传到服务器上。或者在服务器上直接使用下面的命令下载，其中1.1.2是版本号，需要根据实际情况来改。

下载：
curl -LO https://github.com/prometheus/node_exporter/releases/download/v1.1.2/node_exporter-1.1.2.linux-amd64.tar.gz
解压：
tar -xzf node_exporter-1.1.2.linux-amd64.tar.gz
移动node_exporter：
cd node_exporter-1.1.2.linux-amd64/
mv node_exporter /usr/local/bin/

2.后台运行node exporter

后台启动：
cd /usr/local/bin/
./node_exporter  &

3.启动成功后，查看端口，这里可以看到node_export已经起来了

访问0.0.0.0:9100或http://localhost:9100/可以看到以下页面：

初始Node Exporter监控指标

访问0.0.0.0:9100 /metrics，可以看到当前node exporter获取到的当前主机的所有监控数据，如下所示：

每一个监控指标之前都会有一段类似于如下形式的信息：

# HELP node_memory_Slab_bytes Memory information field Slab_bytes.
# TYPE node_memory_Slab_bytes gauge
node_memory_Slab_bytes 3.00969984e+08
# HELP node_memory_SwapCached_bytes Memory information field SwapCached_bytes.
# TYPE node_memory_SwapCached_bytes gauge
node_memory_SwapCached_bytes 0
# HELP node_memory_SwapFree_bytes Memory information field SwapFree_bytes.
# TYPE node_memory_SwapFree_bytes gauge
node_memory_SwapFree_bytes 2.147479552e+09
# HELP node_memory_SwapTotal_bytes Memory information field SwapTotal_bytes.
# TYPE node_memory_SwapTotal_bytes gauge
node_memory_SwapTotal_bytes 2.147479552e+09

其中HELP用于解释当前指标的含义，TYPE则说明当前指标的数据类型。在上面的例子中node_cpu的注释表明当前指标是cpu0上idle进程占用CPU的总时间，CPU占用时间是一个只增不减的度量指标，从类型中也可以看出node_cpu的数据类型是计数器(counter)，与该指标的实际含义一致。又例如node_load1该指标反映了当前主机在最近一分钟以内的负载情况，系统的负载情况会随系统资源的使用而变化，因此node_load1反映的是当前状态，数据可能增加也可能减少，从注释中可以看出当前指标类型为仪表盘(gauge)，与指标反映的实际含义一致。

除了这些以外，在当前页面中根据物理主机系统的不同，你还可能看到如下监控指标：

node_boot_time：系统启动时间
node_cpu：系统CPU使用量
nodedisk*：磁盘IO
nodefilesystem*：文件系统用量
node_load1：系统负载
nodememeory*：内存使用量
nodenetwork*：网络带宽
node_time：当前系统时间
go_*：node exporter中go相关指标
process_*：node exporter自身进程相关运行指标

从Node Exporter收集监控数据

为了能够让Prometheus Server能够从当前node exporter获取到监控数据，这里需要修改Prometheus配置文件。编辑prometheus.yml并在scrape_configs节点下添加以下内容（最后4行为添加内容）:

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped 
from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

# 采集node exporter监控数据
  - job_name: 'node'
    static_configs:
    - targets: ['localhost:9100']

重新启动Prometheus Server

访问http://localhost:9090，进入到Prometheus Server。如果输入“up”并且点击执行按钮以后，可以看到如下结果：

Expression Browser

如果Prometheus能够正常从node exporter获取数据，则会看到以下结果：

up{instance="localhost:9090", job="prometheus"}
	1
up{instance="localhost:9100", job="node"}
	1

其中“1”表示正常，反之“0”则为异常。

安装Grafana

1、下载Grafana，去https://grafana.com/grafana/download查找与系统相对应的版本下载，wget也是一个很好的下载工具，虽然我还是喜欢用迅雷下，因为每次用命令下都很慢，云主机的话，还是命令更合适一些。

wget https://dl.grafana.com/oss/release/grafana-8.0.4-1.x86_64.rpm

在用管理员权限运行

sudo yum install grafana-8.0.4-1.x86_64.rpm

我这边使用rpm包安装了 Grafana ，那么我的配置文件位于/etc/grafana/grafana.ini，custom.ini不单独使用。该路径在 Grafana init.d 脚本中使用--configfile 参数指定。现在不需要修改配置文件，直接先设置开机启动

systemctl enable grafana-server

再运行该服务

systemctl start grafana-server

可以用下面的命令查看状态

systemctl status grafana-server

查看该服务监听的端口

netstat -antp

可以看到3000端口被监听，直接访问localhost:3000进grafana主页，默认用户名和密码，都是admin

tcp        0      0 192.168.116.129:56632   34.120.177.193:443      ESTABLISHED 12239/grafana-serve     
tcp6       0      0 :::3000                 :::*                    LISTEN      12239/grafana-serve

登陆后如下图所示