一、prometheus_server端部署prometheus
prometheus是基于T_S(timeseries)的数据库,对系统时间的准确性要求很高,必须保证主机时间的实时同步。所以,安装prometheus之前,须先进行ntp时间同步
[root@test ~]# yum install ntpdate -y
[root@test ~]# ntpdate ntp1.aliyun.com
[root@test ~]# hwclock -w
[root@test ~]# crontab -e
* * * * * /usr/sbin/ntpdate ntp1.aliyun.com 2&> /dev/null && /usr/sbin/hwclock -w 2&>/dev/null
关闭防火墙和selinux
[root@test ~]# sed -i 's/SELINUX=enforcing/SLINUX=disabled/g' /etc/selinux/config
[root@test ~]# systemctl stop firewalld
[root@test ~]# systemctl disable firewalld
[root@test ~]# setenforce 0
下载安装
二进制包下载解压后即可使用,官网地址:https://prometheus.io/
[root@test ~]# wget https://github.com/prometheus/prometheus/releases/download/v2.30.1/prometheus-2.30.1.linux-amd64.tar.gz
[root@test ~]# ls
prometheus-2.30.1.linux-amd64.tar.gz
[root@test ~]# tar -xf prometheus-2.30.1.linux-amd64.tar.gz -C /usr/local
[root@test ~]# mv /usr/local/prometheus-2.30.1.linux-amd64/ /usr/local/prometheus/
启动
# 查看版本等信息
[root@test prometheus]# ./prometheus --version
prometheus, version 2.30.1 (branch: HEAD, revision: fafb309d4027b050c917362d7d2680c5ad6f6e9e)
build user: root@36ab67e1b043
build date: 20210928-09:41:36
go version: go1.17.1
platform: linux/amd64
[root@test ~]# cd /usr/local/prometheus/
# 前台启动
[root@test prometheus]# ./prometheus
# 当不带任何参数启动时,其实默认自带参数 --config.file=prometheus.yml ,使用当前目录下的prometheus.yml文件作为配置文件
# 后台启动为: nohup ./prometheus & ,输出的信息会存放在当前目录下的nohup.out中
# 或者 nohup ./prometheus &> /PATH/FILENAME &
level=info ts=2021-09-29T10:58:37.550Z caller=main.go:400 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2021-09-29T10:58:37.550Z caller=main.go:438 msg="Starting Prometheus" version="(version=2.30.1, branch=HEAD, revision=fafb309d4027b050c917362d7d2680c5ad6f6e9e)"
level=info ts=2021-09-29T10:58:37.550Z caller=main.go:443 build_context="(go=go1.17.1, user=root@36ab67e1b043, date=20210928-09:41:36)"
level=info ts=2021-09-29T10:58:37.550Z caller=main.go:444 host_details="(Linux 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 test (none))"
level=info ts=2021-09-29T10:58:37.550Z caller=main.go:445 fd_limits="(soft=1024, hard=4096)"
level=info ts=2021-09-29T10:58:37.550Z caller=main.go:446 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2021-09-29T10:58:37.552Z caller=web.go:541 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2021-09-29T10:58:37.552Z caller=main.go:822 msg="Starting TSDB ..."
level=warn ts=2021-09-29T10:58:37.552Z caller=db.go:683 component=tsdb msg="A TSDB lockfile from a previous execution already existed. It was replaced" file=/usr/local/prometheus/data/lock
level=info ts=2021-09-29T10:58:37.553Z caller=tls_config.go:191 component=web msg="TLS is disabled." http2=false
level=info ts=2021-09-29T10:58:37.554Z caller=head.go:466 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
level=info ts=2021-09-29T10:58:37.554Z caller=head.go:500 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=3.947µs
level=info ts=2021-09-29T10:58:37.554Z caller=head.go:506 component=tsdb msg="Replaying WAL, this may take a while"
level=info ts=2021-09-29T10:58:37.556Z caller=head.go:577 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=2
level=info ts=2021-09-29T10:58:37.558Z caller=head.go:577 component=tsdb msg="WAL segment loaded" segment=1 maxSegment=2
level=info ts=2021-09-29T10:58:37.559Z caller=head.go:577 component=tsdb msg="WAL segment loaded" segment=2 maxSegment=2
level=info ts=2021-09-29T10:58:37.559Z caller=head.go:583 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=26.379µs wal_replay_duration=4.976829ms total_replay_duration=5.017724ms
level=info ts=2021-09-29T10:58:37.561Z caller=main.go:849 fs_type=XFS_SUPER_MAGIC
level=info ts=2021-09-29T10:58:37.561Z caller=main.go:852 msg="TSDB started"
level=info ts=2021-09-29T10:58:37.561Z caller=main.go:979 msg="Loading configuration file" filename=prometheus.yml
level=info ts=2021-09-29T10:58:37.602Z caller=main.go:1016 msg="Completed loading of configuration file" filename=prometheus.yml totalDuration=41.166168ms db_storage=1.422µs remote_storage=14.496µs web_handler=531ns query_engine=1.442µs scrape=40.625426ms scrape_sd=84.755µs notify=23.322µs notify_sd=7.193µs rules=2.375µs
level=info ts=2021-09-29T10:58:37.602Z caller=main.go:794 msg="Server is ready to receive web requests."
# 默认prometheus会把运行的本机同时当作本监控节点(prometheus server不需要部署node_exporter)
# 本机IP
[root@test ~]# ifconfig | awk 'NR==2 {print $2}'
192.168.126.120
在默认配置中,已经将 Prometheus Server 自身作为被监控端进行监控
可以通过HTTP_GET请求方式查看prometheus对本机的监控内容:
[root@test ~]# curl localhost:9090/metrics
# 或者在浏览器访问 http://192.168.126.120:9090/metrics
# 可以看到获取到的当前主机的所有监控数据,返回了⼤量的这种 metrics类型 K/V数据
Prometheus UI是Prometheus内置的一个可视化管理界面,当 Prometheus 启动成功后,可以访问 http://IP:9090 页面。通过Prometheus UI用户能够轻松的了解 Prometheus当前的配置,监控任务运行状态等。在Prometheus UI的 Web Console界面, 用它可以进行任何 PromQL 查询和调试工作,非常方便,通过 Graph 面板,用户还能直接使用 PromQL 实时查询监控数据:
那么此时可以使用 PromQL (Prometheus Query Language)在Web Console界面也可以对采集到的metrics中的某一key值进行查看
查看监控节点
prometheus配置文件说明
# 配置文件格式是yaml格式
# 此片段指定的是 prometheus 的全局配置, 比如采集间隔,抓取超时时间等。( 通常可以被job单独的配置覆盖
global:)
global:
# 默认抓取周期,可用单位ms、smhdwy #设置每15s采集数据一次,默认1分钟
[ scrape_interval: <duration> | default = 1m ]
# 默认抓取超时
[ scrape_timeout: <duration> | default = 10s ]
# 监控规则的默认周期(多长时间会进行一次监控规则的评估) ,默认1分钟
# 举例:假如设置当内存使用量>70%时发出报警这么一条rule(规则),那么prometheus会根据这个配置项设定的时间来执行一次这个规则检查内存的使用情况
[<