部署PROMETHEUS+Grafana

最新推荐文章于 2024-09-13 14:10:55 发布

WAIT_TIME

最新推荐文章于 2024-09-13 14:10:55 发布

阅读量269

点赞数

分类专栏：我的运维笔记监控相关文章标签： prometheus grafana pushgateway node_exporter

本文链接：https://blog.csdn.net/wangshui898/article/details/114797662

版权

我的运维笔记同时被 2 个专栏收录

101 篇文章 3 订阅

订阅专栏

监控相关

25 篇文章 1 订阅

订阅专栏

部署PROMETHEUS

下载安装

下载地址

https://prometheus.io/download/

创建prometheus用户

useradd prometheus -s /sbin/nologin

创建prometheus数据目录

mkdir /data/prometheus
chown prometheus.prometheus /data/prometheus

解压部署

tar -zxf prometheus-2.25.0.linux-amd64.tar.gz
mv prometheus-2.25.0.linux-amd64 /usr/local/prometheus

配置文件

vim /usr/local/prometheus/prometheus.yml

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    static_configs:
    - targets: ['192.168.2.150:9090']
  
  - job_name: 'node'
    static_configs:
    - targets:
        - '192.168.2.151:9100'
        - '192.168.2.152:9100'

配置文件说明

scrape_interval				抓取数据时间间隔
evaluation_interval			报警阈值检测频率

配置service启动文件

cat > /usr/lib/systemd/system/prometheus.service << 'EOF'
[Unit]
Description=Prometheus
After=network.target

[Service]
Type=simple
Environment="GOMAXPROCS=4"
User=prometheus
Group=prometheus
ExecReload=/bin/kill -HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
ExecStart=/usr/local/prometheus/prometheus \
  --config.file=/usr/local/prometheus/prometheus.yml \
  --storage.tsdb.path=/data/prometheus \
  --storage.tsdb.retention=30d \
  --web.console.libraries=/usr/local/prometheus/console_libraries \
  --web.console.templates=/usr/local/prometheus/consoles \
  --web.listen-address=0.0.0.0:9090 \
  --web.read-timeout=5m \
  --web.max-connections=10 \
  --query.max-concurrency=20 \
  --query.timeout=2m \
  --web.enable-lifecycle
PrivateTmp=true
PrivateDevices=true
ProtectHome=true
NoNewPrivileges=true
LimitNOFILE=infinity
ReadWriteDirectories=/data/prometheus
ProtectSystem=full

SyslogIdentifier=prometheus
Restart=always

[Install]
WantedBy=multi-user.target
EOF

启动参数说明

web.read-timeout		请求连接等待最大时间(防止太多空链接占用资源)
web.max-connections		最大连接数(获取数据源时候建立最大连接数限制,避免连接数太多资源浪费)
storage.tsdb.retention	监控数据保存时长(企业级一般15为宜)
storage.tsdb.path		监控数据保存路径(如果不设置,默认存在监控当前路径)
query.timeout			针对用户-强行终止太庞大的慢查询
query.max-concurrency	针对用户-访问连接数限制

启动prometheus

systemctl daemon-reload
systemctl start prometheus
systemctl enable prometheus

命令行启动方式

/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --storage.tsdb.path=/data/prometheus --storage.tsdb.retention=30d --web.console.libraries=/usr/local/prometheus/console_libraries --web.console.templates=/usr/local/prometheus/consoles --web.listen-address=0.0.0.0:9090 --web.read-timeout=5m --web.max-connections=10 --query.max-concurrency=20 --query.timeout=2m --web.enable-lifecycle

node-exporter

https://prometheus.io/

解压部署

tar -zxf node_exporter-1.1.2.linux-amd64.tar.gz
mv node_exporter-1.1.2.linux-amd64 /usr/local/node_exporter

配置service启动文件

cat > /usr/lib/systemd/system/node-exporter.service << 'EOF'
[Unit]
Description=This is prometheus node exporter
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/node_exporter/node_exporter
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

启动exporter

systemctl daemon-reload
systemctl start node-exporte
systemctl enable node-exporter

pushgateway

下载

https://prometheus.io/

解压部署

tar -zxf pushgateway-1.4.0.linux-amd64.tar.gz
mv pushgateway-1.4.0.linux-amd64 /usr/local/pushgateway

配置service启动文件

cat > /usr/lib/systemd/system/pushgateway.service << 'EOF'
[Unit]
Description=This is prometheus pushgateway
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/pushgateway/pushgateway
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

启动pushgateway

systemctl daemon-reload
systemctl start pushgateway
systemctl enable pushgateway

脚本样例

#!/bin/bash
instance_name=`hostname -f|cut -d'.' -f1`

if [ ${instance_name} == "localhost" ];then
        echo "Must FQDN hostname"
        exit 1;
fi

label="count_netstat_wait_connections"
count_netstat_wait_connections=`netstat -antp|grep -i wait |wc -l`
echo "$label: ${count_netstat_wait_connections}"
echo "$label ${count_netstat_wait_connections}"| curl --data-binary @- http://192.168.2.150:9091/metrics/job/pushgateway1/instance/${instance_name}

prometheus.yml配置pushgateway

  - job_name: 'pushgateway'
    static_configs:
    - targets:
        - '192.168.2.150:9091'

部署Grafana

下载地址

https://grafana.com/grafana/download

解压安装

tar -zxf grafana-7.4.3.linux-amd64.tar.gz
mv grafana-7.4.3 /usr/local/grafana

配置grafana配置文件

cd /usr/local/grafana/conf; cp defaults.ini grafana.ini

创建对应目录

mkdir -pv /data/grafana/data
mkdir -pv /data/grafana/log

修改配置文件

[paths]
data = /data/grafana/data
temp_data_lifetime = 24h
logs = /data/grafana/log
plugins = /data/grafana/plugins
provisioning = conf/provisioning

[server]
protocol = http
http_addr =
http_port = 3000
domain = 192.168.2.150

[smtp]
enabled = true
host = smtp.sina.com:465
user = wangshui898@sina.com
password = 邮箱授权码
cert_file =
key_file =
skip_verify = false
from_address = wangshui898@sina.com
from_name = Grafana
ehlo_identity =
startTLS_policy =

配置service启动文件

cat > /usr/lib/systemd/system/grafana-server.service << 'EOF'
[Unit]
Description=This is grafana-server
After=network.target

[Service]
Type=simple
WorkingDirectory=/usr/local/grafana
ExecReload=/bin/kill -HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
ExecStart=/usr/local/grafana/bin/grafana-server --config=/usr/local/grafana/conf/grafana.ini --pidfile=/data/grafana/log/grafana-server.pid
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

启动grafana

systemctl daemon-reload
systemctl start grafana-server
systemctl enable grafana-server

登录地址

IP:3000

帐号口令:
admin
admin

告警配置

常用插件

./grafana-cli plugins install grafana-piechart-panel

常用监控项

CPU使用率[1m]

(1-((sum(increase(node_cpu_seconds_total{mode="idle"}[1m])) by (instance))/ (sum(increase(node_cpu_seconds_total[1m])) by (instance))))*100

公式解析:

(1-(所有cpu空闲使用时间)/(所有cpu总使用时间总和))*100

CPU磁盘IO负载[1m]

((sum(increase(node_cpu_seconds_total{mode="iowait"}[1m])) by (instance))/(sum(increase(node_cpu_seconds_total[1m])) by (instance)))*100

公式解析:

(cpu_iowait使用时间综合)/(所有cpu使用时间总和)*100

内存使用率

(1-((node_memory_Buffers_bytes + node_memory_Cached_bytes + node_memory_MemFree_bytes)/node_memory_MemTotal_bytes))*100

可用内存=系统free memory+buffers+cached

buffers和cached虽然被占用,但新的内容到来时,是可以快速释放并应用

内存使用率=实际使用内存/总内存,即: (1-((可用内存)/内存总量)*100

centos7之后,available直接给出实际可用内存数

磁盘容量使用率

方法一:

(1-(node_filesystem_free_bytes/node_filesystem_size_bytes))*100

公式解析:

1-(系统空闲空间/系统总容量)*100

硬盘读写速度[1m]

((rate(node_disk_read_bytes_total[1m])+rate(node_disk_written_bytes_total[1m]))/1024/1024)>0

两次1024后为Mbs

网络带宽

rate(node_network_transmit_bytes_total[1m])

TCP连接数(wait_connections状态)[1m]

使用pushgateway,脚本如下

#!/bin/bash
instance_name=`hostname -f|cut -d'.' -f1`

if [ ${instance_name} == "localhost" ];then
        echo "Must FQDN hostname"
        exit 1;
fi

label="count_netstat_wait_connections"
count_netstat_wait_connections=`netstat -antp|grep -i wait |wc -l`
echo "$label: ${count_netstat_wait_connections}"
echo "$label ${count_netstat_wait_connections}"| curl --data-binary @- http://192.168.2.150:9091/metrics/job/pushgateway1/instance/${instance_name}

公式:

count_netstat_wait_connections

文件描述符使用率

(node_filefd_allocated/node_filefd_maximum) * 100

网络连通性监控

pushgateway方式

脚本

#!/bin/bash
instance_name=`hostname -f|cut -d'.' -f1`

if [ ${instance_name} == "localhost" ];then
        echo "Must FQDN hostname"
        exit 1;
fi

lspk=`timeout 3 ping -q -A -s 500 -W 1000 -c 100 192.168.2.150|grep transmitted|awk '{print $6}'`
rrt=`timeout 3 ping -q -A -s 500 -W 1000 -c 100 192.168.2.150|grep transmitted|awk '{print $10}'`

value_lspk=`echo $lspk|sed "s/%//g"`
value_rrt=`echo $rrt|sed "s/ms//g"`

echo "lost_packet_: ${value_lspk}"
echo "lost_packet ${value_lspk}"|curl --data-binary @- http://192.168.2.150:9091/metrics/job/pushgateway1/instance/${instance_name}

echo "rrt: ${value_rrt}"
echo "rrt ${value_rrt}"|curl --data-binary @- http://192.168.2.150:9091/metrics/job/pushgateway1/instance/${instance_name}

硬盘空间使用率

((node_filesystem_size_bytes{mountpoint="/"}-node_filesystem_free_bytes{mountpoint="/"})/node_filesystem_size_bytes{mountpoint="/"})*100

硬盘空间使用量

大饼图

使用容量

(node_filesystem_size_bytes{mountpoint="/",fstype="xfs",instance="192.168.2.151:9100"}-node_filesystem_free_bytes{mountpoint="/",fstype="xfs",instance="192.168.2.151:9100"})/1024/1024/1024

空闲容量

(node_filesystem_free_bytes{mountpoint="/",fstype="xfs",instance="192.168.2.151:9100"})/1024/1024/1024

硬盘总容量

(node_filesystem_size_bytes{mountpoint="/",fstype="xfs",instance="192.168.2.151:9100"})/1024/1024/1024

WAIT_TIME

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

部署PROMETHEUS+Grafana

部署PROMETHEUS

下载安装

下载地址

创建prometheus用户

创建prometheus数据目录

解压部署

配置文件

配置service启动文件

启动prometheus

node-exporter

解压部署

配置service启动文件

启动exporter

相关计算方法

几种数据类型

几个常用函数

node_exporter采集示例:

pushgateway

下载

解压部署

配置service启动文件

启动pushgateway

脚本样例

prometheus.yml配置pushgateway

部署Grafana

下载地址

配置grafana配置文件

配置service启动文件

启动grafana

常用插件

常用监控项

CPU使用率[1m]

CPU磁盘IO负载[1m]

内存使用率

磁盘容量使用率

硬盘读写速度[1m]

网络带宽

TCP连接数(wait_connections状态)[1m]

文件描述符使用率

网络连通性监控

硬盘空间使用率

硬盘空间使用量