004-监控Linux服务器

最新推荐文章于 2023-11-19 15:59:58 发布

提笔书几行？

最新推荐文章于 2023-11-19 15:59:58 发布

阅读量458

点赞数

分类专栏：运维文章标签：服务器 linux 运维

本文链接：https://blog.csdn.net/CChenheng/article/details/129353350

版权

运维专栏收录该内容

7 篇文章 0 订阅

订阅专栏

环境说明：

操作系统：Centos7-2009
确保网络连接正常、yum源正常

监控主机	11.0.1.137
被监控主机	11.0.1.134
版本node_exporter	v1.5.0

步骤说明：

1、原理逻辑

使用 **ip:9100/metrics **暴露数据

2、被监控主机安装node_exporter

:::tips
exporter列表：https://prometheus.io/docs/instrumenting/exporters
:::

node_exporter获取：

官网获取软件包：https://prometheus.io/download/

GitHub获取软件包：

1、进入exporter列表
https://prometheus.io/docs/instrumenting/exporters/

2、跳转GitHub
https://github.com/prometheus/node_exporter

3、获取包

3、在Prometheus配置被监控主机信息

安装node_exporter

mkdir /opt/node-exporter
cd /opt/node-exporter/
wget https://github.com/prometheus/node_exporter/releases/download/v1.5.0/node_exporter-1.5.0.linux-amd64.tar.gz
tar xzvf node_exporter-1.5.0.linux-amd64.tar.gz
mv node_exporter-1.5.0.linux-amd64  node_exporter-1.5.0

#测试启动
cd node_exporter-1.5.0/
./node_exporter
.........

ts=2023-02-28T15:07:53.817Z caller=tls_config.go:232 level=info msg="Listening on" address=[::]:9100
ts=2023-02-28T15:07:53.817Z caller=tls_config.go:235 level=info msg="TLS is disabled." http2=false address=[::]:9100

访问测试：http://11.0.1.134:9100/

配置服务管理

# vi /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=node_exporter
[Service]
ExecStart=/opt/node-exporter/node_exporter-1.5.0/node_exporter
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
[Install]
WantedBy=multi-user.target

#重载、启动、开机启动
systemctl daemon-reload
systemctl start node_exporter
systemctl enable node_exporter

3、监控主机配置的


vi prometheus.yml
#增加监控配置
  - job_name: "exp"
    static_configs:
      - targets: ["11.0.1.134:9100"]

#配置检查，如下结果为ok
./promtool check config ./prometheus.yml
Checking ./prometheus.yml
 SUCCESS: ./prometheus.yml is valid prometheus config file syntax

#进行热加载
##格式为： kill -HUP pid
ps -ef |grep prometheus
ps -ef |grep prometheus
root      10306      1  0 20:30 ?        00:00:05 /opt/monitor/prometheus/prometheus --config.file=/opt/monitor/prometheus/prometheus.yml
root      10709   1931  0 23:19 pts/1    00:00:00 grep --color=auto prometheus
kill -HUP 10306

页面测试：

4、在Grafana展示数据：

前提绑定Prometheus的数据源

导入仪表盘的json或者ID

仪表盘数据来源官网：https://grafana.com/grafana/dashboards/

5、给暴露指标接口启用http认证

1、被监控主机

cd /opt/node-exporter/node_exporter-1.5.0/
#启用HTTP认证：
vi config.yml

basic_auth_users:
  prometheus: $2y$12$8uSetX/PDmYcBOFGRYxBauz8KaCZhHsZz0yf7GWn8DCxVlWMfB5nW
# 用户名: 密码

##上面密码用下面命令生成：
yum install httpd-tools –y
htpasswd -nBC 12 '' | tr -d ':\n'
New password:
Re-type new password:
$2y$12$8uSetX/PDmYcBOFGRYxBauz8KaCZhHsZz0yf7GWn8DCxVlWMfB5nW0

# vi /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=node_exporter
[Service]
ExecStart=/opt/node-exporter/node_exporter/node_exporter --web.config.file=/opt/node-exporter/node_exporter/config.yml
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure

[Install]
WantedBy=multi-user.target



#重载、重启服务
systemctl daemon-reload
systemctl restart node_exporter


#检查服务状态
systemctl status node_exporter.service

重新刷新页面访问：http://11.0.1.134:9100/

2、监控主机

vi prometheus.yml

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]
  - job_name: "exp"
    basic_auth:
      username: prometheus
      password: 123456
    static_configs:
      - targets: ["11.0.1.134:9100"]

配置检查以及检查监控状态

./promtool check config ./prometheus.yml

Checking ./prometheus.yml
 SUCCESS: ./prometheus.yml is valid prometheus config file syntax


#进行配置热加载
##格式为： kill -HUP pid
ps -ef |grep prometheus
root      10803      1  0 Feb28 ?        00:00:04 /opt/monitor/prometheus/prometheus --config.file=/opt/monitor/prometheus/prometheus.yml
kill -HUP 10803

6、监控系统服务

被监控主机上配置

# vi /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=node_exporter
[Service]
ExecStart=/opt/node-exporter/node_exporter/node_exporter --web.config.file=/opt/node-exporter/node_exporter/config.yml --collector.systemd --collector.systemd.unit-include=(sshd|network).service 
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure

[Install]
WantedBy=multi-user.target



#重载、重启服务
systemctl daemon-reload
systemctl restart node_exporter

#检查状态是否正常
systemctl status node_exporter.service