目的
每个物理机部署 node_export 用于收集本机当前系统状态信息
创建 prometheus 用于收集物理机的监控状态
只支持通过 prometheus 对 node_export 进行 pull 方法收集数据
后端使用 M3DB 作为 prometheus metric 后端存储
软件下载
node_exporter-0.18.1.linux-amd64.tar.gz
prometheus-2.10.0.linux-amd63.tar.gz
prometheus 其他组件下载
grafana-6.2.3
node_export 安装
如无特殊需求, 解压后直接运行 node_exporter 命令即可
默认状态下 node_export 监控 9100 端口
可以通过 curl http://host_ip:9100/metrics 测试 node_export 是否工作正常
node-exporter.service
[Unit]
Description=prometheus node exporter
After=syslog.target network.target
[Service]
Type=simple
KillSignal=SIGINT
ExecStart=/apps/svr/node_exporter/node_exporter
ExecStop=/bin/kill -s SIGINT $MAINPID
RemainAfterExit=yes
PIDFile=/run/node_exporter.pid
[Install]
WantedBy=multi-user.target
prometheus 配置
参考 prometheus.yml
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['prometheus_ip:9090'] <- 当前 prometheus 地址
- job_name: 'hypervisor_host1' <- 需要监控的 host1
static_configs:
- targets: ['hypervisor_ip1:9100']
- job_name: 'hypervisor_host2' <- 需要监控的 host2
static_configs:
- targets: ['hypervisor_ip2:9100']
- job_name: 'm3db-node' <- 需要监控的 m3db 集群
static_configs:
- targets: ['m3db-host1_ip:9400','m3db-host2_ip:9400','m3db-host3_ip:9400']
remote_read: <- prometheus 从 m3query 中获取数据
- url: "http://m3query:port/api/v1/prom/remote/read"
read_recent: true
remote_write: <- prometheus 数据 写入到 m3coordinator
- url: "http://m3coordinator:port/api/v1/prom/remote/write"
默认下 prometheus 监控在 9090 端口
启动命令
./prometheus --log.level=debug --config.file=prometheus.yml
prometheus.service
[Unit]
Description=prometheus service
After=syslog.target network.target
[Service]
Type=simple
KillSignal=SIGINT
ExecStart=/apps/svr/prometheus/prometheus --config.file /apps/svr/prometheus/prometheus.yml --query.max-concurrency=300
ExecStop=/bin/kill -s SIGINT $MAINPID
RemainAfterExit=yes
PIDFile=/run/prometheus.pid
LimitNOFILE=6000000
[Install]
WantedBy=multi-user.target
grafana
修改 grafana 启动端口
http_port = 80
如没有特殊需求, 直接启动则可
bin/grafana-server -config conf/defaults.ini
grafana.service
[Unit]
Description=Grafana instance
Documentation=http://docs.grafana.org
Wants=network-online.target
After=network-online.target
[Service]
Type=notify
Restart=on-failure
WorkingDirectory=/apps/svr/grafana/
RuntimeDirectory=grafana
RuntimeDirectoryMode=0750
#ExecStart="/apps/svr/grafana/bin/grafana-server -config=/apps/svr/grafana/conf/defaults.ini -pidfile=/tmp/grafana-server.pid"
ExecStart=/bin/bash -c "/apps/svr/grafana-6.1.6/bin/grafana-server -homepath=/apps/svr/grafana-6.1.6/ -config=/apps/svr/grafana-6.1.6/conf/defaults.ini -pidfile=/apps/svr/grafana-6.1.6/tmp/grafana-server.pid"
LimitNOFILE=10000
TimeoutStopSec=20
[Install]
WantedBy=multi-user.target