Prometheus+Grafana监控部署

Prometheus

  • Prometheus Server: 收集指标和存储时间序列数据,并提供查询接口
  • ClientLibrary: 客户端库
  • Push Gateway: 短期存储指标数据。主要用于临时性的任务
  • Exporters: 采集已有的第三方服务监控指标并暴露metrics
  • Web UI: 简单的Web控制台

系统环境

CentOS Linux release 7.6.1810 (Core)

解压

# 下载地址:https://prometheus.io/download/
tar xf prometheus-2.34.0.linux-amd64.tar.gz

修改配置文件

mv prometheus-2.34.0.linux-amd64 prometheus-2.34.0
cd prometheus-2.34.0 && mkdir data
#====== prometheus.yml ======#
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  
alerting:
  alertmanagers:
  - static_configs:
    - targets:
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']

添加至system

cat >> /usr/lib/systemd/system/prometheus.service << EOF
[Unit]
Description=The Prometheus 2 monitoring system and time series database.
Documentation=https://prometheus.io
After=network.target
[Service]
User=wxw
ExecStart=/home/wxw/prometheus/prometheus \
--storage.tsdb.path=/home/wxw/prometheus/data \
--config.file=/home/wxw/prometheus/prometheus.yml
Restart=on-failure
StartLimitInterval=1
RestartSec=3
[Install]
WantedBy=multi-user.target
EOF

备份配置文件

cp prometheus.yml prometheus.yml.bak

同步时间

普罗米修斯依赖于精确的时间,时间漂移可能会导致意外的查询结果

sudo yum -y install ntp
sudo ntpdate time1.aliyun.com

启动

sudo systemctl start prometheus
# nohup ./prometheus --config.file=prometheus.yml &

访问

sudo firewall-cmd --zone=public --permanent --add-port=9090/tcp
sudo firewall-cmd --reload

主机监控的node_exporter

tar xf node_exporter-1.3.1.linux-amd64.tar.gz
mv node_exporter-1.3.1.linux-amd64 node_exporter-1.3.1
cd node_exporter
sudo vim /usr/lib/systemd/system/node_exporter.service
# 添加系统服务
[Unit]
Description=node_exporter
[Service]
User=wxw
ExecStart=/home/wxw/node_exporter/node_exporter \
--web.disable-exporter-metrics \
--log.level=error
[Install]
WantedBy=multi-user.target
# 启动
sudo systemctl daemon-reload
sudo systemctl start node_exporter
# nohup ./node_exporter --web.listen-address=":9100" --web.disable-exporter-metrics &

配置服务端job

vim prometheus.yml

  - job_name: 'host-monitor'
    scrape_interval: 10s
    static_configs:
      - targets: ['192.168.3.201:9100']
        labels:
          instance: node1

重启prometheus

sudo systemctl restart prometheus

Grafana

配置

cat >> /etc/yum.repos.d/grafana.repo << EOF
[grafana]
name=grafana
baseurl=https://mirrors.aliyun.com/grafana/yum/rpm
repo_gpgcheck=0
enabled=1
gpgcheck=0
EOF
$ sudo yum makecache
$ sudo yum install grafana

启动

sudo systemctl enable --now grafana-server
# nohup ./grafana-server &

Alertmanager

安装

tar xf alertmanager-0.24.0.linux-amd64.tar.gz 
mv alertmanager-0.24.0.linux-amd64 alertmanager-0.24.0

邮件告警

备份原始文件
cp alertmanager.yml alertmanager.yml.bak
配置alertmanager.yml
#====== alertmanager.yml ======#
global:
	resolve_timeout: 5m
	smtp_smarthost: 'smtp.163.com:25'
	smtp_from: 'wangxiwen95@163.com'
	smtp_auth_username: 'wangxiwen95@163.com'
	smtp_auth_password: 'HEWFSVKTHHBUIAZR'
	smtp_require_tls: false
	
route:
	group_by: ['alertname']
	group_wait: 10s
	group_interval: 10s
	repeat_interval: 1h
	receiver: 'email'
receivers:
- name: 'email'
	email_configs:
	- to: 'wangxiwen95@163.com'
		send_resolved: true
inhibit_rules:
	- source_match:
			severity: 'critical'
		target_match:
			severity: 'warning'
		equal: ['alertname', 'dev', 'instance']
检查配置文件
$ ./amtool check-config alertmanager.yml
Checking 'alertmanager.yml'  SUCCESS
Found:
 - global config
 - route
 - 1 inhibit rules
 - 1 receivers
 - 0 templates
启动
mkdir ./data
sudo vim /usr/lib/systemd/system/alertmanager.service
[Unit]
Description=alertmanager System
Documentation=alertmanager System
[Service]
User=wxw
ExecStart=/home/wxw/alertmanager/alertmanager \
--config.file=/home/wxw/alertmanager/alertmanager.yml
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-relaod
sudo systemctl start alertmanager
# ./alertmanager --config.file=alertmanager.yml &
配置prometheus.yaml
# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

# 定义告警文件
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
  - "rules/*.yml"
编写规则

cd prometheus && mkdir rules && cd rules

#====== host_monitor.yml ======#
groups:
- name: node-up
	rules:
	- alert: node-up
		expr: up == 0
		for: 15s
		labels:
			severity: 1
			team: node
		annotations:
			summary: "{
   {$labels.instance}}Instance has been down for more than 5 seconds"
检查配置文件
$ ./promtool check config prometheus.yml 
Checking prometheus.yml
  SUCCESS: 1 rule files found
 SUCCESS: prometheus.yml is valid prometheus config file syntax

Checking rules/host_monitor.yml
  SUCCESS: 1 rules found
优化告警模板
新建模板文件

cd alertmanager && vim email.tmpl

{
   {
    define "email.to.html" }}
{
   {
    if gt (len .Alerts.Firing) 0 }}{
   {
    range .Alerts }}
@告警
告警程序: prometheus_alert <br>
告警级别: {
   {
    .Labels.severity }}<br>
告警类型: {
   {
    .Labels.alertname }} <br>
故障主机: {
   {
    .Labels.instance }} <br>
告警主题: {
   {
    .Annotations.summary }} <br>
告警详情: {
   {
    .Annotations.description }} <br>
触发时间: {
   {
    .StartsAt }} <br>
{
   {
    end }}
{
   {
    end }}
{
   {
    if gt (len .Alerts.Resolved) 0 }}{
   {
    range .Alerts }}
@恢复:
告警主机:{
   {
    .Labels.instance }} <br>
告警主题:{
   {
    .Annotations.summary }} <br>
恢复时间: {
   {
    .EndsAt }} <br>
{
   {
    end }}
{
   {
    end }}
{
   {
    end
修改文件使用模板
#====== alertmanager.yml ======#
global:
	resolve_timeout: 5m
	smtp_smarthost: 'smtp.163.com:25'
	smtp_from: 'wangxiwen95@163.com'
	smtp_auth_username: 'wangxiwen95@163.com'
	smtp_auth_password: 'HEWFSVKTHHBUIAZR'
	smtp_require_tls: false

# 打开模板
templates:
	- '/home/wxw/alertmanager/email.tmpl'
	
route:
	group_by: ['alertname']
	group_wait: 10s
	group_interval: 10s
	repeat_interval: 1h
	receiver: 'email'
receivers:
- name: 'email'
	email_configs:
	- to: 'wangxiwen95@163.com'
		html: '{
   { template "email.to.html" . }}'	## 使用模板方式发送
		send_resolved: true
inhibit_rules:
	- source_match:
			severity: 'critical'
		target_match:
			severity: 'warning'
		equal: ['alertname', 'dev', 'instance']
检查配置文件重启
$ ./amtool check-config alertmanager.yml 
$ sudo systemctl restart alertmanager

企业微信告警

修改alertmanager.yml
#====== alertmanager.yml ======#
global:
	resolve_timeout: 5m

# 打开模板
templates:
	- '/home/wxw/alertmanager/wechat.tmpl'

# 企业微信告警
route:
	group_by: ['alertname']
	group_wait: 10s
	group_interval: 10s
	repeat_interval: 1h
	receiver: 'wechat'
receivers:
- name: 'wechat'
	wechat_configs:
	- corp_id: 'wwa274450828cc9189'
		# to_party: '1'
		to_user: 'WangXiWen'
		agent_id: '1000002'
		api_secret: 'yTaolZ_bwq0sRc6YeSD_qEcGM4RFh8O12DnphNjy26Y'
		send_resolved: true
		message: '{
   { template "wechat.tmpl" . }}'

inhibit_rules:
	- source_match:
			severity: 'critical'
		target_match:
			severity: 'warning'
		equal: ['alertname', 'dev', 'instance']
修改模板

vim wechat.tmpl

{
   
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值