Prometheus监控
简介
Prometheus是由SoundCloud开发的开源监控报警系统和时序列数据库(TSDB)。Prometheus使用Go语言开发,是Google BorgMon监控系统的开源版本。
安装
wget https://github.com/prometheus/prometheus/releases/download/v2.18.1/prometheus-2.18.1.linux-amd64.tar.gz
tar xzvf prometheus-2.18.1.linux-amd64.tar.gz
mv prometheus-2.18.1.linux-amd64 /usr/local/prometheus
其它系统版本可在这里下载:https://prometheus.io/download/
验证安装
cd /usr/local/prometheus
./prometheus --version
prometheus, version 1.6.3 (branch: master, revision: c580b60c67f2c5f6b638c3322161bcdf6d68d7fc)
build user: root@e54b06e0b22f
build date: 20170519-08:00:43
go version: go1.8.
创建Systemd服务
vim /etc/systemd/system/prometheus.service
[Unit]
Description=prometheus
After=network.target
[Service]
Type=simple
User=root
ExecStart=/usr/local/prometheus/prometheus -config.file=/usr/local/prometheus/prometheus.yml -storage.local.path=/var/lib/prometheus
Restart=on-failure
[Install]
WantedBy=multi-user.target
$ sudo systemctl daemon-reload
$ sudo systemctl start prometheus
$ sudo systemctl enable prometheu
验证Prometheus是否启动成功
systemctl status prometheus
● prometheus.service - prometheus
Loaded: loaded (/etc/systemd/system/prometheus.service; disabled; vendor preset: enabled)
Active: active (running) since Mon 2017-05-22 11:13:36 CST; 18s ago
Main PID: 9175 (prometheus)
Tasks: 9
Memory: 15.8M
CPU: 207ms
CGroup: /system.slice/prometheus.service
└─9175 /usr/local/prometheus/prometheus -config.file=/usr/local/prometheus/prometheus.yml -storage.local.path=/var/lib/prometheus
访问自带Web
Prometheus自带一个比较简单的Web,可以查看表达式搜索结果、报警配置、prometheus配置,exporter状态等。自带Web默认在http://ip:9090。
Mtail 日志监控
简介
- mtail日志处理器是由Google的SRE人员编写的,其采用Apache 2.0许可证,并且使用Go语言。mtail日志处理器专门用于从应用程序日志中提取要导出到时间序列数据库中的指标
- mtail日志处理器通过运行“程序”(program)来工作,它定义了日志匹配模式,并且指定了匹配后要创建和操作的指标。它与Prometheus配合得很好,可以暴露任何要抓取的指标,也可以配置为将指标发送到collectd、StatsD或Graphite等工具
安装
wget https://github.com/google/mtail/releases/download/v3.0.0-rc29/mtail_v3.0.0-rc29_linux_amd64
mv mtail_v3.0.0-rc33_linux_amd64 mtail
chmod 0755 mtail
sudo cp mtail /usr/local/bin
验证安装
cd /usr/local/bin/
mtail --version
使用mtail
sudo mkdir /etc/mtail
sudo touch /etc/mtail/line_count.mtail
vim /etc/mtail/line_count.mtail
编辑某个程序 其他examples
https://github.com/google/mtail/tree/master/examples
(待更新)
counter line_count
/$/ {
line_count++
}
运行mtail
sudo mtail --progs /etc/mtail --logs '/var/log/*.log'
prometheus 抓取mtail端点
vim /usr/local/prometheus/prometheus.yml
- job_name: 'mtail-moniter'
static_configs:
- targets: [
'ip1:3903',
'ip2:3903'
]
labels:
instance: log-dev
写完后可以用自带的工具预先检测一遍
cd /usr/local/prometheus
./promtool check config prometheus.yml
Checking prometheus.yml
SUCCESS: 0 rule files found
配置修改后需要重启服务加载新配置
sudo systemctl restart prometheus
Alertmanager(email报警)
简介
告警能力在Prometheus的架构中被划分成两个独立的部分。如下所示,通过在Prometheus中定义AlertRule(告警规则),Prometheus会周期性的对告警规则进行计算,如果满足告警触发条件就会向Alertmanager发送告警信息。
获取并安装软件包
Alertmanager最新版本的下载地址可以从Prometheus官方网站https://prometheus.io/download/获取。
wget https://github.com/prometheus/alertmanager/releases/download/v0.20.0/alertmanager-0.20.0.linux-amd64.tar.gz
tar xvf alertmanager-0.20.0.linux-amd64.tar.gz
mv alertmanager-0.20.0.linux-amd64 /usr/local/alertmanager
创建Systemd服务
vi /etc/systemd/system/alertmanager.service
[Unit]
Description=alertmanager
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
User=root
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml
[Install]
WantedBy=multi-user.targe
alertmanager.yml的配置
vim /usr/local/alertmanager/alertmanager.yml
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.exmail.qq.com:465' # 邮箱smtp服务器代理
smtp_from: 'xxx' # 发送邮箱名称
smtp_auth_username: 'xxx' # 邮箱名称
smtp_auth_password: 'xxx' # 邮箱密码或授权码
smtp_require_tls: false
templates:
- 'template/*.tmpl'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'mail'
receivers:
- name: 'mail'
email_configs:
- to: 'narang.sun@wetax.com.cn'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
企业微信通知见:https://www.cnblogs.com/longcnblogs/p/9620733.html
写完后可以用自带的工具预先检测一遍
cd /usr/local/alertmanager
./amtool check-config alertmanager.yml
Checking alertmanager.yml
SUCCESS: 1 rule files found
启动 alertmanager
systemctl daemon-reload
systemctl start alertmanager.service
systemctl status alertmanager.service
在Prometheus模块定义告警规则
修改Prometheus配置文件prometheus.yml,添加以下配置:
rule_files:
- /usr/local/prometheus/rules/*.rules
打开alerting
alerting:
alertmanagers:
- static_configs:
- targets: ["localhost:9093"]
在目录/usr/local/prometheus/rules/下创建告警文件mtail-log.rules内容如下:
(待更新)
groups:
- name: hostStatsAlert
rules:
- alert: hostCpuUsageAlert
expr: up == 0
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} CPU usgae high"
description: "{{ $labels.instance }} CPU usage above 85% (current value: {{ $value }})"
- alert: mtailLog
expr: line_count - line_count offset 1m > 0
for: 20s
labels:
severity: page
annotations:
summary: "log +1"
description: "{{ $labels.instance }} log +1 (current value: {{ $value }})"
写完后可以用自带的工具预先检测一遍
cd /usr/local/prometheus
./promtool check config prometheus.yml
Checking prometheus.yml
SUCCESS: 1 rule files found
重启prometheus
sudo systemctl restart prometheus
总体测试
prometheus
Mtail
查看是否采集到对应指标
Alertmanager
查看rule是否启动
报警:随便触发一个rule 会收到邮件
Grafana面板
简介
Grafana是用于可视化大型测量数据的开源程序,它提供了强大和优雅的方式去创建、共享、浏览数据。Dashboard中显示了你不同metric数据源中的数据。
Grafana最常用于因特网基础设施和应用分析,但在其他领域也有用到,比如:工业传感器、家庭自动化、过程控制等等。Grafana支持热插拔控制面板和可扩展的数据源,目前已经支持Graphite、InfluxDB、OpenTSDB、Elasticsearch、Prometheus等。
Grafana安装
wget https://dl.grafana.com/oss/release/grafana-6.7.3-1.x86_64.rpm
sudo yum install grafana-6.7.3-1.x86_64.rpm
启动Grafana
systemctl start grafana-server
查看Grafana是否启动成功
$ systemctl status grafana-server
● grafana-server.service - Grafana instance
Loaded: loaded (/usr/lib/systemd/system/grafana-server.service; masked; vendor preset: enabled)
Active: active (running) since Mon 2017-05-22 14:57:29 CST; 49min ago
Docs: http://docs.grafana.org
Main PID: 21735 (grafana-server)
CGroup: /system.slice/grafana-server.service
└─21735 /usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --pidfile= cfg:default.paths.logs=/var/log/grafana cfg:default.paths.data=/var/lib/grafana cfg:default.paths.plugins=/var/lib/grafana/plugins
访问Grafana
通过http://ip:3000访问Grafana Web界面(缺省帐号/密码为admin/admin)
其他配置请参考https://grafana.com/docs/grafana/latest/
参考文献
prometheus && alertmanager && grafana
https://www.hi-linux.com/posts/25047.html
https://counter2015.com/2020/04/13/grafana-monitor-2/
https://prometheus.io/docs/alerting/configuration/
https://yunlzheng.gitbook.io/prometheus-book/
https://cloud.tencent.com/developer/article/1556769
https://www.cnblogs.com/longcnblogs/p/9620733.html
https://blog.52itstyle.vip/archives/1984/
https://www.kancloud.cn/huyipow/prometheus/525003
mtail