部署Prometheus实验环境准备。----呕心沥血的一个真实实操 希望尊重作者 禁止抄袭
#Prometheus服务器
#收集数据导出器 exporter(Redis,RabbitMQ,PostgreSQL)
#可视化监控图像界面 Grafana
#告警插件 Alertmanager
1.首先安装Prometheus服务器
#进入到data目录下部署
cd /data
#下载Prometheus安装包
wget
https://github.com/prometheus/prometheus/releases/download/v2.27.1/prometheus-2.27.1.linux-amd64.tar.gz
1.2下载好的Prometheus进行解压并修改名称
tar -zxvf prometheus-2.27.1.linux-amd64.tar.gz
mv prometheus-2.27.1.linux-amd64 prometheus
1.3然后进人prometheus目录
cd prometheus
1.4启动prometheus
#直接使用默认配置文件启动 默认端口9090
nohup /data/prometheus/prometheus --config.file=“/data//prometheus/prometheus.yml” &
#结束 prometheus :pkill prometheus
1.5我们启动成功后去安全组开放9090端口
1.6通过浏览器访问http://服务器IP:9090就可以访问到prometheus的主界面
1.7 默认只监控了本机一台,点Status --点Targets --可以看到只监控了本机
1.8 主机数据显示
#通过http://服务器IP:9090/metrics可以查看到监控的数据
#访问路径 http://服务器:9090/metrics
2导出器 收集数据exporter
2.1 prometheus监控PostgreSQL并配置prometheus.yml
下载并解压pgsql_exporter
wget https://github.com/prometheus-community/postgres_exporter/releases/download/v0.8.0/postgres_exporter_v0.8.0_linux-amd64.tar.gz
tar -xf postgres_exporter_v0.8.0_linux-amd64
2.1.2装完postgres_exporter并启动成功
#http://服务器IP:9187/metrics访问成功后 开始配置prometheus.yml
vi prometheus.yml
2.1.3重启prometheus后访问prometheus界面后台查看Targets以下up则成功
#Grafana监控PostgreSql面板推荐ID : 9628
2.2 prometheus监控Redis并配置prometheus.yml
下载并解压redis_exporter
wget https://github.com/oliver006/redis_exporter/releases/download/v1.23.1/redis_exporter-v1.23.1.linux-amd64.tar.gz
tar -xf redis_exporter-v1.23.1.linux-amd64.tar.gz
2.2.1安装完Redis_exporter并启动成功
#http://服务器IP:9121/metrics访问成功后 开始配置prometheus.yml
vi prometheus.yml
2.2.3 重启prometheus后访问prometheus界面后台查看Targets以下up则成功
#Grafana监控Redis面板推荐ID : 10819
2.3prometheus监控Rabbitmq并配置prometheus.yml
到安装包目录
cd /data/monitoring/packages
下载并解压rabbitmq_exporter
wget https://github.com/kbudde/rabbitmq_exporter/releases/download/v0.29.0/rabbitmq_exporter-0.29.0.linux-amd64.tar.gz
tar -xf rabbitmq_exporter-0.29.0.linux-amd64.tar.gz
2.3.1安装完Rabbitmq_exporter并启动成功
#http://服务器IP:9419/metrics访问成功后 开始配置prometheus.yml
vi prometheus.yml
2.3.2重启prometheus后访问prometheus界面后台查看Targets以下up则成功
#Grafana监控RabbitMQ面板推荐ID : 2121
3安装Grafana可视化监控图形界面
3.1创建grafana源
vim /etc/yum.repos.d/grafana.repo
https://grafana.com/docs/grafana/latest/installation/rpm/
3.2 然后进行安装即可
yum install grafana -y
systemctl start grafana-server #启动grafana
systemctl stop grafana-server #停止grafana
systemctl status grafana-server #查看grafana状态
systemctl enable grafana-server #开机自启动grafana
3.3访问grafana
#通过http://服务器IP:3000/ (默认端口)
#端口如果冲突修改配置文件
vi /etc/grafana/grafana.ini
#然后重启 systemctl restart grafana-server
#访问路径 http://服务器:9091 默认admin/admin
3.4 然后配置prometheus数据源让grafana能获取到prometheus的数据
#grafana控制台>Configuration>Date Souces
3.5 添加prometheus数据源
3.6 配置prometheus数据源
#以下图是配置好了(只添加了本机的prometheus)
3.7 配置grafana监控面板
#导入面板参观官方
3.8获取grafana面板的ID
#grafana面板官方参考
https://grafana.com/grafana/dashboards?search=CPU
3.9 获取到的grafana面板的ID放置到import
3.10 筛选自己创建好的prometheus数据源
#import后监控面板就显示出来
4安装告警插件 Alertmanager
4.1下载Alertmanager
wget https://github.com/prometheus/alertmanager/releases/download/v0.22.2/alertmanager-0.22.2.linux-amd64.tar.gz
4.2 解压 Alertmanager
tar -zxvf alertmanager-0.22.2.linux-amd64.tar.gz
4.3 修改名字Alertmanager名称
mv alertmanager-0.22.2.linux-amd64 alertmanager
4.4配置告警企业微信
#配置alertmanager.yml前 需要在企业微信管理页面获取钩子https://work.weixin.qq.com/
#需要获取corp_id 我的企业>拉取到最下面(企业ID)
#需要获取agent_id和api_secret 应用管理>自建应用
#重要!!! 创建完完应用在应用页面下的最下面需要添加服务器ip白名单,否则是不可能收到告警信息的!!!切记
#需要获取to_user 并把可见范围添加成员
#需要收取告警信息的成员备份好账号(配置需要用到)
#开始配置alertmanager.yml
cd alertmanager
vi alertmanager.yml
global:
resolve_timeout: 5m
templates: #告警模板
- './template/wechat.tmpl'
route: # 设置报警分发策略
group_by: ['alertname'] # 分组标签
group_wait: 10s # 告警等待时间。告警产生后等待10s,如果有同组告警一起发出
group_interval: 10s # 两组告警的间隔时间
repeat_interval: 1m # 重复告警的间隔时间,减少相同右键的发送频率 此处为测试设置为1分钟
receiver: 'wechat' # 默认接收者
receivers:
- name: 'wechat'
wechat_configs:
- send_resolved: true
agent_id: '' # 自建应用的agentId
to_user: '' # 接收告警消息的人员Id
api_secret: '' # 自建应用的secret
corp_id: '' # 企业ID
4.5 配置告警策略模板
4.5.1在alertmanager下创建template文件夹 模板都写在template文件夹下
mkdir template
4.5.2 进入template创建一个tmpl后缀的文件
cd template
vim wechat.tmpl
4.5.3配置告警策略模板
{{ define "wechat.default.message" }}
{{- if gt (len .Alerts.Firing) 0 -}}
{{- range $index, $alert := .Alerts -}}
=== 异常告警 ===
告警名称:{{ $alert.Labels.alertname }}
告警级别:{{ $alert.Labels.severity }}
告警时间:{{ $alert.StartsAt.Local.Format "2006-01-02 15:04:05" }}
告警机器:{{ $alert.Labels.instance }} {{ $alert.Labels.device }}
告警详情:{{ $alert.Annotations.description }}
=== END ===
{{- end }}
{{- end }}
{{- if gt (len .Alerts.Resolved) 0 -}}
{{- range $index, $alert := .Alerts -}}
=== 告警恢复 ===
告警名称:{{ $alert.Labels.alertname }}
告警级别:{{ $alert.Labels.severity }}
告警机器:{{ $alert.Labels.instance }}
告警时间:{{ $alert.StartsAt.Local.Format "2006-01-02 15:04:05" }}
恢复时间:{{ $alert.EndsAt.Local.Format "2006-01-02 15:04:05" }}
告警详情:{{ $alert.Annotations.description }}
=== END ===
{{- end }}
{{- end }}
{{- end }}
4.5.4企业微信发送告警时的模板例子
4.6配置Alertmanager与prometheus连接起来
vi prometheus.yml
4.7配置prometheus告警规则
vi prometheus.yml
4.7.1在prometheus下创建为rules的文件夹
mkdir rules
4.7.2创建告警匹配规则文件
4.7.3 配置应用告警规则
vi postgresql.yml
#alert告警规则可参考官方(groups-rules不可缺少)
#https://awesome-prometheus-alerts.grep.to/rules#host-and-hardware
4.7.4 启动 alertmanager
nohup ./alertmanager --config.file=‘alertmanager.yml’&
./amtool check-config alertmanager.yml # 检查配置
./promtool check rules rules/*.yml #告警规则检查
4.7.8重启prometheus
#查看告警状态 prometheus主界面>Alerts
#已成功收到告警信息
5导出器 脚本附录用途是–开机自启
附1. postgres_exporter init脚本
#!/bin/sh
# chkconfig:35 90 3
# description: postgres_exporter server manage.
appName=/data/monitoring/postgres_exporter
appPort=9187
isWhliePid=/var/run/postgres_exporterWhile.pid
runPaths=/data/monitoring
function status(){
myPid=$(ps -ef | grep "$appName" | grep -v grep | awk '{print $2}')
if [ "$myPid" != "" ]; then
echo $myPid
echo "Started postgres_exporter. pid = $myPid"
else
echo "Stopped postgres_exporter."
fi
}
function start(){
myPid=$(ps -ef | grep "$appName" | grep -v grep | awk '{print $2}')
if [ ! -z "$myPid" ]; then
echo "postgres_exporter进程存在,3s后重试,请稍候..."
sleep 3
mypid=$(ps -ef | grep "$appName" | grep -v grep | awk '{print $2}')
if [ ! -z "$myPid" ]; then
echo "忽略启动命令postgres_exporter正在运行"
exit 1
fi
fi
echo "Starting postgres_exporter..."
#su - postgres_exporter -c "nohup $runPaths --web.listen-address=:$appPort > /dev/null 2>&1 &"
source $runPaths/conf/postgre_data_source
nohup $runPaths/postgres_exporter --web.listen-address=:$appPort > $runPaths/logs/postgres_exporter.log 2>&1 &
printf "Waiting for postgres_exporter..."
while true;
do
lsof -i:$appPort|grep LISTEN > /dev/null 2>&1
if [[ $? == 0 ]] ; then break; fi;
myPid=$(ps -ef | grep "$appName" | grep -v grep | awk '{print $2}')
if [ -z "$myPid" ]; then
echo -e "\nFailed to start postgres_exporter."
exit 1
fi
printf ".";
sleep 1;
done
echo
myPid=`lsof -i:$appPort|grep LISTEN|awk '{print $2}'`
echo "running: PID:$myPid"
}
function stop(){
echo "Stopping postgres_exporter..."
myPid=$(ps -ef | grep "$appName" | grep -v grep | awk '{print $2}')
if [ "$myPid" != "" ]; then
echo 0 > $isWhliePid
kill $myPid
fi
printf "Waiting for postgres_exporter..."
while true;
do
myPid=$(ps -ef | grep "$appName" | grep -v grep | awk '{print $2}')
if [ "$myPid" = "" ] ; then break; fi;
printf ".";
sleep 1;
done
echo
}
case "$1" in
start)
start
status
;;
stop)
stop
status
;;
restart)
$0 stop
$0 start
;;
status)
status
;;
*)
echo "Usage: $0 {start|stop|restart|status}"
exit $?
;;
esac
附2. redis_exporter init脚本
#!/bin/sh
# chkconfig:35 90 3
# description: redis_exporter server manage.
appName=/data/monitoring/redis_exporter
appPort=9121
isWhliePid=/var/run/redis_exporterWhile.pid
runPaths=/data/monitoring
function status(){
myPid=$(ps -ef | grep "$appName" | grep -v grep | awk '{print $2}')
if [ "$myPid" != "" ]; then
echo "Started redis_exporter. pid = $myPid"
else
echo "Stopped redis_exporter."
fi
}
function start(){
myPid=$(ps -ef | grep "$appName" | grep -v grep | awk '{print $2}')
if [ ! -z "$myPid" ]; then
echo "redis_exporter进程存在,3s后重试,请稍候..."
sleep 3
mypid=$(ps -ef | grep "$appName" | grep -v grep | awk '{print $2}')
if [ ! -z "$myPid" ]; then
echo "忽略启动命令redis_exporter正在运行"
exit 1
fi
fi
echo "Starting redis_exporter..."
#su - postgres_exporter -c "nohup $runPaths --web.listen-address=:$appPort > /dev/null 2>&1 &"
nohup $runPaths/redis_exporter --web.listen-address=:$appPort > $runPaths/logs/redis_exporter.log 2>&1 &
printf "Waiting for redis_exporter..."
while true;
do
lsof -i:$appPort|grep LISTEN > /dev/null 2>&1
if [[ $? == 0 ]] ; then break; fi;
myPid=$(ps -ef | grep "$appName" | grep -v grep | awk '{print $2}')
if [ -z "$myPid" ]; then
echo -e "\nFailed to start redis_exporter."
exit 1
fi
printf ".";
sleep 1;
done
echo
myPid=`lsof -i:$appPort|grep LISTEN|awk '{print $2}'`
echo "running: PID:$myPid"
}
function stop(){
echo "Stopping redis_exporter..."
myPid=$(ps -ef | grep "$appName" | grep -v grep | awk '{print $2}')
if [ "$myPid" != "" ]; then
echo 0 > $isWhliePid
kill $myPid
fi
printf "Waiting for redis_exporter..."
while true;
do
myPid=$(ps -ef | grep "$appName" | grep -v grep | awk '{print $2}')
if [ "$myPid" = "" ] ; then break; fi;
printf ".";
sleep 1;
done
echo
}
case "$1" in
start)
start
status
;;
stop)
stop
status
;;
restart)
$0 stop
$0 start
;;
status)
status
;;
*)
echo "Usage: $0 {start|stop|restart|status}"
exit $?
;;
esac
附3. rabbitmq_exporter init脚本
#!/bin/sh
# chkconfig:35 90 3
# description: rabbitmq_exporter server manage.
appName=/data/monitoring/rabbitmq_exporter
appPort=9419
isWhliePid=/var/run/rabbitmq_exporterWhile.pid
runPaths=/data/monitoring
function status(){
myPid=$(ps -ef | grep "$appName" | grep -v grep | awk '{print $2}')
if [ "$myPid" != "" ]; then
echo "Started rabbitmq_exporter. pid = $myPid"
else
echo "Stopped rabbitmq_exporter."
fi
}
function start(){
myPid=$(ps -ef | grep "$appName" | grep -v grep | awk '{print $2}')
if [ ! -z "$myPid" ]; then
echo "rabbitmq_exporter进程存在,3s后重试,请稍候..."
sleep 3
mypid=$(ps -ef | grep "$appName" | grep -v grep | awk '{print $2}')
if [ ! -z "$myPid" ]; then
echo "忽略启动命令rabbitmq_exporter正在运行"
exit 1
fi
fi
echo "Starting rabbitmq_exporter..."
#su - postgres_exporter -c "nohup $runPaths --web.listen-address=:$appPort > /dev/null 2>&1 &"
source $runPaths/conf/rabbitmq_data_source_0.29.0
#nohup $runPaths/rabbitmq_exporter -config-file $runPaths/conf/rabbitmq_data_source >> $runPaths/logs/rabbitmq_exporter.log 2>&1 &
nohup $runPaths/rabbitmq_exporter >> $runPaths/logs/rabbitmq_exporter.log 2>&1 &
printf "Waiting for rabbitmq_exporter..."
while true;
do
lsof -i:$appPort|grep LISTEN > /dev/null 2>&1
if [[ $? == 0 ]] ; then break; fi;
myPid=$(ps -ef | grep "$appName" | grep -v grep | awk '{print $2}')
if [ -z "$myPid" ]; then
echo -e "\nFailed to start rabbitmq_exporter."
exit 1
fi
printf ".";
sleep 1;
done
echo
myPid=`lsof -i:$appPort|grep LISTEN|awk '{print $2}'`
echo "running: PID:$myPid"
}
function stop(){
echo "Stopping rabbitmq_exporter..."
myPid=$(ps -ef | grep "$appName" | grep -v grep | awk '{print $2}')
if [ "$myPid" != "" ]; then
echo 0 > $isWhliePid
kill $myPid
fi
printf "Waiting for rabbitmq_exporter..."
while true;
do
myPid=$(ps -ef | grep "$appName" | grep -v grep | awk '{print $2}')
if [ "$myPid" = "" ] ; then break; fi;
printf ".";
sleep 1;
done
echo
}
case "$1" in
start)
start
status
;;
stop)
stop
status
;;
restart)
$0 stop
$0 start
;;
status)
status
;;
*)
echo "Usage: $0 {start|stop|restart|status}"
exit $?
;;
esac
`