背景 : 本来打算使用 process-exporter 来监控 airflow celery worker 进程的,但是当 airflow worker 并发数量达到 2000 时,process-exporter metrics 延时会达到 5min 以上,prometheus 拉取 metrics 会 timeout。打算使用 ps 命令写 worker 进程数到文件中,然后使用 node-exporter 来 export 文件中的 metrics。
环境:ubuntu 16.04, node-exporter 1.0.1
操作步骤:
1、下载 node-exporter, 解压后 mv 到 /usr/local/bin/node_exporter。
2、配置 service,启动 node_export 服务,使用 --collector.textfile.directory
参数,收集 /home/hadoop/airflow/logs/node_exporter/ 目录下面以 .prom 结尾的所有文件中的 metrics。
# cat /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/node_exporter \
--collector.textfile.directory="/home/hadoop/airflow/logs/node_exporter" \
--collector.ntp \
--web.listen-address=0.0.0.0:9100 \
--collector.filesystem.ignored-mount-points='^/(dev|proc|sys|var/lib/docker/.+)($|/)' \
--collector.filesystem.ignored-fs-types='^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$'
SyslogIdentifier=node_exporter
Restart=always
[Install]
WantedBy=multi-user.target
3、编辑 /home/hadoop/airflow/logs/node_exporter/worker_num.prom
Promethues Text-based格式*.prom
文件文件书写规范
- 每行必须使用换行符
\n
结束,空行会被忽略。 #
符号开头,后面不接HELP
或TYPE
的行,视为注释。# HELP
开头,后面第一个字段是metric名,再后面的字段或字符被视为对metric的描述。# TYPE
开头,后面第一个字段是metric名,第二个字段是metric类型,metric类型有counter, gauge, histogram, summary, or untyped。- 相同的metric名只能有一个
TYPE
,并且TYPE
这行要放在metric取样之前,如果没有为metric设置TYPE
,metric类型被设置为untyped
# cat /home/hadoop/airflow/logs/node_exporter/worker_num.prom
# HELP worker_num airflow worker number
# TYPE worker_num gauge
worker_num 2001
4、查看 metrics worker_num
# curl -s 127.0.0.1:9100/metrics | grep worker_num
node_textfile_mtime_seconds{file="worker_num.prom"} 1.618836422e+09
# HELP worker_num airflow worker number
# TYPE worker_num gauge
worker_num 2001
5、定时采集 worker 数量,写入 /home/hadoop/airflow/logs/node_exporter/worker_num
采集 worker_num 脚本
# cat /home/hadoop/airflow/sh/worker_exporter.sh
worker_num=`ps uax | grep celery | grep -v grep | wc -l`
echo "# HELP worker_num airflow worker number" > /home/hadoop/airflow/logs/node_exporter/worker_num
echo "# TYPE worker_num gauge" >> /home/hadoop/airflow/logs/node_exporter/worker_num
echo worker_num $worker_num >> /home/hadoop/airflow/logs/node_exporter/worker_num
mv /home/hadoop/airflow/logs/node_exporter/worker_num /home/hadoop/airflow/logs/node_exporter/worker_num.prom
6、定时任务
# crontab -l
*/1 * * * * /bin/bash /home/hadoop/airflow/sh/worker_exporter.sh
参考文章: