Slurm 20.02.3 使用influxdb 1.8.0 收集slurm集群任务信息 No. 3-1

最新推荐文章于 2024-05-23 09:57:08 发布

潍在杭的Hp0e

最新推荐文章于 2024-05-23 09:57:08 发布

阅读量802

点赞数 2

分类专栏： slurm 文章标签： hpc centos 数据库

本文链接：https://blog.csdn.net/xuecangqiuye/article/details/107610262

版权

slurm 专栏收录该内容

7 篇文章 3 订阅

订阅专栏

slurm最早的时候是只支持HDF5来收集监控信息，但是HDF5是非实时，只能等一个slurm job step结束后，才汇总一次。

slurm在16年的时候就推出了使用influxdb收集监控信息，官网链接。

本文主要介绍，slurm 20.02.3版本使用influxdb收集监控信息。

slurm - influxdb的安装配置方式会等到slurm组件讲完后一起说明。

1 环境准备

1个管理节点slurmcltd(192.168.0.211)

2个计算节点slurmd(192.168.0.218、192.168.0.128)

2 influxdb安装

influxdb安装非常简单，我们只需要安装到slurm 管理节点即可，或者是单独拎出一个非slurm集群节点也可。

本文中就安装到slurm管理节点中(192.168.0.211)。

[root@cm-wsy-c16m32d200-1 ~] yum install waget -y
[root@cm-wsy-c16m32d200-1 ~] wget https://dl.influxdata.com/influxdb/releases/influxdb-1.8.0.x86_64.rpm
[root@cm-wsy-c16m32d200-1 ~] yum localinstall influxdb-1.8.0.x86_64.rpm
[root@cm-wsy-c16m32d200-1 ~] systemctl start influxd
[root@cm-wsy-c16m32d200-1 ~] systemctl enable influxd
[root@cm-wsy-c16m32d200-1 ~] influx --version
InfluxDB shell version: 1.8.0

#创建数据库
[root@cm-wsy-c16m32d200-1 slurm] influx
Connected to http://localhost:8086 version 1.8.0
InfluxDB shell version: 1.8.0
>create database slurm_job_status
>exit

使用influxdb的默认配置即可。

2.1 安装influxdb可视化工具 InfluxDBStudio-0.2.0 （可选）

下载地址，比较久远的版本了，不过还能使用。使用grafana来可视化influxdb也是可以的。

下载好后，直接解压压缩包人，然后双击exe运行即可。

打开influxdb studio后，新建连接，连接到我们的influxdb即可。里面的表，是slurm自己就会帮我们建好。

3 slurm配置文件修改

主要修改两个配置文件 slurm.conf 和 acct_gather.conf

slurm.conf

AcctGatherProfileType=acct_gather_profile/influxdb

acct_gather.conf

ProfileInfluxDBRTPolicy=autogen
ProfileInfluxDBDefault=ALL
ProfileInfluxDBDatabase=slurm_job_status
ProfileInfluxDBHost=cm-wsy-c16m32d200-1:8086
ProfileInfluxDBPass=123456
ProfileInfluxDBUser=root

每个计算节点中都必须存在acct_gather.conf