1.安装munge
slurm 内部的认证程序munge,编译这个依赖前需要安装两个依赖:
yum -y install munge munge-libs munge-devel -y
配置:
/usr/sbin/create-munge-key
启动:
systemctl enable munge --now
2.安装pmix
mpi通信的服务器
先安装两个依赖库
yum install -y hwloc-devel libevent-devel
https://github.com/openpmix/openpmix/releases 下载最新包
解压后进入内部编译和安装:
./configure && make install -j $(nproc)
3.编译slurm
下载最新源码包
https://github.com/SchedMD/slurm/releases/tag
解压进入内部:
./configure && make install -j $(nproc)
4.配置slurm
在刚才的目录下 doc/html 有自动生成的网页,拷贝一份到本地,用浏览器打开,填写相应信息
ClusterName、SlurmctldHost、NodeName 都填hostname的值
CPUs、Sockets、CoresPerSocket 根据lscpu 填
RealMemory 根据free -m 填
SlurmUser 方便起见用root,生产环境建议用权限小的用户
Default MPI Type 可不选,这里选PMIX
Process Tracking 这里选LinuxProc
Resource Selection 选Cons_res
SelectTypeParameters 选CR_Memory
Job Accounting Gather 选Linux
然后点击sumbit 得到一份配置文件,
写入到/usr/local/etc/slurm.conf
在 etc/ 下:
chmod 755 *.service && cp *.service /etc/systemd/system
启动:
systemclt enable slurmcltd --now
systemclt enable slurmd --now
查看队列:
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
test* up infinite 1 idle spack
验证:
$ cat>hello.cpp<<EOF
#include "mpi.h"
#include <iostream>
int main(int argc, char* argv[])
{
int rank;
int size;
MPI_Init(0,0);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
std::cout<<"Hello world from process "<<rank<<" of "<<size<<std::endl;
MPI_Finalize();
return 0;
}
EOF
$ mpicxx hello.cpp -o hello
$ srun -p test --mpi=pmi2 -n 4 ./hello
Hello world from process 1 of 4
Hello world from process 2 of 4
Hello world from process 0 of 4
Hello world from process 3 of 4