Ubuntu部署Slurm

一、服务器基础设置

服务端:

192.168.101.150 master

客户端:

192.168.101.151 node1

192.168.101.152 node2

1、修改/etc/hosts文件

vim /etc/hosts

二、配置NFS服务

1、服务端配置

1.1、安装NFS服务器端

apt-get install nfs-kernel-server

1.2、在服务器端添加NFS共享目录

vim /etc/exports

/nfs *(insecure,rw,sync,no_root_squash)

1.3启用NFS服务 

/etc/init.d/nfs-kernel-server restart && systemctl enable nfs-kernel-server

1.4验证共享成功 

exportfs -rv 

 2、客户端设置

2.1、安装NFS服务

sudo apt-get install nfs-common

2.2、在客户端验证并挂载

mkdir /nfs

showmount -e 192.168.101.150&& mount -t nfs 192.168.101.150:/nfs /nfs

2.3、开机自动挂载

vim /etc/fstab

192.168.101.150:/nfs /nfs nfs defaults 0 0

 三、安装Munge

1、服务端和客户端安装Munge

apt upgrade

apt install munge

四、配置NIS

 1、服务端配置

1.1、安装nis

sudo apt install nis

1.2、编辑/etc/default/nis

vim /etc/default/nis

NISSERVER=true

NISCLIENT=false

1.3、配置/etc/yp.conf

vim /etc/yp.conf

domain nis.example.com server 192.168.101.150

1.4、从主机中建立映射到NIS服务的域

ypdomainname nis.example.com

1.5、配置/etc/defaultdomain

vim /etc/defaultdomain

nis.example.com

1.6、配置 /etc/nsswitch.conf

vim /etc/nsswitch.conf

passwd:files [NOTFOUND=return] nis

shadow:files [NOTFOUND=return] nis

group:files [NOTFOUND=return] nis

1.7、数据初始化

/usr/lib/yp/ypinit -m

1.8、启动nis服务

service ypserv start

service ypbind start

 2、客户端配置

2.1、安装nis

sudo apt install nis

2.2、配置nis域名

nisdomainname nis.example.com

2.3、配置/etc/defaultdomain

vim /etc/defaultdomain

nis.example.com

2.4、配置 /etc/default/nis

vim /etc/default/nis

NISSERVER=false

NISCLIENT=true

2.5、配置/etc/yp.conf

vim /etc/yp.conf

domain nis.example.com server 192.168.101.150

2.6、配置/etc/nsswitch.conf

vim /etc/nsswitch.conf

passwd: files nis

group: files nis

shadow: files nis

gshadow: files

hosts: files dns nis

2.7、重启服务

sudo service ypbind restart(服务端先重启systemctl restart ypserv)

2.8、yptest测试

yptest

五、Munge部署

1、在Master上生成秘钥

create-munge-key

2、将秘钥复制到Node1和Node2上面

scp /etc/munge/munge.key root@node1:/etc/munge/ 

scp /etc/munge/munge.key root@node2:/etc/munge/ 

3、修改权限(所有节点) 

chown -R munge.munge /var/{lib,log,run}/munge

chown -R munge.munge /etc/munge

chmod 711 /var/lib/munge

chmod 700 /var/log/munge

chmod 755 /var/run/munge

chmod 700 /etc/munge

chmod 400 /etc/munge/munge.key

4、查看munge的uid和gid是否一致

id munge

5、重启munge 

systemctl restart munge

systemctl status munge

六、Slurm部署

1、安装slurm

sudo apt install slurm-wlm slurm-wlm-doc -y

2、配置slurm(直接将所有文件复制进去即可)

vim /etc/slurm-llnl/slurm.conf

ClusterName=cool   #集群名称

ControlMachine=master

#ControlAddr=

#BackupController=

#BackupAddr=

#

MailProg=/usr/bin/s-nail

SlurmUser=root

#SlurmdUser=root

SlurmctldPort=6817

SlurmdPort=6818

AuthType=auth/munge

#JobCredentialPrivateKey=

#JobCredentialPublicCertificate=

StateSaveLocation=/var/spool/slurmctld

SlurmdSpoolDir=/var/spool/slurmd

SwitchType=switch/none

MpiDefault=none

SlurmctldPidFile=/var/run/slurmctld.pid

SlurmdPidFile=/var/run/slurmd.pid

ProctrackType=proctrack/pgid

#PluginDir=

#FirstJobId=

ReturnToService=0

#MaxJobCount=

#PlugStackConfig=

#PropagatePrioProcess=

#PropagateResourceLimits=

#PropagateResourceLimitsExcept=

#Prolog=

#Epilog=

#SrunProlog=

#SrunEpilog=

#TaskProlog=

#TaskEpilog=

#TaskPlugin=

#TrackWCKey=no

#TreeWidth=50

#TmpFS=

#UsePAM=

#

# TIMERS

SlurmctldTimeout=300

SlurmdTimeout=300

InactiveLimit=0

MinJobAge=300

KillWait=30

Waittime=0

#

# SCHEDULING

SchedulerType=sched/backfill

#SchedulerAuth=

#SelectType=select/linear

#PriorityType=priority/multifactor

#PriorityDecayHalfLife=14-0

#PriorityUsageResetPeriod=14-0

#PriorityWeightFairshare=100000

#PriorityWeightAge=1000

#PriorityWeightPartition=10000

#PriorityWeightJobSize=1000

#PriorityMaxAge=1-0

#

# LOGGING

SlurmctldDebug=info

SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log

SlurmdDebug=info

SlurmdLogFile=/var/log/slurm-llnl/slurmd.log

JobCompType=jobcomp/none

#JobCompLoc=

#

# ACCOUNTING

#JobAcctGatherType=jobacct_gather/linux

#JobAcctGatherFrequency=30

#

#AccountingStorageType=accounting_storage/slurmdbd

#AccountingStorageHost=

#AccountingStorageLoc=

#AccountingStoragePass=

#AccountingStorageUser=

#

# COMPUTE NODES

PartitionName=master Nodes=master,node1,node2 Default=NO MaxTime=INFINITE State=UP

#NodeName=master State=UNKNOWN

NodeName=master Sockets=2 CoresPerSocket=1 ThreadsPerCore=1 State=UNKNOWN

NodeName=node1 Sockets=2 CoresPerSocket=1 ThreadsPerCore=1 State=UNKNOWN

NodeName=node2 Sockets=2 CoresPerSocket=1 ThreadsPerCore=1 State=UNKNOWN

Sockets 代表你服务器cpu的个数

CoresPerSocket 代表每个cpu的核数

ThreadsPerCore 代表是否开启超线程,如果开启了超线程就是2,没有开启就是1

 3、将配置文件复制到Node1和Node2上面

scp /etc/slurm-llnl/slurm.conf root@node1:/etc/slurm-llnl

scp /etc/slurm-llnl/slurm.conf root@node1:/etc/slurm-llnl

4、启动服务 

sudo systemctl enable slurmctld –now(仅master)

sudo systemctl enable slurmd –now(所有节点)

5、sinfo查看

sinfo

6、脚本测试

#!/bin/bash

#SBATCH -J FIRE

#SBATCH -p master

#SBATCH -N 1

#SBATCH -n 32

#SBATCH -o %j.loop

#SBATCH -e %j.loop

#SBATCH --comment=WRF



echo "SLURM_JOB_NODELIST=${SLURM_JOB_NODELIST}"

echo "SLURM_NODELIST=${SLURM_NODELIST}"



time srun --mpi=pmi2 ./fire_openmpi_slurm 960000

date

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值