Slurm Architecture

slurmd daemon running on each node and a central slurmctld daemon running on a management node


node, compute resource in Slurm

partition, group of nodes in a logical(possibly overlapping) sets

job, allocation of resources assigned to a user for a specified amout of time

job steps, sets of tasks within a job


slurm --help / man  

sacct, report job/job step accouting information about active or completed jobs

salloc, allocate resources for a job in real time ( usually allocate resource ans spawn a shell, which is then to execute srun to launch parallel taskes)

sattach, attach std I/O Error plus signal capabilities to a currently running job/jobstep, usually can attach/deattach multi-times

sbatch, submit a job script for later execution, usually contian one or more srun commands to launch parallel tasks

sbcast, transfer a file from local disk to local disk on the nodes allocated to one job, usually for shared file system

scancel, cancel a pending or running job

scontrol, administrator tool to view/modify Slurm state, usually need root user

sinfo, reports state partion and nodes managed by Slurm, with variety of filtering, sorting, formatting options

smap, report state information for jobs, partitions, node by Slum graphically

squeue, report state of job(samilar as sinfo), with running jobs in priority order 

srun, sumbit job for execution or initiate jobsteps in real time, options e.g.  max/min node cout, processor count, specific node to use or not use, specific node characteristics( memory, disk space, centain required features)

strigger, job monitor

sview, GUI to get / update state informaton for jobs, partions, nodes managed by Slurm


a simple way: a single commond line e.g.  srum -N3 -l  ../mpi/example
/* running example on three nodes with output(-l)

a common way: submit a script:

 * options in script can be supplied by using a prefix of "#SBATCH" followed by the option at the beginning of the script(before any commands to be excuted in the script), Options supplied on the command line would override any options specified within the script

another way:  create a resource allocation and spawn job steps within that allocation

 * salloc used to create a resource allocation and typically start a shell within that allocation. one or more job can excute.
 Slurm doesn't automactically migrate executable or data files to the nodes allocated to a job. Either the files must exist on local disk or in some global file system(NFS). sbcast can be used to transfer files to local storage on allocated nodes using Slurm's hierarchical communications

CPU Management Steps  (重要)

step1: selection of nodes in slurm.conf

SelectType   selecc/linear | select/cons_res
SelectTypeParameters  CR_CPU | CR_CPU_Memory | CR_Core | CR_Core_Memory |CR_Socket | CR_Socket_Memory

srun/salloc/sbatch command line options ::
-B --extra-node-info  <sockets[:cores[:threads]]> ,  restricts node selection to nodes with a specified layout of sockets, cores and threads

-C --constraint <list> , restrict node selection to nodes with specified attributes

--contiguous N/A , restrict node selection to contiguous nodes

--cores-per-socket  <cores>  restrict node selection to nodes with at least the specified number of cores per socket

-c, --cpus-per-task <ncpus> ,  control the number of CPUs allocated per tast

--exclusive  N/A ,  pervent sharing of allocated nodes with other jobs, suballocates CPUs to job steps

-F, --nodefile <node file>  File containing a list of specific nodes to be selected for the job(salloc sbatch)

--hint  comput_bound | memory_hound | [no] multithread,   additional controls on allocation CPU resoures

--minicpus  <n>   controls the minimum # of CPUs allocated per node

-N, --nodes  <minnodes[-maxnodes]>   

-n, --ntasks  <numbers>   number of tasks to be created for the job

--ntasks-per-core  <number>  maximum number of tasks per allocated core

--ntasks-per-socket <number>

--ntasks-per-node <number>

-O, --overcommit N/A,  allows fewer CPUs to be allocated than the number of tasks

-p, --partition  <partition_names>, which partition is used for the job

-s, --share  N/A, allow sharing of allocated nodes with other jobs

--sockets-per-node <sockets>

--threads-per-core <threads>

-w, --nodelist <host1,host2, ... or filename>, list of specific nodes to be allocated to the job

-x, --exclude <host1, host2, .. filename>, list of specific nodes to be excluded from allocated to the job

-Z --no-allocate  N/A  

step2: allocation CPUs from the selected nodes in slurm.conf(same as in step1)

step3: distribution of tasks to the selected nodes

each task is distributed to only one node, but more than one task may be distributed to each node.

in slurm.conf  MaxTasksPerNode <number>, max number of tasks that a job step can spawn on a single node

srun/salloc/sbatch options in this step:

--distribution, -m  block | cyclic | arbitrary | plane=<options> [:block|cyclic]

--ntasks-per-core <number> 

--ntasks-per-socket <number> 

--ntasks-per-node <number>

-r, --relative N/A  which node is used for a job step

step 4:  opt/binding CPU
SLURM distributed and bind each task to a specified subset of the allocated CPUs on the node


#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --constraint=CPU-L5520 
#SBATCH --partition=debug
#SBATCH --time=00:00:10
#SBATCH --mail-type=END
#SBATCH --output=core8.out


