User Tools

Site Tools


wiki:prototype:slurmusage

SLURM

General information

SLURM is the job scheduler used in all our clusters. Even though different versions are installed for different groups of clusters, the basic commands are still the same. Here a list of them:

Command Information Usage
sbatch Submit a job script to SLURM sbatch $jobscript
srun Run a parallel job without job script srun –ntasks=N –cpus-per-task=CPT
salloc Allocate nodes to allow directly access salloc –nodes=$nNodes
sinfo View information about SLURM nodes and partitions sinfo
squeue View information about jobs located in the SLURM scheduling queue squeue
scancel Cancel a job scancel $jobID
sacct Display accounting data of jobs sacct -u $user

SLURM examples

Preliminary information

SLURM allows to set different options to specify the resources will be allocated and, potentially, used by the submitted job. Here a list with the most common ones:

# To specify the number of processes (~MPI ranks) to be spawned
--ntasks=$NTASKS
 
# To specify how many cores each task will use (~ Number of OpenMP/OmpSs threads)
--cpus-per-task=$CPT
 
# To specify the maximum execution time your job will use.
# It will be killed after this amount of time
--time={D-}HH:MM:SS
 
# To specify the name of the job. Useful for identifying it at squeue/sinfo/sacct output
-J $jobName
 
# To specify the file where standard output and error will be redirected.
# If -e not specified stderr will be redirected to stdout.
# If -o not specified, the output file will be located at the working directory,
# with name slurm-%j.out. %j is the job ID.
-o out/file-%j.out
-e err/file-%j.err
 
# To specify the partition where the job will be submitted.
# Each cluster has its own partitions.
--partition=$partitionName

Job Dependencies

As a user, one can specify some dependencies that the job must fulfill in order to start its execution. These dependencies can be between jobs or even a future time. The following dependencies are allowed:

  • Between jobs
    • After job begin
      dependency=after:$jobId
      • This job can begin execution after the specified jobs have begun execution.
    • After job finish
      dependency=afterany:$jobId
      • This job can begin execution after the specified jobs have terminated.
    • After job fail
      dependency=afternotok:$jobID
      • This job can begin execution after the specified jobs have terminated in some failed state (non-zero exit code, node failure, timed out, etc).
    • After job finish successfully
      dependency=afterok:$jobID
      • This job can begin execution after the specified jobs have successfully executed (ran to completion with an exit code of zero).
    • Singleton
      dependency=singleton
      • The job can begin execution after any previously job sharing the same job name and user have terminated.
  • Time dependency
    • Start Time
      begin=$time
      • Submit the batch script to the SLURM controller immediately, like normal, but tell the controller to defer the allocation of the job until the specified time.
      • Time format could be one of the following:
        • –begin=16:00
        • –begin=now+4hour
        • –begin=now+60 (seconds by default)
        • –begin=2010-01-20T12:34:00

OpenMP

#!/bin/bash
 
#SBATCH --partition=thunder
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --out=omp-%j.out
#SBATCH --err=omp-%j.err
#SBATCH --time=10:00
 
export OMP_NUM_THREADS=8
 
srun ./omp_binary

OmpSs

#!/bin/bash
 
#SBATCH --partition=thunder
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --out=ompss-%j.out
#SBATCH --err=ompss-%j.err
#SBATCH --time=10:00
 
export NX_ARGS="--smp-workers=8"
 
srun ./ompss_binary

MPI

#!/bin/bash
 
#SBATCH --partition=thunder
#SBATCH --ntasks=48
#SBATCH --cpus-per-task=1
#SBATCH --out=mpi-%j.out
#SBATCH --err=mpi-%j.err
#SBATCH --time=10:00
 
srun ./mpi_binary

OpenCL/CUDA

#!/bin/bash
 
#SBATCH --partition=jetson-tx
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --out=cuda-%j.out
#SBATCH --err=cuda-%j.err
#SBATCH --time=10:00
#SBATCH --gres=gpu
 
srun ./cuda_binary

MPI+{OmpSs/OpenMP}

#!/bin/bash
 
#SBATCH --partition=thunder
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=48
#SBATCH --out=mpi_omp-%j.out
#SBATCH --err=mpi_omp-%j.err
#SBATCH --time=10:00
 
export OMP_NUM_THREADS=48
 
srun ./mpi_omp_binary

MPI+OpenMP with CPU binding

The following code shows an example on Thunder cluster on how to bind different MPI ranks to different cores. For more information, check the man page of the srun command.

#!/bin/bash
 
#SBATCH --partition=thunder
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=48
#SBATCH --out=mpi_omp-%j.out
#SBATCH --err=mpi_omp-%j.err
#SBATCH --time=10:00
 
export OMP_NUM_THREADS=48
 
# This will map rank 0 to one socket and rank 1 to the other
maskCPU0=0x000000000000ffffffffffff
maskCPU1=0xffffffffffff
srun --cpu_bind=verbose,mask_cpu:${maskCPU0},${maskCPU1} ./mpi_omp_binary
 
# This would map rank 0 to even cores, while rank 1 to odd cores
maskCPU0=0x555555555555555555555555
maskCPU1=0xaaaaaaaaaaaaaaaaaaaaaaaa
srun --cpu_bind=verbose,mask_cpu:${maskCPU0},${maskCPU1} ./mpi_omp_binary
 
# This would map tasks to sockets. If the number of tasks differ from the number
# of allocated sockets, could result in sub-optimal binding.
srun --cpu_bind=verbose,mask_cpu:sockets ./mpi_omp_binary
wiki/prototype/slurmusage.txt · Last modified: 2017/04/11 10:44 (external edit)