\documentclass[10pt]{article} \usepackage[margin=3cm]{geometry} \usepackage{amsmath} \usepackage{amsfonts} \usepackage{eurosym} \usepackage{ucs} \usepackage[utf8x]{inputenc} \usepackage{amssymb} \usepackage{graphicx} \usepackage{color} \pagestyle{plain} \usepackage{hyperref} \begin{document} \title{SLURM} \author{ } \date{2024-04-20} \maketitle \section{General information} \subsection{SLURM commands} \begin{center} \renewcommand{\arraystretch}{1.5} \setlength{\tabcolsep}{2pt} \begin{tabular}{|p{0.084000\linewidth}|p{0.134444\linewidth}|p{0.117586\linewidth}|} \hline {\bf Command }&{\bf Information }&{\bf Usage } \\ \hline {\bf sbatch }&{\bf Submit a job script to SLURM }&{\bf sbatch [options] job_script.sh } \\ \hline {\bf srun }&{\bf Run a parallel job without job script }&{\bf srun [options] executable } \\ \hline {\bf sall }&{\bf Allocate nodes to SSH to }&{\bf salloc -w mb-1,node-2,…,node-N -p mb } \\ \hline {\bf sinfo }&{\bf View information about SLURM nodes and partitions }&{\bf sinfo -p mb, sinfo -N -n mb-N } \\ \hline {\bf squeue }&{\bf View information about jobs located in the SLURM scheduling queue }&{\bf squeue [-p mb] } \\ \hline {\bf scancel }&{\bf Cancel job }&{\bf scancel -u $user –state=PD, scancel $JobID } \\ \hline {\bf sacct }&{\bf Display accounting data of jobs }&{\bf sacct -u $user -o field1,field2,...,fieldN } \\ \hline \end{tabular} \end{center} | More information regarding those commands can be check via {\bf man}. \section{SLURM Mont-Blanc power monitor Plugin} ---- :!::!: {\bf IMPORTANT NOTE} :!::!: The SLURM power monitor plugin only computes the energy-to-solution of the commands executed via ''srun''. This means that, even for serial applications, in case you need to get energy measures, you {\bf must} use the srun command to execute the application, no matter if it is from the command line or inside the job script which will be submitted via ''sbatch''. ---- \subsection{In which cases the energy-to-solution is computed?} An SLURM job is composed by one or more job steps. Each job step is command executed via ''srun''. Then, answering the question, the SLURM plugin computes the energy-to-solution {\bf only of the job steps}, reporting the value per job step. The following table contains all the known cases: \begin{center} \renewcommand{\arraystretch}{1.5} \setlength{\tabcolsep}{2pt} \begin{tabular}{|p{0.121495\linewidth}|p{0.233645\linewidth}|p{0.616822\linewidth}|} \hline {\bf Command }&{\bf Executed where? }&{\bf Energy-to-solution available? } \\ \hline srun & At the command line & YES \\ \hline srun & Inside a job script & YES \\ \hline sbatch & Submitted & YES (but only for commands executed with srun at the job script) \\ \hline salloc & At the command line & YES \\ \hline \end{tabular} \end{center} The summary is that, if you want the energy-to-solution of a job, SLURM can only provide the one from binaries executed with srun. In case you need the energy-to-solution of the rest of the job script, then you should compute it manually. \section{SLURM job scripts} \subsection{Preliminary information} Number of allocated nodes is computed by SLURM using one of the following formulas, depending on the options set. \begin{itemize} \begin{itemize} \item N = ntasks * cpus-per-task \item N = ntasks * ntasks-per-core \item N = ntasks * ntasks-per-socket \item N = ntasks * ntasks-per-node \end{itemize} \end{itemize} Also, one can specify how many nodes can be allocated with {\bf -N} or {\bf –nodes} option. \begin{verbatim} -N, --nodes= Request that a minimum of minnodes nodes be allocated to this job. A maximum node count may also be specified with maxnodes. If only one number is specified, this is used as both the minimum and maximum node count. The partition's node limits supersede those of the job. If a job's node limits are outside of the range permitted for its associated partition, the job will be left in a PEND‐ ING state. This permits possible execution at a later time, when the partition limit is changed. If a job node limit exceeds the number of nodes configured in the partition, the job will be rejected. Note that the environment variable SLURM_NNODES will be set to the count of nodes actually allocated to the job. See the ENVIRONMENT VARIABLES section for more information. If -N is not specified, the default behavior is to allocate enough nodes to satisfy the requirements of the -n and -c options. The job will be allocated as many nodes as possible within the range specified and without delaying the initiation of the job. The node count spec‐ ification may include a numeric value followed by a suffix of "k" (multiplies numeric value by 1,024) or "m" (multiplies numeric value by 1,048,576). \end{verbatim} {\bf NOTE: } It is mandatory to set the wall-clock limit at the job script by using the {\bf --time} option. \begin{verbatim} -t, --time=