\documentclass[10pt]{article} \usepackage[margin=3cm]{geometry} \usepackage{amsmath} \usepackage{amsfonts} \usepackage{eurosym} \usepackage{ucs} \usepackage[utf8x]{inputenc} \usepackage{amssymb} \usepackage{graphicx} \usepackage{color} \pagestyle{plain} \usepackage{hyperref} \begin{document} ~~NOTOC~~ \title{Mont-Blanc Prototype} \author{ } \date{2024-04-25} \maketitle Currently we have deployed two versions of the cluster, one for production an another one for testing. \begin{itemize} \begin{itemize} \begin{itemize} \item 7 chassis \item 15 login nodes \item 930 compute nodes \end{itemize} \item [[:montblanc_d6|Mont-Blanc testing]] \begin{itemize} \item 1 chassis \item 1 login node \item 134 compute nodes \end{itemize} \end{itemize} \end{itemize} \section{FAQ} \subsection{How do I submit a job to the prototype?} \subsection{Which software is installed at the Mont-Blanc prototype?} \subsection{How can I get power data from my jobs?} :!: {\bf The Mont-Blanc Power Accounting plugin for SLURM is not working properly at the moment. The energy reported by the ''sacct'' command could be incorrect} :!: It depends on which kind of data do you want. If you only care about the energy to solution of your job, then is as simple as execute: \begin{verbatim} sacct -j ${JOBID} -o jobid,jobname,partition,alloccpus,state,exitcode,consumedenergy \end{verbatim} \subsection{Why ''sacct'' command is reporting 4.29M as consumed energy for my jobs?} When the consumed energy field is filled with 4.29M the computation of the energy-to-solution failed for some reason. Most probable one could be that some of the power traces from some nodes are missing for another issue with the power database. \subsection{Why ''sacct'' command is reporting 0 as consumed energy for my jobs?} It depends on where you are getting the 0J as consumed energy. For example: \begin{verbatim} JobID JobName Partition AllocCPUS NNodes State ExitCode ConsumedEnergy ------------ ---------- ---------- ---------- -------- ---------- -------- -------------- 219999 power_mon+ mb-priori+ 2 1 COMPLETED 0:0 219999.batch batch 1 1 COMPLETED 0:0 0 219999.0 dense-mat+ 2 1 COMPLETED 0:0 28 220025 power_mon+ mb-priori+ 2 1 COMPLETED 0:0 220025.batch batch 1 1 COMPLETED 0:0 0 220025.0 dense-mat+ 2 1 COMPLETED 0:0 71 220025.1 dense-mat+ 2 1 COMPLETED 0:0 51 220030 power_mon+ mb-priori+ 2 1 COMPLETED 0:0 220030.batch batch 1 1 COMPLETED 0:0 0 \end{verbatim} For example, the job 219999 report 28 because the job script was executing the binary with the ''srun'' command. But, as you can see, only the jobstep 0 is reporting the data (i.e. JobID equals to 219999.0) while the ''batch'' job step reports 0. Afterwards, a job with two job steps is executed (JobId equals to 220025). For each job step the consumed energy is reported. Finally, the last job (JobId equals to 220030) does not report any consumed energy since the ''srun'' command were not used for executing the application inside the job script. \subsection{Is there any user-level configuration for MPI jobs?} \subsection{How can I get a Paraver trace of my application?} \subsection{How can I run my benchmark ensuring any other job is running at the cluster at the same time?} \subsection{I am facing some weird behavior and errors with the filesystem. Is there anything I can do to improve the overall performance of Lustre?} \subsection{Who should I contact if I have a doubt or a technical issue?} \end{document}