Tracing manual for Mont-Blanc prototype

NOTE: This guide explains how to generate a Paraver trace with Extrae tracing tool by using the LD_PRELOAD method. Another methods are available, but not covered here.

MPI{+PROG_MOD} applications

Prepare your binary

This step is only needed for C/C++ applications

Due to some bugs at the libunwind library used by Extrae, the first thing to do is to recompile our application by using the following flags. If not, our MPI application is more likely to incur into a segmentation fault during the execution.

-funwind-tables -g

Prepare your job script

Now we need to modify our job script to specify that we want to generate a trace with Extrae. For this, first thing to do is to load the module of the MPI implementation we want to use. Once done, load the Extrae module.

druiz@mb-login-12:~$ module load openmpi/1.10.0
load openmpi/1.10.0 (PATH, MANPATH, LD_LIBRARY_PATH)
druiz@mb-login-12:~$ module load extrae
load extrae/3.2.1 (PATH, LD_LIBRARY_PATH, C_INCLUDE_PATH, EXTRAE_HOME)

At this point we should have set the environment variable $EXTRAE_HOME with the path of the proper Extrae installation to use.

Now we need to copy the needed files to the folder where our job script is located. This steps is dependent of the programming model we are using. Note that the exact folders can change depending on the programming model as well.

druiz@mb-login-12:~/job$ cp $EXTRAE_HOME/share/example/${PROGRAMMING_MODEL}/extrae.xml .
druiz@mb-login-12:~/job$ cp $EXTRAE_HOME/share/example/${PROGRAMMING_MODEL}/ld-preload/trace.sh .

Where programming model can be one of the following:

  • MPI
  • MPI+OMP
  • MPI+OMPSS
  • MPI+OPENCL

Now edit trace.sh file. Please note that since the extrae.xml is located at the same folder we need to make sure that the path is correct for $EXTRAE_CONFIG_FILE variable.

#!/bin/bash
 
export EXTRAE_HOME=/apps/extrae/3.2.1/openmpi/1.10.0
export EXTRAE_CONFIG_FILE=./extrae.xml
 
# Example for MPI only application, set only one
export LD_PRELOAD=${EXTRAE_HOME}/lib/libmpitrace.so # For C apps
#export LD_PRELOAD=${EXTRAE_HOME}/lib/libmpitracef.so # For Fortran apps
 
## Run the desired program
$*

Regarding the extrae.xml file, we will not cover how to set the different options available. For this, one could check the commented examples provided at $EXTRAE_HOME/share/example/${PROGRAMMING_MODEL}

The last thing to do now is to modify our job script. The only change needed is at the srun command, where we should add the execution of the trace.sh script.

#!/bin/bash
 
#SBATCH --partition=mb
#SBATCH --ntasks=64
#SBATCH --cpus-per-task=1
#SBATCH --out=mpi-%j.out
#SBATCH --err=mpi-%j.err
#SBATCH --time=10:00
 
srun ./trace.sh ./mpi_binary

Submit your job

No special effort is needed at this point, just submit your job as usual.

sbatch job.sh

Once the job finishes the trace should be generated. The default is to generate it with the same name as the application binary and in the same folder from where the application was executed. The trace consists in 3 files with extensions .prv, .row and .pcf.

Manual merge of the trace

NOTE: The merge process is done automatically by Extrae at the end of the execution and, almost every time, works fine. For those cases that not, disable the merge tag at the extrae.xml and follow the following steps.

Merging the trace can be done in a serial or a parallel way. We strongly suggest to execute the merger in parallel since the merging process can spent a lot of time for big traces.

The best idea is to put the parallel merge inside your job script. This way you will be able to use the same number of processors for the merging process. Anyhow, it can be done with a different number of nodes.

The following example shows how to merge the trace at the same job script used to generate it.

#!/bin/bash
 
#SBATCH --partition=mb
#SBATCH --ntasks=64
#SBATCH --cpus-per-task=1
#SBATCH --out=mpi-%j.out
#SBATCH --err=mpi-%j.err
#SBATCH --time=10:00
 
srun ./trace.sh ./mpi_binary
 
# Make sure the intermediate files are synced
sync
sleep 5s
 
# Merge the trace
srun mpimpi2prv -f TRACE.mpits -o ${TRACE_NAME}.prv
QR Code
QR Code extrae (generated for current page)