Mont-Blanc Prototype

The Mont-Blanc prototype is located at the BSC facilities.

Cluster information

Setup

Use the command sinfo in order to check node availability.

  • Login nodes: ARM-based machines, provides access to the cluster.
    • Ubuntu 14.04, Kernel 3.11.0
    • 15 login nodes
      * User login to one of them in a random way when connecting to the cluster
  • Blade: The cluster has 63 blades and each of them contains 15 SDB nodes.
    • The blades are interconnected using a 10GbE network.
  • Nodes: you can find more information about the SDB here.
    • Kernel 3.11.0-3.11.0-bsc_opencl+

Storage

Each node has 15GB at root partition, /home and /apps folders are mounted over the network (Lustre) from the file servers and are shared among all the nodes.

Power monitoring

Getting the energy-to-solution of your job

Retrieving the energy-to-solution is straightforward since SLURM keeps track of it. You can query the SLURM accounting database using the sacct command. An example output of sacct looks like this:

 user@mb-login-12:~$ sacct -o JobID,ConsumedEnergy
        JobID ConsumedEnergy
 ------------ --------------
 74427
 74427.batch               0
 74427.0               4.29M
 74438
 74438.batch               0
 74438.0                 160

You can see that job 74438 consumed 160 Joules. Please be aware that if an error occurs during the measurement, you will see 4.29M (like in job 74427). We know that this is not very useful but this is a combination of some weird internals and a bug in SLURM.

Getting a full power trace

To retrieve a power trace for your application, you need to know the nodes on which your application was executed as well as the application's start and end times. One of the ways to achieve this, is to augment your job script as outlined below.

Example Job Script
 # Example SLURM job script for power monitoring
 # based on a standard hybrid MPI+OpenMP job
 #
 #SBATCH --partition=mb
 #SBATCH --ntasks=2
 #SBATCH --cpus-per-task=2
 #SBATCH --out=omp-%j.out
 #SBATCH --err=omp-%j.err
 
 export OMP_NUM_THREADS=2
 
 # Print the node list
 echo "Running on nodes: `scontrol show hostname $SLURM_JOB_NODELIST`"
 
 # Print the start time
 echo -n "Execution starts at: "
 date +%s
 
 # Run my application
 srun my_application
 
 # Print the end time
 echo -n "Execution stops at: "
 date +%s
Retrieve the Power Trace

To retrieve the power trace in the form of a comma separated value file (CSV), you need the dcdbquery tool. It becomes available with:

 module load power_monitor

The dcdbquery tool has the following syntax:

 dcdbquery [-r] [-l] [-h <hostname>] <Sensor 1> [<Sensor 2> ...] <Start> <End>
 
 where
 <hostname> - the name of the database server. Use mb.mont.blanc for this.
 <Sensor n> - the name of one or more sensors (see below)
 <Start>    - start of time series
 <End>      - end of time series

Sensor names comprise of the node name and the type of sensor (PWR stands for power consumption). For example, if your job runs on node mb-237, the name of the associated power sensor is mb-237-PWR.

Start and End times can be supplied in two formats:

  • Human readable: Supply them as 'yyyy-mm-dd hh:mm:ss' (with quotes), e.g.'2015-04-16 15:38:29'
  • Unix epoch: corresponding to the output of 'date +%s'

By default, times are interpreted to be in UTC! Use the '-l' option to dcdbquery to switch interpretation of <Start> and <End> as well as the generated output to your local timezone.

If the -r option is specified, the generated output contains the raw internal timestamps (nanoseconds since UNIX Epoch) instead of the human readable ISO format.

Example:

 dcdbquery -h mb.mont.blanc -r mb-915-PWR 1429178521 1429178527
 Sensor,Time,Value
 mb-915-PWR,1429178521580000000,8458
 mb-915-PWR,1429178522600000000,9462
 mb-915-PWR,1429178523620000000,8994
 mb-915-PWR,1429178525650000000,7475
 mb-915-PWR,1429178526670000000,7472
Creating nice plots

The CSV output of dcdbquery can be interpreted by many popular applications (Excel, OpenOffice Calc, etc.). The following script uses some bash scripting fun to generate a nice plot of your appliaction's power trace based on the CSV output of dcdbquery (please run dcdbquery with the -r option!):

#/bin/bash
 
if [ "$#" -lt "1" ]; then
	echo "Usage: plot.sh <filename> [outfile.png]"
	exit
fi
 
# Check that file exists
if [ ! -e $1 ]; then
	echo "File $1 does not exist."
	exit
fi
 
# Check that tmp doesn't exists
if [ -e tmp ]; then
	echo "tmp directory already exists. Aborting."
	exit
fi
 
# Get length of file
L=`cat $1 | wc -l`
L=$(($L-1))
 
# Check for the contained sensor names
S=`tail -n $L $1 | awk -F "," '{printf("%s\n", $1)}' | uniq`
echo "Found sensors:"
echo "$S"
 
# Create a file for each sensor
mkdir -p tmp
while read -r i; do
	echo "Creating tmp/$i".dat
	grep "$i" $1 | sort > "tmp/$i".dat
done <<< "$S"
 
# Find the smallest time stamp
U="2999999999999999999"
while read -r i; do
	T=`head -n1 tmp/${i}.dat | awk -F "," '{print $2}'`
	echo "tmp/${i}.dat starts at: $T"
	if [ "$T" -lt "$U" ]; then
		U="$T"
	fi
done <<< "$S"
echo "Plot starts at: $U"
 
# Write new data files with updated time stamps
while read -r i; do
	cat tmp/${i}.dat | awk -F "," '{printf("%s,%f,%s\n",$1,($2-'$U')/1000000000,$3)}' > tmp/${i}.dat2
done <<< "$S"
 
# Generate Plot File (general stuff)
cat > tmp/plot.gplt << EOF
# GNUPlot file for visualizing dcdbqueries
 
set title 'MontBlanc Power Consumption'
set xlabel "Time (sec)"
set ylabel "Power (mW)"
 
set border linewidth 1.5
 
set datafile separator ","
EOF
 
# Generate Plot File (output config)
if [ "$#" -eq "2" ]; then
	cat>> tmp/plot.gplt << EOF
 
set terminal png
set output '$2'
EOF
fi
 
# Generate Plot File (plot data)
cat>> tmp/plot.gplt << EOF
plot \\
EOF
 
while read -r i; do
	echo "'tmp/${i}.dat2' using 2:3 title '$i' with lines, \\">> tmp/plot.gplt
done <<< "$S"
cat>> tmp/plot.gplt << EOF
 
EOF
 
# Generate Plot File (wait if interactive)
if [ "$#" -eq "1" ]; then
	cat>> tmp/plot.gplt << EOF
pause -1
EOF
fi
 
# Show Plot
gnuplot tmp/plot.gplt
 
# Delete tmp dir
rm -r tmp

Login information

Account

If you want to get access to the cluster, please send an e-mail to hca.sysadmin@bsc.es

  • Full name
  • Institution
  • Intended use

Please, do not share your account with other people. If several users from your institution require accessing the ARM clusters, please ask for one account for each user.

By getting access to the ARM clusters at BSC, we might ask you to share your performance results to help us improve the cluster configuration.

Login

You can access the prototype cluster using ssh:

  • Log into the Mont-Blanc cluster login node with your user:
    • ssh user@mb.bsc.es

At this point, you would be at one of the login nodes (ARM-based machine). From here you could compile your applications (applications are found at /apps) or execute your application using the job scheduler (SLURM).

Change Password

In order to change your password in the Mont-Blanc prototype you must use the following link.

Software available

Location

  • All the software is located at /apps folder.

Software

  • GNU Bison
  • CLooG
  • FLEX
  • GMP Library
  • ISL
  • libunwind
  • GNU MPC
  • GNU MPFR
  • PAPI
  • GNU Compiler Suite
  • Environment modules
  • Boost
  • Runtime
    • MPICH
    • OpenMPI
    • OpenCL Full Profile
    • OmpSs (stable and development)
    • Open-MX
  • Scientific libraries
    • FFTW
    • HDF5
    • ATLAS
    • clBLAS
  • Development Tools
    • Extrae
    • Allinea DDT
    • Scalasca
    • LTTNG
  • Frameworks
  • GASNet
  • SLURM script example for an MPI application:
#!/bin/sh
#SBATCH --ntasks=$NTASKS
#SBATCH --cpus-per-task=$NCPU_TASK
#SBATCH --partition=$PARTITION
#SBATCH --job-name=$JOB_NAME
#SBATCH --error=err/$JOB_NAME-%j.err
#SBATCH --output=out/$JOB_NAME-%j.out
#SBATCH --workdir=/path/to/binaries
 
srun ./$PROG

NOTE: to run an OpenCL application you must add #SBATCH –gres=gpu to your jobscript.

This script must be launched using the command sbatch jobscript.sh from mb-login-$N node. You can look at some job scripts examples here.

Environment Modules

The Environment Modules package provides for the dynamic modification of a user's environment via modulefiles.

Each modulefile contains the information needed to configure the shell for an application. Once the Modules package is initialized, the environment can be modified on a per-module basis using the module command which interprets modulefiles. Typically modulefiles instruct the module command to alter or set shell environment variables such as PATH, MANPATH, etc. modulefiles may be shared by many users on a system and users may have their own collection to supplement or replace the shared modulefiles.

Modules can be loaded and unloaded dynamically and atomically, in an clean fashion. Modules are useful in managing different versions of applications. Modules can also be bundled into metamodules that will load an entire suite of different applications.

A complete list of available modules can be obtained using the command module avail at one of the compute nodes, obtaining an output as the folllowing.

druiz@mb-login-1:~$ module avail
 
------------------------------------------------ /apps/modules/3.2.10/Modules/default/modulefiles/compilers -------------------------------------------------
gcc/4.8.2                                  mpich/3.1.4_omx                            openmpi/1.6.4_experimental_tracing_debug
gcc/4.9.0                                  ompss/14.10                                openmpi/1.6.4_experimental_tracing_gunwind
gcc/4.9.1                                  ompss/15.02                                openmpi/1.8.3
gcc/4.9.2                                  ompss/15.04(default)                       openmpi/1.8.3_omxTesting
gcc/5.1.0                                  ompss/git-gcc-4.8.4                        openmpi/1.8.7
gcc/5.2.0(default)                         openmpi/1.10.0                             openmpi/1.8.8(default)
mpich/3.1.3                                openmpi/1.6.4                              openmpi/1.8.8_omx
mpich/3.1.4(default)                       openmpi/1.6.4_experimental_tracing
 
-------------------------------------------------- /apps/modules/3.2.10/Modules/default/modulefiles/tools ---------------------------------------------------
allineaDDT/latest(default)            extrae/3.2.0                          papi/5.4.1(default)                   scalasca/2.2.1(default)
extrae/2.5.0                          lttng/2.6.2(default)                  perf/3.11.0-bsc_opencl+               scalasca/2.2.1_externCube
extrae/3.0.1                          papi/5.3.0                            perf/3.11.0-bsc_opencl_dvfs+(default) scons/2.3.6(default)
extrae/3.1.0(default)                 papi/5.4.0                            power_monitor/power_monitor(default)
 
------------------------------------------------ /apps/modules/3.2.10/Modules/default/modulefiles/libraries -------------------------------------------------
atlas/3.11.27                 clBLAS/latest(default)        fortranCL/0.1alpha4(default)  liburcu/0.8.7(default)        scalapack/2.0.2(default)
atlas/3.11.31_lapack(default) clFFT/latest(default)         hdf5/1.8.13(default)          opencl/1.1.0(default)         vtk/6.1.0(default)
boost/1.56.0                  fftw/2.1.5                    hdf5/1.8.13_parallel          opengl/2.4.0(default)
boost/1.58.0(default)         fftw/3.3.4(default)           lapack/3.5.0(default)         petsc/3.5.3(default)
 
------------------------------------------------ /apps/modules/3.2.10/Modules/default/modulefiles/frameworks ------------------------------------------------
GASNet/1.24.2(default)

Note that several versions are provided for some of the modules, like gcc. The following examples will show you how to load, unload or obtain info about available modules. Several module can be specified to obtain or load/unload at once.

druiz@mb-login-1:~$ module help gcc/4.9.2
 
load gcc/4.9.2 (PATH, MANPATH, LD_LIBRARY_PATH)
 
druiz@mb-login-1:~$ module list
 
Currently Loaded Modulefiles:
  1) atlas/3.11.31_lapack   5) hdf5/1.8.13            9) extrae/3.0.1
  2) lapack/3.5.0           6) opencl/1.1.0          10) gcc/4.9.2
  3) boost/1.56.0           7) papi/5.4.0
  4) fftw/3.3.4             8) openmpi/1.8.3

Loading modules at login

If you want to load some modules automatically when login in to mb-cluster you can add the following to your .bashrc script.

module load gcc openmpi
QR Code
QR Code montblanc_chapel (generated for current page)