MPI User Parameters

General Information

Eager and Rendezvous

Eager and Rendezvous are protocols for MPI transmissions which are used depending on the size of the message that needs to be sent. Eager is good for smaller messages since it has a better latency with a lower bandwidth. At the other hand, Rendezvous provides a higher bandwidth thus increasing the latency due to synchronization.

Eager protocol

This protocol is used when the sender can guess that the message can fit in the receive buffers of the receiver (i.e. small messages). This way, no need of synchronization is needed between both processes.

Rendezvous protocol

When the sender can guess that the message won't fit in the receive buffers of the receiver (i.e. big messages) then the rendezvous protocol is used. As one can imagine, a synchronization between the sender and the received is needed.

OpenMPI

Several parameters can be tunned for OpenMPI. For a complete list execute the following at the Mont-Blanc prototype:

ompi_info --param all all --level 9

Each one of these parameters can be modified in three different ways:

  • Adding the parameter to ~/.openmpi/mca-params.conf
  • Modifying the environment variable OMPI_MCA_param_name
  • Using the option –mca with mpirun

The following parameters could be interesting to study since it can improve the performance of your application:

  • btl_tcp_rndv_eager_limit
    • Size (in bytes, including header) of “phase 1” fragment sent for all large messages (must be >= 0 and ⇐ eager_limit) (integer, size in bytes)
  • btl_tcp_eager_limit
    • Maximum size (in bytes, including header) of “short” messages (integer, size in bytes)
  • oob_base_enable_module_progress_threads
    • Whether to independently progress OOB messages for each interface (0: disabled, 1: enabled)
  • oob_tcp_listen_mode=listen_thread
    • Startup a separate thread dedicated to responding to connection requests

There is no universal rule for each of these parameters. This means that one application can perform better with one settings while another can drop it. The user is in charge to know its application and then, with this data, decide and check which configuration is the best for its application.

How to set these parameters

  • Adding the parameter to ~/.openmpi/mca-params.conf
#############################################
# MCA options at ~/.openmpi/mca-params.conf #
#############################################

# BTL params
btl_tcp_rndv_eager_limit=524288
btl_tcp_eager_limit=524288

# PML params (only for using with 1.8.8_omx)
#pml = cm

# OOB params
oob_tcp_listen_mode = listen_thread
oob_base_enable_module_progress_threads = false

# DEBUG params
#orte_report_launch_progress = true
#oob_base_verbose = 10
#plm_base_verbose = 5
#rml_base_verbose = 10
#pml_base_verbose = 256

# Other stuff
#coll_tuned_use_dynamic_rules = true

NOTE: the options set here are for educational purposes only.

  • Modifying the environment variable OMPI_MCA_param_name
#!/bin/bash
 
#SBATCH --partition=mb
#SBATCH --ntasks=64
#SBATCH --cpus-per-task=1
#SBATCH --out=mpi-%j.out
#SBATCH --err=mpi-%j.err
#SBATCH --time=10:00
 
export OMPI_MCA_btl_tcp_rndv_eager_limit=524288
export OMPI_MCA_btl_tcp_eager_limit=524288
export OMPI_MCA_oob_tcp_listen_mode=1
 
srun ./mpi_binary

NOTE: the options set here are for educational purposes only.

  • Using the option –mca with mpirun
#!/bin/bash
 
#SBATCH --partition=mb
#SBATCH --ntasks=64
#SBATCH --cpus-per-task=1
#SBATCH --out=mpi-%j.out
#SBATCH --err=mpi-%j.err
#SBATCH --time=10:00
 
export OMPI_MCA_btl_tcp_rndv_eager_limit=524288
export OMPI_MCA_btl_tcp_eager_limit=524288
export OMPI_MCA_oob_tcp_listen_mode=1
 
mpirun --mca btl_tcp_rndv_eager_limit 524288 --mca btl_tcp_eager_limit 524288 --mca oob_tcp_list_mode 1 ./mpi_binary

NOTE: the options set here are for educational purposes only.

MPICH

As well as for OpenMPI, MPICH allow the user to set different parameters to modify the behavior of the runtime. To see a complete list of these, execute the following at the Mont-Blanc prototype:

mpivars

In this case though, the only way to set these parameters is via environment variables.

  • Modifying the environment variable MPIR_CVAR_PARAM_NAME

The following parameters could be interesting to study since it can improve the performance of your application:

  • MPIR_CVAR_ASYNC_PROGRESS
    • If set to true, MPICH will initiate an additional thread to make asynchronous progress on all communication operations including point-to-point, collective, one-sided operations and I/O.
  • MPIR_CVAR_CH3_EAGER_MAX_MSG_SIZE:
    • This cvar controls the message size at which CH3 switches from eager to rendezvous mode.
  • MPIR_CVAR_ENABLE_SMP_${COLLECTIVE}
    • Enable different SMP aware collective. Supported ones are the following:
      • REDUCE: Enable SMP aware reduce.
      • BCAST: Enable SMP aware broadcast (See also: MPIR_CVAR_MAX_SMP_BCAST_MSG_SIZE, which sets the maximum message size for which SMP-aware broadcast is used).
      • BARRIER: Enable SMP aware barrier.
      • COLLECTIVES: Enable SMP aware collective communication.
      • ALLREDUCE: Enable SMP aware allreduce.
  • MPIR_CVAR_ALLTOALL_THROTTLE
    • max no. of irecvs/isends posted at a time in some alltoall algorithms. Setting it to 0 causes all irecvs/isends to be posted at a time in some alltoall algorithms. Setting it to 0 causes all irecvs/isends to be posted at once.

There is no universal rule for each of these parameters. This means that one application can perform better with one settings while another can drop it. The user is in charge to know its application and then, with this data, decide and check which configuration is the best for its application.

How to set these parameters

  • Modifying the environment variable MPIR_CVAR_PARAM_NAME
#!/bin/bash
 
#SBATCH --partition=mb
#SBATCH --ntasks=64
#SBATCH --cpus-per-task=1
#SBATCH --out=mpi-%j.out
#SBATCH --err=mpi-%j.err
#SBATCH --time=10:00
 
export MPIR_CVAR_ASYNC_PROGRESS=0
export MPIR_CVAR_CH3_EAGER_MAX_MSG_SIZE=524288
export MPIR_CVAR_ENABLE_SMP_COLLECTIVES=1
 
srun ./mpi_binary

NOTE: the options set here are for educational purposes only.

QR Code
QR Code mpi_opt (generated for current page)