\documentclass[10pt]{article} \usepackage[margin=3cm]{geometry} \usepackage{amsmath} \usepackage{amsfonts} \usepackage{eurosym} \usepackage{ucs} \usepackage[utf8x]{inputenc} \usepackage{amssymb} \usepackage{graphicx} \usepackage{color} \pagestyle{plain} \usepackage{hyperref} \begin{document} \title{MPI User Parameters} \author{ } \date{2024-04-23} \maketitle \section{General Information} \subsection{Eager and Rendezvous} Eager and Rendezvous are protocols for MPI transmissions which are used depending on the size of the message that needs to be sent. Eager is good for smaller messages since it has a better latency with a lower bandwidth. At the other hand, Rendezvous provides a higher bandwidth thus increasing the latency due to synchronization. \subsubsection{Eager protocol} This protocol is used when the sender can guess that the message can fit in the receive buffers of the receiver (i.e. small messages). This way, no need of synchronization is needed between both processes. \subsubsection{Rendezvous protocol} When the sender can guess that the message won't fit in the receive buffers of the receiver (i.e. big messages) then the rendezvous protocol is used. As one can imagine, a synchronization between the sender and the received is needed. \section{OpenMPI} Several parameters can be tunned for OpenMPI. For a complete list execute the following at the Mont-Blanc prototype: \begin{verbatim} ompi_info --param all all --level 9 \end{verbatim} Each one of these parameters can be modified in three different ways: \begin{itemize} \item Adding the parameter to ''~/.openmpi/mca-params.conf'' \item Modifying the environment variable ''OMPI_MCA_param_name'' \item Using the option ''--mca'' with ''mpirun'' \end{itemize} The following parameters could be interesting to study since it can improve the performance of your application: \begin{itemize} \item ''btl_tcp_rndv_eager_limit'' \begin{itemize} \item Size (in bytes, including header) of "phase 1" fragment sent for all large messages (must be >= 0 and <= eager_limit) (integer, size in bytes) \end{itemize} \item ''btl_tcp_eager_limit'' \begin{itemize} \item Maximum size (in bytes, including header) of "short" messages (integer, size in bytes) \end{itemize} \item ''oob_base_enable_module_progress_threads'' \begin{itemize} \item Whether to independently progress OOB messages for each interface (0: disabled, 1: enabled) \end{itemize} \item ''oob_tcp_listen_mode=listen_thread'' \begin{itemize} \item Startup a separate thread dedicated to responding to connection requests \end{itemize} \end{itemize} There is no universal rule for each of these parameters. This means that one application can perform better with one settings while another can drop it. The user is in charge to know its application and then, with this data, decide and check which configuration is the best for its application. \subsection{How to set these parameters} \begin{itemize} \item Adding the parameter to ''~/.openmpi/mca-params.conf'' \end{itemize} \#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\# \# MCA options at ~/.openmpi/mca-params.conf \# \#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\#\# \# BTL params btl_tcp_rndv_eager_limit=524288 btl_tcp_eager_limit=524288 \# PML params (only for using with 1.8.8_omx) \#pml = cm \# OOB params oob_tcp_listen_mode = listen_thread oob_base_enable_module_progress_threads = false \# DEBUG params \#orte_report_launch_progress = true \#oob_base_verbose = 10 \#plm_base_verbose = 5 \#rml_base_verbose = 10 \#pml_base_verbose = 256 \# Other stuff \#coll_tuned_use_dynamic_rules = true {\bf NOTE:} the options set here are for educational purposes only. \begin{itemize} \item Modifying the environment variable ''OMPI_MCA_param_name'' \end{itemize} \begin{verbatim} #!/bin/bash #SBATCH --partition=mb #SBATCH --ntasks=64 #SBATCH --cpus-per-task=1 #SBATCH --out=mpi-%j.out #SBATCH --err=mpi-%j.err #SBATCH --time=10:00 export OMPI_MCA_btl_tcp_rndv_eager_limit=524288 export OMPI_MCA_btl_tcp_eager_limit=524288 export OMPI_MCA_oob_tcp_listen_mode=1 srun ./mpi_binary \end{verbatim} {\bf NOTE:} the options set here are for educational purposes only. \begin{itemize} \item Using the option ''--mca'' with ''mpirun'' \end{itemize} \begin{verbatim} #!/bin/bash #SBATCH --partition=mb #SBATCH --ntasks=64 #SBATCH --cpus-per-task=1 #SBATCH --out=mpi-%j.out #SBATCH --err=mpi-%j.err #SBATCH --time=10:00 export OMPI_MCA_btl_tcp_rndv_eager_limit=524288 export OMPI_MCA_btl_tcp_eager_limit=524288 export OMPI_MCA_oob_tcp_listen_mode=1 mpirun --mca btl_tcp_rndv_eager_limit 524288 --mca btl_tcp_eager_limit 524288 --mca oob_tcp_list_mode 1 ./mpi_binary \end{verbatim} {\bf NOTE:} the options set here are for educational purposes only. \section{MPICH} As well as for OpenMPI, MPICH allow the user to set different parameters to modify the behavior of the runtime. To see a complete list of these, execute the following at the Mont-Blanc prototype: \begin{verbatim} mpivars \end{verbatim} In this case though, the only way to set these parameters is via environment variables. \begin{itemize} \item Modifying the environment variable ''MPIR_CVAR_PARAM_NAME'' \end{itemize} The following parameters could be interesting to study since it can improve the performance of your application: \begin{itemize} \item ''MPIR_CVAR_ASYNC_PROGRESS'' \begin{itemize} \item If set to true, MPICH will initiate an additional thread to make asynchronous progress on all communication operations including point-to-point, collective, one-sided operations and I/O. \end{itemize} \item ''MPIR_CVAR_CH3_EAGER_MAX_MSG_SIZE:'' \begin{itemize} \item This cvar controls the message size at which CH3 switches from eager to rendezvous mode. \end{itemize} \item ''MPIR_CVAR_ENABLE_SMP_${COLLECTIVE}'' \begin{itemize} \item Enable different SMP aware collective. Supported ones are the following: \begin{itemize} \item ''REDUCE'': Enable SMP aware reduce. \item ''BCAST'': Enable SMP aware broadcast (See also: MPIR_CVAR_MAX_SMP_BCAST_MSG_SIZE, which sets the maximum message size for which SMP-aware broadcast is used). \item ''BARRIER'': Enable SMP aware barrier. \item ''COLLECTIVES'': Enable SMP aware collective communication. \item ''ALLREDUCE'': Enable SMP aware allreduce. \end{itemize} \end{itemize} \item ''MPIR_CVAR_ALLTOALL_THROTTLE'' \begin{itemize} \item max no. of irecvs/isends posted at a time in some alltoall algorithms. Setting it to 0 causes all irecvs/isends to be posted at a time in some alltoall algorithms. Setting it to 0 causes all irecvs/isends to be posted at once. \end{itemize} \end{itemize} There is no universal rule for each of these parameters. This means that one application can perform better with one settings while another can drop it. The user is in charge to know its application and then, with this data, decide and check which configuration is the best for its application. \subsection{How to set these parameters} \begin{itemize} \item Modifying the environment variable ''MPIR_CVAR_PARAM_NAME'' \end{itemize} \begin{verbatim} #!/bin/bash #SBATCH --partition=mb #SBATCH --ntasks=64 #SBATCH --cpus-per-task=1 #SBATCH --out=mpi-%j.out #SBATCH --err=mpi-%j.err #SBATCH --time=10:00 export MPIR_CVAR_ASYNC_PROGRESS=0 export MPIR_CVAR_CH3_EAGER_MAX_MSG_SIZE=524288 export MPIR_CVAR_ENABLE_SMP_COLLECTIVES=1 srun ./mpi_binary \end{verbatim} {\bf NOTE:} the options set here are for educational purposes only. \end{document}