likwid-mpirun(1) A tool to start and monitor MPI applications with LIKWID


likwid-memsweeper [-hvdOm] [-n number_of_processes] [-hostfile filename] [-nperdomain number_of_processes_in_domain] [-pin expression] [-omp omptype] [-mpi mpitype] [-g eventset] [--]


likwid-mpirun is a command line application that wraps the vendor-specific mpirun tool and adds calls to likwid-perfctr(1) to the execution string. The user-given application is ran, measured and the results returned to the staring node.


prints a help message to standard output, then exits
prints version information to standard output, then exits
prints debug messages to standard output
-n,-np,--n,--np <number_of_processes>
specifies how many MPI processes should be started
-hostfile <filename>
specifies the nodes to schedule the MPI processes on. If not given, the environment variables PBS_NODEFILE, LOADL_HOSTFILE and SLURM_HOSTFILE are checked.
-nperdomain <number_of_processes_in_domain>
specifies the processes per affinity domain (see likwid-pin for info about affinity domains)
-pin <expression>
specifies the pinning for hybrid execution (see likwid-pin for info about affinity domains)
-s, --skip <mask>
Specify skip mask as HEX number. For each set bit the corresponding thread is skipped.
-omp <omptype>
enables hybrid setup. Likwid tries to determine OpenMP type automatically. The only possible value are intel and gnu
-mpi <mpitype>
specifies the MPI implementation that should be used by the wrapper. Possible values are intelmpi, openmpi and mvapich2
activates the Marker API for the executed MPI processes
prints output in CSV not ASCII tables
stops parsing arguments for likwid-mpirun, in order to set options for underlying MPI implementation after --.


For standard application:
likwid-mpirun -np 32 ./myApp

Will run 32 MPI processes, each host is filled with as much processes as written in ppn

With pinning:
likwid-mpirun -np 32 -nperdomain S:2 ./myApp

Will start 32 MPI processes with 2 processes per socket.

For hybrid runs:
likwid-mpirun -np 32 -pin M0:0-3_M1:0-3 ./myApp

Will start 32 MPI processes with 2 processes per node. Threads of the first process are pinned to the cores 0-3 in NUMA domain 0 (M0). The OpenMP threads of the second process are pinned to the first four cores in NUMA domain 1 (M1)


When measuring Uncore events it is not possible to select a cpu pin expression that covers multiple sockets, e.g. S0:0-1_S0:2@S1:2. This runs two processes, each running on two CPUs. But since the first CPU of the second expression is on socket 0, which is already handled by S0:0-1, the second MPI process gets a event set that does not contain Uncore counters although the second part of the second expression would measure the Uncore counters on socket 1.


Written by Thomas Roehl <[email protected]>.