likwid-mpirun(1) A tool to start and monitor MPI applications with LIKWID

SYNOPSIS

likwid-memsweeper [-hvdOm] [-n number_of_processes] [-hostfile filename] [-nperdomain number_of_processes_in_domain] [-pin expression] [-omp omptype] [-mpi mpitype] [-g eventset] [--]

DESCRIPTION

likwid-mpirun is a command line application that wraps the vendor-specific mpirun tool and adds calls to likwid-perfctr(1) to the execution string. The user-given application is ran, measured and the results returned to the staring node.

OPTIONS

-h,--help
prints a help message to standard output, then exits
-v,--version
prints version information to standard output, then exits
-d,--debug
prints debug messages to standard output
-n,-np,--n,--np <number_of_processes>
specifies how many MPI processes should be started
-hostfile <filename>
specifies the nodes to schedule the MPI processes on. If not given, the environment variables PBS_NODEFILE, LOADL_HOSTFILE and SLURM_HOSTFILE are checked.
-nperdomain <number_of_processes_in_domain>
specifies the processes per affinity domain (see likwid-pin for info about affinity domains)
-pin <expression>
specifies the pinning for hybrid execution (see likwid-pin for info about affinity domains)
-s, --skip <mask>
Specify skip mask as HEX number. For each set bit the corresponding thread is skipped.
-omp <omptype>
enables hybrid setup. Likwid tries to determine OpenMP type automatically. The only possible value are intel and gnu
-mpi <mpitype>
specifies the MPI implementation that should be used by the wrapper. Possible values are intelmpi, openmpi and mvapich2
-m,--marker
activates the Marker API for the executed MPI processes
-O
prints output in CSV not ASCII tables
--
stops parsing arguments for likwid-mpirun, in order to set options for underlying MPI implementation after --.

EXAMPLE

1.
For standard application:
likwid-mpirun -np 32 ./myApp

Will run 32 MPI processes, each host is filled with as much processes as written in ppn

2.
With pinning:
likwid-mpirun -np 32 -nperdomain S:2 ./myApp

Will start 32 MPI processes with 2 processes per socket.

3.
For hybrid runs:
likwid-mpirun -np 32 -pin M0:0-3_M1:0-3 ./myApp

Will start 32 MPI processes with 2 processes per node. Threads of the first process are pinned to the cores 0-3 in NUMA domain 0 (M0). The OpenMP threads of the second process are pinned to the first four cores in NUMA domain 1 (M1)

BUGS

When measuring Uncore events it is not possible to select a cpu pin expression that covers multiple sockets, e.g. S0:0-1_S0:2@S1:2. This runs two processes, each running on two CPUs. But since the first CPU of the second expression is on socket 0, which is already handled by S0:0-1, the second MPI process gets a event set that does not contain Uncore counters although the second part of the second expression would measure the Uncore counters on socket 1.

AUTHOR

Written by Thomas Roehl <[email protected]>.