SYNOPSISlikwid-memsweeper [-hvdOm] [-n number_of_processes] [-hostfile filename] [-nperdomain number_of_processes_in_domain] [-pin expression] [-omp omptype] [-mpi mpitype] [-g eventset] [--]
DESCRIPTIONlikwid-mpirun is a command line application that wraps the vendor-specific mpirun tool and adds calls to likwid-perfctr(1) to the execution string. The user-given application is ran, measured and the results returned to the staring node.
- prints a help message to standard output, then exits
- prints version information to standard output, then exits
- prints debug messages to standard output
- -n,-np,--n,--np <number_of_processes>
- specifies how many MPI processes should be started
- -hostfile <filename>
- specifies the nodes to schedule the MPI processes on. If not given, the environment variables PBS_NODEFILE, LOADL_HOSTFILE and SLURM_HOSTFILE are checked.
- -nperdomain <number_of_processes_in_domain>
- specifies the processes per affinity domain (see likwid-pin for info about affinity domains)
- -pin <expression>
- specifies the pinning for hybrid execution (see likwid-pin for info about affinity domains)
- -s, --skip <mask>
- Specify skip mask as HEX number. For each set bit the corresponding thread is skipped.
- -omp <omptype>
- enables hybrid setup. Likwid tries to determine OpenMP type automatically. The only possible value are intel and gnu
- -mpi <mpitype>
- specifies the MPI implementation that should be used by the wrapper. Possible values are intelmpi, openmpi and mvapich2
- activates the Marker API for the executed MPI processes
- prints output in CSV not ASCII tables
stops parsing arguments for likwid-mpirun, in order to set options for underlying MPI implementation after --.
- For standard application:
- likwid-mpirun -np 32 ./myApp
Will run 32 MPI processes, each host is filled with as much processes as written in ppn
- With pinning:
- likwid-mpirun -np 32 -nperdomain S:2 ./myApp
Will start 32 MPI processes with 2 processes per socket.
- For hybrid runs:
- likwid-mpirun -np 32 -pin M0:0-3_M1:0-3 ./myApp
Will start 32 MPI processes with 2 processes per node. Threads of the first process are pinned to the cores 0-3 in NUMA domain 0 (M0). The OpenMP threads of the second process are pinned to the first four cores in NUMA domain 1 (M1)
BUGSWhen measuring Uncore events it is not possible to select a cpu pin expression that covers multiple sockets, e.g. S0:0-1_S0:[email protected]:2. This runs two processes, each running on two CPUs. But since the first CPU of the second expression is on socket 0, which is already handled by S0:0-1, the second MPI process gets a event set that does not contain Uncore counters although the second part of the second expression would measure the Uncore counters on socket 1.