g_tune_pme(1) time mdrun as a function of PME nodes to optimize settings

SYNOPSIS

g_tune_pme -p perf.out -err errors.log -so tuned.tpr -s topol.tpr -o traj.trr -x traj.xtc -cpi state.cpt -cpo state.cpt -c confout.gro -e ener.edr -g md.log -dhdl dhdl.xvg -field field.xvg -table table.xvg -tablep tablep.xvg -tableb table.xvg -rerun rerun.xtc -tpi tpi.xvg -tpid tpidist.xvg -ei sam.edi -eo sam.edo -j wham.gct -jo bam.gct -ffout gct.xvg -devout deviatie.xvg -runav runaver.xvg -px pullx.xvg -pf pullf.xvg -mtx nm.mtx -dn dipole.ndx -bo bench.trr -bx bench.xtc -bcpo bench.cpt -bc bench.gro -be bench.edr -bg bench.log -beo bench.edo -bdhdl benchdhdl.xvg -bfield benchfld.xvg -btpi benchtpi.xvg -btpid benchtpid.xvg -bjo bench.gct -bffout benchgct.xvg -bdevout benchdev.xvg -brunav benchrnav.xvg -bpx benchpx.xvg -bpf benchpf.xvg -bmtx benchn.mtx -bdn bench.ndx -[no]h -[no]version -nice int -xvg enum -np int -npstring enum -nt int -r int -max real -min real -npme enum -fix int -upfac real -downfac real -ntpr int -four real -steps step -resetstep int -simsteps step -[no]launch -deffnm string -ddorder enum -[no]ddcheck -rdd real -rcon real -dlb enum -dds real -gcom int -[no]v -[no]compact -[no]seppot -pforce real -[no]reprod -cpt real -[no]cpnum -[no]append -maxh real -multi int -replex int -reseed int -[no]ionize

DESCRIPTION

For a given number -np or -nt of processors/threads, this program systematically times mdrun with various numbers of PME-only nodes and determines which setting is fastest. It will also test whether performance can be enhanced by shifting load from the reciprocal to the real space part of the Ewald sum. Simply pass your .tpr file to g_tune_pme together with other options for mdrun as needed.

Which executables are used can be set in the environment variables MPIRUN and MDRUN. If these are not present, 'mpirun' and 'mdrun' will be used as defaults. Note that for certain MPI frameworks you need to provide a machine- or hostfile. This can also be passed via the MPIRUN variable, e.g.

export MPIRUN="/usr/local/mpirun -machinefile hosts"

Please call g_tune_pme with the normal options you would pass to mdrun and add -np for the number of processors to perform the tests on, or -nt for the number of threads. You can also add -r to repeat each test several times to get better statistics.

g_tune_pme can test various real space / reciprocal space workloads for you. With -ntpr you control how many extra .tpr files will be written with enlarged cutoffs and smaller fourier grids respectively. Typically, the first test (number 0) will be with the settings from the input .tpr file; the last test (number ntpr) will have cutoffs multiplied by (and at the same time fourier grid dimensions divided by) the scaling factor -fac (default 1.2). The remaining .tpr files will have about equally-spaced values in between these extremes. Note that you can set -ntpr to 1 if you just want to find the optimal number of PME-only nodes; in that case your input .tpr file will remain unchanged.

For the benchmark runs, the default of 1000 time steps should suffice for most MD systems. The dynamic load balancing needs about 100 time steps to adapt to local load imbalances, therefore the time step counters are by default reset after 100 steps. For large systems (1M atoms) you may have to set -resetstep to a higher value. From the 'DD' load imbalance entries in the md.log output file you can tell after how many steps the load is sufficiently balanced. Example call:

g_tune_pme -np 64 -s protein.tpr -launch

After calling mdrun several times, detailed performance information is available in the output file perf.out. Note that during the benchmarks, a couple of temporary files are written (options -b*), these will be automatically deleted after each test.

If you want the simulation to be started automatically with the optimized parameters, use the command line option -launch.

FILES

-p perf.out Output
 Generic output file 

-err errors.log Output
 Log file 

-so tuned.tpr Output
 Run input file: tpr tpb tpa 

-s topol.tpr Input
 Run input file: tpr tpb tpa 

-o traj.trr Output
 Full precision trajectory: trr trj cpt 

-x traj.xtc Output, Opt.
 Compressed trajectory (portable xdr format) 

-cpi state.cpt Input, Opt.
 Checkpoint file 

-cpo state.cpt Output, Opt.
 Checkpoint file 

-c confout.gro Output
 Structure file: gro g96 pdb etc. 

-e ener.edr Output
 Energy file 

-g md.log Output
 Log file 

-dhdl dhdl.xvg Output, Opt.
 xvgr/xmgr file 

-field field.xvg Output, Opt.
 xvgr/xmgr file 

-table table.xvg Input, Opt.
 xvgr/xmgr file 

-tablep tablep.xvg Input, Opt.
 xvgr/xmgr file 

-tableb table.xvg Input, Opt.
 xvgr/xmgr file 

-rerun rerun.xtc Input, Opt.
 Trajectory: xtc trr trj gro g96 pdb cpt 

-tpi tpi.xvg Output, Opt.
 xvgr/xmgr file 

-tpid tpidist.xvg Output, Opt.
 xvgr/xmgr file 

-ei sam.edi Input, Opt.
 ED sampling input 

-eo sam.edo Output, Opt.
 ED sampling output 

-j wham.gct Input, Opt.
 General coupling stuff 

-jo bam.gct Output, Opt.
 General coupling stuff 

-ffout gct.xvg Output, Opt.
 xvgr/xmgr file 

-devout deviatie.xvg Output, Opt.
 xvgr/xmgr file 

-runav runaver.xvg Output, Opt.
 xvgr/xmgr file 

-px pullx.xvg Output, Opt.
 xvgr/xmgr file 

-pf pullf.xvg Output, Opt.
 xvgr/xmgr file 

-mtx nm.mtx Output, Opt.
 Hessian matrix 

-dn dipole.ndx Output, Opt.
 Index file 

-bo bench.trr Output
 Full precision trajectory: trr trj cpt 

-bx bench.xtc Output
 Compressed trajectory (portable xdr format) 

-bcpo bench.cpt Output
 Checkpoint file 

-bc bench.gro Output
 Structure file: gro g96 pdb etc. 

-be bench.edr Output
 Energy file 

-bg bench.log Output
 Log file 

-beo bench.edo Output, Opt.
 ED sampling output 

-bdhdl benchdhdl.xvg Output, Opt.
 xvgr/xmgr file 

-bfield benchfld.xvg Output, Opt.
 xvgr/xmgr file 

-btpi benchtpi.xvg Output, Opt.
 xvgr/xmgr file 

-btpid benchtpid.xvg Output, Opt.
 xvgr/xmgr file 

-bjo bench.gct Output, Opt.
 General coupling stuff 

-bffout benchgct.xvg Output, Opt.
 xvgr/xmgr file 

-bdevout benchdev.xvg Output, Opt.
 xvgr/xmgr file 

-brunav benchrnav.xvg Output, Opt.
 xvgr/xmgr file 

-bpx benchpx.xvg Output, Opt.
 xvgr/xmgr file 

-bpf benchpf.xvg Output, Opt.
 xvgr/xmgr file 

-bmtx benchn.mtx Output, Opt.
 Hessian matrix 

-bdn bench.ndx Output, Opt.
 Index file 

OTHER OPTIONS

-[no]hno
 Print help info and quit

-[no]versionno
 Print version info and quit

-nice int 0
 Set the nicelevel

-xvg enum xmgrace
 xvg plot formatting:  xmgrace xmgr or  none

-np int 1
 Number of nodes to run the tests on (must be  2 for separate PME nodes)

-npstring enum -np
 Specify the number of processors to  $MPIRUN using this string:  -np -n or  none

-nt int 1
 Number of threads to run the tests on (turns MPI & mpirun off)

-r int 2
 Repeat each test this often

-max real 0.5
 Max fraction of PME nodes to test with

-min real 0.25
 Min fraction of PME nodes to test with

-npme enum auto
 Benchmark all possible values for  -npme or just the subset that is expected to perform well:  auto all or  subset

-fix int -2
 If = -1, do not vary the number of PME-only nodes, instead use this fixed value and only vary rcoulomb and the PME grid spacing.

-upfac real 1.2
 Upper limit for rcoulomb scaling factor (Note that rcoulomb upscaling results in fourier grid downscaling)

-downfac real 1
 Lower limit for rcoulomb scaling factor

-ntpr int 0
 Number of  .tpr files to benchmark. Create this many files with scaling factors ranging from 1.0 to fac. If  1, automatically choose the number of  .tpr files to test

-four real 0
 Use this fourierspacing value instead of the grid found in the  .tpr input file. (Spacing applies to a scaling factor of 1.0 if multiple  .tpr files are written)

-steps step 1000
 Take timings for this many steps in the benchmark runs

-resetstep int 100
 Let dlb equilibrate this many steps before timings are taken (reset cycle counters after this many steps)

-simsteps step -1
 If non-negative, perform this many steps in the real run (overwrites nsteps from  .tpr, add  .cpt steps)

-[no]launchno
 Launch the real simulation after optimization

-deffnm string
 Set the default filename for all file options at launch time

-ddorder enum interleave
 DD node order:  interleave pp_pme or  cartesian

-[no]ddcheckyes
 Check for all bonded interactions with DD

-rdd real 0
 The maximum distance for bonded interactions with DD (nm), 0 is determine from initial coordinates

-rcon real 0
 Maximum distance for P-LINCS (nm), 0 is estimate

-dlb enum auto
 Dynamic load balancing (with DD):  auto no or  yes

-dds real 0.8
 Minimum allowed dlb scaling of the DD cell size

-gcom int -1
 Global communication frequency

-[no]vno
 Be loud and noisy

-[no]compactyes
 Write a compact log file

-[no]seppotno
 Write separate V and dVdl terms for each interaction type and node to the log file(s)

-pforce real -1
 Print all forces larger than this (kJ/mol nm)

-[no]reprodno
 Try to avoid optimizations that affect binary reproducibility

-cpt real 15
 Checkpoint interval (minutes)

-[no]cpnumno
 Keep and number checkpoint files

-[no]appendyes
 Append to previous output files when continuing from checkpoint instead of adding the simulation part number to all file names (for launch only)

-maxh real -1
 Terminate after 0.99 times this time (hours)

-multi int 0
 Do multiple simulations in parallel

-replex int 0
 Attempt replica exchange every  steps

-reseed int -1
 Seed for replica exchange, -1 is generate a seed

-[no]ionizeno
 Do a simulation including the effect of an X-ray bombardment on your system