SYNOPSIS
g_tune_pme -p perf.out -err errors.log -so tuned.tpr -s topol.tpr -o traj.trr -x traj.xtc -cpi state.cpt -cpo state.cpt -c confout.gro -e ener.edr -g md.log -dhdl dhdl.xvg -field field.xvg -table table.xvg -tablep tablep.xvg -tableb table.xvg -rerun rerun.xtc -tpi tpi.xvg -tpid tpidist.xvg -ei sam.edi -eo sam.edo -j wham.gct -jo bam.gct -ffout gct.xvg -devout deviatie.xvg -runav runaver.xvg -px pullx.xvg -pf pullf.xvg -mtx nm.mtx -dn dipole.ndx -bo bench.trr -bx bench.xtc -bcpo bench.cpt -bc bench.gro -be bench.edr -bg bench.log -beo bench.edo -bdhdl benchdhdl.xvg -bfield benchfld.xvg -btpi benchtpi.xvg -btpid benchtpid.xvg -bjo bench.gct -bffout benchgct.xvg -bdevout benchdev.xvg -brunav benchrnav.xvg -bpx benchpx.xvg -bpf benchpf.xvg -bmtx benchn.mtx -bdn bench.ndx -[no]h -[no]version -nice int -xvg enum -np int -npstring enum -nt int -r int -max real -min real -npme enum -fix int -upfac real -downfac real -ntpr int -four real -steps step -resetstep int -simsteps step -[no]launch -deffnm string -ddorder enum -[no]ddcheck -rdd real -rcon real -dlb enum -dds real -gcom int -[no]v -[no]compact -[no]seppot -pforce real -[no]reprod -cpt real -[no]cpnum -[no]append -maxh real -multi int -replex int -reseed int -[no]ionizeDESCRIPTION
For a given number -np or -nt of processors/threads, this program systematically times mdrun with various numbers of PME-only nodes and determines which setting is fastest. It will also test whether performance can be enhanced by shifting load from the reciprocal to the real space part of the Ewald sum. Simply pass your .tpr file to g_tune_pme together with other options for mdrun as needed.
Which executables are used can be set in the environment variables MPIRUN and MDRUN. If these are not present, 'mpirun' and 'mdrun' will be used as defaults. Note that for certain MPI frameworks you need to provide a machine- or hostfile. This can also be passed via the MPIRUN variable, e.g.
export MPIRUN="/usr/local/mpirun -machinefile hosts"
Please call g_tune_pme with the normal options you would pass to mdrun and add -np for the number of processors to perform the tests on, or -nt for the number of threads. You can also add -r to repeat each test several times to get better statistics.
g_tune_pme can test various real space / reciprocal space workloads for you. With -ntpr you control how many extra .tpr files will be written with enlarged cutoffs and smaller fourier grids respectively. Typically, the first test (number 0) will be with the settings from the input .tpr file; the last test (number ntpr) will have cutoffs multiplied by (and at the same time fourier grid dimensions divided by) the scaling factor -fac (default 1.2). The remaining .tpr files will have about equally-spaced values in between these extremes. Note that you can set -ntpr to 1 if you just want to find the optimal number of PME-only nodes; in that case your input .tpr file will remain unchanged.
For the benchmark runs, the default of 1000 time steps should suffice for most MD systems. The dynamic load balancing needs about 100 time steps to adapt to local load imbalances, therefore the time step counters are by default reset after 100 steps. For large systems (1M atoms) you may have to set -resetstep to a higher value. From the 'DD' load imbalance entries in the md.log output file you can tell after how many steps the load is sufficiently balanced. Example call:
g_tune_pme -np 64 -s protein.tpr -launch
After calling mdrun several times, detailed performance information is available in the output file perf.out. Note that during the benchmarks, a couple of temporary files are written (options -b*), these will be automatically deleted after each test.
If you want the simulation to be started automatically with the optimized parameters, use the command line option -launch.
FILES
-p perf.out OutputGeneric output file
-err errors.log
Output
Log file
-so tuned.tpr
Output
Run input file: tpr tpb tpa
-s topol.tpr
Input
Run input file: tpr tpb tpa
-o traj.trr
Output
Full precision trajectory: trr trj cpt
-x traj.xtc
Output, Opt.
Compressed trajectory (portable xdr format)
-cpi state.cpt
Input, Opt.
Checkpoint file
-cpo state.cpt
Output, Opt.
Checkpoint file
-c confout.gro
Output
Structure file: gro g96 pdb etc.
-e ener.edr
Output
Energy file
-g md.log
Output
Log file
-dhdl dhdl.xvg
Output, Opt.
xvgr/xmgr file
-field field.xvg
Output, Opt.
xvgr/xmgr file
-table table.xvg
Input, Opt.
xvgr/xmgr file
-tablep tablep.xvg
Input, Opt.
xvgr/xmgr file
-tableb table.xvg
Input, Opt.
xvgr/xmgr file
-rerun rerun.xtc
Input, Opt.
Trajectory: xtc trr trj gro g96 pdb cpt
-tpi tpi.xvg
Output, Opt.
xvgr/xmgr file
-tpid tpidist.xvg
Output, Opt.
xvgr/xmgr file
-ei sam.edi
Input, Opt.
ED sampling input
-eo sam.edo
Output, Opt.
ED sampling output
-j wham.gct
Input, Opt.
General coupling stuff
-jo bam.gct
Output, Opt.
General coupling stuff
-ffout gct.xvg
Output, Opt.
xvgr/xmgr file
-devout deviatie.xvg
Output, Opt.
xvgr/xmgr file
-runav runaver.xvg
Output, Opt.
xvgr/xmgr file
-px pullx.xvg
Output, Opt.
xvgr/xmgr file
-pf pullf.xvg
Output, Opt.
xvgr/xmgr file
-mtx nm.mtx
Output, Opt.
Hessian matrix
-dn dipole.ndx
Output, Opt.
Index file
-bo bench.trr
Output
Full precision trajectory: trr trj cpt
-bx bench.xtc
Output
Compressed trajectory (portable xdr format)
-bcpo bench.cpt
Output
Checkpoint file
-bc bench.gro
Output
Structure file: gro g96 pdb etc.
-be bench.edr
Output
Energy file
-bg bench.log
Output
Log file
-beo bench.edo
Output, Opt.
ED sampling output
-bdhdl benchdhdl.xvg
Output, Opt.
xvgr/xmgr file
-bfield benchfld.xvg
Output, Opt.
xvgr/xmgr file
-btpi benchtpi.xvg
Output, Opt.
xvgr/xmgr file
-btpid benchtpid.xvg
Output, Opt.
xvgr/xmgr file
-bjo bench.gct
Output, Opt.
General coupling stuff
-bffout benchgct.xvg
Output, Opt.
xvgr/xmgr file
-bdevout benchdev.xvg
Output, Opt.
xvgr/xmgr file
-brunav benchrnav.xvg
Output, Opt.
xvgr/xmgr file
-bpx benchpx.xvg
Output, Opt.
xvgr/xmgr file
-bpf benchpf.xvg
Output, Opt.
xvgr/xmgr file
-bmtx benchn.mtx
Output, Opt.
Hessian matrix
-bdn bench.ndx
Output, Opt.
Index file
OTHER OPTIONS
-[no]hnoPrint help info and quit
-[no]versionno
Print version info and quit
-nice int 0
Set the nicelevel
-xvg enum xmgrace
xvg plot formatting: xmgrace, xmgr or none
-np int 1
Number of nodes to run the tests on (must be 2 for separate PME nodes)
-npstring enum -np
Specify the number of processors to $MPIRUN using this string: -np, -n or none
-nt int 1
Number of threads to run the tests on (turns MPI & mpirun off)
-r int 2
Repeat each test this often
-max real 0.5
Max fraction of PME nodes to test with
-min real 0.25
Min fraction of PME nodes to test with
-npme enum auto
Benchmark all possible values for -npme or just the subset that is expected to perform well: auto, all or subset
-fix int -2
If = -1, do not vary the number of PME-only nodes, instead use this fixed value and only vary rcoulomb and the PME grid spacing.
-upfac real 1.2
Upper limit for rcoulomb scaling factor (Note that rcoulomb upscaling results in fourier grid downscaling)
-downfac real 1
Lower limit for rcoulomb scaling factor
-ntpr int 0
Number of .tpr files to benchmark. Create this many files with scaling factors ranging from 1.0 to fac. If 1, automatically choose the number of .tpr files to test
-four real 0
Use this fourierspacing value instead of the grid found in the .tpr input file. (Spacing applies to a scaling factor of 1.0 if multiple .tpr files are written)
-steps step 1000
Take timings for this many steps in the benchmark runs
-resetstep int 100
Let dlb equilibrate this many steps before timings are taken (reset cycle counters after this many steps)
-simsteps step -1
If non-negative, perform this many steps in the real run (overwrites nsteps from .tpr, add .cpt steps)
-[no]launchno
Launch the real simulation after optimization
-deffnm string
Set the default filename for all file options at launch time
-ddorder enum interleave
DD node order: interleave, pp_pme or cartesian
-[no]ddcheckyes
Check for all bonded interactions with DD
-rdd real 0
The maximum distance for bonded interactions with DD (nm), 0 is determine from initial coordinates
-rcon real 0
Maximum distance for P-LINCS (nm), 0 is estimate
-dlb enum auto
Dynamic load balancing (with DD): auto, no or yes
-dds real 0.8
Minimum allowed dlb scaling of the DD cell size
-gcom int -1
Global communication frequency
-[no]vno
Be loud and noisy
-[no]compactyes
Write a compact log file
-[no]seppotno
Write separate V and dVdl terms for each interaction type and node to the log file(s)
-pforce real -1
Print all forces larger than this (kJ/mol nm)
-[no]reprodno
Try to avoid optimizations that affect binary reproducibility
-cpt real 15
Checkpoint interval (minutes)
-[no]cpnumno
Keep and number checkpoint files
-[no]appendyes
Append to previous output files when continuing from checkpoint instead of adding the simulation part number to all file names (for launch only)
-maxh real -1
Terminate after 0.99 times this time (hours)
-multi int 0
Do multiple simulations in parallel
-replex int 0
Attempt replica exchange every steps
-reseed int -1
Seed for replica exchange, -1 is generate a seed
-[no]ionizeno
Do a simulation including the effect of an X-ray bombardment on your system