The University of Queensland Homepage
UQ VisLab UQ VisLab

 Understanding the GROMACS workflow

Understanding the GROMACS workflow

Edan Scriven

GROMACS is a suite of programs, each of which is responsible for a certain stage of the preparation of a system for simulation (and the program that actually performs the simulation). However, the power and flexibility of GROMACS comes at the price of ease-of-use. The following simple guide shows the steps necessary to run a GROMACS simulation, and gives a broad overview of what each of the component programs does.

A (currently non-functional) workflow of the GROMACS system is available for Kepler, and can be downloaded here. Kepler workflows are .xml files, and you may need to right-click and select Save As... to download the file. See the project page on exploring the potential of Kepler for managing workflows for more information.

All of the commands in the GROMACS suite have an online help system. Type command -h to access this help.

The steps to creating a GROMACS simulation

GROMACS accepts Protein DataBank (http://www.pdb.org) .pdb files, and can convert them to the GROMACS native format, with the use of the pdb2gmx command. pdb2gmx takes the .pdb file, and outputs two files. The x-y-z co-ordinates of the atoms are stored in a .gro file, and the atomic masses, charges, and bonds are stored in a .top file.

Once theses files have been created, it is necessary to specify the “box” in which the atoms will reside. This is done by running the editconf command. editconf will take the .gro file, and the dimensions of the box which you specify, and append the box dimensions on the last line of the .gro file.

Now that the box dimensions have been specified, it is possible to add more that one of the molecules found in the .pdb file, and to fill the rest of the box with solvent molecules. Both of these operations are handled by the genbox command. genbox will output a new .gro file that contains all of the atoms that have been added in this step. genbox will also output a new .top file, which includes the topology data for the solvent molecules.

In some circumstances, it may be necessary to introduce ions into the solution, either to neutralise the total charge in the system, or to “screen out” the electrostatic interactions between the solute molecule and its periodic images, or both.

[Note on periodic images: GROMACS by default uses periodic boundary conditions, which simply means that if an atom leaves the box on one face, it wraps around to appear on the opposite face. The forces between atoms are also calculated “across the periodic boundary”, meaning that the box is stacked beside and on top of itself, and thus an atom may interact with its copy in the neighbouring box. These neighbouring boxes and the atoms they contain are called periodic images.]

Ions are added to the system by the use of the genion utility. However, genion does not work directly on .gro files. You must first translate the text-based .gro file into a binary file with the use of the grompp utility (grompp is short for GROMACS pre-processor). However, genion will output another .gro file and another .top file, to reflect the new atoms introduced (the ions). Ions are placed where solvent molecules used to be (and are removed by genion.)

Once the ions are in place, it is time to start running simulations. There are three types of simulation, although the first two are optional. All three types are executed the same way – by the use of first grompp and then mdrun. mdrun is the main computational engine of GROMACS, and is the program actually responsible for shifting the atoms around according to the laws of physics (as specified in the molecular dynamics parameter file (.mdp) grompp takes the .gro, .top and .mdp files, and produces a .tpr file, which is the input to mdrun. mdrun outputs a large binary file containing the state of the system at regular time intervals (.trr) and also outputs a .gro file which contains the state of the system at the last time step. (This .gro file can be used to continue the simulation later.)

The first simulation is the “energy minimisation” simulation. All it does is nudge the atoms in the solute molecules only, until the bond lengths and angles are in their minimum potential energy configuration (completely ignoring the other atoms in the system). Doing this is a good way to check that the bonds are stable (the molecule will explode if something's wrong), and speeds up the full simulation run later on, since the solute's positions are already optimised.

The second simulation is the “position restraints” simulation. This ones fixes the solute molecules in place, and allows the solvent and ions to “relax” into their minimum potential energy positions. This allows the solvent to fill the box more evenly (not doing this step can result in pockets of vacuum at the start of your full simulation). Again, the primary function of this step is to optimise the initial conditions for the full simulation run.

The third simulation is the “full” simulation, and is the one in which the full molecular dynamics is calculated. This is the simulation which will take a lot of time (the optional simulations take hardly any time in comparison). When running GROMACS on a high-performance computer, this is the step which will be parallelised. For more information on running GROMACS in parallel, see the project page on parallel GROMACS.