Exploration of the capabilities of Kepler for workflow representation
Edan Scriven
Kepler (http://www.kepler-project.org) is a program for visual workflow design and implementation. A workflow for GROMACS was developed in Kepler, to explore the potential for the software to describe and manage the GROMACS data flow process.
Kepler is amazingly powerful, and can control the flow of data from various sources (e.g. files, databases, files on remote filesystems, etc.) and directly manipulate that data internally. However, the most recent version (Alpha 8) seemed to have difficulty executing external commands. Unfortunately, this is essential to the implementation of the GROMACS workflow (shown below) in Kepler. If this were working, it would be possible to set up and run GROMACS from within Kepler, drastically reducing the complexity involved in designing and managing GROMACS simulations, compared to the command-line interface. The workflow could be further extended to allow for the transfer of data and execution of commands on remote machines (particularly high-performance machines).
It is also possible to further extend the functionality of Kepler, by creating custom “actors” to perform arbitrary tasks. This requires a sound knowledge of Java, as Kepler is implemented completely in Java. The file containing the workflow is an .xml file, and can be manipulated or visualised using any suitably configured xml renderer.
The basic functional component of the GROMACS workflow in Kepler requires an actor to execute an external program, and a string variable to pass in the arguments to the command.

Figure
1: The process needed to run the external program pdb2gmx with
arguments.
The arguments start with the string “pdb2gmx
parameters”, which are substituted with strings from the three
input ports.
This entire functional unit can be collected into a single actor, called a hierarchy (the actor with a red border below). This turns the whole process into a “black box” module, consisting only of the inputs and outputs.

Figure
2: The pdb2gmx hierarchial actor, with inputs shown. The
output is a text box that displays the pdb2gmx usual output to
a console.
Each of the hierarchical actors encapsulates the string manipulations necessary to set up the arguments to each of the GROMACS commands, and to run those commands. The parameters on the left are string literals, and can be edited to suit the specific application. (This is the workflow for the lac21 molecule.) These strings are passed in to the execution actors for substitution into the command line arguments. The flow of execution is from the top down. Each of the “output” boxes is a text window that displays what each command would normally display in a terminal window, and is not essential to the workflow. The very first input is “extended_lac21.pdb”, and the final output is “md.trr” (not shown).
[Note: The lac21 molecule is a fragment of the lac repressor protein. The .pdb file was obtained from Adam Fairley's Mathematics Honours research thesis, titled “Molecular Dynamics Simulations of the Self-Assembly of the Lac21 and Lac28 peptides for Bio-Nano Applications”, in partnership with Prof. Anton Middelberg of the Australian Institute for Bioengineering & Nanotechnology. An electronic copy of this thesis is available for download here.]
The .pdb file mentioned in the workflow can be downloaded here. The entire GROMACS workflow can be downloaded here. You may need to right-click and select Save As... to download the file.

Figure
3: The entire GROMACS workflow. The order in which each hierarchical
actor executes is from top to bottom.
