Exploration of the capabilities of gridMathematica on the Altix ia64 HPC machines
Mathematica (http://www.wolfram.com) is a powerful and flexible problem-solving environment targeted to the physical sciences. This document explores the capabilities of an add-on called gridMathematica (aka the Parallel Computing Toolkit).
Mathematica separates its functional componenets into two modules, a front end that manages the user interface, and a kernel that performs all the computations in a notebook. The kernel may run on a different machine to the front end (a remote kernel), and this functionality has been extended to allow one notebook to be executed by several kernels (usually one per processor). This functionality is known as gridMathematica, and allows certain Mathematica computations to be performed in parallel. Fundamentally, it is our experience that this paralleisation applies only to list operations (e.g. linear algebra).
That is to say, there are some significant limitations on exactly what operations can be performed in parallel. First of all, functions like Solve[ ], or FindRoot[ ], or pretty much any other function in Mathematica are not actually parallelised. That is to say, sending a Solve[ ] command to a Mathematica session with two kernels won't make Solve[ ] run twice as fast. However, you could run Solve[ ] on each kernel, perhaps with different arguments, and have them run simultaneously. Operations on lists, vectors, matrices, etc can be parallelised, by a process known as domain decomposition.
Here is a Mathematica .nb file that demonstrates a simple routine to perform a matrix multiplication in parallel, using both gridMathematica's automatic domain decomposition, and a hand-coded domain decomposition routine.
The following graph shows the time taken to multiply two matrices, as a function of the number of available slave kernels. As you can see, performing the operation on two kernels produced the fastest computation. The degradation of performance for more than two kernels in this case is most likely due to the significant communication overhead inherent in all but the most trivial gridMathematica operations. The procedure involved the (faster) manual domain decomposition algorithm.
Figure: execution time for a matrix multiplication in gridMathematica
Unfortunately, this communication overhead becomes a bigger
problem the more complex the operation. Only embarrassingly parallel
procedures (i.e. procedures that do not require the slave kernels to
share data with each other) can be parallelised with any significant
improvement in performance.