**Exploration of the
capabilities of gridMathematica on the Altix ia64 HPC machines**

Mathematica (http://www.wolfram.com) is a powerful and flexible problem-solving environment targeted to the physical sciences. This document explores the capabilities of an add-on called gridMathematica (aka the Parallel Computing Toolkit).

Mathematica separates its functional componenets into two modules,
a front end that manages the user interface, and a kernel that
performs all the computations in a notebook. The kernel may run on a
different machine to the front end (a *remote kernel*), and this
functionality has been extended to allow one notebook to be executed
by several kernels (usually one per processor). This functionality is
known as gridMathematica, and allows certain Mathematica computations
to be performed in parallel. Fundamentally, it is our experience that
this paralleisation applies only to list operations (e.g. linear
algebra).

That is to say, there are some significant limitations on exactly
what operations can be performed in parallel. First of all, functions
like **Solve[ ]**, or **FindRoot[ ]**, or pretty much any other
function in Mathematica are not actually parallelised. That is to
say, sending a **Solve[ ]** command to a Mathematica session with
two kernels won't make **Solve[ ]** run twice as fast. However,
you could run **Solve[ ]** on each kernel, perhaps with different
arguments, and have them run simultaneously. Operations on lists,
vectors, matrices, etc can be parallelised, by a process known as
*domain decomposition*.

Here is a Mathematica .nb file that demonstrates a simple routine to perform a matrix multiplication in parallel, using both gridMathematica's automatic domain decomposition, and a hand-coded domain decomposition routine.

The following graph shows the time taken to multiply two matrices, as a function of the number of available slave kernels. As you can see, performing the operation on two kernels produced the fastest computation. The degradation of performance for more than two kernels in this case is most likely due to the significant communication overhead inherent in all but the most trivial gridMathematica operations. The procedure involved the (faster) manual domain decomposition algorithm.

Figure: execution time for a matrix multiplication in gridMathematica

Unfortunately, this communication overhead becomes a bigger
problem the more complex the operation. Only embarrassingly parallel
procedures (i.e. procedures that do not require the slave kernels to
share data with each other) can be parallelised with any significant
improvement in performance.