Slowly but surely more and more MATLAB
functions are becoming able to take advantage of multi-core
processors. For example, in MATLAB 2009b,
functions such as sort, bsxfun,
filter and erf (among others)
gained the ability to spread the calculational load across several
processor cores. This is good news because if
your code uses these functions, and if you have a multi-core
processor, then you will get faster execution times without having
to modify your program. This
kind of parallelism is called implicit
parallelism because it doesn’t require any special commands in
order to take advantage of multiple cores – MATLAB just does it
automagically. Faster code for free!
For many people though, this isn’t enough and they find
themselves needing to use explicit
parallelism where it becomes the programmer’s responsibility to
tell MATLAB exactly how to spread the work between the multiple
processing cores available. The standard way of
doing this in MATLAB is to use the Parallel
Computing Toolbox but, unlike packages such as Mathematica and Maple, this functionality is going
to cost you extra.
There is another way though…
I’ve recently been working with David Szotten, a
postgraduate student at Manchester University, on writing parallel
mex files for MATLAB using openmp and C. Granted, it’s not as easy as using the Parallel Computing Toolbox
but it does work and the results can be blisteringly fast since we
are working in C. It’s also not quite as
difficult as we originally thought it might be.
There is one golden rule you need to observe:
Never, ever call any of the mex api functions inside the
portion of your code that you are trying to parallelise using
openmp. Don’t try to interact with MATLAB at all
during the parallel portion of your code.
The reason for the golden rule is because, at the time of
writing at least (MATLAB 2009b), all mex api functions are not thread-safe and that includes the
printf function since printf is
defined to be mexPrintf in the
mex.h header file. Stick to the
golden rule and you’ll be fine. Move away from
the golden rule,however, and you’ll find
dragons. Trust me on this!
Let’s start with an example and find the sum of foo(x) where foo
is a moderately complicated function and x is a large number of
values. We have implemented such an example in
the mex function mex_sum_openmp.
I assume you are running MATLAB 2009b on 32bit Ubuntu Linux
version 9.04 – just like me. If your setup
differs from this then you may need to modify the following
instructions accordingly.
Before you start. Close all currently running
instances of MATLAB.
Download and unzip the file mex_sum_openmp.zip to give you
mex_sum_openmp.c
Open up a bash terminal and enter export
OMP_NUM_THREADS=2 (replace 2 with however many cores your
machine has)
start MATLAB from this terminal by entering
matlab.
Inside MATLAB, navigate to the directory containing
mex_sum_openmp.c
Inside MATLAB, enter the following to compile the .c file
mex mex_sum_openmp.c CFLAGS="$CFLAGS -fopenmp" LDFLAGS="$LDFLAGS -fopenmp"
Inside MATLAB, test the code by entering
x=rand(1,9999999); %create an array of random x values
mex_sum_openmp(x) �lculate the sum of foo(x(i))
If all goes well, this calculation will be done in parallel and
you will be rewarded with a single number. Time
the calculation as follows.
tic;mex_sum_openmp(x);toc
My dual core laptop did this in about 0.6 seconds on
average. To see how much faster this is compared
to single core mode just repeat all of the steps above but start
off with export OMP_NUM_THREADS=1 (You won’t need
to recompile the mex function). On doing this, my
laptop did it in 1.17 seconds and so the 2 core mode is almost
exactly 2 times faster.
Let’s take a look at the code.
#include "mex.h"
#include
#include
double spawn_threads(double x[],int cols)
{
double sum=0;
int count;
#pragma omp parallel for shared(x, cols) private(count) reduction(+: sum)
for(count = 0;count
{
sum += sin(x[count]*x[count]*x[count])/cos(x[count]+1.0);;
}
return sum;
}
void mexFunction(int nlhs,mxArray* plhs[], int nrhs, const mxArray* prhs[])
{
double sum;
if(nrhs!=1) {
mexErrMsgTxt("One input required.");
} else if(nlhs>1) {
mexErrMsgTxt("Too many output arguments.");
}
int rows = mxGetM(prhs[0]);
int cols = mxGetN(prhs[0]);
if(rows>1){
mexErrMsgTxt("Input data has too many Rows. Just the one please");
}
double* x = mxGetPr(prhs[0]);
sum = spawn_threads(x,cols);
plhs[0] = mxCreateDoubleMatrix(1,1, mxREAL);
double* y = mxGetPr(plhs[0]);
y[0]=sum;
}
I’m not going to go through this line by line as I assume that
you know how to program in openmp and also know how to write basic
mex files. The key point to notice is that I don’t have any
parallel code inside mexFunction() which serves as
MATLAB’s entry point into the C code. Likewise, I don’t have any
mex functions inside my spawn_threads() function –
the part that is actually parallelised using openmp.
That’s pretty much it – the golden rule in action. I hope you
find this example useful and thanks again to David
Szotten who helped me figure all of this out. Comments welcomed.