openmp matlab,[转载]Parallel MATLAB with openmp mex files

Slowly but surely more and more MATLAB

functions are becoming able to take advantage of multi-core

processors. For example, in MATLAB 2009b,

functions such as sort, bsxfun,

filter and erf (among others)

gained the ability to spread the calculational load across several

processor cores. This is good news because if

your code uses these functions, and if you have a multi-core

processor, then you will get faster execution times without having

to modify your program. This

kind of parallelism is called implicit

parallelism because it doesn’t require any special commands in

order to take advantage of multiple cores – MATLAB just does it

automagically. Faster code for free!

For many people though, this isn’t enough and they find

themselves needing to use explicit

parallelism where it becomes the programmer’s responsibility to

tell MATLAB exactly how to spread the work between the multiple

processing cores available. The standard way of

doing this in MATLAB is to use the Parallel

Computing Toolbox but, unlike packages such as Mathematica and Maple, this functionality is going

to cost you extra.

There is another way though…

I’ve recently been working with David Szotten, a

postgraduate student at Manchester University, on writing parallel

mex files for MATLAB using openmp and C. Granted, it’s not as easy as using the Parallel Computing Toolbox

but it does work and the results can be blisteringly fast since we

are working in C. It’s also not quite as

difficult as we originally thought it might be.

There is one golden rule you need to observe:

Never, ever call any of the mex api functions inside the

portion of your code that you are trying to parallelise using

openmp. Don’t try to interact with MATLAB at all

during the parallel portion of your code.

The reason for the golden rule is because, at the time of

writing at least (MATLAB 2009b), all mex api functions are not thread-safe and that includes the

printf function since printf is

defined to be mexPrintf in the

mex.h header file. Stick to the

golden rule and you’ll be fine. Move away from

the golden rule,however, and you’ll find

dragons. Trust me on this!

Let’s start with an example and find the sum of foo(x) where foo

is a moderately complicated function and x is a large number of

values. We have implemented such an example in

the mex function mex_sum_openmp.

I assume you are running MATLAB 2009b on 32bit Ubuntu Linux

version 9.04 – just like me. If your setup

differs from this then you may need to modify the following

instructions accordingly.

Before you start. Close all currently running

instances of MATLAB.

Download and unzip the file mex_sum_openmp.zip to give you

mex_sum_openmp.c

Open up a bash terminal and enter export

OMP_NUM_THREADS=2 (replace 2 with however many cores your

machine has)

start MATLAB from this terminal by entering

matlab.

Inside MATLAB, navigate to the directory containing

mex_sum_openmp.c

Inside MATLAB, enter the following to compile the .c file

mex mex_sum_openmp.c CFLAGS="$CFLAGS -fopenmp" LDFLAGS="$LDFLAGS -fopenmp"

Inside MATLAB, test the code by entering

x=rand(1,9999999); %create an array of random x values

mex_sum_openmp(x) �lculate the sum of foo(x(i))

If all goes well, this calculation will be done in parallel and

you will be rewarded with a single number. Time

the calculation as follows.

tic;mex_sum_openmp(x);toc

My dual core laptop did this in about 0.6 seconds on

average. To see how much faster this is compared

to single core mode just repeat all of the steps above but start

off with export OMP_NUM_THREADS=1 (You won’t need

to recompile the mex function). On doing this, my

laptop did it in 1.17 seconds and so the 2 core mode is almost

exactly 2 times faster.

Let’s take a look at the code.

#include "mex.h"

#include

#include

double spawn_threads(double x[],int cols)

{

double sum=0;

int count;

#pragma omp parallel for shared(x, cols) private(count) reduction(+: sum)

for(count = 0;count

{

sum += sin(x[count]*x[count]*x[count])/cos(x[count]+1.0);;

}

return sum;

}

void mexFunction(int nlhs,mxArray* plhs[], int nrhs, const mxArray* prhs[])

{

double sum;

if(nrhs!=1) {

mexErrMsgTxt("One input required.");

} else if(nlhs>1) {

mexErrMsgTxt("Too many output arguments.");

}

int rows = mxGetM(prhs[0]);

int cols = mxGetN(prhs[0]);

if(rows>1){

mexErrMsgTxt("Input data has too many Rows. Just the one please");

}

double* x = mxGetPr(prhs[0]);

sum = spawn_threads(x,cols);

plhs[0] = mxCreateDoubleMatrix(1,1, mxREAL);

double* y = mxGetPr(plhs[0]);

y[0]=sum;

}

I’m not going to go through this line by line as I assume that

you know how to program in openmp and also know how to write basic

mex files. The key point to notice is that I don’t have any

parallel code inside mexFunction() which serves as

MATLAB’s entry point into the C code. Likewise, I don’t have any

mex functions inside my spawn_threads() function –

the part that is actually parallelised using openmp.

That’s pretty much it – the golden rule in action. I hope you

find this example useful and thanks again to David

Szotten who helped me figure all of this out. Comments welcomed.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值