openmp 并行 速度更慢_使用OpenMP并行化C ++代码,计算实际上并行速度较慢

在尝试使用OpenMP并行化C++代码时,发现执行时间从6.956s增加到超过3分钟。问题出在数据竞争和潜在的嵌套并行性。为了解决这个问题,需要避免数据危害,考虑使用`#pragma omp task`代替`#pragma omp for`,并在递归调用中创建单个并行区域。更新后的代码示例中展示了如何使用任务并避免数据竞争,但需要注意,由于任务创建的开销,可能无法获得速度提升,建议重新设计为迭代版本再进行并行化。
摘要由CSDN通过智能技术生成

I have the following code that I want to parallelize:

int ncip( int dim, double R)

{

int i;

int r = (int)floor(R);

if (dim == 1)

{

return 1 + 2*r;

}

int n = ncip(dim-1, R); // last coord 0

#pragma omp parallel for

for(i=1; i<=r; ++i)

{

n += 2*ncip(dim-1, sqrt(R*R - i*i) ); // last coord +- i

}

return n;

}

The program execution time when ran without openmp is 6.956s when I try and parallelize the for loop my execution time is greater than 3 minutes (and that's because I ended it myself). What am I doing wrong in regards to parallelizing this code ?

second attempt

int ncip( int dim, double R)

{

int i;

int r = (int)floor( R);

if ( dim == 1)

{ return 1 + 2*r;

}

#pragma omp parallel

{

int n = ncip( dim-1, R); // last coord 0

#pragma omp for reduction (+:n)

for( i=1; i<=r; ++i)

{

n += 2*ncip( dim-1, sqrt( R*R - i*i) ); // last coord +- i

}

}

return n;

}

解决方案

You are doing that wrong!

(1) There are data races in variable n. If you want to parallelize a code that have writes in the same memory zone, you must use the reduction (in the for), atomic or critical to avoid data hazards.

(2) Probably you have the nested parallelism enabled, so the program is creating a new parallel zone every time you call the function ncip. Should be this the main problem. For recursive functions I advise you to create just one parallel zone and then use the pragma omp task.

Do not parallelize with #pragma omp for and try with the #pragma omp task. Look this example:

int ncip(int dim, double R){

...

#pragma omp task

ncip(XX, XX);

#pragma omp taskwait

...

}

int main(int argc, char *argv[]) {

#pragma omp parallel

{

#pragma omp single

ncip(XX, XX);

}

return(0);

}

UPDATE:

//Detailed version (without omp for and data races)

int ncip(int dim, double R){

int n, r = (int)floor(R);

if (dim == 1) return 1 + 2*r;

n = ncip(dim-1, R); // last coord 0

for(int i=1; i<=r; ++i){

#pragma omp task

{

int aux = 2*ncip(dim-1, sqrt(R*R - i*i) ); // last coord +- i

#pragma omp atomic

n += aux;

}

}

#pragma omp taskwait

return n;

}

PS: You'll not get a speedup from this, because overhead to creat a task is bigger than the work of a single task. The best thing you can do is re-write this algorithm to an iterative version, and then try to parallelize it.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值