openmp 并行速度更慢_使用OpenMP并行化C ++代码，计算实际上并行速度较慢

最新推荐文章于 2024-04-02 17:14:39 发布

weixin_39954569

最新推荐文章于 2024-04-02 17:14:39 发布

阅读量403

点赞数

文章标签： openmp 并行速度更慢

本文链接：https://blog.csdn.net/weixin_39954569/article/details/111725006

版权

在尝试使用OpenMP并行化C++代码时，发现执行时间从6.956s增加到超过3分钟。问题出在数据竞争和潜在的嵌套并行性。为了解决这个问题，需要避免数据危害，考虑使用`#pragma omp task`代替`#pragma omp for`，并在递归调用中创建单个并行区域。更新后的代码示例中展示了如何使用任务并避免数据竞争，但需要注意，由于任务创建的开销，可能无法获得速度提升，建议重新设计为迭代版本再进行并行化。

摘要由CSDN通过智能技术生成

I have the following code that I want to parallelize:

int ncip( int dim, double R)

{

int i;

int r = (int)floor(R);

if (dim == 1)

{

return 1 + 2*r;

}

int n = ncip(dim-1, R); // last coord 0

#pragma omp parallel for

for(i=1; i<=r; ++i)

{

n += 2*ncip(dim-1, sqrt(R*R - i*i) ); // last coord +- i

}

return n;

}

The program execution time when ran without openmp is 6.956s when I try and parallelize the for loop my execution time is greater than 3 minutes (and that's because I ended it myself). What am I doing wrong in regards to parallelizing this code ?

second attempt

int ncip( int dim, double R)

{

int i;

int r = (int)floor( R);

if ( dim == 1)

{ return 1 + 2*r;

}

#pragma omp parallel

{

int n = ncip( dim-1, R); // last coord 0

#pragma omp for reduction (+:n)

for( i=1; i<=r; ++i)

{

n += 2*ncip( dim-1, sqrt( R*R - i*i) ); // last coord +- i

}

return n;

}

解决方案

You are doing that wrong!

(1) There are data races in variable n. If you want to parallelize a code that have writes in the same memory zone, you must use the reduction (in the for), atomic or critical to avoid data hazards.

(2) Probably you have the nested parallelism enabled, so the program is creating a new parallel zone every time you call the function ncip. Should be this the main problem. For recursive functions I advise you to create just one parallel zone and then use the pragma omp task.

Do not parallelize with #pragma omp for and try with the #pragma omp task. Look this example:

int ncip(int dim, double R){

...

#pragma omp task

ncip(XX, XX);

#pragma omp taskwait

...

}

int main(int argc, char *argv[]) {

#pragma omp parallel

{

#pragma omp single

ncip(XX, XX);

}

return(0);

}

UPDATE:

//Detailed version (without omp for and data races)

int ncip(int dim, double R){

int n, r = (int)floor(R);

if (dim == 1) return 1 + 2*r;

n = ncip(dim-1, R); // last coord 0

for(int i=1; i<=r; ++i){

#pragma omp task

{

int aux = 2*ncip(dim-1, sqrt(R*R - i*i) ); // last coord +- i

#pragma omp atomic

n += aux;

}

#pragma omp taskwait

return n;

}

PS: You'll not get a speedup from this, because overhead to creat a task is bigger than the work of a single task. The best thing you can do is re-write this algorithm to an iterative version, and then try to parallelize it.