openmp 并行速度更慢_OpenMP较慢的代码如何并行化？

最新推荐文章于 2024-04-21 16:24:11 发布

weixin_39731456

最新推荐文章于 2024-04-21 16:24:11 发布

阅读量164

点赞数

文章标签： openmp 并行速度更慢

本文链接：https://blog.csdn.net/weixin_39731456/article/details/111724997

版权

This code is slower with OpenMP. Without OpenMP I get about 10s. With OpenMP i get about 40s. What is happening? Thank you very much friends!

for (i=2;i

#pragma omp parallel for

for (j=2; j

C[i][j]= absi[i]*absj[j]*

(2.0f*B[i][j] + absi[i]*absj[j]*

(VEL[i][j]*VEL[i][j]*fat*

(16.0f*(B[i][j-1]+B[i][j+1]+B[i-1][j]+B[i+1][j])

-1.0f*(B[i][j-2]+B[i][j+2]+B[i-2][j]+B[i+2][j])

-60.0f*B[i][j]

)-A[i][j]));

c2 = (abs(C[i][j]) > Amax[i][j]);

if (c2) {

Amax[i][j] = abs(C[i][j]);

Ttra[i][j] = t;

}

解决方案

Just because you're using OpenMP doesn't mean your program will run faster. A couple of things can be happening here:

There is a cost associated to spawning each thread, and if you spawn a thread to do a small amount of computation, the spawning of the thread itself will take more time than the computation.

By default, OpenMP will spawn the maximum number of threads supported by your CPU. With CPU's that support 2 or more threads per core, the threads will be competing for each core's resources. Using omp_get_num_threads() you can see how many threads will be spawned by default. I recommend trying running your code with half that value using omp_set_num_threads().

Did you confirm the results were the same with and without OpenMP? It seems there is a dependency with the variables j and c2. You should declare them private to each thread:

#pragma omp parallel for private(j,c2)

I wanted to add another thing: before attempting any parallelization, you should make sure that the code is already optimized.

Depending on your compiler, compiler flags and the complexity of the instruction, the compiler may or may not optimize your code:

// avoid calculation nnoib-2 every iteration

int t_nnoib = nnoib - 2;

for (i=2; i< t_nnoib; ++i){

// avoid calculation nnojb-2 every iteration

int t_nnojb = nnojb - 2;

// avoid loading absi[i] every iteration

int t_absi = absi[i];

for (j=2; j< t_nnojb; ++j) {

C[i][j]= t_absi * absj[j] *

(2.0f*B[i][j] + t_absi * absj[j] *

(VEL[i][j] * VEL[i][j] * fat *

(16.0f * (B[i][j-1] + B[i][j+1] + B[i-1][j] + B[i+1][j])

-1.0f * (B[i][j-2] + B[i][j+2] + B[i-2][j] + B[i+2][j])

-60.0f * B[i][j]

) - A[i][j]));

// c2 is a useless variable

if (abs(C[i][j]) > Amax[i][j]) {

Amax[i][j] = abs(C[i][j]);

Ttra[i][j] = t;

}

It may not seem much, but it can have a huge impact on your code. The compiler will try to place local variables in registers (which have a much faster access time). Keep in mind that you cant apply this technique indefinitely since you have an limited number of registers, and abusing this will cause your code to suffer from register spilling.

In the case of the array absi, you'll avoid having the system keeping a piece of that array in cache during the execution of the j loop. The general idea of this technique is to move to the outer loop any array access that doesn't depend on the inner loop's variable.

weixin_39731456

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
openmp 并行速度更慢_OpenMP较慢的代码如何并行化？

This code is slower with OpenMP. Without OpenMP I get about 10s. With OpenMP i get about 40s. What is happening? Thank you very much friends!for (i=2;i#pragma omp parallel forfor (j=2; jC[i][j]= absi[...
复制链接

扫一扫