openmp 并行 速度更慢_OpenMP较慢的代码如何并行化?

This code is slower with OpenMP. Without OpenMP I get about 10s. With OpenMP i get about 40s. What is happening? Thank you very much friends!

for (i=2;i

#pragma omp parallel for

for (j=2; j

C[i][j]= absi[i]*absj[j]*

(2.0f*B[i][j] + absi[i]*absj[j]*

(VEL[i][j]*VEL[i][j]*fat*

(16.0f*(B[i][j-1]+B[i][j+1]+B[i-1][j]+B[i+1][j])

-1.0f*(B[i][j-2]+B[i][j+2]+B[i-2][j]+B[i+2][j])

-60.0f*B[i][j]

)-A[i][j]));

c2 = (abs(C[i][j]) > Amax[i][j]);

if (c2) {

Amax[i][j] = abs(C[i][j]);

Ttra[i][j] = t;

}

}

}

解决方案

Just because you're using OpenMP doesn't mean your program will run faster. A couple of things can be happening here:

There is a cost associated to spawning each thread, and if you spawn a thread to do a small amount of computation, the spawning of the thread itself will take more time than the computation.

By default, OpenMP will spawn the maximum number of threads supported by your CPU. With CPU's that support 2 or more threads per core, the threads will be competing for each core's resources. Using omp_get_num_threads() you can see how many threads will be spawned by default. I recommend trying running your code with half that value using omp_set_num_threads().

Did you confirm the results were the same with and without OpenMP? It seems there is a dependency with the variables j and c2. You should declare them private to each thread:

#pragma omp parallel for private(j,c2)

I wanted to add another thing: before attempting any parallelization, you should make sure that the code is already optimized.

Depending on your compiler, compiler flags and the complexity of the instruction, the compiler may or may not optimize your code:

// avoid calculation nnoib-2 every iteration

int t_nnoib = nnoib - 2;

for (i=2; i< t_nnoib; ++i){

// avoid calculation nnojb-2 every iteration

int t_nnojb = nnojb - 2;

// avoid loading absi[i] every iteration

int t_absi = absi[i];

for (j=2; j< t_nnojb; ++j) {

C[i][j]= t_absi * absj[j] *

(2.0f*B[i][j] + t_absi * absj[j] *

(VEL[i][j] * VEL[i][j] * fat *

(16.0f * (B[i][j-1] + B[i][j+1] + B[i-1][j] + B[i+1][j])

-1.0f * (B[i][j-2] + B[i][j+2] + B[i-2][j] + B[i+2][j])

-60.0f * B[i][j]

) - A[i][j]));

// c2 is a useless variable

if (abs(C[i][j]) > Amax[i][j]) {

Amax[i][j] = abs(C[i][j]);

Ttra[i][j] = t;

}

}

}

It may not seem much, but it can have a huge impact on your code. The compiler will try to place local variables in registers (which have a much faster access time). Keep in mind that you cant apply this technique indefinitely since you have an limited number of registers, and abusing this will cause your code to suffer from register spilling.

In the case of the array absi, you'll avoid having the system keeping a piece of that array in cache during the execution of the j loop. The general idea of this technique is to move to the outer loop any array access that doesn't depend on the inner loop's variable.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值