转载请声明出处http://blog.csdn.net/zhongkejingwang/article/details/40018735
在C/C++中使用OpenMP优化代码方便又简单,代码中需要并行处理的往往是一些比较耗时的for循环,所以重点介绍一下OpenMP中for循环的应用。个人感觉只要掌握了文中讲的这些就足够了,如果想要学习OpenMP可以到网上查查资料。
工欲善其事,必先利其器。如果还没有搭建好omp开发环境的可以看一下OpenMP并行程序设计——Eclipse开发环境的搭建
首先,如何使一段代码并行处理呢?omp中使用parallel制导指令标识代码中的并行段,形式为:
#pragma omp parallel
{
每个线程都会执行大括号里的代码
}
比如下面这段代码:
- #include <iostream>
- #include "omp.h"
- using namespace std;
- int main(int argc, char **argv) {
- //设置线程数,一般设置的线程数不超过CPU核心数,这里开4个线程执行并行代码段
- omp_set_num_threads(4);
- #pragma omp parallel
- {
- cout << "Hello" << ", I am Thread " << omp_get_thread_num() << endl;
- }
- }
以上代码执行结果为:
- Hello, I am Thread 1
- Hello, I am Thread 0
- Hello, I am Thread 2
- Hello, I am Thread 3
带有for的制导指令:
for制导语句是将for循环分配给各个线程执行,这里要求数据不存在依赖。
使用形式为:
(1)#pragma omp parallel for
for()
(2)#pragma omp parallel
{//注意:大括号必须要另起一行
#pragma omp for
for()
}
注意:第二种形式中并行块里面不要再出现parallel制导指令,比如写成这样就不可以:
#pragma omp parallel
{
#pragma omp parallel for
for()
}
第一种形式作用域只是紧跟着的那个for循环,而第二种形式在整个并行块中可以出现多个for制导指令。下面结合例子程序讲解for循环并行化需要注意的地方。
假如不使用for制导语句,而直接在for循环前使用parallel语句:(为了使输出不出现混乱,这里使用printf代替cout)
- #include <iostream>
- #include <stdio.h>
- #include "omp.h"
- using namespace std;
- int main(int argc, char **argv) {
- //设置线程数,一般设置的线程数不超过CPU核心数,这里开4个线程执行并行代码段
- omp_set_num_threads(4);
- #pragma omp parallel
- for (int i = 0; i < 2; i++)
- //cout << "i = " << i << ", I am Thread " << omp_get_thread_num() << endl;
- printf("i = %d, I am Thread %d\n", i, omp_get_thread_num());
- }
输出结果为:
- i = 0, I am Thread 0
- i = 0, I am Thread 1
- i = 1, I am Thread 0
- i = 1, I am Thread 1
- i = 0, I am Thread 2
- i = 1, I am Thread 2
- i = 0, I am Thread 3
- i = 1, I am Thread 3
从输出结果可以看到,如果不使用for制导语句,则每个线程都执行整个for循环。所以,使用for制导语句将for循环拆分开来尽可能平均地分配到各个线程执行。将并行代码改成这样之后:
- #pragma omp parallel for
- for (int i = 0; i < 6; i++)
- printf("i = %d, I am Thread %d\n", i, omp_get_thread_num());
- i = 4, I am Thread 2
- i = 2, I am Thread 1
- i = 0, I am Thread 0
- i = 1, I am Thread 0
- i = 3, I am Thread 1
- i = 5, I am Thread 3
这样整个for循环被拆分并行执行了。上面的代码中parallel和for连在一块使用的,其只能作用到紧跟着的for循环,循环结束了并行块就退出了。
上面的代码可以改成这样:
- #pragma omp parallel
- {
- #pragma omp for
- for (int i = 0; i < 6; i++)
- printf("i = %d, I am Thread %d\n", i, omp_get_thread_num());
- }
- #pragma omp parallel
- {
- #pragma omp parallel for
- for (int i = 0; i < 6; i++)
- printf("i = %d, I am Thread %d\n", i, omp_get_thread_num());
- }
- i = 0, I am Thread 0
- i = 0, I am Thread 0
- i = 1, I am Thread 0
- i = 1, I am Thread 0
- i = 2, I am Thread 0
- i = 2, I am Thread 0
- i = 3, I am Thread 0
- i = 3, I am Thread 0
- i = 4, I am Thread 0
- i = 4, I am Thread 0
- i = 5, I am Thread 0
- i = 5, I am Thread 0
- i = 0, I am Thread 0
- i = 1, I am Thread 0
- i = 0, I am Thread 0
- i = 2, I am Thread 0
- i = 1, I am Thread 0
- i = 3, I am Thread 0
- i = 2, I am Thread 0
- i = 4, I am Thread 0
- i = 3, I am Thread 0
- i = 5, I am Thread 0
- i = 4, I am Thread 0
- i = 5, I am Thread 0
当然,上面说的for制导语句的两种写法是有区别的,比如两个for循环之间有一些代码只能有一个线程执行,那么用第一种写法只需要这样就可以了:
- #pragma omp parallel for
- for (int i = 0; i < 6; i++)
- printf("i = %d, I am Thread %d\n", i, omp_get_thread_num());
- //这里是两个for循环之间的代码,将会由线程0即主线程执行
- printf("I am Thread %d\n", omp_get_thread_num());
- #pragma omp parallel for
- for (int i = 0; i < 6; i++)
- printf("i = %d, I am Thread %d\n", i, omp_get_thread_num());
- i = 0, I am Thread 0
- i = 2, I am Thread 1
- i = 1, I am Thread 0
- i = 3, I am Thread 1
- i = 4, I am Thread 2
- i = 5, I am Thread 3
- I am Thread 0
- i = 4, I am Thread 2
- i = 2, I am Thread 1
- i = 5, I am Thread 3
- i = 0, I am Thread 0
- i = 3, I am Thread 1
- i = 1, I am Thread 0
由于用parallel标识的并行块中每一行代码都会被多个线程处理,所以如果想让两个for循环之间的代码由一个线程执行的话就需要在代码前用single或master制导语句标识,master由是主线程执行,single是选一个线程执行,这个到底选哪个线程不确定。所以上面代码可以写成这样:
- #pragma omp parallel
- {
- #pragma omp for
- for (int i = 0; i < 6; i++)
- printf("i = %d, I am Thread %d\n", i, omp_get_thread_num());
- #pragma omp master
- {
- //这里的代码由主线程执行
- printf("I am Thread %d\n", omp_get_thread_num());
- }
- #pragma omp for
- for (int i = 0; i < 6; i++)
- printf("i = %d, I am Thread %d\n", i, omp_get_thread_num());
- }
到这里,parallel和for的用法都讲清楚了。接下来就开始讲并行处理时数据的同步问题,这是多线程编程里都会遇到的一个问题。
为了讲解数据同步问题,先由一个例子开始:
- #include <iostream>
- #include "omp.h"
- using namespace std;
- int main(int argc, char **argv) {
- int n = 100000;
- int sum = 0;
- omp_set_num_threads(4);
- #pragma omp parallel
- {
- #pragma omp for
- for (int i = 0; i < n; i++) {
- {
- sum += 1;
- }
- }
- }
- cout << " sum = " << sum << endl;
- }
- 第一次输出sum = 58544
- 第二次输出sum = 77015
- 第三次输出sum = 78423
方法一:对操作共享变量的代码段做同步标识
代码修改如下:
- #pragma omp parallel
- {
- #pragma omp for
- for (int i = 0; i < n; i++) {
- {
- #pragma omp critical
- sum += 1;
- }
- }
- }
- cout << " sum = " << sum << endl;
方法二:每个线程拷贝一份sum变量,退出并行块时再把各个线程的sum相加
并行代码修改如下:
- #pragma omp parallel
- {
- #pragma omp for reduction(+:sum)
- for (int i = 0; i < n; i++) {
- {
- sum += 1;
- }
- }
- }
方法三:这种方法貌似不那么优雅
代码修改如下:
- int n = 100000;
- int sum[4] = { 0 };
- omp_set_num_threads(4);
- #pragma omp parallel
- {
- #pragma omp for
- for (int i = 0; i < n; i++) {
- {
- sum[omp_get_thread_num()] += 1;
- }
- }
- }
- cout << " sum = " << sum[0] + sum[1] + sum[2] + sum[3] << endl;
数据同步就讲完了,上面的代码中for循环是一个一个i平均分配给各个线程,如果想把循环一块一块分配给线程要怎么做呢?这时候用到了schedule制导语句。下面的代码演示了schedule的用法:
- #include <iostream>
- #include "omp.h"
- #include <stdio.h>
- using namespace std;
- int main(int argc, char **argv) {
- int n = 12;
- omp_set_num_threads(4);
- #pragma omp parallel
- {
- #pragma omp for schedule(static, 3)
- for (int i = 0; i < n; i++) {
- {
- printf("i = %d, I am Thread %d\n", i, omp_get_thread_num());
- }
- }
- }
- }
输出结果如下:
- i = 6, I am Thread 2
- i = 3, I am Thread 1
- i = 7, I am Thread 2
- i = 4, I am Thread 1
- i = 8, I am Thread 2
- i = 5, I am Thread 1
- i = 0, I am Thread 0
- i = 9, I am Thread 3
- i = 1, I am Thread 0
- i = 10, I am Thread 3
- i = 2, I am Thread 0
- i = 11, I am Thread 3
OK,for循环并行化的知识基本讲完了,还有一个有用的制导语句barrier,用它可以在并行块中设置一个路障,必须等待所有线程到达时才能通过,这个一般在并行处理循环前后存在依赖的任务时使用到。
是不是很简单?