在openmp中有个比较典型的测试例子cpp_compiler_options_openmp.cpp,展示了
for循环中的归约操作, #pragma omp parallel for reduction(+:sum) private(x)
自已也写个多线程的版本,针对Intel Core 2 Duo CPU.
计算pi的方法很多,这个方法用于测试最好了,原理:http://wenku.baidu.com/view/3287baacdd3383c4bb4cd2ed.html
c代码:
double test1(int num_steps) {
int i;
double x, pi, sum = 0.0, step;
step = 1.0 / (double) num_steps;
for (i = 1; i <= num_steps; i++) {
x = (i - 0.5) * step;
sum = sum + 4.0 / (1.0 + x*x);
}
pi = step * sum;
return pi;
}
使用MS vc++6.0 编译,得到FPU单线程版本。
用Intel C++编译,得到SSE单线程版本。
用Intel C++结合openmp生成SSE的3线程版本(两个计算线程,一个主线程)。
我将Intel C++编译,得到SSE单线程版本改写为如上的3线程版本。
处理器:Intel Core(TM)2 Duo CPU E8500 @3.16GHz 3.16GHz win7 32位.
性能如下:
MS VC For 1000000000 steps, pi = 3.141592653589971, 6506 milliseconds 单线程
Intel C++ For 1000000000 steps, pi = 3.141592653589763, 3307 milliseconds 单线程
Openmp+Intel C++ For 1000000000 steps, pi = 3.141592653589738, 1684 milliseconds 3线程
my mtTest.exe For 1000000000 steps, pi = 3.141592653589738 ,1606 milliseconds 3线程
很容易扩展成更多的线程。
;>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
;*--==--* fasm multiple threads.
;*--==--* By G-Spider
;*--==--* fasm mtTest.asm mtTest.exe
;>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
format PE console
entry start
include 'win32a.inc'
THREAD_PRIORITY_TIME_CRITICAL = 0fh
THREAD_PRIORITY_HIGHEST = 02h
CREATE_SUSPENDED = 04h
INFINITE = -1
;i = 1; i <= num_steps; i&