我已经阅读了有关此主题的其他一些问题.
但是,他们还是没有解决我的问题.
我写的代码如下,我得到的pthread版本和omp版本都比串行版本慢.我很困惑
在以下环境下编译:
Ubuntu 12.04 64bit 3.2.0-60-generic
g++ (Ubuntu 4.8.1-2ubuntu1~12.04) 4.8.1
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Vendor ID: AuthenticAMD
CPU family: 18
Model: 1
Stepping: 0
CPU MHz: 800.000
BogoMIPS: 3593.36
L1d cache: 64K
L1i cache: 64K
L2 cache: 512K
NUMA node0 CPU(s): 0,1
编译命令:
g -std = c 11 ./eg001.cpp -fopenmp
#include
#include
#include
#include
#include
#define NUM_THREADS 5
const int sizen = 256000000;
struct Data {
double * pSinTable;
long tid;
};
void * compute(void * p) {
Data * pDt = (Data *)p;
const int start = sizen * pDt->tid/NUM_THREADS;
const int end = sizen * (pDt->tid + 1)/NUM_THREADS;
for(int n = start; n < end; ++n) {
pDt->pSinTable[n] = std::sin(2 * M_PI * n / sizen);
}
pthread_exit(nullptr);
}
int main()
{
double * sinTable = new double[sizen];
pthread_t threads[NUM_THREADS];
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
clock_t start, finish;
start = clock();
int rc;
Data dt[NUM_THREADS];
for(int i = 0; i < NUM_THREADS; ++i) {
dt[i].pSinTable = sinTable;
dt[i].tid = i;
rc = pthread_create(&threads[i], &attr, compute, &dt[i]);
}//for
pthread_attr_destroy(&attr);
for(int i = 0; i < NUM_THREADS; ++i) {
rc = pthread_join(threads[i], nullptr);
}//for
finish = clock();
printf("from pthread: %lf
", (double)(finish - start)/CLOCKS_PER_SEC);
delete sinTable;
sinTable = new double[sizen];
start = clock();
# pragma omp parallel for
for(int n = 0; n < sizen; ++n)
sinTable[n] = std::sin(2 * M_PI * n / sizen);
finish = clock();
printf("from omp: %lf
", (double)(finish - start)/CLOCKS_PER_SEC);
delete sinTable;
sinTable = new double[sizen];
start = clock();
for(int n = 0; n < sizen; ++n)
sinTable[n] = std::sin(2 * M_PI * n / sizen);
finish = clock();
printf("from serial: %lf
", (double)(finish - start)/CLOCKS_PER_SEC);
delete sinTable;
pthread_exit(nullptr);
return 0;
}
输出:
from pthread: 21.150000
from omp: 20.940000
from serial: 20.800000
我想知道这是否是我的代码的问题,所以我用pthread做了同样的事情.
但是,我完全错了,我想知道这是否可能是Ubuntu在OpenMP / pthread上的问题.
我有一个朋友也拥有AMD CPU和Ubuntu 12.04,并且在那里遇到了相同的问题,因此我可能有理由相信该问题不仅限于我.
如果有人和我有相同的问题,或者对这个问题有一些线索,请事先感谢.
如果代码不够好,我会运行一个基准测试并将结果粘贴到此处:
新信息:
我使用VS2012在Windows(无pthread版本)上运行代码.
我使用sizen的1/10,因为Windows不允许我分配很大的内存主干,结果是:
from omp: 1.004
from serial: 1.420
from FreeNickName: 735 (this one is the suggestion improvement by @FreeNickName)
这是否表明这可能是Ubuntu OS的问题?
通过使用可在操作系统之间移植的omp_get_wtime函数来解决问题.请参阅Hristo Iliev的答案.
关于FreeNickName有争议的主题的一些测试.
(对不起,我需要在Ubuntu上对其进行测试,因为Windows是我的朋友之一.)
–1–从delete变为delete [] :(但不包括memset)(-std = c 11 -fopenmp)
from pthread: 13.491405
from omp: 13.023099
from serial: 20.665132
from FreeNickName: 12.022501
–2–在新之后立即使用memset:(-std = c 11 -fopenmp)
from pthread: 13.996505
from omp: 13.192444
from serial: 19.882127
from FreeNickName: 12.541723
–3–在新之后立即使用memset:(-std = c 11 -fopenmp -march = native -O2)
from pthread: 11.886978
from omp: 11.351801
from serial: 17.002865
from FreeNickName: 11.198779
–4–在新之后立即使用memset,并将FreeNickName的版本放在OMP之前的版本中:(-std = c 11 -fopenmp -march = native -O2)
from pthread: 11.831127
from FreeNickName: 11.571595
from omp: 11.932814
from serial: 16.976979
–5–在新版本之后立即使用memset,并将FreeNickName的版本放在OMP之前的版本中,并将NUM_THREADS设置为5而不是2(我是双核).
from pthread: 9.451775
from FreeNickName: 9.385366
from omp: 11.854656
from serial: 16.960101