pthread 并行程序中unsigned效率问题

下面是利用pthread并行求PI的代码,当我运行串行程序时,迭代8亿次用时12秒多。在双核对笔记本下面创建8个线程运行下面的程序,竟然用时15秒多(而不是想象对串行一半的时间),不可思议啊。这个问题困扰了我一天多,先看程序吧:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
#include <sys/time.h>
#include <pthread.h>

#define MAX_N_THREADS 20
static long num_steps=800000000; 
double step = 1.0 / (double)num_steps;
double pi;
unsigned long length = 0;
double g_sum = 0;
pthread_mutex_t mtSum;

void* work(void* p)
{
    timeval tstart, tend;
    gettimeofday(&tstart, NULL);
    int ithread = (int)p;
    unsigned long start = ithread * length;
    unsigned long end = length + start;

    printf("thread %d start, start = %ld, length = %ld, g_sum = %lf\n", ithread, start, end, g_sum); 
    unsigned long i = start; 
    double x = 0.0, sum = 0.0;
    for ( ; i < end; i++) {
        x = (i + 0.5)*step;
        sum = sum + 4.0/(1.0 + x*x);
    } 
    gettimeofday(&tend, NULL);
    double tcost = tend.tv_sec - tstart.tv_sec + (double)(tend.tv_usec-tstart.tv_usec)/1000000.0; 
    printf("thread %d calculate end, cost %10.6f seconds\n", ithread, tcost);

    pthread_mutex_lock(&mtSum);
    g_sum += sum;
    pthread_mutex_unlock(&mtSum);

    gettimeofday(&tend, NULL);
    tcost = tend.tv_sec - tstart.tv_sec + (double)(tend.tv_usec-tstart.tv_usec)/1000000.0;
    printf("thread %d exit, sum = %lf, g_sum = %lf, cost %10.6f seconds\n", ithread, sum, g_sum, tcost);
    //pthread_exit((void*)0);
    return ((void*)NULL);
}

int main(int argc, char* argv[])
{
    double x, sum = 0.0;
    timeval start, end;
    double tcost;
    unsigned int nthreads;
    
    for (int i = 1; i < argc; i++) {
        char *ts = strstr(argv[i], "-p=");
        if (ts == NULL)
            continue;
        sscanf(ts,"-p=%d", &nthreads);
    } 

    printf("nthreads = %d\n", nthreads);

    gettimeofday(&start, NULL);

    pthread_t threads[MAX_N_THREADS];
    pthread_attr_t attr;
    pthread_attr_init(&attr);
    pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
    pthread_mutex_init(&mtSum, NULL);

    
    length = (unsigned long)ceil(num_steps / nthreads);
    for (int i = 0; i < nthreads; i++){
        int rc = pthread_create(&threads[i], &attr, work, (void*)i);
        //int rc = pthread_create(&threads[i], NULL, work, (void*)i);
        if (rc) {
            printf("ERROR; return code from pthread_create(%d) is %d\n",i, rc); 
            exit(-1); 
        }
    }

    void *status;
    for (int i = 0; i < nthreads; i++) 
    { 
       //pthread_join(threads[i], &status); 
       pthread_join(threads[i], NULL); 
    }

    pi = step * g_sum;


    gettimeofday(&end, NULL);
    tcost = end.tv_sec - start.tv_sec + (double)(end.tv_usec-start.tv_usec)/1000000.0;
    printf("Pi = %12.9f, cost %10.6f seconds\n", pi, tcost);

    pthread_mutex_destroy(&mtSum);
    //pthread_exit(NULL);
    return 0;
}

问题出现在work线程中的循环,当起始、终止点start和end类型均为unsigned long时,程序运行时间为15秒多,改为long后,程序运行时间变为了7秒多(与预测值相似)。这是为什么呢?暂时依然无解。
通过vimdiff 对比二者的汇编代码,发现一共有三处不同。左边为unsigned版本,右边为signed版本。

who can tell me why?
阅读更多
文章标签: null thread join 汇编
个人分类: 并行
想对作者说点什么? 我来说一句

没有更多推荐了,返回首页

加入CSDN,享受更精准的内容推荐,与500万程序员共同成长!
关闭
关闭