OpenMP多线程并行案例

Begoniaish

已于 2024-03-16 10:46:38 修改

阅读量348

点赞数 4

文章标签：算法

于 2024-02-20 09:58:15 首次发布

本文链接：https://blog.csdn.net/Begoniaish/article/details/136183365

版权

文章介绍了如何使用OpenMP实现HelloWorld程序的并行版本，展示了并行计算PI和矩阵乘法的代码示例，强调了数据一致性、同步机制以及线程绑定对性能的影响。

摘要由CSDN通过智能技术生成

1.用OpenMP实现Helloworld程序，每个线程打印"`Hello World，This is thread %d out of %d`", 分别打印出当前线程号和线程总数，并输出程序执行的时间。

#include <stdio.h>
#include <omp.h>
#include <time.h>

int main() {
    int num_threads, thread_id;
    double start_time, end_time;

    start_time = omp_get_wtime();

    #pragma omp parallel private(thread_id)
    {
        num_threads = omp_get_num_threads();
        thread_id = omp_get_thread_num();
        
        printf("Hello World, This is thread %d out of %d\n", thread_id, num_threads);
    }

    end_time = omp_get_wtime();

    printf("Execution time: %f seconds\n", end_time - start_time);

    return 0;
}

monkeycode@ln0:~/training_system/chenzhaoxiang/OpenMP$ ./Helloworld
Hello World, This is thread 4 out of 64
Hello World, This is thread 42 out of 64
Hello World, This is thread 3 out of 64
Hello World, This is thread 6 out of 64
Hello World, This is thread 5 out of 64
Hello World, This is thread 8 out of 64
Hello World, This is thread 61 out of 64
Hello World, This is thread 62 out of 64
Hello World, This is thread 0 out of 64
Hello World, This is thread 9 out of 64
Hello World, This is thread 58 out of 64
Hello World, This is thread 60 out of 64
Hello World, This is thread 59 out of 64
Hello World, This is thread 12 out of 64
Hello World, This is thread 2 out of 64
Hello World, This is thread 13 out of 64
Hello World, This is thread 10 out of 64
Hello World, This is thread 11 out of 64
Hello World, This is thread 17 out of 64
Hello World, This is thread 1 out of 64
Hello World, This is thread 15 out of 64
Hello World, This is thread 19 out of 64
Hello World, This is thread 16 out of 64
Hello World, This is thread 14 out of 64
Hello World, This is thread 22 out of 64
Hello World, This is thread 21 out of 64
Hello World, This is thread 20 out of 64
Hello World, This is thread 25 out of 64
Hello World, This is thread 27 out of 64
Hello World, This is thread 28 out of 64
Hello World, This is thread 31 out of 64
Hello World, This is thread 33 out of 64
Hello World, This is thread 26 out of 64
Hello World, This is thread 32 out of 64
Hello World, This is thread 34 out of 64
Hello World, This is thread 29 out of 64
Hello World, This is thread 36 out of 64
Hello World, This is thread 30 out of 64
Hello World, This is thread 35 out of 64
Hello World, This is thread 24 out of 64
Hello World, This is thread 23 out of 64
Hello World, This is thread 37 out of 64
Hello World, This is thread 40 out of 64
Hello World, This is thread 39 out of 64
Hello World, This is thread 41 out of 64
Hello World, This is thread 44 out of 64
Hello World, This is thread 18 out of 64
Hello World, This is thread 43 out of 64
Hello World, This is thread 46 out of 64
Hello World, This is thread 45 out of 64
Hello World, This is thread 38 out of 64
Hello World, This is thread 48 out of 64
Hello World, This is thread 47 out of 64
Hello World, This is thread 50 out of 64
Hello World, This is thread 51 out of 64
Hello World, This is thread 54 out of 64
Hello World, This is thread 52 out of 64
Hello World, This is thread 56 out of 64
Hello World, This is thread 7 out of 64
Hello World, This is thread 57 out of 64
Hello World, This is thread 53 out of 64
Hello World, This is thread 63 out of 64
Hello World, This is thread 55 out of 64
Hello World, This is thread 49 out of 64
Execution time: 0.005046 seconds

2.改写程序，将并行计算PI程序所有线程计算得到的部分和累加到共享变量pi，输出运行时间和pi的值。运用多个线程执行OpenMP程序看看是否会产生问题？如果产生了问题会是什原因造成的？

#include <stdio.h>
#include <omp.h>

static long num_steps = 100000;
double step;

int main() {
    int i;
    double x, pi, sum = 0.0;
    double start_time, end_time;

    step = 1.0 / (double)num_steps;

    start_time = omp_get_wtime();

    #pragma omp parallel for reduction(+:sum)
    for (i = 0; i < num_steps; i++) {
        x = (i + 0.5) * step;
        sum += 4.0 / (1.0 + x * x);
    }

    pi = step * sum;

    end_time = omp_get_wtime();

    printf("pi = %.16f\n", pi);
    printf("Elapsed time: %.4f seconds\n", end_time - start_time);

    return 0;
}

在这个程序中，我们使用 #pragma omp parallel for 命令来将 for 循环并行化，从而让多个线程同时计算部分和。为了确保线程安全，我们使用了 reduction 子句，将每个线程计算得到的部分和进行累加。

在使用OpenMP进行多线程并行计算时，可能会出现一些问题，例如数据竞争、死锁等。在本例中，由于只有一个共享变量 sum 被多个线程同时修改，因此不会出现数据竞争问题。但是，如果程序中有多个共享变量被多个线程同时访问和修改，则需要使用适当的同步机制（如互斥锁、信号量等）来避免数据竞争和死锁。

此外，即使程序正确地使用了同步机制，也可能会出现性能下降的问题，例如由于线程之间的上下文切换、同步开销等原因。因此，在使用OpenMP进行多线程并行计算时，需要进行充分的测试和优化，以确保程序能够正确运行，并获得最佳的性能表现。

3.设计一个程序，使用OpenMP并行化实现矩阵乘法。给定两个矩阵 A 和 B，矩阵大小均为1024*1024，你的任务是计算它们的乘积 C。

要求：

1、使用循环结构体的知识点，包括for循环体并行化、变量规约属性与子句reduction、循环调度策略与子句schedule以及嵌套循环与子句collapse。

2、实现OpenMP并行化以加速矩阵乘法的计算。

3、考虑内存一致性，确保数据在并行计算过程中的正确性。

4、可选：实现线程亲核性，将线程绑定到特定的CPU核心上执行。

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

#define SIZE 1024

int main() {
    int A[SIZE][SIZE], B[SIZE][SIZE], C[SIZE][SIZE];

    // 初始化矩阵 A 和 B
    for (int i = 0; i < SIZE; i++) {
        for (int j = 0; j < SIZE; j++) {
            A[i][j] = i + j;
            B[i][j] = i - j;
        }
    }
    double start_time = omp_get_wtime();
    // 使用OpenMP并行计算矩阵乘法
    #pragma omp parallel for collapse(2) schedule(static)
    for (int i = 0; i < SIZE; i++) {
        for (int j = 0; j < SIZE; j++) {
            C[i][j] = 0;
            #pragma omp parallel for reduction(+:C[i][j])
            for (int k = 0; k < SIZE; k++) {
                C[i][j] += A[i][k] * B[k][j];
            }
        }
    }
    double end_time = omp_get_wtime();

    printf("Elapsed time: %.4f seconds\n", end_time - start_time);
    // 输出结果（可选）
    /*
    for (int i = 0; i < SIZE; i++) {
        for (int j = 0; j < SIZE; j++) {
            printf("%d ", C[i][j]);
        }
        printf("\n");
    }
    */

    return 0;
}

以下是在 proc_bind(）在不同模式下master/close/spread的加速效果

//master
monkeycode@ln0:~/training_system/chenzhaoxiang/OpenMP$ vi 1024x1024.c
monkeycode@ln0:~/training_system/chenzhaoxiang/OpenMP$ gcc -fopenmp 1024x1024.c -o 1024x1024
monkeycode@ln0:~/training_system/chenzhaoxiang/OpenMP$ ./1024x1024
Elapsed time: 1.6774 seconds
//close
monkeycode@ln0:~/training_system/chenzhaoxiang/OpenMP$ vi 1024x1024.c
monkeycode@ln0:~/training_system/chenzhaoxiang/OpenMP$ gcc -fopenmp 1024x1024.c -o 1024x1024
monkeycode@ln0:~/training_system/chenzhaoxiang/OpenMP$ ./1024x1024              
Elapsed time: 1.6787 seconds
//spread
monkeycode@ln0:~/training_system/chenzhaoxiang/OpenMP$ vi 1024x1024.c 
monkeycode@ln0:~/training_system/chenzhaoxiang/OpenMP$ gcc -fopenmp 1024x1024.c -o 1024x1024
monkeycode@ln0:~/training_system/chenzhaoxiang/OpenMP$ ./1024x1024
Elapsed time: 1.8293 seconds
//不启用proc_bind
monkeycode@ln0:~/training_system/chenzhaoxiang/OpenMP$ vi 1024x1024.c
monkeycode@ln0:~/training_system/chenzhaoxiang/OpenMP$ gcc -fopenmp 1024x1024.c -o 1024x1024
monkeycode@ln0:~/training_system/chenzhaoxiang/OpenMP$ ./1024x1024
Elapsed time: 1.7215 seconds

Begoniaish

关注

4
点赞
踩
3

收藏

觉得还不错? 一键收藏
1
评论
OpenMP多线程并行案例

循环结构体的知识点，包括for循环体并行化、变量规约属性与子句reduction、循环调度策略与子句schedule以及嵌套循环与子句collapse。实现线程亲核性，将线程绑定到特定的CPU核心上执行。考虑内存一致性，确保数据在并行计算过程中的正确性。OpenMP并行化以加速矩阵乘法的计算。
复制链接

扫一扫