【云计算实验】基于OpenMP、CUDA、OpenCL、GPU并行计算实验

最新推荐文章于 2024-08-01 14:00:00 发布

zoetu

最新推荐文章于 2024-08-01 14:00:00 发布

阅读量3.5k

点赞数 5

分类专栏： # 云计算文章标签：云计算深度学习计算机视觉

本文链接：https://blog.csdn.net/qq_43800119/article/details/124492946

版权

云计算专栏收录该内容

9 篇文章 2 订阅

订阅专栏

前言

本文是之前做云计算实验整理的内容，借博客保存一下！
使用不同方法对算法加速还是很有意思的！

实验题目

自选一张图片，按照实验指南说明在jetson05节点上基于OpenMP和CUDA对图片进行边缘提取实验，记录梯度向量幅度的最小值和最大值，比较串行算法和并行算法的运行时间，并提交处理后的边缘提取结果图片。试一下：如果编译时开启优化选项（比如选择“-O3”级别的优化），串行算法和并行算法的运行时间分别有什么变化。
使用提供的opencl-examples源码在你自己的计算机上进行基于OpenCL的GPU并行算法实验，记录你的实验环境参数（包括CPU和GPU相关参数等，可从实验程序日志中获取）以及各个算法的串行版本和GPU并行版本的运行时间，并进行简单的对比分析。

1. 基于OpenMP和CUDA对图片进行边缘提取实验

1.1 处理图片

在这里插入图片描述

1.2 基于OpenMP

1）串行算法

a. 运行命令

zz@jetson05:~/examples$ g++ -fopenmp -o sobel sobel.cpp -lfreeimage 
zz@jetson05:~/examples$ ./sobel IBM_Blue_Gene_P_supercomputer.jpg
Filtering of input image start ...
the minimum value: 0
the maximum value: 1052.84
The total time for execution is 7.66395s

b. 运行截图

在这里插入图片描述

c. 图片处理结果

在这里插入图片描述

2）并行算法

a. 运行命令

zz@jetson05:~/examples$ g++ -fopenmp -o sobel_omp sobel_omp.cpp -lfreeimage
zz@jetson05:~/examples$ ./sobel_omp IBM_Blue_Gene_P_supercomputer.jpg
Filtering of input image start ...
the minimum value: 0
the maximum value: 1052.84
actual threads number: 8
The total time for execution is 2.20083s

b. 运行截图
在这里插入图片描述

c. 图片处理结果
在这里插入图片描述

3）线程数从8改成16

a. 修改代码
在这里插入图片描述

zz@jetson05:~/examples$ g++ -fopenmp -o sobel_omp16 sobel_omp.cpp -lfreeimage
zz@jetson05:~/examples$ ./sobel_omp16 IBM_Blue_Gene_P_supercomputer_out.jpg
Filtering of input image start ...
the minimum value: 0
the maximum value: 1001.81
actual threads number: 16
The total time for execution is 1.84612s

b. 运行截图及结果

在这里插入图片描述

4）编译算法时开启优化选项

a. 串行算法
选择“-O3”级别的优化

运行命令

zz@jetson05:~/examples$ g++ -fopenmp -O3 sobel sobel.cpp -lfreeimage
zz@jetson05:~/examples$ ./sobel IBM_Blue_Gene_P_supercomputer.jpg
Filtering of input image start ...
the minimum value: 0
the maximum value: 1052.84
The total time for execution is 7.65937s

运行截图及结果
结果：运行时间和1)差不多

在这里插入图片描述
b. 并行算法
选择“-O3”级别的优化

运行命令

zz@jetson05:~/examples$ g++ -fopenmp -O3 sobel_omp16 sobel_omp.cpp -lfreeimage 
zz@jetson05:~/examples$ ./sobel_omp16 IBM_Blue_Gene_P_supercomputer_out.jpg
Filtering of input image start ...
the minimum value: 0
the maximum value: 1001.81
actual threads number: 16
The total time for execution is 0.741739s

运行截图及结果
如下图所示，运行时间为0.741739s 比 3)运行结果1.84612s快

1.3 基于CUDA

1) 运行命令

zz@jetson05:~/examples$ nvcc -o sobel_cuda sobel_cuda.cu -lfreeimage
zz@jetson05:~/examples$ ./sobel_cuda IBM_Blue_Gene_P_supercomputer.jpg
1 CUDA devices found!
CUDA device 0: NVIDIA Tegra X1
  multi processor count: 1
  shared memory per block: 48 KB
  max threads per block: 1024
  max threads per multi processor: 2048
  max warps per multi processor: 64
Use CUDA device 0
the minimum value: 0
the maximum value: 1059.25
The total time for execution is: 0.184939s