CUDA 卷积乘法

最新推荐文章于 2023-05-30 18:38:56 发布

金鳞本鲤

最新推荐文章于 2023-05-30 18:38:56 发布

阅读量371

点赞数 1

本文链接：https://blog.csdn.net/weixin_43906500/article/details/112469916

版权

该博客展示了如何使用CUDA来加速卷积乘法运算。通过对比CPU与GPU的运行时间，显示了CUDA在处理此类任务时的效率优势。代码包括了CUDA C++的主机部分和设备（GPU）部分的实现。

摘要由CSDN通过智能技术生成

cpp代码

#include <iostream>

#include <stdio.h>
#include <stdlib.h> //为rand()及srand()提供函数声明
#include <time.h>

extern "C" int mulWithCuda(float* c,float* a,float* b, int size, int kernelSize);

int main()
{
   int size = 10;

   float* a = (float*)malloc(size * size * sizeof(float));
   float* c = (float*)malloc(size * size * sizeof(float));
   float* d = (float*)malloc(size * size * sizeof(float));
   srand(time(NULL));
   for (int row = 0; row < size; ++row)
   {
       for (int col = 0; col < size; ++col)
       {
           a[col + row * size] = (float)rand() / (RAND_MAX / 10);;
           c[row * size + col] = 0;
       }
   }

   int kernelSize = 5;
   float* b = (float*)malloc(kernelSize * kernelSize * sizeof(float));
   for (int i = 0; i < kernelSize * kernelSize; ++i)
   {
       b[i] = 1;
   }

   clock_t start = clock();
   for (int row = 0; row < size; ++row)
   {
       for (int col = 0; col < size; ++col)
       {
           for (int i = 0; i < kernelSize; ++i)
           {
               for (int j = 0; j < kernelSize; ++j)
               {
                   float v = 0;
                   //使其定位到左上角坐标系原点，便于后续定位元素
                   int curRow = row - kernelSize / 2 + i;
                   int curCol = col - kernelSize / 2 + j;
                   if (curRow >= 0 && curCol >= 0 &a

最低0.47元/天解锁文章

金鳞本鲤

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
CUDA 卷积乘法

cpp代码#include <iostream>#include <stdio.h>#include <stdlib.h> //为rand()及srand()提供函数声明#include <time.h> extern "C" int mulWithCuda(float* c,float* a,float* b, int size, int kernelSize);int main(){ int size ...
复制链接

扫一扫

专栏目录