CUDA小记（4）something before programming(一)

最新推荐文章于 2024-07-29 10:08:37 发布

喵小醉

最新推荐文章于 2024-07-29 10:08:37 发布

阅读量1.3k

点赞数 1

分类专栏： CUDA 文章标签： cuda 编程

本文链接：https://blog.csdn.net/github_38221244/article/details/73484616

版权

本文介绍了CUDA编程中常用的内存初始化函数memset()以及数据传递函数memcpy()，详细阐述了它们在主机与设备间数据操作的作用。

摘要由CSDN通过智能技术生成

我是参考了《GPU高性能CUDA实战》来学习CUDA C编程的，具体分析样例时采用了NVIDIA自带的样例，API有一些不一致的地方我会做简单的说明。
在编程开始之前，我们要对GPU有个简单的理解。GPU，图像处理单元，但我使用CUDA编程最主要的想法是使GPU能把强大的计算能力应用到通用并行计算里。我们将CPU以及系统的内存称为主机，而将GPU以及其内存称为设备。
下面我们仔细分析一下之前的样例,在并行编程之前了解一些常用API和概念。（本节我们仅关注内存的分配、释放和数据传输。）
// includes, system
#include <stdio.h>

// includes CUDA Runtime
#include <cuda_runtime.h>

// includes, project
#include <helper_cuda.h>
#include <helper_functions.h> // helper utility functions

   ...
bool correct_output(int *data, const int n, const int x)
{
    for (int i = 0; i < n; i++)
        if (data[i] != x)
        {
            printf("Error! data[%d] = %d, ref = %d\n", i, data[i], x);
            return false;
        }

    return true;
}

int main(int argc, char *argv[])
{
    int devID;
    cudaDeviceProp deviceProps;///mark

    printf("[%s] - Starting...\n", argv[0]);

    // This will pick the best possible CUDA capable device
    devID = findCudaDevice(argc, (const char **)argv);

    // get device name
    checkCudaErrors(cudaGetDeviceProperties(&deviceProps, devID));
    printf("CUDA device [%s]\n", deviceProps.name);

    int n = 16 * 1024 * 1024;
    int nbytes = n * sizeof(int);
    int value = 26;

    // allocate host memory
    int *a = 0;
    checkCudaErrors(cudaMallocHost((void **)&a, nbytes));
    memset(a, 0, nbytes);

    // allocate device memory