CUDA（7）之A poor simple sample to perfectly introduce how to program in CUDA

最新推荐文章于 2022-04-03 21:15:19 发布

林微

最新推荐文章于 2022-04-03 21:15:19 发布

阅读量553

点赞数

本文链接：https://blog.csdn.net/Canhui_WANG/article/details/51584862

版权

Abstract

This article talks about how to program in CUDA.

1. ".CU" files

CUDA C code is written into a text document in your project with the ".CU" extension. The ".CU" extension tells Visual Studio that the file is to be compiled by NVCC, the CUDA compiler.

2. Headers and Libraries

You must include the "cuda.h" header and the "cudart.lib" file as well in order to call the CUDA Api functions.

3. Host, Device and Kernels

Host: This is the CPU.

Device: This is the GPU.

Kernels: This is a function called by the CPU, runs on the GPU.

4. Function Qualifiers

__global__: This means a function called by host, runs on device.

__device__: This means a function called by device, runs on device.

__host__: (or not qualifier) This means a function called by host, runs on host.

5. Device Memory Management

Data must be copied back and forth between the host and the device since the GPU cannot see the system RAM. So, you may need to know the following CUDA API functions for device memory management.

cudaMalloc(void **devPointer, size bytes): This CUDA function reserves memory in the devices on global memory.

cudaFree(void **devPointer): This CUDA function releases memory on the device previously allocated by a cudaMalloc call.

cudaMemcpy(void *des, void *src, size bytes, cudaMemcpyDeviceToHost/ cudaMemcpyHostToDevice): This CUDA function copies to and from the device and host memory.

6. Procedures

Fine, next, we are going to do the following things:

a. Create two integers. (Host)

b. Allocate device memory for these two integers. (Device)

c. Copy these two integers to the device memory. (Host)

d. Call the kernel to add these two integers together. (Device)

e. Copy the result back to the host memory. (Device)

f. Print out the result. (Host)

g. Free the device memory. (Device)

7. The CUDA codes run on the Visual Studio 2012

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <iostream>

using namespace std;

__global__ void Add(int *a, int *b, int *c){
	c[0] = a[0]+b[0];
}

int main(void) {  

	// Declare variables
	int addResult;
	int *devicePointerA;
	int *devicePointerB;
	int *devicePointerC;
	
	//Create two integers. (Host)
	int a = 5;
	int b = 9;

	// Allocate device memory for these two integers and their add result. (Device)
	cudaMalloc(&devicePointerA, sizeof(int));
	cudaMalloc(&devicePointerB, sizeof(int));
	cudaMalloc(&devicePointerC, sizeof(int));

	// Copy these two integers to the device memory. (Host)
	cudaMemcpy(devicePointerA, &a, sizeof(int), cudaMemcpyHostToDevice);
	cudaMemcpy(devicePointerB, &b, sizeof(int), cudaMemcpyHostToDevice);

	// Call the kernel to add these two integers together. (Device)
	Add<<<1, 1>>>(devicePointerA, devicePointerB, devicePointerC);

	// Copy the result back to the host memory. (Device)
	cudaMemcpy(&addResult, devicePointerC, sizeof(int), cudaMemcpyDeviceToHost);

	// Print out the result. (Host)
	printf("The add result is %d \n", addResult);

	// Free the device memory. (Device)
	cudaFree(devicePointerA);
	cudaFree(devicePointerB);

    return 0;  
}

8. The test result