CUDA(7)之A poor simple sample to perfectly introduce how to program in CUDA

Abstract

This article talks about how to program in CUDA.

 

1. ".CU" files

CUDA C code is written into a text document in your project with the ".CU" extension. The ".CU" extension tells Visual Studio that the file is to be compiled by NVCC, the CUDA compiler.

 

2. Headers and Libraries

You must include the "cuda.h" header and the "cudart.lib" file as well in order to call the CUDA Api functions.

 

3. Host, Device and Kernels

Host: This is the CPU.

Device: This is the GPU.

Kernels: This is a function called by the CPU, runs on the GPU.

 

4. Function Qualifiers

__global__: This means a function called by host, runs on device.

__device__: This means a function called by device, runs on device.

__host__: (or not qualifier) This means a function called by host, runs on host.

 

5. Device Memory Management

Data must be copied back and forth between the host and the device since the GPU cannot see the system RAM. So, you may need to know the following CUDA API functions for device memory management.

cudaMalloc(void **devPointer, size bytes): This CUDA function reserves memory in the devices on global memory.

cudaFree(void **devPointer): This CUDA function releases memory on the device previously allocated by a cudaMalloc call.

cudaMemcpy(void *des, void *src, size bytes, cudaMemcpyDeviceToHost/ cudaMemcpyHostToDevice): This CUDA function copies to and from the device and host memory.

 

6. Procedures

Fine, next, we are going to do the following things:

a. Create two integers. (Host)

b. Allocate device memory for these two integers. (Device)

c. Copy these two integers to the device memory. (Host)

d. Call the kernel to add these two integers together. (Device)

e. Copy the result back to the host memory. (Device)

f. Print out the result. (Host)

g. Free the device memory. (Device)

 

7. The CUDA codes run on the Visual Studio 2012

 

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <iostream>

using namespace std;

__global__ void Add(int *a, int *b, int *c){
	c[0] = a[0]+b[0];
}

int main(void) {  

	// Declare variables
	int addResult;
	int *devicePointerA;
	int *devicePointerB;
	int *devicePointerC;
	
	//Create two integers. (Host)
	int a = 5;
	int b = 9;

	// Allocate device memory for these two integers and their add result. (Device)
	cudaMalloc(&devicePointerA, sizeof(int));
	cudaMalloc(&devicePointerB, sizeof(int));
	cudaMalloc(&devicePointerC, sizeof(int));

	// Copy these two integers to the device memory. (Host)
	cudaMemcpy(devicePointerA, &a, sizeof(int), cudaMemcpyHostToDevice);
	cudaMemcpy(devicePointerB, &b, sizeof(int), cudaMemcpyHostToDevice);

	// Call the kernel to add these two integers together. (Device)
	Add<<<1, 1>>>(devicePointerA, devicePointerB, devicePointerC);

	// Copy the result back to the host memory. (Device)
	cudaMemcpy(&addResult, devicePointerC, sizeof(int), cudaMemcpyDeviceToHost);

	// Print out the result. (Host)
	printf("The add result is %d \n", addResult);

	// Free the device memory. (Device)
	cudaFree(devicePointerA);
	cudaFree(devicePointerB);

    return 0;  
}  

 

8. The test result

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值