Hetergeneous Parallel Programming编程作业之MP1: Vector Addition

8 篇文章 0 订阅
4 篇文章 0 订阅

MP1的目标:通过实现Vector Addition熟悉使用CUDA API及相关的设置。

主要会考察以下几个方面,在代码"//@@"下面添加你自己的代码。

Allocate device memory
Copy host memory to device
Initialize thread block and kernel grid dimensions
Invoke CUDA kernel
Copy results from device to host
Free device memory
Write the CUDA kernel

Vector Addition就不多讲了,下面我给出了自己的源码:

// MP 1
#include	<wb.h>

__global__ void vecAdd(float * in1, float * in2, float * out, int len) {
    //@@ Insert code to implement vector addition here
        // Get our global thread ID
    int id = blockIdx.x*blockDim.x+threadIdx.x;
 
    // Make sure we do not go out of bounds
    if (id < len)
        out[id] = in1[id] + in2[id];
}

int main(int argc, char ** argv) {
    wbArg_t args;
    int inputLength;
    float * hostInput1;
    float * hostInput2;
    float * hostOutput;
    float * deviceInput1;
    float * deviceInput2;
    float * deviceOutput;

    args = wbArg_read(argc, argv);

    wbTime_start(Generic, "Importing data and creating memory on host");
    hostInput1 = (float *) wbImport(wbArg_getInputFile(args, 0), &inputLength);
    hostInput2 = (float *) wbImport(wbArg_getInputFile(args, 1), &inputLength);
    hostOutput = (float *) malloc(inputLength * sizeof(float));
    wbTime_stop(Generic, "Importing data and creating memory on host");

    wbLog(TRACE, "The input length is ", inputLength);

	wbTime_start(GPU, "Allocating GPU memory.");
    //@@ Allocate GPU memory here
	size_t bytes = inputLength*sizeof(float);
    cudaMalloc(&deviceInput1, bytes);
    cudaMalloc(&deviceInput2, bytes);
    cudaMalloc(&deviceOutput, bytes);

    wbTime_stop(GPU, "Allocating GPU memory.");

    wbTime_start(GPU, "Copying input memory to the GPU.");
    //@@ Copy memory to the GPU here
    cudaMemcpy( deviceInput1, hostInput1, bytes, cudaMemcpyHostToDevice);
    cudaMemcpy( deviceInput2, hostInput2, bytes, cudaMemcpyHostToDevice);

    wbTime_stop(GPU, "Copying input memory to the GPU.");
    
    //@@ Initialize the grid and block dimensions here
    // Number of threads in each thread block
    int blockSize = 1024;
    // Number of thread blocks in grid
    int gridSize = (int)ceil((float)inputLength/blockSize);

    
    wbTime_start(Compute, "Performing CUDA computation");
    //@@ Launch the GPU Kernel here
    vecAdd<<<gridSize, blockSize>>>(deviceInput1, deviceInput2, deviceOutput, inputLength);
  
    cudaThreadSynchronize();
    wbTime_stop(Compute, "Performing CUDA computation");
    
    wbTime_start(Copy, "Copying output memory to the CPU");
    //@@ Copy the GPU memory back to the CPU here
    cudaMemcpy( hostOutput, deviceOutput, bytes, cudaMemcpyDeviceToHost );

    wbTime_stop(Copy, "Copying output memory to the CPU");

    wbTime_start(GPU, "Freeing GPU Memory");
    //@@ Free the GPU memory here
	cudaFree(deviceInput1);
    cudaFree(deviceInput2);
    cudaFree(deviceOutput);

    wbTime_stop(GPU, "Freeing GPU Memory");

    wbSolution(args, hostOutput, inputLength);

    free(hostInput1);
    free(hostInput2);
    free(hostOutput);

    return 0;
}

之前的博文中《Hetergeneous Parallel Programming编程作业之MP0: Initial Lab Tour with Device Query》,所需的源文件:https://github.com/ashwin/coursera-heterogeneous. 用src/GenDataMP1.cpp生成可执行文件程序GenDataMP1,然后用它生成长度为10000的vector. 它会一次性产生三个vector文件A, B, C. 这里 C = A + B. 

然后我们执行MP1的程序:

$ optirun ./MP1 vecA.txt vecB.txt vecC.txt

运行结果:

[Generic] 0.0193745920 Importing data and creating memory on host
Trace main::32 The input length is 10000
[GPU    ] 0.0597575680 Allocating GPU memory.
[GPU    ] 0.0000576000 Copying input memory to the GPU.
[Compute] 0.0000394240 Performing CUDA computation
[Copy   ] 0.0000514560 Copying output memory to the CPU
[GPU    ] 0.0001185280 Freeing GPU Memory
All tests passed!

本地测试完成后,可以将代码cp到在线提交系统进行测试,一共有6个数据集,测试通过,完成。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值