CUDA on Platform 学习笔记5--错误检测与事件

CV温故知新

已于 2022-07-10 17:04:20 修改

阅读量743

点赞数

分类专栏： CUDA教程文章标签：深度学习人工智能

于 2022-07-08 14:42:54 首次发布

本文链接：https://blog.csdn.net/lansebingxuan/article/details/125677107

版权

CUDA教程专栏收录该内容

6 篇文章 1 订阅

订阅专栏

本文是参加2022 CUDA on Platform 线上训练营学习笔记，感谢NVIDIA各位老师的精彩讲解!

CUDA运行时的错误检测函数

为什么要使用错误检测函数？
CUDA通常有两类错误：
1）CUDA操作后立刻返回的瞬时错误，常见的有cudaMalloc的参数错误（如要求分配100GB显存分配失败），和kernel的启动配置错误（如要求启动1025个线程的block），错误都会立刻返回的。例如下面代码的错误类型1处，kernel的启动配置错误应立即返回CPU；
2）CUDA操作正在进行中，未来将会发生的错误，例如下面代码的错误类型2处异步错误。前面提过，CUDA是异构计算，CPU和GPU各自独立执行任务，CPU端调用核函数后继续执行主线程的程序，无法确认GPU端是否执行正确，因此再下一次的同步调用的时候，返回之前的异步错误，这样CPU端可以了解CUDA核函数是否成功执行。

__global__void hello()
{
  printf("thread_id (%d,%d) block_id (%d, %d) coordinate (%d,%d)\n"
    "global index %2d \n", threadIdx.x, threadIdx.y, blockIdx.x, blockIdx.y, ix, iy, idx);
}
int main()
{
    hello<<<1, 1025>>>(); //错误，这里不能大于1024
    //错误检测函数,错误类型1
    cudaError_t error = cudaGetLastError();
    if (error != cudaSuccess)
    {
        printf("CUDA error:%s\n",cudaGetErrorString(err));
        exit(-1);
    }
    printf("I am the CPU:Hello World!\n");
    cudaError_t err = cudaDeviceSynchronize();
    //错误类型2
    if (err != cudaSuccess)
    {
        printf("cudaDeviceSynchronize error:%s\n",cudaGetErrorString(err));
        exit(-1);
    }
}

如上面的示例代码，若没有中间的错误检测函数，主程序输出
I am the CPU:Hello World!，主程序并不知道CUDA出错，而增加错误检测后，主程序获知CUDA出错原因，并退出程序，方便调试。
2. 几种检测函数
CUDA有下面几个错误检测函数，其中第三个第四个都是返回最新的error，区别是对于一些可恢复的错误(例如切换参数重新调用一些API)，第三个可以reset成success进行挽救，这样程序可以继续往下执行，第四个获取到error并不修改或者删除它，这样后面的其他错误处理过程还可以继续得到这个错误。
注意，对于一些不可恢复的错误(例如kernel访存原因挂掉，此时CUDA Context将会失效），此时立刻终止是最好的选择，不能reset

__host____device__const char* cudaGetErrorName (cudaError_t error)
//Returns the string representation of an error code enum name
//返回错误的名称

__host____device__const char* cudaGetErrorString (cudaError_t error)
//Returns the description string for an error code
//返回错误的描述

__host____device__cudaError_t cudaGetLastError (void)
//Returns the last error from a runtime call 
//返回最新的error

__host____device__cudaError_t cudaPeekAtLastError(void)
//Returns the last error from a runtime call
//返回最新的error

使用方法示例
下面是一个使用方法示例，需要时直接调用CHECK即可。注意，该宏检测到错误后打印错误信息就退出了，因此若想reset上述的可恢复错误，不能直接调用。

//这里是一个预处理命令，作用是确保当前文件在一个编译单元中不被重复包含
#pragma once
#include <stdio.h>
#define CHECK(call)
do                                                                          \
{																			\
    const cudaError_t error_code = call;									\
    if (error_code != cudaSuccess)											\
    {																		\
        printf("CUDA Error:\n");											\
        printf("    File:  %s\n",__FILE__);									\
        printf("    Line:  %d\n",__LINE__);									\
        printf("    Eror code:  %d\n",error_code);							\
        printf("    Eror text:  %s\n",cudaGetErrorString(error_code));		\
        exit(1);															\
    }																		\
} while (0);

CHECK(cudaMemcpy(d_b, h_b, sizeof(int)*n*k,cudaMemcpyHostToDevice));

CUDA中的事件

使用CUDA时，通常我们希望对程序进行计时，以便对比加速前后效果，虽然CPU计时器也可以使用，但由于GPU异步执行，计时可能不准确，因此最好使用CUDA event。
CUDA event本质是一个GPU时间戳，这个时间戳是在用户指定的位置记录。由于GPU本身支持记录时间戳，因此就避免了当使用CPU定时器来统计GPU执行时间时可能遇到的诸多问题，结果更准确。

常用函数

__host__cudaError_t cudaEventCreate(cudaEvent_t* event)
//Creates an event object.
__host__device__cudaError_t  cudaEventCreateWithFlags(cudaEvent_t* event, unsigned int flags)
//Creates an event object with the specified flags.
__host__device__cudaError_t  cudaEventDestroy(cudaEvent_t event)
//Destroys an event object.
__host__cudaError_t cudaEventElapsedTime(float* ms, cudaEvent_t start, cudaEvent_t end)
//Computes the elapsed time between events.
__host__cudaError_t cudaEventQuery(cudaEvent_t event)
//Queries an event's status.
__host__device__cudaError_t  cudaEventRecord(cudaEvent_t event, cudaStream_t stream=0)
//Records an event.
__host__cudaError_t cudaEventRecordWithFlags(cudaEvent_t event, cudaStream_t stream=0, unsigned int flags=0)
//Records an event.
__host__cudaError_t cudaEventSynchronize(cudaEvent_t event)
//Waits for an event to complete.

使用方法
CUDA的event使用通常有以下步骤。

声明:
cudaEvent_t event;
创建:
cudaEvent_t cudaEventCreate(cudaEvent_t * event);
销毁:
cudaEvent_t cudaEventDestroy(cudaEvent_t event);
添加事件到当前执行流:
cudastream相当于一条条执行队列，比如调用一次CUDA通常有数据拷贝，核函数调用，结果返回，这样一个完整的队列可以成为一个stream，可以将event插入到不同的流中。
cudaEvent_t cudaEventRecord(cudaEvent_t event, cudaStream_t stream=0);
cudaEventRecord()视为一条记录当前时间的语句，并且把这条语句放入GPU的未完成队列中。因为直到GPU执行完了在调用cudaEventRecord()之前的所有语句时，事件才会被记录下来。且仅当GPU完成了之前的工作并且记录了stop事件后，才能安全地读取stop时间值。
等待事件完成，设立flag:
cudaEvent_t cudaEventSynchronize(cudaEvent_t event);//阻塞
cudaError_t cudaEventQuery(cudaEvent_t event);//非阻塞
阻塞是调用可能被卡住，不能继续往下执行，直到特定的状态满足才能继续（例如事件已经完成记录），非阻塞是指调用者查询了一下event的状态（完成还是未完成），不会卡住影响执行。
对两个已经发生/完成的事件的时刻值start和end做差，得到时间间隔。
cudaError_t cudaEventElapsedTime(float* ms, cudaEvent_t start, cudaEvent_t stop);