整个代码分为以下几个步骤
- 获取gpu info
- 创建tensor descriptor
- 为x创建unified memory,并进行初始化
- 创建sigmod 的active descriptor
- 执行active forward,也就是sigmod计算
- 销毁资源
#include <iostream>
#include <cuda_runtime.h>
#include <cudnn.h>
/**
* Minimal example to apply sigmoid activation on a tensor
* using cuDNN.
**/
int main(int argc, char** argv)
{
// get gpu info
int numGPUs;
cudaGetDeviceCount(&numGPUs);
std::cout << "Found " << numGPUs << " GPUs." << std::endl;
cudaSetDevice(0); // use GPU0
int device;
struct cudaDeviceProp devProp;
cudaGetDevice(&device);
cudaGetDeviceProperties(&devProp, device);
std::cout << "Compute capability:" << devProp.major << "." << devProp.minor << std::endl;
cudnnHandle_t handle_;
cudnnCreate(&handle_);
std::cout << "Created cuDNN handle" << std::endl;
// create the tensor descriptor
cudnnDataType_t dtype = CUDNN_DATA_FLOAT;
cudnnTensorFormat_t format = CUDNN_TENSOR_NCHW;
int n = 1, c = 1, h = 1, w = 10;
int NUM_ELEMENTS = n*c*h*w;
cudnnTensorDescriptor_t x_desc;
cudnnCreateTensorDescriptor(&x_desc);
cudnnSetTensor4dDescriptor(x_desc, format, dtype, n, c, h, w);
// create the tensor
float *x;
// 创建 Unified Memory,这样cpu和memory都可以使用
cudaMallocManaged(&x, NUM_ELEMENTS * sizeof(float));
for(int i=0;i<NUM_ELEMENTS;i++) x[i] = i * 1.00f;
std::cout << "Original array: ";
for(int i=0;i<NUM_ELEMENTS;i++) std::cout << x[i] << " ";
// create activation function descriptor
float alpha[1] = {1};
float beta[1] = {0.0};
cudnnActivationDescriptor_t sigmoid_activation;
cudnnActivationMode_t mode = CUDNN_ACTIVATION_SIGMOID;
cudnnNanPropagation_t prop = CUDNN_NOT_PROPAGATE_NAN;
cudnnCreateActivationDescriptor(&sigmoid_activation);
cudnnSetActivationDescriptor(sigmoid_activation, mode, prop, 0.0f);
cudnnActivationForward(
handle_,
sigmoid_activation,
alpha,
x_desc,
x,
beta,
x_desc,
x
);
cudnnDestroy(handle_);
std::cout << std::endl << "Destroyed cuDNN handle." << std::endl;
std::cout << "New array: ";
for(int i=0;i<NUM_ELEMENTS;i++) std::cout << x[i] << " ";
std::cout << std::endl;
cudaFree(x);
return 0;
}
cudnnActivationDescriptor_t
cudnnActivationDescriptor_t是指向保存激活操作描述的不透明结构的指针。cudnnCreateActivationDescriptor()用于创建一个实例,而cudnnSetActivationDescriptor()必须用于初始化此实例。
cudnnCreateActivationDescriptor()
此函数通过分配保存其不透明结构所需的内存来创建激活描述符对象。有关详细信息,请参阅cudnnActivationDescriptor_t。
udnnStatus_t cudnnCreateActivationDescriptor(
cudnnActivationDescriptor_t *activationDesc)
cudnnSetActivationDescriptor()
此函数初始化先前创建的通用激活函数对象。
cudnnStatus_t cudnnSetActivationDescriptor(
cudnnActivationDescriptor_t activationDesc,
cudnnActivationMode_t mode,
cudnnNanPropagation_t reluNanOpt,
double coef)
activationDesc
输入/输出。以前创建的激活描述符的句柄。
mode
输入。枚举以指定激活模式。
reluNanOpt
输入。枚举以指定Nan传播模式。
coef
输入。浮点数。当激活模式(参见cudnnActivationMode_t)设置为CUDNN_ACTIVATION_CLIPPED_RELU时,此输入指定削波阈值;并且当激活模式被设置为CUDNN_ACTIVATION_RELU时,该输入指定上限。
cudnnActivationMode_t
CUDNN_ACTIVATION_SIGMOID
Selects the sigmoid function.
CUDNN_ACTIVATION_RELU
Selects the rectified linear function.
CUDNN_ACTIVATION_TANH
Selects the hyperbolic tangent function.
CUDNN_ACTIVATION_CLIPPED_RELU
Selects the clipped rectified linear function.
CUDNN_ACTIVATION_ELU
Selects the exponential linear function.
CUDNN_ACTIVATION_IDENTITY
Selects the identity function, intended for bypassing the activation step in cudnnConvolutionBiasActivationForward() . (The cudnnConvolutionBiasActivationForward() function must use CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM .) Does not work with cudnnActivationForward() or cudnnActivationBackward() .
CUDNN_ACTIVATION_SWISH
Selects the swish function.