关于cv::cuda::GpuMat与PtrStepSz

47 篇文章 5 订阅

一般host端用GpuMat,在.cu里定义kernel的caller函数时形参用PtrStepSz(或者PtrStep),kernel函数也可以使用PtrStepSz,从而实现host to device的参数传递

PtrStepSz无非就是GpuMat的阉割版,但仔细看GpuMat的源码(sources\modules\core\src\cuda\gpu_mat.cu),只有 ::data是在device端分配的,即通过cudaMallocPitch分配显存,而其它成员如rows, cols, step都是存放在host端内存。再看GpuMat到PtrStepSz(PtrStep)的类型转换:

template <class T> inline
GpuMat::operator PtrStepSz<T>() const
{
    return PtrStepSz<T>(rows, cols, (T*)data, step);
}

template <class T> inline
GpuMat::operator PtrStep<T>() const
{
    return PtrStep<T>((T*)data, step);
}

PtrStepSz的构造只是通过逐个成员赋值,它们应该也是host端内存,kernel里面怎么能够读到rows, cols, step这些变量的呢?

看看opencv官方的介绍https://docs.opencv.org/2.4/modules/gpu/doc/data_structures.html

Lightweight class encapsulating pitched memory on a GPU and passed to nvcc-compiled code (CUDA kernels). Typically, it is used internally by OpenCV and by users who write device code. You can call its members from both host and device code.

再看PtrStepSz(PtrStep)的定义(include\opencv2\core\cuda_types.hpp):

#ifdef __CUDACC__
    #define __CV_CUDA_HOST_DEVICE__ __host__ __device__ __forceinline__
#else
    #define __CV_CUDA_HOST_DEVICE__
#endif

...

template <typename T> struct PtrStep : public DevPtr<T>
{
    __CV_CUDA_HOST_DEVICE__ PtrStep() : step(0) {}
    __CV_CUDA_HOST_DEVICE__ PtrStep(T* data_, size_t step_) : DevPtr<T>(data_), step(step_) {}

    size_t step;

    __CV_CUDA_HOST_DEVICE__       T* ptr(int y = 0)       { return (      T*)( (      char*)DevPtr<T>::data + y * step); }
    __CV_CUDA_HOST_DEVICE__ const T* ptr(int y = 0) const { return (const T*)( (const char*)DevPtr<T>::data + y * step); }

    __CV_CUDA_HOST_DEVICE__       T& operator ()(int y, int x)       { return ptr(y)[x]; }
    __CV_CUDA_HOST_DEVICE__ const T& operator ()(int y, int x) const { return ptr(y)[x]; }
};

template <typename T> struct PtrStepSz : public PtrStep<T>
{
    __CV_CUDA_HOST_DEVICE__ PtrStepSz() : cols(0), rows(0) {}
    __CV_CUDA_HOST_DEVICE__ PtrStepSz(int rows_, int cols_, T* data_, size_t step_)
        : PtrStep<T>(data_, step_), cols(cols_), rows(rows_) {}

    template <typename U>
    explicit PtrStepSz(const PtrStepSz<U>& d) : PtrStep<T>((T*)d.data, d.step), cols(d.cols), rows(d.rows){}

    int cols;
    int rows;
};

这里PtrStep的成员函数都是带__host__ __device__ __forceinline__前缀的,代表既能被host端调用,也能被device调用,这会使编译代码时,nvcc(device端代码编译器)和mvsc(host端编译器)都分别编译一份各自的,成员函数在调用时,在host和device端其实执行的是不一样的目标代码,而里面用到的成员变量,如rows, cols, step也会存在两份,host端对应内存,devcie端对应显存,而这些变量的赋值也是编译器自动帮我们做了,具体得看目标代码(待进一步研究和确认)


关于 __host__ __device__

https://stackoverflow.com/questions/33218522/cuda-host-device-variables

https://forums.developer.nvidia.com/t/what-is-host-device/26709

Sometimes the same functionality is needed in both the host and the device portions of CUDA code. To avoid code duplication, CUDA allows such functions to carry both host and device attributes, which means the compiler places one copy of that function into the host compilation flow (to be compiled by the host compiler, e.g. gcc or MSVC), and a second copy into the device compilation flow (to be compiled with NVIDIA’s CUDA compiler).

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#host

The __ device __ and __ host __ execution space specifiers can be used together however, in which case the function is compiled for both the host and the device. The __ CUDA_ARCH __ macro introduced in Application Compatibility can be used to differentiate code paths between host and device:

__host__ __device__ func()
{
#if __CUDA_ARCH__ >= 800
   // Device code path for compute capability 8.x
#elif __CUDA_ARCH__ >= 700
   // Device code path for compute capability 7.x
#elif __CUDA_ARCH__ >= 600
   // Device code path for compute capability 6.x
#elif __CUDA_ARCH__ >= 500
   // Device code path for compute capability 5.x
#elif __CUDA_ARCH__ >= 300
   // Device code path for compute capability 3.x
#elif !defined(__CUDA_ARCH__) 
   // Host code path
#endif
}

在代码里面可以通过宏__CUDA_ARCH__来识别当前运行的是host还是device端代码

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值