PixelShuffle详解和cuda实现

最新推荐文章于 2024-11-09 19:17:54 发布

BigerBang

最新推荐文章于 2024-11-09 19:17:54 发布

阅读量4.9k

点赞数 5

分类专栏：超高清视频处理技术文章标签：深度学习计算机视觉卷积神经网络 cuda

本文链接：https://blog.csdn.net/BigerBang/article/details/108551305

版权

超高清视频处理技术专栏收录该内容

10 篇文章 7 订阅

订阅专栏

1.背景

1.1 PixelShuffle的出处

PixelShuffle这一操作出自论文Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network, 论文中称为periodic shuffling operator。Pytorch将其实现为

torch.nn.PixleShuffle(upscale_factor)

1.2为什么要用cuda实现

在将包含PixelShuffle的模型向trt转换时，如果特征图很大的时候，例如 $(1, 64 * 4, 1088, 1920)$ 向 $(1, 64, 1088 * 2, 1920 * 2)$ 转换时，会出现out of memory(显卡的显存是32G)。因此，只能单独拿出来用cuda实现；
用cuda实现了发现，会比pytorch的要快一点；

2.什么是PixelShuffle

首先贴一下论文中的图：
在这里插入图片描述
图中彩色部分从 $r^2$ channels－>High-resolution image的示意过程即为PixelShuffle；
PixelShuffle可以看成一个特殊的reshape操作，其通过从通道维度向长宽维度搬移像素，实现上采样，因此可用于SR等需要将特征图放大的task；

input shape
$(N, C * r * r, H, W)$
output shape
$(N, C, r * H, r * W)$
其中r是想要放大的倍数；

3. PixelShuffle cuda实现

首先看一下论文中对此操作的公式定义：

其中y和x是结果在rH,rＷ维度上的坐标，对应的是原特征图上的 y%r 和 x%r，这很好理解；
但是这个公示的c的表达式，只有在C=1的时候是正确的吧？我是没太看懂，希望有看懂的同学指点一下；
由于照着论文公式实现有问题，我就自己总结了公式如下：
$ps(T)_{h, w, c} = T_{{\lfloor h/r \rfloor}, {\lfloor w/r \rfloor}, c*r^2+r*{\lfloor h/r \rfloor}+{\lfloor w/r \rfloor}}$
cuda代码如下：

__global__ void pixel_shuffle_kernel(const half *x, half *z, int r, int w, int h, int c, int input_c_stride, int input_h_stride, int output_c_stride, int output_h_stride)
{
    const int w_i = blockIdx.x * blockDim.x + threadIdx.x;
	const int h_i = blockIdx.y * blockDim.y + threadIdx.y;
	const int c_i = blockIdx.z * blockDim.z + threadIdx.z;
    const bool withinXbounds = w_i < w;
	const bool withinYbounds = h_i < h;
    const bool withinCbounds = c_i < c;
    if(withinXbounds && withinYbounds && withinCbounds){
        long ic = r*(h_i%r) + (w_i%r) + c_i*r*r;
        long iw = w_i/r;
        long ih = h_i/r;
        long index = 0+ic*(long)input_c_stride+ih*(long)input_h_stride+(long)iw;
        z[0+c_i*output_c_stride+h_i*output_h_stride+w_i] = x[index];
    }
}

经验证，此kernel的结果和torch.nn.PixleShuffle结果一致。