PyTorch之PixelShuffle

在这里插入图片描述
这里介绍一下PyTorch中torch.nn.PixelShuffle()这个层的背景介绍和相关用法。

参考文档:
PyTorch中的PixelShuffle

1 背景介绍

PixelShuffle层又名亚像素卷积层,是论文Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network中介绍的一种应用于超分辨率重建应用的具有上采样功能的卷积层。这篇ESPCN论文介绍了这种层的功能,sub-pixel convolution layer以 s t r i d e = 1 r stride=\frac{1}{r} stride=r1( r r r为SR的放大倍数upscaling factor)去提取feature map,虽然称之为卷积,但是其并没用用到任何需要学习的参数,它的原理也很简单,就是将输入feature map进行像素重组,也就是说亚像素卷积层虽用卷积之名,但却没有做任何乘加计算,只是用了另一种方式去提取特征罢了:
在这里插入图片描述
如上图所示的最后一层就是亚像素卷积层,它就是将输入格式为 ( b a t c h , r 2 C , H , W ) (batch,r^2C, H, W) (batch,r2C,H,W)的feature map中同一通道的像素提取出来作为输出feature map的一小块,遍历整个输入feature map就可以得到最后的输出图像。整体来看,就好像是用 1 r \frac{1}{r} r1的步长去做卷积一样,这样就造成了不是对整像素点做卷积,而是对亚像素做卷积,故称之为亚像素卷积层,最后的输出格式就是 ( b a t c h , 1 , r H , r W ) (batch,1, rH,rW) (batch,1,rH,rW)
Note:

  1. 想要了解更多关于ESPCN网络或者亚像素卷积层的相关知识,请看我的另一篇论文笔记之ESPCN
  2. 亚像素卷积是一个隐式卷积的过程,其不含任何可学习参数

因此,简单一句话,PixelShuffle层做的事情就是将输入feature map像素重组输出高分辨率的feature map,是一种上采样方法,具体表达为:
( b a t c h , r 2 C , H , W ) → ( b a t c h , 1 , r H , r W ) (batch,r^2C, H, W)\to (batch,1, rH,rW) (batch,r2C,H,W)(batch,1,rH,rW)其中 r r r为上采样倍率(上图中 r = 3 r=3 r=3)。

2 用法简介

我们来看看PyTorch是如何将ESPCN论文的核心——亚像素采样层编写出来的:
调用格式,该层的输入就是上采样倍率 r r r

torch.nn.PixelShuffle(upscale_factor=3)

内部源码如下:
在这里插入图片描述
其源码中就是3个方法:
①:init():保存上采样倍率值。
②:forward(),调用F.pixel_shuffle()函数,其实做的就是亚像素卷积层背后的重组过程,其输入就是待重组的输入feature map
③:extra_repr():打印出上采样倍率 r r r的值。

Note:

  1. torch.nn.PixelShuffle()不含任何可学习参数 W , b W,b W,b,和一般的卷积层是不一样的,尽管亚像素卷积也是一种特征提取过程。

2.1 实战代码

接下来我们实际来使用一下:
我们设置上采样倍率 r = 2 r=2 r=2,输入feature map的格式为: ( 1 , 1 , 3 , 3 ) (1, 1, 3,3) (1,1,3,3),模拟一个batch=1,通道数为1,图像高和宽为3的feature map

r = 2  # 上采样倍率
ps = nn.PixelShuffle(r)
x = torch.arange(4*9).view(1, 1*(r**2), 3, 3)
print(f'input is \n {x},and size is \n {x.size()}')
y = ps(x)  # 亚像素采样
print(f'output is \n {y},and size is \n {y.size()}')
print(f'upscale_factor is {ps.extra_repr()}')

2.2 效果展示

最后的输出如下:
在这里插入图片描述
可以很容易看出来,torch.nn.PixelShuffle()描述的就是ESPCN论文中的亚像素采样过程

### Pixel Shuffle in Deep Learning Image Super-Resolution Implementation and Explanation In the context of deep learning-based image super-resolution (SR), pixel shuffle plays a crucial role as an up-sampling method that efficiently increases spatial resolution while maintaining computational efficiency[^1]. The core idea behind pixel shuffle is to rearrange elements from lower-dimensional feature maps into higher-dimensional ones without introducing additional parameters. The operation can be mathematically described by transforming a tensor with shape \([C, H, W]\) where \(C\) represents channels, and \(H\),\(W\) represent height and width respectively; this transformation results in another tensor having dimensions \([\frac{C}{r^{2}}, rH, rW]\). Here, \(r\) denotes scaling factor which determines how much larger we want our output size compared to input size. Below demonstrates Python code implementing PyTorch's `pixel_shuffle` function: ```python import torch.nn as nn class SubPixelConvolution(nn.Module): def __init__(self, num_channels, upscale_factor=2): super(SubPixelConvolution, self).__init__() self.conv = nn.Conv2d(num_channels, num_channels * (upscale_factor ** 2), kernel_size=3, stride=1, padding=1) self.shuffle = nn.PixelShuffle(upscale_factor) def forward(self, x): out = self.conv(x) out = self.shuffle(out) return out ``` This module first applies convolutional layers followed by applying pixel shuffling through `nn.PixelShuffle`. By doing so, it effectively expands low-resolution images into high-resolution counterparts during inference time. Compared to other upsampling techniques like nearest neighbor or bilinear interpolation, pixel shuffle offers better performance because it learns optimal mappings between pixels directly via training data rather than relying on fixed rules. Moreover, since no extra learnable weights are involved after convolutions, memory usage remains minimal throughout processing stages. --related questions-- 1. How does sub-pixel convolution compare against traditional bicubic interpolation methods? 2. Can you explain why using transposed convolutions might lead to checkerboard artifacts when performing SR tasks? 3. What modifications could enhance the effectiveness of pixel shuffle within GAN architectures for generating realistic textures at finer scales?
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值