基于Oneclass的上采样方法对比

One-class算法有编码和解码两个过程。在解码阶段,需要将feature map 重新变成图像。

One-class算法用的上采样方法是ConvTranspose2d,公司想对One-class进行改进,就让我调研可不可以将ConvTranspose2d用其他的上采样方法替换。

我就去调研了一下,发现上采样方法有这几种。

上采样方法介绍:
1.转置卷积
2.Pixelshuffle
3.Upsample
4.UpsamplingNearest2d
5.UpsamplingBilinear2d

转置卷积
调用方法:nn.ConvTranspose2d(inner_ndf, cngf, 4, 2, 1, bias=False)
参数:输入的通道数,输出的通道数,卷积核大小,步长,填充
torch.Size([4, 1536, 4, 4])通过该层变成torch.Size([4, 768, 8, 8])
out=(n−1)×stride−2×padding+k
(4-1)*2-2*1+4=6-2+4=8
优点:可以长宽按不同系数放大

Pixelshuffle
调用方法:nn.PixelShuffle(scale_factor)
参数:scales 放大系数。
缺点:长宽放大系数需保持一致。

Upsample
调用方法:nn.Upsample(scale_factor, mode)
参数:放大系数和插值方式。
缺点:缺点:长宽放大系数需保持一致。

UpsamplingNearest2d 
调用方法:nn.UpsamplingNearest2d(scale_factor)
参数:scales 放大系数。
缺点:长宽放大系数需保持一致

UpsamplingBilinear2d
调用方法:nn.UpsamplingBilinear2d(scale_factor)
参数:scales 放大系数。
缺点:长宽放大系数需保持一致

在调研完方法后,我就开始替换跑模型了。

输入256*256

输出256*256

原始网络如下:

NetG(
  (encoder): Encoder(
    (main): Sequential(
      (initial-conv-3-48): Conv2d(3, 48, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (initial-relu-48): LeakyReLU(negative_slope=0.2, inplace)
      (pyramid-48-96-conv): Conv2d(48, 96, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (pyramid-96-batchnorm): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (pyramid-96-relu): LeakyReLU(negative_slope=0.2, inplace)
      (pyramid-96-192-conv): Conv2d(96, 192, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (pyramid-192-batchnorm): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (pyramid-192-relu): LeakyReLU(negative_slope=0.2, inplace)
      (pyramid-192-384-conv): Conv2d(192, 384, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (pyramid-384-batchnorm): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (pyramid-384-relu): LeakyReLU(negative_slope=0.2, inplace)
      (pyramid-384-768-conv): Conv2d(384, 768, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (pyramid-768-batchnorm): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (pyramid-768-relu): LeakyReLU(negative_slope=0.2, inplace)
      (pyramid-768-1536-conv): Conv2d(768, 1536, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (pyramid-1536-batchnorm): BatchNorm2d(1536, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (pyramid-1536-relu): LeakyReLU(negative_slope=0.2, inplace)
    )
  )
  (decoder): Decoder(
    (main): Sequential(
      (initial-1536-768-convt): ConvTranspose2d(1536, 768, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (initial-768-batchnorm): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (initial-768-relu): ReLU(inplace)
      (pyramid-768-384-convt): ConvTranspose2d(768, 384, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (pyramid-384-relu): ReLU(inplace)
      (pyramid-384-192-convt): ConvTranspose2d(384, 192, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (pyramid-192-relu): ReLU(inplace)
      (pyramid-192-96-convt): ConvTranspose2d(192, 96, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (pyramid-96-relu): ReLU(inplace)
      (pyramid-96-48-convt): ConvTranspose2d(96, 48, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (pyramid-48-relu): ReLU(inplace)
      (final-48-3-convt): ConvTranspose2d(48, 3, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (final-3-sigmoid): Sigmoid()
    )
  )
)

使用upsample的网络结构:

NetG(
  (encoder): Encoder(
    (main): Sequential(
      (initial-conv-3-48): Conv2d(3, 48, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (initial-relu-48): LeakyReLU(negative_slope=0.2, inplace)
      (pyramid-48-96-conv): Conv2d(48, 96, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (pyramid-96-batchnorm): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (pyramid-96-relu): LeakyReLU(negative_slope=0.2, inplace)
      (pyramid-96-192-conv): Conv2d(96, 192, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (pyramid-192-batchnorm): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (pyramid-192-relu): LeakyReLU(negative_slope=0.2, inplace)
      (pyramid-192-384-conv): Conv2d(192, 384, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (pyramid-384-batchnorm): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (pyramid-384-relu): LeakyReLU(negative_slope=0.2, inplace)
      (pyramid-384-768-conv): Conv2d(384, 768, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (pyramid-768-batchnorm): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (pyramid-768-relu): LeakyReLU(negative_slope=0.2, inplace)
      (pyramid-768-1536-conv): Conv2d(768, 1536, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (pyramid-1536-batchnorm): BatchNorm2d(1536, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (pyramid-1536-relu): LeakyReLU(negative_slope=0.2, inplace)
    )
  )
  (decoder): Decoder(
    (main): Sequential(
      (initial-1536-768-convt): Conv2d(1536, 768, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (initial-768-Upsample): Upsample(scale_factor=2, mode=nearest)
      (initial-768-batchnorm): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (initial-768-relu): ReLU(inplace)
      (pyramid-768-384-convt): Conv2d(768, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (pyramid-768-Upsample): Upsample(scale_factor=2, mode=nearest)
      (pyramid-384-relu): ReLU(inplace)
      (pyramid-384-192-convt): Conv2d(384, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (pyramid-384-Upsample): Upsample(scale_factor=2, mode=nearest)
      (pyramid-192-relu): ReLU(inplace)
      (pyramid-192-96-convt): Conv2d(192, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (pyramid-192-Upsample): Upsample(scale_factor=2, mode=nearest)
      (pyramid-96-relu): ReLU(inplace)
      (pyramid-96-48-convt): Conv2d(96, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (pyramid-96-Upsample): Upsample(scale_factor=2, mode=nearest)
      (pyramid-48-relu): ReLU(inplace)
      (final-48-3-convt): Conv2d(48, 3, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (final-48-Upsample): Upsample(scale_factor=2, mode=nearest)
      (final-3-3-convt): Conv2d(3, 3, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (final-3-sigmoid): Sigmoid()
    )
  )
)

使用pixelshuffle的网络结构:

NetG(
  (encoder): Encoder(
    (main): Sequential(
      (initial-conv-3-48): Conv2d(3, 48, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (initial-relu-48): LeakyReLU(negative_slope=0.2, inplace)
      (pyramid-48-96-conv): Conv2d(48, 96, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (pyramid-96-batchnorm): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (pyramid-96-relu): LeakyReLU(negative_slope=0.2, inplace)
      (pyramid-96-192-conv): Conv2d(96, 192, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (pyramid-192-batchnorm): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (pyramid-192-relu): LeakyReLU(negative_slope=0.2, inplace)
      (pyramid-192-384-conv): Conv2d(192, 384, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (pyramid-384-batchnorm): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (pyramid-384-relu): LeakyReLU(negative_slope=0.2, inplace)
      (pyramid-384-768-conv): Conv2d(384, 768, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (pyramid-768-batchnorm): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (pyramid-768-relu): LeakyReLU(negative_slope=0.2, inplace)
      (pyramid-768-1536-conv): Conv2d(768, 1536, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
      (pyramid-1536-batchnorm): BatchNorm2d(1536, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (pyramid-1536-relu): LeakyReLU(negative_slope=0.2, inplace)
    )
  )
  (decoder): Decoder(
    (main): Sequential(
      (initial-1536-768-convt): Conv2d(1536, 3072, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (initial-768-PixelShuffle): PixelShuffle(upscale_factor=2)
      (initial-768-batchnorm): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (initial-768-relu): ReLU(inplace)
      (pyramid-768-384-convt): Conv2d(768, 1536, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (pyramid-768-PixelShuffle): PixelShuffle(upscale_factor=2)
      (pyramid-384-relu): ReLU(inplace)
      (pyramid-384-192-convt): Conv2d(384, 768, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (pyramid-384-PixelShuffle): PixelShuffle(upscale_factor=2)
      (pyramid-192-relu): ReLU(inplace)
      (pyramid-192-96-convt): Conv2d(192, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (pyramid-192-PixelShuffle): PixelShuffle(upscale_factor=2)
      (pyramid-96-relu): ReLU(inplace)
      (pyramid-96-48-convt): Conv2d(96, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (pyramid-96-PixelShuffle): PixelShuffle(upscale_factor=2)
      (pyramid-48-relu): ReLU(inplace)
      (final-48-12-convt): Conv2d(48, 12, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (final-48-PixelShuffle): PixelShuffle(upscale_factor=2)
      (final-3-sigmoid): Sigmoid()
    )
  )
)

实验结果发现:

下面两种方法upsample和pixelshuffle都不如ConvTranspose2d。

....................................

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值