神经网络中转置卷积上采样与反最大池化上采样的对比

最新推荐文章于 2024-01-18 08:00:00 发布

haohulala

最新推荐文章于 2024-01-18 08:00:00 发布

阅读量808

点赞数 1

分类专栏：计算机视觉

本文链接：https://blog.csdn.net/haohulala/article/details/107547588

版权

计算机视觉专栏收录该内容

36 篇文章 16 订阅

订阅专栏

目前，CNN卷积神经网络中，图像上采样通常有两种方法，分别是FCN网络使用的转置卷积和segnet中使用的反最大池化。
我们首先来看转置卷积的上采样方法，关于转置卷积的原理，可以看下面这篇文章
https://blog.csdn.net/lanadeus/article/details/82534425
我们知道，卷积网络输入输出尺寸的关系如下
$n_{out} = \frac{n_{in}-kernel+2*padding} {stride}+1$
我们将$ n_{in} $和$ n_{out} $的位置调换一下，就可以得到转置卷积输入输出的尺寸关系
$n_{in} = \frac{n_{out}-kernel+2*padding} {stride}+1$
化简后得到
$n_{out} = stride*(n_{in}-1)+kernel-2*padding$
下面我们用代码实际感受一下转置卷积的作用

import torch
from torch import nn
import numpy as np
import torch.nn.functional as f
from torch.autograd import Variable

a = np.random.normal(size=(1, 1, 4, 4))
print(a.shape)
print(a)

(1, 1, 4, 4)
[[[[-0.6183562   1.08084338 -0.37976044  1.72667035]
   [ 0.93381504 -0.01241052  0.76949068  1.27797714]
   [-0.89348575  0.533843   -1.38686784  0.47704014]
   [ 0.8346226  -1.00254449  0.36778208  0.28855805]]]]

先使用卷积将矩阵缩小

conv = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, stride=1, padding=0)
b = torch.from_numpy(a).float()
output = conv(Variable(b))
print(output.shape)
print(output)

torch.Size([1, 1, 2, 2])
tensor([[[[ 0.3280,  0.0288],
          [-0.6246,  0.6161]]]], grad_fn=<ThnnConv2DBackward>)

我们发现，矩阵经过卷积后缩小为2x2，下面我们要用转置卷积将矩阵恢复，由
$n_{out} = stride*(n_{in}-1)+kernel-2*padding$
我们知道，想要输出是输入的两倍，我们可以取kernel=4，padding=1，stride=2

conv_trans = nn.ConvTranspose2d(1, 1, kernel_size=4, stride=2, padding=1)
output_trans = conv_trans(output)
print(output_trans.shape)
print("使用转置卷积前的结果")
print(output)
print("使用转置卷积后的结果")
print(output_trans)
print("原始的矩阵")
print(b)

torch.Size([1, 1, 4, 4])
使用转置卷积前的结果
tensor([[[[ 0.3280,  0.0288],
          [-0.6246,  0.6161]]]], grad_fn=<ThnnConv2DBackward>)
使用转置卷积后的结果
tensor([[[[-0.1257, -0.1235, -0.1065, -0.1172],
          [-0.0543,  0.1394, -0.2212, -0.2056],
          [-0.1654, -0.2676, -0.1478, -0.1226],
          [ 0.0318, -0.2095, -0.1181, -0.0288]]]],
       grad_fn=<ThnnConvTranspose2DBackward>)
原始的矩阵
tensor([[[[-0.6184,  1.0808, -0.3798,  1.7267],
          [ 0.9338, -0.0124,  0.7695,  1.2780],
          [-0.8935,  0.5338, -1.3869,  0.4770],
          [ 0.8346, -1.0025,  0.3678,  0.2886]]]])

我们可以发现，虽然大小和原矩阵一样，但是矩阵内的值还是有变化的，下面我们来看看反最大池化上采样

array, idx = f.max_pool2d(Variable(b), kernel_size=2, stride=2, return_indices=True)
print(array)
print(idx)

tensor([[[[1.0808, 1.7267],
          [0.8346, 0.4770]]]])
tensor([[[[ 1,  3],
          [12, 11]]]])

可以发现，通过f.max_pool2d得到了两个张量，一个最大池化的结果，另一个记录了最大池化得到的值的索引。

output_unpool = f.max_unpool2d(array, idx, kernel_size=2, stride=2)
print(output_unpool.shape)
print("反最大池化得到的结果")
print(output_unpool)
print("转置卷积得到的结果")
print(output_trans)
print("原始矩阵")
print(b)

torch.Size([1, 1, 4, 4])
反最大池化得到的结果
tensor([[[[0.0000, 1.0808, 0.0000, 1.7267],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.4770],
          [0.8346, 0.0000, 0.0000, 0.0000]]]])
转置卷积得到的结果
tensor([[[[-0.1257, -0.1235, -0.1065, -0.1172],
          [-0.0543,  0.1394, -0.2212, -0.2056],
          [-0.1654, -0.2676, -0.1478, -0.1226],
          [ 0.0318, -0.2095, -0.1181, -0.0288]]]],
       grad_fn=<ThnnConvTranspose2DBackward>)
原始矩阵
tensor([[[[-0.6184,  1.0808, -0.3798,  1.7267],
          [ 0.9338, -0.0124,  0.7695,  1.2780],
          [-0.8935,  0.5338, -1.3869,  0.4770],
          [ 0.8346, -1.0025,  0.3678,  0.2886]]]])

从上面的例子我们就可以看出两种上采样方式的区别，他们的都可以将矩阵恢复原来的形状，但是不能保证得到的矩阵和原始矩阵完全相同

haohulala

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
2
评论
神经网络中转置卷积上采样与反最大池化上采样的对比

目前，CNN卷积神经网络中，图像上采样通常有两种方法，分别是FCN网络使用的转置卷积和segnet中使用的反最大池化。我们首先来看转置卷积的上采样方法，关于转置卷积的原理，可以看下面这篇文章https://blog.csdn.net/lanadeus/article/details/82534425我们知道，卷积网络输入输出尺寸的关系如下nout=nin−kernel+2∗paddingstride+1 n_{out} = \frac{n_{in}-kernel+2*padding} {stride
复制链接

扫一扫

专栏目录