池化层详解

卷积神经网络

卷积神经网络(Convolutional Neural Network,CNN)是一种深度学习模型,主要用于图像识别、语音识别和自然语言处理等领域。CNN 通过卷积层、池化层和全连接层等组件构成。

池化层可以减小特征图的尺寸,并且保留图像的主要特征信息。常见的池化方式有最大池化(Max Pooling)和平均池化(Average Pooling)。

CNN 通常使用反向传播算法进行训练,通过优化损失函数,不断调整网络参数,使其能够更好地拟合训练数据。在实际应用中,CNN 已经取得了许多优秀的成果,例如在图像分类、物体检测和语音识别等领域中,CNN 已经成为了一种主流的模型。

池化层

本文主要讲MaxPool2d

主要的数学公式:

对由多个输入平面组成的输入信号应用 2D 最大池化

最简单的情况下,具有输入大小的层的输出值(N,C,H,W), output (N,C,Hout​,Wout​) and kernel_size (kH,kW)可以精确地描述为:

o u t ( N i , C j , h , w ) = max ⁡ m = 0 , … , k H − 1 max ⁡ n = 0 , … , k W − 1 i n p u t ( N i , C j , s t r i d e [ 0 ] × h + m , s t r i d e [ 1 ] × w + n ) \begin{aligned} out(N_{i},C_{j},h,w)& =\max_{m=0,\ldots,kH-1}\max_{n=0,\ldots,kW-1} \\ &\mathrm{input}(N_i,C_j,\mathrm{stride}[0]\times h+m,\mathrm{stride}[1]\times w+n) \end{aligned} out(Ni,Cj,h,w)=m=0,,kH1maxn=0,,kW1maxinput(Ni,Cj,stride[0]×h+m,stride[1]×w+n)

如果填充非零,则输入将在两侧隐式填充负无穷大,以填充点数。膨胀控制核点之间的间距。这很难描述,但是这个链接很好地展示了膨胀的作用。

参数:

框架可设置参数:

torch.nn.MaxPool2d(kernel_size, 
                   stride=None, 
                   padding=0, 
                   dilation=1, 
                   return_indices=False, 
                   ceil_mode=False)

torch.nn.MaxPool2d是一个池化层,它可以将输入的二维图像张量进行下采样操作。这个函数的输出是一个张量,它的形状取决于输入张量的形状、池化窗口的大小、步长、padding和dilation等参数。如果return_indices为True,则输出还将返回最大值的索引。

参数含义:

参数名称含义
kernel_size池化窗口的大小。可以是int,表示正方形窗口;也可以是tuple,表示矩形窗口,如(2, 3)。
stride池化窗口在水平和垂直方向上的步长。可以是int,表示水平和垂直方向上相同的步长;也可以是tuple,表示水平和垂直方向上不同的步长,如(2, 3)。
padding在输入张量的边缘填充0的层数。可以是int,表示在四个方向上填充相同数量的0;也可以是tuple,表示在不同方向上填充不同数量的0,如(1, 2)。
dilation控制kernel中各个元素之间的空间跨度。可以是int,表示各个元素之间的空间跨度相同;也可以是tuple,表示各个元素之间的空间跨度不同,如(2, 3)。
return_indices是否返回最大值的索引。
ceil_mode为True时,输出大小将向上取整;为False时,输出大小将向下取整。
示例代码:
import torch
import torchvision
from torch import nn
from torch.nn import  MaxPool2d
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter

dataset = torchvision.datasets.CIFAR10("../data",
                                       train=False,
                                       transform=torchvision.transforms.ToTensor(),
                                       download=True)

dataloader = DataLoader(dataset, batch_size=64)

input = torch.tensor([[1, 2, 0, 3, 1],
                     [0, 1, 2, 3, 1],
                     [1, 2, 1, 0, 0],
                     [5, 2, 3, 1, 1],
                     [2, 1, 0, 1, 1]], dtype=torch.float32)

input = torch.reshape(input, (-1, 1, 5, 5))

class Touch(nn.Module):
    def __init__(self):
        super(Touch, self).__init__()
        self.maxpool1 = MaxPool2d(kernel_size=3, ceil_mode=False)

    def forward(self, input):
        output = self.maxpool1(input)
        return output



touch = Touch()

writer = SummaryWriter("../../logs")

step = 0
for data in dataloader:
    imgs, targets = data
    writer.add_images("input", imgs, step)
    output = touch(imgs)
    writer.add_images("output", output, step)

    step = step + 1

writer.close()

将输出的日志文件传入tensorboard模块进行可视化展示:

tensorboard --logdir=logs

可视化展示:

在这里插入图片描述

框架源码:
class MaxPool2d(_MaxPoolNd):
    r"""Applies a 2D max pooling over an input signal composed of several input planes.

    In the simplest case, the output value of the layer with input size :math:`(N, C, H, W)`,
    output :math:`(N, C, H_{out}, W_{out})` and :attr:`kernel_size` :math:`(kH, kW)`
    can be precisely described as:

    .. math::
        \begin{aligned}
            out(N_i, C_j, h, w) ={} & \max_{m=0, \ldots, kH-1} \max_{n=0, \ldots, kW-1} \\
                                    & \text{input}(N_i, C_j, \text{stride[0]} \times h + m,
                                                   \text{stride[1]} \times w + n)
        \end{aligned}

    If :attr:`padding` is non-zero, then the input is implicitly padded with negative infinity on both sides
    for :attr:`padding` number of points. :attr:`dilation` controls the spacing between the kernel points.
    It is harder to describe, but this `link`_ has a nice visualization of what :attr:`dilation` does.

    Note:
        When ceil_mode=True, sliding windows are allowed to go off-bounds if they start within the left padding
        or the input. Sliding windows that would start in the right padded region are ignored.

    The parameters :attr:`kernel_size`, :attr:`stride`, :attr:`padding`, :attr:`dilation` can either be:

        - a single ``int`` -- in which case the same value is used for the height and width dimension
        - a ``tuple`` of two ints -- in which case, the first `int` is used for the height dimension,
          and the second `int` for the width dimension

    Args:
        kernel_size: the size of the window to take a max over
        stride: the stride of the window. Default value is :attr:`kernel_size`
        padding: Implicit negative infinity padding to be added on both sides
        dilation: a parameter that controls the stride of elements in the window
        return_indices: if ``True``, will return the max indices along with the outputs.
                        Useful for :class:`torch.nn.MaxUnpool2d` later
        ceil_mode: when True, will use `ceil` instead of `floor` to compute the output shape

    Shape:
        - Input: :math:`(N, C, H_{in}, W_{in})` or :math:`(C, H_{in}, W_{in})`
        - Output: :math:`(N, C, H_{out}, W_{out})` or :math:`(C, H_{out}, W_{out})`, where

          .. math::
              H_{out} = \left\lfloor\frac{H_{in} + 2 * \text{padding[0]} - \text{dilation[0]}
                    \times (\text{kernel\_size[0]} - 1) - 1}{\text{stride[0]}} + 1\right\rfloor

          .. math::
              W_{out} = \left\lfloor\frac{W_{in} + 2 * \text{padding[1]} - \text{dilation[1]}
                    \times (\text{kernel\_size[1]} - 1) - 1}{\text{stride[1]}} + 1\right\rfloor

    Examples::

        >>> # pool of square window of size=3, stride=2
        >>> m = nn.MaxPool2d(3, stride=2)
        >>> # pool of non-square window
        >>> m = nn.MaxPool2d((3, 2), stride=(2, 1))
        >>> input = torch.randn(20, 16, 50, 32)
        >>> output = m(input)

    .. _link:
        https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md
    """

    kernel_size: _size_2_t
    stride: _size_2_t
    padding: _size_2_t
    dilation: _size_2_t

    def forward(self, input: Tensor):
        return F.max_pool2d(input, self.kernel_size, self.stride,
                            self.padding, self.dilation, ceil_mode=self.ceil_mode,
                            return_indices=self.return_indices)

d(input, self.kernel_size, self.stride,
self.padding, self.dilation, ceil_mode=self.ceil_mode,
return_indices=self.return_indices)

  • 34
    点赞
  • 45
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值