池化层详解

最新推荐文章于 2024-08-18 20:03:35 发布

修修爪哇

最新推荐文章于 2024-08-18 20:03:35 发布

阅读量2.5k

点赞数 34

分类专栏：深度学习文章标签： python 深度学习 pytorch 卷积神经网络 rnn pycharm

本文链接：https://blog.csdn.net/ss20211121/article/details/136636722

版权

深度学习专栏收录该内容

6 篇文章 0 订阅

订阅专栏

卷积神经网络

卷积神经网络（Convolutional Neural Network，CNN）是一种深度学习模型，主要用于图像识别、语音识别和自然语言处理等领域。CNN 通过卷积层、池化层和全连接层等组件构成。

池化层可以减小特征图的尺寸，并且保留图像的主要特征信息。常见的池化方式有最大池化（Max Pooling）和平均池化（Average Pooling）。

CNN 通常使用反向传播算法进行训练，通过优化损失函数，不断调整网络参数，使其能够更好地拟合训练数据。在实际应用中，CNN 已经取得了许多优秀的成果，例如在图像分类、物体检测和语音识别等领域中，CNN 已经成为了一种主流的模型。

池化层

本文主要讲MaxPool2d

主要的数学公式：

对由多个输入平面组成的输入信号应用 2D 最大池化

最简单的情况下，具有输入大小的层的输出值(N,C,H,W), output (N,C,Hout,Wout) and kernel_size (kH,kW)可以精确地描述为：

$\begin{aligned} out(N_{i},C_{j},h,w)& =\max_{m=0,\ldots,kH-1}\max_{n=0,\ldots,kW-1} \\ &\mathrm{input}(N_i,C_j,\mathrm{stride}[0]\times h+m,\mathrm{stride}[1]\times w+n) \end{aligned}$

如果填充非零，则输入将在两侧隐式填充负无穷大，以填充点数。膨胀控制核点之间的间距。这很难描述，但是这个链接很好地展示了膨胀的作用。

参数：

框架可设置参数：

torch.nn.MaxPool2d(kernel_size, 
                   stride=None, 
                   padding=0, 
                   dilation=1, 
                   return_indices=False, 
                   ceil_mode=False)

torch.nn.MaxPool2d是一个池化层，它可以将输入的二维图像张量进行下采样操作。这个函数的输出是一个张量，它的形状取决于输入张量的形状、池化窗口的大小、步长、padding和dilation等参数。如果return_indices为True，则输出还将返回最大值的索引。

参数含义：

参数名称	含义
kernel_size	池化窗口的大小。可以是int，表示正方形窗口；也可以是tuple，表示矩形窗口，如(2, 3)。
stride	池化窗口在水平和垂直方向上的步长。可以是int，表示水平和垂直方向上相同的步长；也可以是tuple，表示水平和垂直方向上不同的步长，如(2, 3)。
padding	在输入张量的边缘填充0的层数。可以是int，表示在四个方向上填充相同数量的0；也可以是tuple，表示在不同方向上填充不同数量的0，如(1, 2)。
dilation	控制kernel中各个元素之间的空间跨度。可以是int，表示各个元素之间的空间跨度相同；也可以是tuple，表示各个元素之间的空间跨度不同，如(2, 3)。
return_indices	是否返回最大值的索引。
ceil_mode	为True时，输出大小将向上取整；为False时，输出大小将向下取整。

示例代码：

import torch
import torchvision
from torch import nn
from torch.nn import  MaxPool2d
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter

dataset = torchvision.datasets.CIFAR10("../data",
                                       train=False,
                                       transform=torchvision.transforms.ToTensor(),
                                       download=True)

dataloader = DataLoader(dataset, batch_size=64)

input = torch.tensor([[1, 2, 0, 3, 1],
                     [0, 1, 2, 3, 1],
                     [1, 2, 1, 0, 0],
                     [5, 2, 3, 1, 1],
                     [2, 1, 0, 1, 1]], dtype=torch.float32)

input = torch.reshape(input, (-1, 1, 5, 5))

class Touch(nn.Module):
    def __init__(self):
        super(Touch, self).__init__()
        self.maxpool1 = MaxPool2d(kernel_size=3, ceil_mode=False)

    def forward(self, input):
        output = self.maxpool1(input)
        return output



touch = Touch()

writer = SummaryWriter("../../logs")

step = 0
for data in dataloader:
    imgs, targets = data
    writer.add_images("input", imgs, step)
    output = touch(imgs)
    writer.add_images("output", output, step)

    step = step + 1

writer.close()

将输出的日志文件传入tensorboard模块进行可视化展示：

tensorboard --logdir=logs

可视化展示：

在这里插入图片描述

框架源码：

class MaxPool2d(_MaxPoolNd):
    r"""Applies a 2D max pooling over an input signal composed of several input planes.

    In the simplest case, the output value of the layer with input size :math:`(N, C, H, W)`,
    output :math:`(N, C, H_{out}, W_{out})` and :attr:`kernel_size` :math:`(kH, kW)`
    can be precisely described as:

    .. math::
        \begin{aligned}
            out(N_i, C_j, h, w) ={} & \max_{m=0, \ldots, kH-1} \max_{n=0, \ldots, kW-1} \\
                                    & \text{input}(N_i, C_j, \text{stride[0]} \times h + m,
                                                   \text{stride[1]} \times w + n)
        \end{aligned}

    If :attr:`padding` is non-zero, then the input is implicitly padded with negative infinity on both sides
    for :attr:`padding` number of points. :attr:`dilation` controls the spacing between the kernel points.
    It is harder to describe, but this `link`_ has a nice visualization of what :attr:`dilation` does.

    Note:
        When ceil_mode=True, sliding windows are allowed to go off-bounds if they start within the left padding
        or the input. Sliding windows that would start in the right padded region are ignored.

    The parameters :attr:`kernel_size`, :attr:`stride`, :attr:`padding`, :attr:`dilation` can either be:

        - a single ``int`` -- in which case the same value is used for the height and width dimension
        - a ``tuple`` of two ints -- in which case, the first `int` is used for the height dimension,
          and the second `int` for the width dimension

    Args:
        kernel_size: the size of the window to take a max over
        stride: the stride of the window. Default value is :attr:`kernel_size`
        padding: Implicit negative infinity padding to be added on both sides
        dilation: a parameter that controls the stride of elements in the window
        return_indices: if ``True``, will return the max indices along with the outputs.
                        Useful for :class:`torch.nn.MaxUnpool2d` later
        ceil_mode: when True, will use `ceil` instead of `floor` to compute the output shape

    Shape:
        - Input: :math:`(N, C, H_{in}, W_{in})` or :math:`(C, H_{in}, W_{in})`
        - Output: :math:`(N, C, H_{out}, W_{out})` or :math:`(C, H_{out}, W_{out})`, where

          .. math::
              H_{out} = \left\lfloor\frac{H_{in} + 2 * \text{padding[0]} - \text{dilation[0]}
                    \times (\text{kernel\_size[0]} - 1) - 1}{\text{stride[0]}} + 1\right\rfloor

          .. math::
              W_{out} = \left\lfloor\frac{W_{in} + 2 * \text{padding[1]} - \text{dilation[1]}
                    \times (\text{kernel\_size[1]} - 1) - 1}{\text{stride[1]}} + 1\right\rfloor

    Examples::

        >>> # pool of square window of size=3, stride=2
        >>> m = nn.MaxPool2d(3, stride=2)
        >>> # pool of non-square window
        >>> m = nn.MaxPool2d((3, 2), stride=(2, 1))
        >>> input = torch.randn(20, 16, 50, 32)
        >>> output = m(input)

    .. _link:
        https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md
    """

    kernel_size: _size_2_t
    stride: _size_2_t
    padding: _size_2_t
    dilation: _size_2_t

    def forward(self, input: Tensor):
        return F.max_pool2d(input, self.kernel_size, self.stride,
                            self.padding, self.dilation, ceil_mode=self.ceil_mode,
                            return_indices=self.return_indices)