卷积神经网络
卷积神经网络(Convolutional Neural Network,CNN)是一种深度学习模型,主要用于图像识别、语音识别和自然语言处理等领域。CNN 通过卷积层、池化层和全连接层等组件构成。
池化层可以减小特征图的尺寸,并且保留图像的主要特征信息。常见的池化方式有最大池化(Max Pooling)和平均池化(Average Pooling)。
CNN 通常使用反向传播算法进行训练,通过优化损失函数,不断调整网络参数,使其能够更好地拟合训练数据。在实际应用中,CNN 已经取得了许多优秀的成果,例如在图像分类、物体检测和语音识别等领域中,CNN 已经成为了一种主流的模型。
池化层
本文主要讲MaxPool2d
主要的数学公式:
对由多个输入平面组成的输入信号应用 2D 最大池化
最简单的情况下,具有输入大小的层的输出值(N,C,H,W), output (N,C,Hout,Wout) and kernel_size
(kH,kW)可以精确地描述为:
o u t ( N i , C j , h , w ) = max m = 0 , … , k H − 1 max n = 0 , … , k W − 1 i n p u t ( N i , C j , s t r i d e [ 0 ] × h + m , s t r i d e [ 1 ] × w + n ) \begin{aligned} out(N_{i},C_{j},h,w)& =\max_{m=0,\ldots,kH-1}\max_{n=0,\ldots,kW-1} \\ &\mathrm{input}(N_i,C_j,\mathrm{stride}[0]\times h+m,\mathrm{stride}[1]\times w+n) \end{aligned} out(Ni,Cj,h,w)=m=0,…,kH−1maxn=0,…,kW−1maxinput(Ni,Cj,stride[0]×h+m,stride[1]×w+n)
如果填充非零,则输入将在两侧隐式填充负无穷大,以填充点数。膨胀控制核点之间的间距。这很难描述,但是这个链接很好地展示了膨胀的作用。
参数:
框架可设置参数:
torch.nn.MaxPool2d(kernel_size,
stride=None,
padding=0,
dilation=1,
return_indices=False,
ceil_mode=False)
torch.nn.MaxPool2d是一个池化层,它可以将输入的二维图像张量进行下采样操作。这个函数的输出是一个张量,它的形状取决于输入张量的形状、池化窗口的大小、步长、padding和dilation等参数。如果return_indices为True,则输出还将返回最大值的索引。
参数含义:
参数名称 | 含义 |
---|---|
kernel_size | 池化窗口的大小。可以是int,表示正方形窗口;也可以是tuple,表示矩形窗口,如(2, 3)。 |
stride | 池化窗口在水平和垂直方向上的步长。可以是int,表示水平和垂直方向上相同的步长;也可以是tuple,表示水平和垂直方向上不同的步长,如(2, 3)。 |
padding | 在输入张量的边缘填充0的层数。可以是int,表示在四个方向上填充相同数量的0;也可以是tuple,表示在不同方向上填充不同数量的0,如(1, 2)。 |
dilation | 控制kernel中各个元素之间的空间跨度。可以是int,表示各个元素之间的空间跨度相同;也可以是tuple,表示各个元素之间的空间跨度不同,如(2, 3)。 |
return_indices | 是否返回最大值的索引。 |
ceil_mode | 为True时,输出大小将向上取整;为False时,输出大小将向下取整。 |
示例代码:
import torch
import torchvision
from torch import nn
from torch.nn import MaxPool2d
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
dataset = torchvision.datasets.CIFAR10("../data",
train=False,
transform=torchvision.transforms.ToTensor(),
download=True)
dataloader = DataLoader(dataset, batch_size=64)
input = torch.tensor([[1, 2, 0, 3, 1],
[0, 1, 2, 3, 1],
[1, 2, 1, 0, 0],
[5, 2, 3, 1, 1],
[2, 1, 0, 1, 1]], dtype=torch.float32)
input = torch.reshape(input, (-1, 1, 5, 5))
class Touch(nn.Module):
def __init__(self):
super(Touch, self).__init__()
self.maxpool1 = MaxPool2d(kernel_size=3, ceil_mode=False)
def forward(self, input):
output = self.maxpool1(input)
return output
touch = Touch()
writer = SummaryWriter("../../logs")
step = 0
for data in dataloader:
imgs, targets = data
writer.add_images("input", imgs, step)
output = touch(imgs)
writer.add_images("output", output, step)
step = step + 1
writer.close()
将输出的日志文件传入tensorboard模块进行可视化展示:
tensorboard --logdir=logs
可视化展示:
框架源码:
class MaxPool2d(_MaxPoolNd):
r"""Applies a 2D max pooling over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size :math:`(N, C, H, W)`,
output :math:`(N, C, H_{out}, W_{out})` and :attr:`kernel_size` :math:`(kH, kW)`
can be precisely described as:
.. math::
\begin{aligned}
out(N_i, C_j, h, w) ={} & \max_{m=0, \ldots, kH-1} \max_{n=0, \ldots, kW-1} \\
& \text{input}(N_i, C_j, \text{stride[0]} \times h + m,
\text{stride[1]} \times w + n)
\end{aligned}
If :attr:`padding` is non-zero, then the input is implicitly padded with negative infinity on both sides
for :attr:`padding` number of points. :attr:`dilation` controls the spacing between the kernel points.
It is harder to describe, but this `link`_ has a nice visualization of what :attr:`dilation` does.
Note:
When ceil_mode=True, sliding windows are allowed to go off-bounds if they start within the left padding
or the input. Sliding windows that would start in the right padded region are ignored.
The parameters :attr:`kernel_size`, :attr:`stride`, :attr:`padding`, :attr:`dilation` can either be:
- a single ``int`` -- in which case the same value is used for the height and width dimension
- a ``tuple`` of two ints -- in which case, the first `int` is used for the height dimension,
and the second `int` for the width dimension
Args:
kernel_size: the size of the window to take a max over
stride: the stride of the window. Default value is :attr:`kernel_size`
padding: Implicit negative infinity padding to be added on both sides
dilation: a parameter that controls the stride of elements in the window
return_indices: if ``True``, will return the max indices along with the outputs.
Useful for :class:`torch.nn.MaxUnpool2d` later
ceil_mode: when True, will use `ceil` instead of `floor` to compute the output shape
Shape:
- Input: :math:`(N, C, H_{in}, W_{in})` or :math:`(C, H_{in}, W_{in})`
- Output: :math:`(N, C, H_{out}, W_{out})` or :math:`(C, H_{out}, W_{out})`, where
.. math::
H_{out} = \left\lfloor\frac{H_{in} + 2 * \text{padding[0]} - \text{dilation[0]}
\times (\text{kernel\_size[0]} - 1) - 1}{\text{stride[0]}} + 1\right\rfloor
.. math::
W_{out} = \left\lfloor\frac{W_{in} + 2 * \text{padding[1]} - \text{dilation[1]}
\times (\text{kernel\_size[1]} - 1) - 1}{\text{stride[1]}} + 1\right\rfloor
Examples::
>>> # pool of square window of size=3, stride=2
>>> m = nn.MaxPool2d(3, stride=2)
>>> # pool of non-square window
>>> m = nn.MaxPool2d((3, 2), stride=(2, 1))
>>> input = torch.randn(20, 16, 50, 32)
>>> output = m(input)
.. _link:
https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md
"""
kernel_size: _size_2_t
stride: _size_2_t
padding: _size_2_t
dilation: _size_2_t
def forward(self, input: Tensor):
return F.max_pool2d(input, self.kernel_size, self.stride,
self.padding, self.dilation, ceil_mode=self.ceil_mode,
return_indices=self.return_indices)
d(input, self.kernel_size, self.stride,
self.padding, self.dilation, ceil_mode=self.ceil_mode,
return_indices=self.return_indices)