卷积神经网络
卷积神经网络(Convolutional Neural Network,CNN)是一种深度学习模型,主要用于图像识别、语音识别和自然语言处理等领域。CNN 通过卷积层、池化层和全连接层等组件构成。
卷积层是 CNN 的核心部分,它通过滤波器(Filter)对输入图像进行卷积操作,提取出图像的特征信息。卷积层通过多个不同的滤波器,可以提取出不同的特征信息,例如边缘、纹理和形状等。
CNN 通常使用反向传播算法进行训练,通过优化损失函数,不断调整网络参数,使其能够更好地拟合训练数据。在实际应用中,CNN 已经取得了许多优秀的成果,例如在图像分类、物体检测和语音识别等领域中,CNN 已经成为了一种主流的模型。
卷积层
本文主要讲最常用的Conv2d,二维图像
主要的数学公式:
⋆ 是有效的二维互相关运算符,N 是批量大小,C表示通道数,H 是输入平面的高度(以像素为单位),并且W 是以像素为单位的宽度。
o u t ( N i , C o u t j ) = b i a s ( C o u t j ) + ∑ k = 0 C i n − 1 w e i g h t ( C o u t j , k ) ⋆ i n p u t ( N i , k ) \mathrm{out}(N_i,C_{\mathrm{out}_j})=\mathrm{bias}(C_{\mathrm{out}_j})+\sum_{k=0}^{C_{\mathrm{in}-1}}\mathrm{weight}(C_{\mathrm{out}_j},k)\star\mathrm{input}(N_i,k) out(Ni,Coutj)=bias(Coutj)+k=0∑Cin−1weight(Coutj,k)⋆input(Ni,k)
参数:
pytorch官网给出的框架中可设置的参数:
torch.nn.Conv2d(in_channels,
out_channels,
kernel_size,
stride=1,
padding=0,
dilation=1,
groups=1,
bias=True,
padding_mode='zeros',
device=None,
dtype=None)
参数的含义:
参数名称 | 含义 |
---|---|
in_channels | 输入的通道数,也就是输入图像的深度(channel) |
out_channel | 输出的通道数,也就是卷积核(滤波器)的个数,决定了卷积层的深度。 |
kernel_size | 卷积核的大小,可以是一个整数或者一个元组 (H, W),其中 H 和 W 分别表示卷积核的高度和宽度。 |
stride | 卷积的步长,可以是一个整数或者一个元组 (S_H, S_W),其中 S_H 和 S_W 分别表示在高度和宽度方向上的步长,默认值为 1。 |
padding | 输入的零填充(zero-padding)的大小,可以是一个整数或者一个元组 (P_H, P_W),其中 P_H 和 P_W 分别表示在高度和宽度方向上的填充大小,默认值为 0。 |
dilation | 卷积核的扩展率(dilation rate),默认值为 1。如果设置为大于 1 的值,将会增加卷积核内部元素之间的间距,从而改变卷积操作的感受野大小。 |
groups | 输入和输出之间的连接方式,可以是一个整数,默认值为 1。当 groups 等于输入通道数时,表示每个输入通道对应一个输出通道;当 groups 等于 1 时,表示所有输入通道共享一个卷积核。 |
bias | 是否使用偏置项,默认值为 True。如果设置为 False,卷积操作中将不添加偏置项。 |
padding_mode | 填充模式,默认为 ‘zeros’,表示使用零填充。 |
device | 指定张量所在的设备(CPU 或 GPU)。 |
dtype | 指定张量的数据类型。 |
示例代码:
import torch
import torchvision
from torch import nn
from torch.nn import Conv2d
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWr
//加载CIFAR10数据集(自动下载)
dataset = torchvision.datasets.CIFAR10("../data",
train=False,
transform=torchvision.transforms.ToTensor(),
download=True)
//加载数据集
dataloader = DataLoader(dataset, batch_size=64)
//创建Touch模型
class Touch(nn.Module):
def __init__(self):
super(Touch,self).__init__()
//传入二维卷积所需要的参数
self.conv1 = Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=0)
def forward(self, x):
x = self.conv1(x)
return x
//实例化Touch模型
touch = Touch()
//将下列卷积输出到日志文件中,并载入tensorboard模块可视化展示
writer = SummaryWriter("../../logs")
//初始化step
step = 0
//通过for循环将每一个卷积输出结果写入日志文件中
for data in dataloader:
imgs, targets = data
output = touch(imgs)
//打印输出imgs,output的数据形状
print(imgs.shape)
print(output.shape)
writer.add_images("input", imgs, step)
//将输出的数据形状转化为模型所需要形状
output = torch.reshape(output, (-1, 3, 30, 30))
writer.add_images("output", output, step)
//持续走步
step = step + 1
将输出的日志文件传入tensorboard模块进行可视化展示:
tensorboard --logdir=logs
可视化展示:
框架源码:
class Conv2d(_ConvNd):
__doc__ = r"""Applies a 2D convolution over an input signal composed of several input
planes.
In the simplest case, the output value of the layer with input size
:math:`(N, C_{\text{in}}, H, W)` and output :math:`(N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})`
can be precisely described as:
.. math::
\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) +
\sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)
where :math:`\star` is the valid 2D `cross-correlation`_ operator,
:math:`N` is a batch size, :math:`C` denotes a number of channels,
:math:`H` is a height of input planes in pixels, and :math:`W` is
width in pixels.
""" + r"""
This module supports :ref:`TensorFloat32<tf32_on_ampere>`.
On certain ROCm devices, when using float16 inputs this module will use :ref:`different precision<fp16_on_mi200>` for backward.
* :attr:`stride` controls the stride for the cross-correlation, a single
number or a tuple.
* :attr:`padding` controls the amount of padding applied to the input. It
can be either a string {{'valid', 'same'}} or an int / a tuple of ints giving the
amount of implicit padding applied on both sides.
* :attr:`dilation` controls the spacing between the kernel points; also
known as the à trous algorithm. It is harder to describe, but this `link`_
has a nice visualization of what :attr:`dilation` does.
{groups_note}
The parameters :attr:`kernel_size`, :attr:`stride`, :attr:`padding`, :attr:`dilation` can either be:
- a single ``int`` -- in which case the same value is used for the height and width dimension
- a ``tuple`` of two ints -- in which case, the first `int` is used for the height dimension,
and the second `int` for the width dimension
Note:
{depthwise_separable_note}
Note:
{cudnn_reproducibility_note}
Note:
``padding='valid'`` is the same as no padding. ``padding='same'`` pads
the input so the output has the shape as the input. However, this mode
doesn't support any stride values other than 1.
Note:
This module supports complex data types i.e. ``complex32, complex64, complex128``.
Args:
in_channels (int): Number of channels in the input image
out_channels (int): Number of channels produced by the convolution
kernel_size (int or tuple): Size of the convolving kernel
stride (int or tuple, optional): Stride of the convolution. Default: 1
padding (int, tuple or str, optional): Padding added to all four sides of
the input. Default: 0
padding_mode (str, optional): ``'zeros'``, ``'reflect'``,
``'replicate'`` or ``'circular'``. Default: ``'zeros'``
dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
groups (int, optional): Number of blocked connections from input
channels to output channels. Default: 1
bias (bool, optional): If ``True``, adds a learnable bias to the
output. Default: ``True``
""".format(**reproducibility_notes, **convolution_notes) + r"""
Shape:
- Input: :math:`(N, C_{in}, H_{in}, W_{in})` or :math:`(C_{in}, H_{in}, W_{in})`
- Output: :math:`(N, C_{out}, H_{out}, W_{out})` or :math:`(C_{out}, H_{out}, W_{out})`, where
.. math::
H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{dilation}[0]
\times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor
.. math::
W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{dilation}[1]
\times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor
Attributes:
weight (Tensor): the learnable weights of the module of shape
:math:`(\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}},`
:math:`\text{kernel\_size[0]}, \text{kernel\_size[1]})`.
The values of these weights are sampled from
:math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
:math:`k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}`
bias (Tensor): the learnable bias of the module of shape
(out_channels). If :attr:`bias` is ``True``,
then the values of these weights are
sampled from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
:math:`k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}`
Examples:
>>> # With square kernels and equal stride
>>> m = nn.Conv2d(16, 33, 3, stride=2)
>>> # non-square kernels and unequal stride and with padding
>>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
>>> # non-square kernels and unequal stride and with padding and dilation
>>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
>>> input = torch.randn(20, 16, 50, 100)
>>> output = m(input)
.. _cross-correlation:
https://en.wikipedia.org/wiki/Cross-correlation
.. _link:
https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md
"""
def __init__(
self,
in_channels: int,
out_channels: int,
kernel_size: _size_2_t,
stride: _size_2_t = 1,
padding: Union[str, _size_2_t] = 0,
dilation: _size_2_t = 1,
groups: int = 1,
bias: bool = True,
padding_mode: str = 'zeros', # TODO: refine this type
device=None,
dtype=None
) -> None:
factory_kwargs = {'device': device, 'dtype': dtype}
kernel_size_ = _pair(kernel_size)
stride_ = _pair(stride)
padding_ = padding if isinstance(padding, str) else _pair(padding)
dilation_ = _pair(dilation)
super().__init__(
in_channels, out_channels, kernel_size_, stride_, padding_, dilation_,
False, _pair(0), groups, bias, padding_mode, **factory_kwargs)
def _conv_forward(self, input: Tensor, weight: Tensor, bias: Optional[Tensor]):
if self.padding_mode != 'zeros':
return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
weight, bias, self.stride,
_pair(0), self.dilation, self.groups)
return F.conv2d(input, weight, bias, self.stride,
self.padding, self.dilation, self.groups)
def forward(self, input: Tensor) -> Tensor:
return self._conv_forward(input, self.weight, self.bias)