cv基础算法01-Alexnet

最新推荐文章于 2023-07-22 19:07:33 发布

东阳z

最新推荐文章于 2023-07-22 19:07:33 发布

阅读量516

点赞数 1

分类专栏：人工智能计算机视觉

本文链接：https://blog.csdn.net/qq_22473333/article/details/107964809

版权

人工智能同时被 2 个专栏收录

34 篇文章 2 订阅

订阅专栏

计算机视觉

9 篇文章 0 订阅

订阅专栏

东阳的学习记录，坚持就是胜利！

Alexnet有着划时代的意义：

拉开卷积神经网络统治计算机视觉的序幕
加速计算机视觉应用落地

网络结构

对应论文中的第三节

ReLU Nonlinearity

将神经元输出f建模为输入x的函数的标准方式是用f(x) = tanh(x)或f(x) = (1 + e−x)−1。考虑到梯度下降的训练时间，这些饱和的非线性比非饱和非线性f(x) = max(0,x)更慢。根据Nair和Hinton[20]的说法，我们将这种非线性神经元称为修正线性单元(ReLU)。采用ReLU的深度卷积神经网络训练时间比等价的tanh单元要快几倍。在图1中，对于一个特定的四层卷积网络，在CIFAR-10数据集上达到25%的训练误差所需要的迭代次数可以证实这一点。这幅图表明，如果我们采用传统的饱和神经元模型，我们将不能在如此大的神经网络上实验该工作。

在这里插入图片描述

Training on Multiple GPUs

多GPU训练

Local Response Normalization（弃用）

在pytorch官方实现中被弃用

Overlapping Pooling（pytorch官方实现中用的自适应池化）

在pytorch官方实现中使用自适应池化代替

pytorch官方实现

class AlexNet(nn.Module):

    def __init__(self, num_classes=1000):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            # conv 1
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2), # 96 -> 64
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),  # 带重叠的池化

            # conv 2
            nn.Conv2d(64, 192, kernel_size=5, padding=2), # 256 -> 192
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),

            # conv 3
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            # conv 4
            nn.Conv2d(384, 256, kernel_size=3, padding=1), # 384 -> 256
            nn.ReLU(inplace=True),
            # conv 5
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),

            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),

            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

可见，由5个卷积层和3个全连接层组成，如下图：

在这里插入图片描述

减少过拟合

对应论文第四节

数据预处理/数据增强

The first form of data augmentation consists of generating image translations and horizontal reflec-
tions. We do this by extracting random 224 × 224 patches (and their horizontal reflections) from the
256×256 images and training our network on these extracted patches 4 . This increases the size of our
training set by a factor of 2048, though the resulting training examples are, of course, highly inter-
dependent. Without this scheme, our network suffers from substantial overfitting, which would have
forced us to use much smaller networks. At test time, the network makes a prediction by extracting
five 224 × 224 patches (the four corner patches and the center patch) as well as their horizontal
reflections (hence ten patches in all), and averaging the predictions made by the network’s softmax
layer on the ten patches.

第一种数据增强方式包括产生图像变换和水平翻转。我们从256×256图像上通过随机提取224 × 224的图像块实现了这种方式，然后在这些提取的图像块上进行训练。这通过一个2048因子增大了我们的训练集，尽管最终的训练样本是高度相关的。没有这个方案，我们的网络会有大量的过拟合，这会迫使我们使用更小的网络。在测试时，网络会提取5个224 × 224的图像块（四个角上的图像块和中心的图像块）和它们的水平翻转（因此总共10个图像块）进行预测，然后对网络在10个图像块上的softmax层进行平均。
where and are th eigenvector and eigenvalue of the 3 × 3 covariance matrix of RGB pixel values, respectively, and is the aforementioned random variable. Each is drawn only once for all the pixels of a particular training image until that image is used for training again, at which point it is re-drawn. This scheme approximately captures an important property of natural images, namely, that object identity is invariant to changes in the intensity and color of the illumination. This scheme reduces the top-1 error rate by over 1%.

第二种数据增强方式包括改变训练图像的RGB通道的强度。具体地，我们在整个ImageNet训练集上对RGB像素值集合执行PCA。对于每幅训练图像，我们加上多倍找到的主成分，大小成正比的对应特征值乘以一个随机变量，随机变量通过均值为0，标准差为0.1的高斯分布得到。因此对于每幅RGB图像像素，
我们加上下面的数量：

，分别是RGB像素值3 × 3协方差矩阵的第个特征向量和特征值，是前面提到的随机变量。对于某个训练图像的所有像素，每个只获取一次，直到图像进行下一次训练时才重新获取。这个方案近似抓住了自然图像的一个重要特性，即光照的颜色和强度发生变化时，目标身份是不变的。这个方案减少了top 1错误率1%以上。

Dropout（随机失活）

Combining the predictions of many different models is a very successful way to reduce test errors [1, 3], but it appears to be too expensive for big neural networks that already take several days to train. There is, however, a very efficient version of model combination that only costs about a factor of two during training. The recently-introduced technique, called “dropout” [10], consists of setting to zero the output of each hidden neuron with probability 0.5. The neurons which are “dropped out” in this way do not contribute to the forward pass and do not participate in back-propagation. So every time an input is presented, the neural network samples a different architecture, but all these architectures share weights. This technique reduces complex co-adaptations of neurons, since a neuron cannot rely on the presence of particular other neurons. It is, therefore, forced to learn more robust features that are useful in conjunction with many different random subsets of the other neurons. At test time, we use all the neurons but multiply their outputs by 0.5, which is a reasonable approximation to taking the geometric mean of the predictive distributions produced by the exponentially-many dropout networks.

将许多不同模型的预测结合起来是降低测试误差[1, 3]的一个非常成功的方法，但对于需要花费几天来训练的大型神经网络来说，这似乎太昂贵了。然而，有一个非常有效的模型结合版本，它只花费两倍的训练成本。这种最近引入的技术，叫做“dropout”[10]，它会以0.5的概率对每个隐层神经元的输出设为0。那些“失活的”的神经元不再进行前向传播并且不参与反向传播。因此每次输入时，神经网络会采样一个不同的架构，但所有架构共享权重。这个技术减少了复杂的神经元互适应，因为一个神经元不能依赖特定的其它神经元的存在。因此，神经元被强迫学习更鲁棒的特征，它在与许多不同的其它神经元的随机子集结合时是有用的。在测试时，我们使用所有的神经元但它们的输出乘以0.5，对指数级的许多失活网络的预测分布进行几何平均，这是一种合理的近似。

结果表明：随机失活是有效的，我们在下图中的前两个全连接层使用失活。如果没有失活，我们的网络表现出大量的过拟合。失活大致上使要求收敛的迭代次数翻了一倍。(为什么要在前两个全连接层上使用DropOut)

训练细节

对应论文第5节

权重衰减是有用且必须的

在这里插入图片描述

创新点（改进）

采用ReLu加快大型神经网络训练（弃用了常用的sigmoid，采用了非饱和非线性的ReLU，减少梯度消失，是模型更快收敛）
采用LRN提升大型网络泛化能力（在pytorch官方实现中已经弃用，好像没什么用）
采用Overlapping Pooling提升指标（重叠池化是有用的）（已被自适应池化代替）
采用随机裁剪翻转及色彩扰动增加数据多样性（在resize时，先短边再长边）
采用Drpout减轻过拟合（注意在测试时，需要将参数乘以p）

启发点（思考与展望）

深度与宽度可决定网络能力。（深度是有用的）
Their capacity can be controlled by varying their depth and breadth.(1 Introduction p2)
更强大GPU及更多数据可进一步提高模型性能
All of our experiments suggest that our results can be improved simply by waiting for faster GPUs and bigger
datasets to become available. (1 Introduction p5)
图片缩放细节,对短边先缩放 (防止丢失像素)
Given a rectangular image, we first rescaled the image such that the shorter side was of length 256, and then
cropped out the central 256×256 patch from the resulting image.(2 Dataset p3)
ReLU不需要对输入进行标准化来防止饱和现象,即说明sigmoid/tanh激活函数有必要对输入进行标准化
ReLUs have the desirable property that they do not require input normalization to prevent them from
saturating(3.3 LRN p1)
卷积核学习到频率、方向和颜色特征（第一层卷积学到的，论文中对其做了可视化）
The network has learned a variety of frequency- and orientation-selective kernels, as well as various colored
blobs.(6.1 p1)
相似图片具有“相近”的高级特征（具有思考意义）
If two images produce feature activation vectors with a small Euclidean separation, we can say that the higher levels of the neural network consider them to be similar.(6.1 p3)
图像检索可基于高级特征,效果应该优于基于原始图像（基于上一条）
This should produce a much better image retrieval method than applying autoencoders to the raw pixels.(6.1 p4)
深度是有用且必须的！！（depth really is important for achieving our results！！！）
It is notable that our network’s performance degrades if a single convolutional layer is removed. So the depth really is important for achieving our results(7 Discussion p1)
采用视频数据,可能有新突破（加上时序（前后关系））
Ultimately we would like to use very large and deep convolutional nets on video sequences.(7 Discussion p2)