AlexNet论文笔记

最新推荐文章于 2022-12-31 17:42:15 发布

catOneTwo

最新推荐文章于 2022-12-31 17:42:15 发布

阅读量155

点赞数

分类专栏： # CNN 文章标签：数据挖掘深度学习机器学习神经网络

本文链接：https://blog.csdn.net/weixin_38673554/article/details/104529978

版权

CNN 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

Title: ImageNet Classification with Deep Convolutional Neural Networks（2012）

Link: paper

文章目录

1 Introduction

object recognition with CNN

task: object recognition

To learn about thousands of objects from millions of images, we need a model with a large learning capacity

为了从大规模图片中完成物品识别任务，需要一个学习模型。与标准的前馈神经网络相比，CNN的连接和参数更少，所以更容易训练。

一直以来，因为训练昂贵，CNN一直没有应用于大规模高像素的图片上。幸运地是，随着GPU训练能力的提升，2D卷积最优化的实现，以及大规模有标签图片数据集如ImageNet的出现（避免overfitting），使得CNN的应用得以实现。

contribution

ILSVRC-2010 and ILSVRC-2012 两个比赛中，在 ImageNet 上训练了最大的CNN之一并取得了当时最好的结果。
实现了2D卷积的GPU版本，并且代码公开
这个网络有一些能提升性能和节约训练时间的新特性（第三节阐述）
运用了防止过拟合的有效技巧（第四节阐述）
Our final network contains five convolutional and three fully-connected layers（移除任意卷积层效果都变差）

2 The Dataset

ImageNet is a dataset of over 15 million labeled high-resolution images belonging to roughly 22,000 categories.

超过15万张带标签图片，大约 22000 个类别。

ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) 是一个2020年开始的年度比赛。ILSVRC-2010的测试集标签available,所以实验主要用这个版本的数据。

ImageNet包含的图片分辨率不同，拿到图片首先要down-sample到256 * 256的尺寸。图片没有进行预处理，we trained our network on the (centered) raw RGB values of the pixels.

3 The Architecture

网络8 layers——5 convolutional and 3 fully-connected.

以下是一些网络的新特点（ novel or unusual features）。

3.1 ReLU Nonlinearity

Deep convolutional neural networks with ReLUs train several times faster than their equivalents with tanh units.

标准的输出层激活函数是 tanh 或 sigmoid ，但实验证明用 Relu 训练更快。

3.2 Training on Multiple GPUs

GPU是用来加速训练的，一个GPU的内存有限，所以就用两个训练（therefore we spread the net across two GPUs）。

GPU的并行机制（parallelization scheme）可以让我们把网络的一半神经元放到一个GPU上。

The two-GPU net takes slightly less time to train than the one-GPU net.

3.3 Local Response Normalization

Relu有个特性，它不需要输入数据正则化（normalization）来防止饱和（saturating），但是我们发现局部的正则化（local normalization scheme）对泛化有帮助（generalization）。

3.4 Overlapping Pooling

传统的池化层，池化单元不重叠（overlap）。

但是经过实验发现，models with overlapping pooling更难出现过拟合，所以网络采取了overlapping pooling。

3.5 Overall Architecture

网络总体框架是 5 个卷积层和 3 个全连接层。

输出：最后一个 FC layer 输入到1000-way softmax来产生1000个lable。

中间层：

第二四五个卷积层的核只与它们与前一层的kernel map连接，第三个卷积层的核与第二层的所有kernel map连接。
全连接的神经元与前一层的所有神经元连接。
Max-pooling层跟在response-normalization层和第五个卷积层后。
ReLU应用于每个卷积和全连接层的输出上。

输入：224×224×3 input image
在这里插入图片描述

卷积层：

步长为4
第一层 96 kernels size 11×11×3
第二层 256 kernels size 5 × 5 × 48
第三层 384 kernels size 3 × 3 × 256
第四层 384 kernels size 3 × 3 × 192
第五层 256 kernels size 3 × 3 × 192

全连接层：

每一层有 4096 个神经元

4 Reducing Overfitting

两个主要的减少过拟合的策略。

4.1 Data Augmentation

The easiest and most common method to reduce overfitting on image data is to artificially enlarge the dataset using label-preserving transformations.

通过数据增强来减少过拟合，也就是基于给定的图片经过一些变换得到更多的图像。

第一种方法包括图像翻译和水平翻转（ image translations and horizontal reflections），第二种方法是改变训练图像RGB通道的强度。

4.2 Dropout

set to zero the output of each hidden neuron with probability 0.5.

把隐藏层的输出按0.5的概率置0，这些被“移除”（drop out）的神经元就不会参加前馈（pass forward）和后传(back propagation)。

所以每次来新的输入，网络结构都不同，但权重不变，这样神经元就不会依赖其他神经元，造成过拟合。

这种方法使得网络学到更鲁棒的特性（robust feature）

5 Details of learning

训练时的策略

learning strategy:
SGD
batch size = 128
momentum = 0.9
weight decay = 0.0005（可以减小训练误差）
initialization:
每一层的权重由均值为0，标准差为0.01的高斯分布初始化
第二四五个卷积层和全连接隐藏层的神经元偏置（bias）设为常量（constant）1，其余层的神经元偏置设为常量0（通过给ReLU提供正值来加速早起训练）
learning rate:
We used an equal learning rate for all layers, which we adjusted manually throughout training.
所有层的学习率相等，训练时手动调整

6 Results

数据集：
分别在 ILSVRC-2010，ILSVRC-2012两个比赛的数据集上，以及Fall 2009 version of ImageNet数据集上得到结果。

结果用错误率比较，分为 top1 和 top5 两种计算错误率的方式。

top1：预测的label取概率向量里面最大的那一个作为预测结果，如过你的预测结果中概率最大的那个分类正确，则预测正确。否则预测错误
top5：概率向量最大的前五名中，只要出现了正确概率即为预测正确。否则预测错误。

由此看来，top5 的容忍度高，得到的错误率会更低些。

7 Discussion

Our results show that a large, deep convolutional neural network is capable of achieving record-breaking results on a highly challenging dataset using purely supervised learning.

CNN在监督学习上取得了打破记录的成就，但是没有用无监督学习做预训练。