Alex Net
ImageNet Classification with Deep Convolutional Neural Networks
Contents
Section 1 & 2
Current problem
- Current networks perform relatively well on small datasets like MNIST.
- The immerse complexity makes larger datasets like ImageNet even not large enough. Thus, the model must have lots of prior knowledge to compensate for it.
- The model itself must have large learning capacity.
Author’s work
- Trained the one of the largest convolutional neural network to date on the subset of ImageNet.
- Optimize with GPU.
- Some unusual features to speed up the training and improve performance, detailed in section 3.
- Used several effective techniques for preventing overfitting.
Section 3
Overall architecture
Contains 8 learned layers
- 5 convloutional layers
- 3 fully-connected layers
- a 1000-way softmax layer afterwards
Notes:
- 1st and 2nd convolutional layer is followed by a LRN layer each.
- Each LRN, as well as the 5th convolutional layer, is followed by a max pooling layer.
- The architecture graph is divided vertically into two parts, and distributed on two GPUs.
Novel and unusual features
Relu Nonlinearity
In terms of training time when using an activation function, a non-saturating function( f ( x ) = m a x ( 0 , x ) f(x)=max(0,x) f(x)=max(0,x) ) works faster then a saturating function( f ( x ) = t a n h ( x ) f(x)=tanh(x) f(x)=tanh(x) or f ( x ) = 1 1 + e − x f(x)=\frac{1}{1+e^{-x}} f(x)=1+e−x1 ).
Training on Multiple GPUs
GPU at that time is not capable enough to hold that network, so the auther split the network into two.
Local Response Normalization
LRN in short. A method that enlarge large responses and minish small responses, creating competition for neurons, used to reduce error rates. LRN was mentioned useless, however, in Very Deep Convolutional Networks for Large-Scale Image Recognition(VGG net).
Overlapping Pooling
The pooling kernel overlaps, which reduces the error rate a little bit.
Section 4
This part introduces techniques that prevent overfitting.
Data Augmentation
In short, artificially enlarging the dataset.
- Cut off random parts form a respectively large images, and train them as well as their vertical and horizontial reflections.
- Altering the intensities of the RGB channel.
Dropout
Inactivate some neurons randomly.
Section 5 & 6 & 7
Details, results and thoughts afterwards. In the end, the author propose that a deeper and larger really counts.