卷积神经网络基础
The role of the ConvNet is to reduce the images into a form which is easier to process, without losing features which are critical for getting a good prediction.
我们今天就大概看三种layers:convolution Layer , pooling layer 和 fully connected layer.
1. Convolution Layer
1.1 Parameters:
Kernel Size, Stride, Padding, Input & Output Channels
- Kernel/Filter
如上动图:Kernel/Filter在这里是个3x3x1 matrix,每个channel选的kernel都不一样
Kernel/Filter, K =
1 0 0
1 -1 -1
1 0 -1
卷积核大小:
卷积核通常使用奇数 3*3, 55, 77, 9*9, (为了对称)
通常小而深
- Stride
楼上kernel的Stride Length = 1 (Non-Strided),如果是stride=2的就如下图:
(蓝色的表示input,阴影是kernel,青色的表示output)
- Padding
还有各种带padding的示意图,看这里here
a) Same Padding
(图:SAME padding: 5x5x1 image is padded with 0s to create a 6x6x1 image)
When we augment the 5x5x1 image into a 6x6x1 image and then apply the 3x3x1 kernel over it, we find that the convolved matrix turns out to be of dimensions 5x5x1. Hence the name — Same Padding.
b) Valid Padding
Iif we perform the same operation without padding, we are presented with a matrix which has dimensions of the Kernel (3x3x1) itself — Valid Padding.
1.2 Do the math:
· multiply, then add up:
简单来讲,看下图,黄的里面每一个格子和对应绿色里面的格子相乘,然后相加,得到的就就是Convolved feature
(图:Convoluting a 5x5x1 image with a 3x3x1 kernel to get a 3x3x1 convolved feature)
详细的见本页2.2 Convolution Matrix.
2. Transposed Convolutions
也叫 deconvolutions 或者 fractionally strided convolutions
对叫作deconvolution抱有怨念大有人在,如图:
为啥叫deconvolution,请找 Zeiler.
2.1 why use?
the need of up-sampling, 譬如从低像素到高像素,使用Transposed Convolutions就是个很好的办法。
参考
2.2 Intuition:
- Convolution Operation
回顾一下卷积运算:
input: 4x4
stride: 1
kernel: 3x3
padding: 无
output: 2x2
这里,9个数变1个数,卷积运算是多对一的关系。
- Going Backward
如果我们想:
input: 2x2
output: 4x4
也就是1个数变成9个数,一对多的关系。
我们首先需要弄明白Convolution Matrix和Transposed Convolution Matrix:
- Convolution Matrix
如下图,把 3x3 kernel变成4x16 matrix,把 4x4 input matrix变成16x1的column vector,
然后将 4x16 convolution matrix 与 16x1 input matrix (16 dimensional column vector)矩阵相乘,得到4x1 matrix
4x1 matrix进行变换就是2x2 matrix :
为啥要弄成Convolution Matrix?
With the convolution matrix, you can go from 16 (4x4) to 4 (2x2) because the convolution matrix is 4x16. Then, if you have a 16x4 matrix, you can go from 4 (2x2) to 16 (4x4).
- Transposed Convolution Matrix
我们若想:
input: 2x2
output: 4x4
我们要用到 16x4 matrix,这里要保证 1个数对应9个数的关系:
- Transpose the convolution matrix C (4x16) to CT (16x4).
- Matrix-multiply CT (16x4) with a column vector (4x1) to generate an output matrix (16x1)
- The transposed matrix connects 1 value to 9 values in the output.
把结果变形就得到了4x4 matrix:
注意:the actual weight values in the matrix does not have to come from the original convolution matrix. What’s important is that the weight layout is transposed from that of the convolution matrix. here
tf.nn.conv2d_transpose(
value,
filter,
output_shape,
strides,
padding='SAME',
data_format='NHWC',
name=None
)
3. Pooling Layer
In all cases, pooling helps to make the representation become approximately invariant to small translations of the input. Invariance to translation means that if we translate the input by a small amount, the values of most of the pooled outputs do not change. — Page 342, Deep Learning, 2016.
说白了就是化繁为简,等于downsampling:
作用1: decrease the computational power required to process the data through dimensionality reduction.
作用2: useful for extracting dominant features which are rotational and positional invariant, thus maintaining the process of effectively training of the model.
两种pooling:
3.1 Max pooling
字面意思,max()
,如图
3.2 Average pooling
字面意思,average()
,如图
4. Fully Connected Layer
Basically, a FC layer looks at what high level features most strongly correlate to a particular class and has particular weights so that when you compute the products between the weights and the previous layer, you get the correct probabilities for the different classes.
5. CNN Architectures
5.1 Classic network architectures
- LeNet-5
看这里:Gradient-based learning applied to document recognition
- AlexNet
看这里:ImageNet Classification with Deep Convolutional Neural Networks
- VGG 16
看这里:Very Deep Convolutional Networks for Large-Scale Image Recognition