卷积神经网络
吴恩达 Andrew Ng
Computer Vision Problems
- Image Classification
- Object Detection
- Neural Style Transfer
Vertical edge detection
filter (过滤器) (通常是奇数阶)
⎛⎝⎜111000−1−1−1⎞⎠⎟ ( 1 0 − 1 1 0 − 1 1 0 − 1 ) 边缘检测
convolution operation
*
bright pixels on the left and dark pixels on the right
Padding
(n×n)∗(f×f)=(n−f+1)×(n−f+1) ( n × n ) ∗ ( f × f ) = ( n − f + 1 ) × ( n − f + 1 )
- output will shrink
- pixels in the corner are used only once, so we loss information near the edge of the image
解决上述两个问题的方法 —— Pad
- with an additional border of one pixel all around the edges
- pad with zeros by convention
- so the output will be (n+2p−f+1)×(n+2p−f+1) ( n + 2 p − f + 1 ) × ( n + 2 p − f + 1 )
Valid Convolution: No paddings (p = 0)
Same Convolution: Pad so that output size is the same as the input size
f 通常是奇数
- 便于 Same Convolution 的操作
- 存在中心点 central pixel
Strided Convolution
- 卷积步长 (stride):每次移动的格子数
- output: (⌊n−fs⌋+1)×(⌊n−fs⌋+1) ( ⌊ n − f s ⌋ + 1 ) × ( ⌊ n − f s ⌋ + 1 )
- the filter must entirely lies in the image (plus padding region)
Cross-correlation VS. Convolution
- in mathematic, (convolution) before calculation the filter needs a flipping operation (沿副对角线的镜面翻转)
- in ML we usually do not use flipping operation, actually it should be cross-correlation, but by convention we call this convolution
- 卷积满足结合律
Convolution on RGB images
height, width, channels(depth)
图片和过滤器的通道数必须相等
n×n×nc∗f×f×nc→(n−f+1)×(n−f+1)×n′c n × n × n c ∗ f × f × n c → ( n − f + 1 ) × ( n − f + 1 ) × n c ′
nc n c : number of channels; n′c n c ′ : number of filters
detect n′c n c ′ features
Example of a layer
- add a bias (偏差) to the output, and then apply a real non-linearity (非线性激活函数)
- less prone to over fitting (避免过拟合)
- 上一层的输出作为这一层的输入
- notation
A simple convolution network example
Convolutional Layer (Conv) 卷积层
Pooling Layer (Pool) 池化层
Fully Connected Layer (FC) 全连接层
Pooling layer
- reduce the size of representation to speed up computation and make some of the features it detects a bit more robust
- no parameters to learn, just a fixed function, has no weights
- 最后将池化的结果平整化为一个列向量
Max pooling
- break into different regions
- 输出每一个区域的最大值
- 最大池化的超级参数(hyper-parameters): f=2,s=2 f = 2 , s = 2 (often)
- usually does not use any padding
- 在某个区域提取到的特征保存在输出里
- if this feature is detected anywhere in this filter, then keep a high number
- 每个信道独立执行最大池化的计算
Average pooling
- 每个区域取平均得到输出
- 最大池化比平均池化更常用
Neural network example
识别数字
f=2,s=2 f = 2 , s = 2 使输入的高和宽减少一半
两类卷积的形式
一个卷积层和一个池化层一起作为一层;或者分为单独的两层
一般计算网络层数时,只看具有权重的层
池化后的结果和全连接层的单元作笛卡尔连接??
not to invent your own settings of hyper parameters, but to look in the literature
随着层数的增加,高度和宽度都会减少,信道数会增加
Why convolution
- parameter sharing (参数共享) and sparsity of connections (稀疏连接)
- feature detector (特征检测器) 适用于整张图片
- 某一个输出之和一部分输入相关
- good at capturing translation invariance (捕捉平移不变)
- 即使移动几个像素,图片依然具有与原图相似的特征