Basic Knowledge
卷积操作是点乘相加,INPUT[width*height*depth/channel]
filter[width_*height_*depth/channel]
INPUT和filter中的depth相对应
生成的feature map的个数与filter的个数相等
2 Local Connectivity
hyperparameters:receptive field(the filter size):每个神经元连接到输入变量的局部区域的空间范围。在卷积操作中,连接在空间上(沿着高度和宽度)是局部的,但始终沿着输入
体积的整个深度。
3 Spatial arrangement(Output volume)
three hyperparameters:
(1)depth:correspond to the number of filters(each learning to look for something different in the input)
(2)stride
(3)zero-padding:use it to preserve the spatial size fo the input volume so the input and output width and heigth are the same.
Realization of Convolutiona Netural Network
- Input W1*H1*D1
- hyperparameters:the number of filters K,spaital extent F ,stride S,zero-padding P
- Output W2*H2*D2:
H2=(H1-F+2P)/S+1
D2=K
Implementation as Matrix Multiplication(from CS231)
- ForwardStretch the input image into columns in an operation called im2col.
1、X_col
input image[227*227*3]
filter[11*11*3],stride=4 =>strech each block into a colunm of vector of size 11*11*3=363
output (227-11)/4=55 =>55*55=3025
X_col is a matrix of im2col of size[363*3025](363:each receptive field,3025:the output map unit)
2、W_row
The weights of the CONV layer are stretched out into rows.
the number of the filters is 96.
W_row is a matrix of size [96*363]
3、The result of a convolution is matrix multiply (dot product) np.dot(W_row,X_col)=output [96*3025]
downside:use a lot of mermory,since some values in the input volume are replicated multiple times int X_col.
benefit:there are many efficient implementations of Matrix Multiplication.