# 【深度学习】卷积神经网络的实现与理解

CS231N - Assignment2 - Q4 - ConvNet on CIFAR-10

# 卷积神经网络结构

$x_j^l=f_{relu}(pool(\sum_{i\in M_j}x_i^{l-1}\ast w_{ij}^l+b_{j}^l))$

## 卷积层的朴素（无加速算法）实现与理解

（在实际应用过程中，一般使用加速过程处理的卷积层，这里表现的是原始卷积版本）
###卷积层元素

###卷积层的前向计算

x代表图片image矩阵，w代表过滤层矩阵。过滤器与image一样窗口大小的部分乘积求和，这就是卷积过程。下面的动图就是卷积过程。

### 卷积层正向卷积过程代码实现

def conv_forward_naive(x, w, b, conv_param):
"""
A naive implementation of the forward pass for a convolutional layer.

The input consists of N data points, each with C channels, height H and
width W. We convolve each input with F different filters, where each filter
spans all C channels and has height HH and width HH.

Input:
- x: Input data of shape (N, C, H, W)
- w: Filter weights of shape (F, C, HH, WW)
- b: Biases, of shape (F,)
- conv_param: A dictionary with the following keys:
- 'stride': The number of pixels between adjacent receptive fields in the
horizontal and vertical directions.
- 'pad': The number of pixels that will be used to zero-pad the input.

Returns a tuple of:
- out: Output data, of shape (N, F, H', W') where H' and W' are given by
H' = 1 + (H + 2 * pad - HH) / stride
W' = 1 + (W + 2 * pad - WW) / stride
- cache: (x, w, b, conv_param)
"""
###########################################################################
# TODO: Implement the convolutional forward pass.                  #
# Hint: you can use the function np.pad for padding.               #    ###########################################################################
N, C, H, W = x.shape
F, C, HH, WW = w.shape
stride = conv_param['stride']

##计算卷积后新矩阵的大小并分配全零值占位
new_H = 1 + int((H + 2 * pad - HH) / stride)
new_W = 1 + int((W + 2 * pad - WW) / stride)
out = np.zeros([N, F, new_H, new_W])

##卷积开始
for n in range(N):
for f in range(F):
##临时分配(new_H, new_W)大小的全便宜香卷积矩阵，（即提前加上偏移项b[f]）
conv_newH_newW = np.ones([new_H, new_W])*b[f]
for c in range(C):
for i in range(new_H):
for j in range(new_W):
conv_newH_newW[i, j] +=  np.sum( pedded_x[i*stride:i*stride+HH, j*stride:j*stride+WW] * w[f,c,:,:] )
out[n,f] = conv_newH_newW
###########################################################################
#                             END OF YOUR CODE                     #
###########################################################################
cache = (x, w, b, conv_param)
return out, cache


###卷积层的个人理解

####1. 实现某像素点多通道以及周围信息的整合

这么说起来“Convolution”翻译为“卷和”更恰当，其实卷积的“积”是积分的意思


####2. 我们先讲一个丧心病狂的故事

女友再狠一点，频率越来越高，以至于你都辨别不清时间间隔了。那么，求和就变成积分了。这就是“卷积”一词的由来。


这么一解释卷积层的解释果然很明显。。。。嗯，对。。。。我自己都信了。。。。


#### 3. 图像处理中模板的概念

$\frac{1}{9} \begin{bmatrix} 1&1&1\\ 1&1&1\\ 1&1&1 \end{bmatrix}$

###卷积层反向求导

$x_j^l=f_{relu}(pool(\sum_{i\in M_j}x_i^{l-1}\ast w_{ij}^l+b_{j}^l))$
$f_{relu}(-)$为relu激活函数，池化操作用$pool(-)$表示,x为输入数据代表图像像素矩阵，w为权重代表过滤器（卷积核），b代表偏置。

$g(x*w+b)=g(out)\\ out=x*w+b \\ \frac{\partial g}{\partial x}=\frac{\partial g}{\partial out}*\frac{\partial out}{\partial x}$

### 卷积层反向求导过程代码实现

def conv_backward_naive(dout, cache):
"""
A naive implementation of the backward pass for a convolutional layer.

Inputs:
- dout: Upstream derivatives.
- cache: A tuple of (x, w, b, conv_param) as in conv_forward_naive

Returns a tuple of:
- dx: Gradient with respect to x
- dw: Gradient with respect to w
- db: Gradient with respect to b
"""
###########################################################################
# TODO: Implement the convolutional backward pass.             #
###########################################################################
# 数据准备
x, w, b, conv_param = cache
stride = conv_param['stride']
F, C, HH, WW = w.shape
N, C, H, W = x.shape
N, F, new_H, new_W = dout.shape

# 下面，我们模拟卷积，首先填充x。
mode='constant',
constant_values=0)
dw = np.zeros_like(w)
db = np.zeros_like(b)

for n in range(N):  # 第n个图像
for f in range(F):  # 第f个过滤器
for i in range(new_H):
for j in range(new_W):
db[f] += dout[n, f, i, j] #求导为1，无争议
dw[f] += padded_x[n, :, i*stride : HH + i*stride, j*stride : WW + j*stride] * dout[n, f, i, j]
padded_dx[n, :, i*stride : HH + i*stride, j*stride : WW + j*stride] += w[f] * dout[n, f, i, j]
# 反填充
###########################################################################
#                             END OF YOUR CODE                 #  ###########################################################################
return dx, dw, db


## 池化层的朴素（无加速算法）实现与理解

### 最大池化操作的前向计算

def max_pool_forward_naive(x, pool_param):
"""
A naive implementation of the forward pass for a max pooling layer.

Inputs:
- x: Input data, of shape (N, C, H, W)
- pool_param: dictionary with the following keys:
- 'pool_height': The height of each pooling region
- 'pool_width': The width of each pooling region
- 'stride': The distance between adjacent pooling regions

Returns a tuple of:
- out: Output data
- cache: (x, pool_param)
"""
###########################################################################
# TODO: Implement the max pooling forward pass                      #
###########################################################################
# 准备数据
N, C, H, W = x.shape
pool_height = pool_param['pool_height']
pool_width  = pool_param['pool_width']
pool_stride = pool_param['stride']
new_H = 1 + int((H - pool_height) / pool_stride)
new_W = 1 + int((W - pool_width) / pool_stride)

out = np.zeros([N, C, new_H, new_W])
for n in range(N):
for c in range(C):
for i in range(new_H):
for j in range(new_W):
out[n,c,i,j] = np.max(x[n, c, i*pool_stride : i*pool_stride+pool_height, j*pool_stride : j*pool_stride+pool_width])
###########################################################################
#                             END OF YOUR CODE                            #
###########################################################################
cache = (x, pool_param)
return out, cache


### 最大池化的反向求导

def max_pool_backward_naive(dout, cache):
"""
A naive implementation of the backward pass for a max pooling layer.

Inputs:
- dout: Upstream derivatives
- cache: A tuple of (x, pool_param) as in the forward pass.

Returns:
- dx: Gradient with respect to x
"""
###########################################################################
# TODO: Implement the max pooling backward pass                     #
###########################################################################
# 数据准备
x, pool_param = cache
N, C, H, W = x.shape
pool_height = pool_param['pool_height']
pool_width  = pool_param['pool_width']
pool_stride = pool_param['stride']
new_H = 1 + int((H - pool_height) / pool_stride)
new_W = 1 + int((W - pool_width) / pool_stride)
dx = np.zeros_like(x)
for n in range(N):
for c in range(C):
for i in range(new_H):
for j in range(new_W):
window = x[n, c, i * pool_stride: i * pool_stride + pool_height,j * pool_stride: j * pool_stride + pool_width]
dx[n, c, i * pool_stride: i * pool_stride + pool_height, j * pool_stride: j * pool_stride + pool_width] = (window == np.max(window))*dout[n,c,i,j]
###########################################################################
#                             END OF YOUR CODE                            #
###########################################################################
return dx


### 三明治卷积层（Convolutional “sandwich” layers）

“三明治”卷积层是我得到资料是斯坦福大学CS231n的专门讲法，其实就是将多个操作组合成一个常用模式。前文我一直说的C-R-P组合，可以看做一种三明治卷积层。卷积神经网络在实际应用上，也往往跳过底层实现，直接面向组合操作。