# 卷积神经网络结构

${x}_{j}^{l}=pool\left({f}_{relu}\left(\sum _{i\in {M}_{j}}{x}_{i}^{l-1}\ast {w}_{ij}^{l}+{b}_{j}^{l}\right)\right)$

## 卷积层的朴素（无加速算法）实现与理解

（在实际应用过程中，一般使用加速处理的卷积层，这里表现的是原始版本）

### 卷积层的前向计算

x代表图片image矩阵，w代表过滤层矩阵。各个过滤器分别与image的一部分进行点积，用点积结果排列成结果，这就是卷积过程。下面的动图就是卷积过程。

### 卷积层正向卷积过程代码实现

def conv_forward_naive(x, w, b, conv_param):
"""
A naive implementation of the forward pass for a convolutional layer.

The input consists of N data points, each with C channels, height H and
width W. We convolve each input with F different filters, where each filter
spans all C channels and has height HH and width HH.

Input:
- x: Input data of shape (N, C, H, W)
- w: Filter weights of shape (F, C, HH, WW)
- b: Biases, of shape (F,)
- conv_param: A dictionary with the following keys:
- 'stride': The number of pixels between adjacent receptive fields in the
horizontal and vertical directions.
- 'pad': The number of pixels that will be used to zero-pad the input.

Returns a tuple of:
- out: Output data, of shape (N, F, H', W') where H' and W' are given by
H' = 1 + (H + 2 * pad - HH) / stride
W' = 1 + (W + 2 * pad - WW) / stride
- cache: (x, w, b, conv_param)
"""
##########################################################################################
# TODO: Implement the convolutional forward pass.    任务:完成带卷积操作的前向传播          #
##########################################################################################
N, C, H, W = x.shape        # N个样本，C个通道，H的高度，W的宽度
F, C, HH, WW = w.shape      # F个过滤器，C个通道，HH的过滤器高度，WW的过滤器宽度
stride = conv_param['stride']   # 过滤器每次移动的步长

## 计算卷积结果矩阵的大小并分配全零值占位
new_H = 1 + int((H + 2 * pad - HH) / stride)
new_W = 1 + int((W + 2 * pad - WW) / stride)
out = np.zeros([N, F, new_H, new_W])

## 卷积开始
for n in range(N):
for f in range(F):
## 临时分配(new_H, new_W)大小的全偏移项卷积矩阵，（即提前加上偏移项b[f]）
conv_newH_newW = np.ones([new_H, new_W])*b[f]
for c in range(C):
for i in range(new_H):
for j in range(new_W):
conv_newH_newW[i, j] +=  np.sum(pedded_x[i * stride: i * stride+HH, j * stride: j * stride + WW] * w[f, c, :, :] )
out[n,f] = conv_newH_newW
###########################################################################
#                             END OF YOUR CODE                            #
###########################################################################
cache = (x, w, b, conv_param)
return out, cache

### 卷积层的个人理解

#### 1. 实现某像素点多通道以及周围信息的整合

这么说起来“Convolution”翻译为“卷和”更恰当，其实卷积的“积”就是积分的意思

#### 2. 我们先讲一个丧心病狂的故事

女友再狠一点，频率越来越高，以至于你都辨别不清时间间隔了。那么，求和就变成积分了。使用“积分运算的卷积”就是我们在大学数学《概率论与数理统计》中学到的“卷积运算”。

这么一解释卷积层的解释果然很明显。。。。嗯，对。。。。我自己都信了。。。。

#### 3. 图像处理中模板的概念

$\frac{1}{9}\ast \left[\begin{array}{ccc}1& 1& 1\\ 1& 1& 1\\ 1& 1& 1\end{array}\right]$

### 卷积层反向求导

${x}_{j}^{l}=pool\left({f}_{relu}\left(\sum _{i\in {M}_{j}}{x}_{i}^{l-1}\ast {w}_{ij}^{l}+{b}_{j}^{l}\right)\right)$

${f}_{relu}\left(-\right)$$f_{relu}(-)$为relu激活函数，池化操作用$pool\left(-\right)$$pool(-)$表示，像像素矩阵，w代表过滤层（卷积核），b代表偏置。编写卷积层反向传播时，暂时不用考虑池化层，与激活函数。我们可以用g()代指卷积层后所有的操作。所以这一层的反向对x求导可以简化为如下操作。
$\left\{\begin{array}{c}g\left(x\ast w+b\right)=g\left(out\right)\\ out=x\ast w+b\\ \frac{\mathrm{\partial }g}{\mathrm{\partial }x}=\frac{\mathrm{\partial }g}{\mathrm{\partial }out}\ast \frac{\mathrm{\partial }out}{\mathrm{\partial }x}\end{array}$

### 卷积层反向求导过程代码实现

def conv_backward_naive(dout, cache):
"""
A naive implementation of the backward pass for a convolutional layer.

Inputs:
- dout: Upstream derivatives.
- cache: A tuple of (x, w, b, conv_param) as in conv_forward_naive

Returns a tuple of:
- dx: Gradient with respect to x
- dw: Gradient with respect to w
- db: Gradient with respect to b
"""
###########################################################################
# TODO: Implement the convolutional backward pass.  任务: 卷积层的反向传播 #
###########################################################################
# 数据准备
x, w, b, conv_param = cache
stride = conv_param['stride']
F, C, HH, WW = w.shape
N, C, H, W = x.shape
N, F, new_H, new_W = dout.shape

# 下面，我们模拟卷积，首先填充x。
mode='constant',
constant_values=0)
dw = np.zeros_like(w)
db = np.zeros_like(b)

for n in range(N):  # 第n个图像
for f in range(F):  # 第f个过滤器
for i in range(new_H):
for j in range(new_W):
db[f] += dout[n, f, i, j] # dg对db求导为1*dout
dw[f] += padded_x[n, :, i*stride : HH + i*stride, j*stride : WW + j*stride] * dout[n, f, i, j]
padded_dx[n, :, i*stride : HH + i*stride, j*stride : WW + j*stride] += w[f] * dout[n, f, i, j]
# 去掉填充部分
###########################################################################
#                             END OF YOUR CODE                            #
###########################################################################
return dx, dw, db

## 池化层的朴素（无加速算法）实现与理解

### 最大池化操作的前向计算

def max_pool_forward_naive(x, pool_param):
"""
A naive implementation of the forward pass for a max pooling layer.

Inputs:
- x: Input data, of shape (N, C, H, W)
- pool_param: dictionary with the following keys:
- 'pool_height': The height of each pooling region
- 'pool_width': The width of each pooling region
- 'stride': The distance between adjacent pooling regions

Returns a tuple of:
- out: Output data
- cache: (x, pool_param)
"""
###########################################################################
# TODO: Implement the max pooling forward pass  任务:实现正向最大池化操作   #
###########################################################################
# 准备数据
N, C, H, W = x.shape
pool_height = pool_param['pool_height'] # 池化过滤器高度
pool_width  = pool_param['pool_width']  # 池化过滤器宽度
pool_stride = pool_param['stride']      # 移动步长
new_H = 1 + int((H - pool_height) / pool_stride)    # 池化结果矩阵高度
new_W = 1 + int((W - pool_width) / pool_stride)     # 池化结果矩阵宽度

out = np.zeros([N, C, new_H, new_W])
for n in range(N):
for c in range(C):
for i in range(new_H):
for j in range(new_W):
out[n,c,i,j] = np.max(x[n, c, i*pool_stride : i*pool_stride+pool_height, j*pool_stride : j*pool_stride+pool_width])
###########################################################################
#                             END OF YOUR CODE                            #
###########################################################################
cache = (x, pool_param)
return out, cache

### 最大池化的反向求导

def max_pool_backward_naive(dout, cache):
"""
A naive implementation of the backward pass for a max pooling layer.

Inputs:
- dout: Upstream derivatives
- cache: A tuple of (x, pool_param) as in the forward pass.

Returns:
- dx: Gradient with respect to x
"""
###########################################################################
#   TODO: Implement the max pooling backward pass  任务:反向最大池化操作    #
###########################################################################
# 数据准备
x, pool_param = cache
N, C, H, W = x.shape
pool_height = pool_param['pool_height']
pool_width  = pool_param['pool_width']
pool_stride = pool_param['stride']
new_H = 1 + int((H - pool_height) / pool_stride)
new_W = 1 + int((W - pool_width) / pool_stride)
dx = np.zeros_like(x)
for n in range(N):
for c in range(C):
for i in range(new_H):
for j in range(new_W):
window = x[n, c, i * pool_stride: i * pool_stride + pool_height,j * pool_stride: j * pool_stride + pool_width]
dx[n, c, i * pool_stride: i * pool_stride + pool_height, j * pool_stride: j * pool_stride + pool_width] = (window == np.max(window))*dout[n,c,i,j]
###########################################################################
#                             END OF YOUR CODE                            #
###########################################################################
return dx

### 三明治卷积层（Convolutional “sandwich” layers）

“三明治”卷积层是斯坦福大学CS231n的专门讲法，其实就是将多个操作组合成一个常用模式。前文我们一直说的C-R-P组合，可以看做一种三明治卷积层。卷积神经网络在实际应用上，也往往跳过底层实现，直接面向组合操作。

## 空间批量正则化（空间批量归一化Spatial Batch Normalization，SBN）

BN（Batch Normalization，批量正则化）的提出说到底还是为了防止训练过程中的“梯度弥散”。在BN中，通过将activation规范为均值和方差一致的手段使得原本会减小的activation变大，避免趋近于0的数的出现。但是CNN的BN层有些不同，我们需要稍加改动，变成更适合训练的“SBN”。

def spatial_batchnorm_forward(x, gamma, beta, bn_param):
"""
Computes the forward pass for spatial batch normalization.

Inputs:
- x: Input data of shape (N, C, H, W)
- gamma: Scale parameter, of shape (C,)
- beta: Shift parameter, of shape (C,)
- bn_param: Dictionary with the following keys:
- mode: 'train' or 'test'; required
- eps: Constant for numeric stability
- momentum: Constant for running mean / variance. momentum=0 means that
old information is discarded completely at every time step, while
momentum=1 means that new information is never incorporated. The
default of momentum=0.9 should work well in most situations.
- running_mean: Array of shape (D,) giving running mean of features
- running_var Array of shape (D,) giving running variance of features

Returns a tuple of:
- out: Output data, of shape (N, C, H, W)
- cache: Values needed for the backward pass
"""

out, cache = None, None
###########################################################################
# TODO: Implement the forward pass for spatial batch normalization.       #
# 任务:实现正向的SBN层                                                     #
# HINT: You can implement spatial batch normalization using the vanilla   #
# version of batch normalization defined above. Your implementation should#
# be very short; ours is less than five lines.                            #
# 提示: 可以按照上文中的提示实现一个原始的批量归一化过程。全部代码应小于5行     #
###########################################################################
N, C, H, W = x.shape
x_new = x.transpose(0, 2, 3, 1).reshape(N*H*W, C)
out, cache = batchnorm_forward(x_new, gamma, beta, bn_param)
out = out.reshape(N, H, W, C).transpose(0, 3, 1, 2)
###########################################################################
#                             END OF YOUR CODE                            #
###########################################################################

return out, cache

def spatial_batchnorm_backward(dout, cache):
"""
Computes the backward pass for spatial batch normalization.

Inputs:
- dout: Upstream derivatives, of shape (N, C, H, W)
- cache: Values from the forward pass

Returns a tuple of:
- dx: Gradient with respect to inputs, of shape (N, C, H, W)
- dgamma: Gradient with respect to scale parameter, of shape (C,)
- dbeta: Gradient with respect to shift parameter, of shape (C,)
"""
dx, dgamma, dbeta = None, None, None

###########################################################################
# TODO: Implement the backward pass for spatial batch normalization.      #
# 任务: 实现反向的SBN层                                                    #
# HINT: You can implement spatial batch normalization using the vanilla   #
# version of batch normalization defined above. Your implementation should#
# be very short; ours is less than five lines.                            #
# 提示: 可以按照上文中的提示实现一个原始的批量归一化过程。全部代码应小于5行     #
###########################################################################
N, C, H, W = dout.shape
dout_new = dout.transpose(0, 2, 3, 1).reshape(N*H*W, C)
dx, dgamma, dbeta = batchnorm_backward(dout_new, cache)
dx = dx.reshape(N, H, W, C).transpose(0, 3, 1, 2)

###########################################################################
#                             END OF YOUR CODE                            #
###########################################################################

return dx, dgamma, dbeta


$\left[\begin{array}{ccc}1& 0.0000000001& ...\\ 13& 9876543210& ...\\ ...& ...& ...\end{array}\right]$

### 神经网络编写建议

1. 在你自己编写一个新的神经网络后,你要做的第一件事就是设计损失函数。当我们使用交叉熵损失函数时,我们预计误差为没有正则化项的随机权重。添加正则化项后，交叉损失函数值会上升。
2. 损失函数设计“合理”后,使用梯度检查，来验证你写的反向传播是正确的。（您可以在每一层使用小规模的人造数据和神经元来验证。）
3. 使用神经网络训练模型常常会遇到“过拟合”。造成这种现象的主要原因是：你训练的样本数量比较小。过拟合会产生非常小的训练误差，但却会造成非常高的验证误差。

## 结尾，再谈谈神经网络

### 仿生学与神经网络

“从人类学，仿生学的角度构建人工智能算法，而不仅仅是单纯的数学公式”，这是一个很有意思的想法，说不定是人工智能的下一场革命。

©️2019 CSDN 皮肤主题: 编程工作室 设计师: CSDN官方博客