【DL】CNN的前向传播和反向传播（python手动实现）

BassieYuan

已于 2022-04-04 11:01:32 修改

阅读量1.8k

点赞数

分类专栏：深度学习与计算机视觉文章标签：深度学习卷积神经网络

于 2022-04-03 23:44:24 首次发布

本文链接：https://blog.csdn.net/weixin_45780839/article/details/123938500

版权

深度学习与计算机视觉专栏收录该内容

1 篇文章 0 订阅

订阅专栏

卷积层的前向传播和反向传播

说明

本文中，只实现一层卷积层的正反向传播（带激活函数Relu），实现的是多通道输入，多通道输出的。之前对深度学习的理解大多止于pytorch里面现成的API，这还是第一次摆脱import torch，若有错误还烦请帮忙指正~
为了更加方便地表示张量，我们这里调用numpy包。

前向传播

原理

$a^{l}=\sigma\left(z^{l}\right)=\sigma\left(a^{l-1} * W^{l}+b^{l}\right)$

其中， $\sigma$ 是激活函数，在本文中使用Relu; $a^{l-1}$ 是该卷积层的输入， $a^{l}$ 是经过激活函数后的输出， $z^{l}$ 是未经过激活函数的输出， $W^{l}$ 是卷积核， $b^{l}$ 是该层的偏置。

代码

激活函数Relu的前向传播，在输入大于等于零时保持原值，小于零时输出置零。

def Rulu(z):
	z[z<0]=0
	return z

卷积层的前向传播的代码如下所示（带激活函数Relu）。
$X ， W ， b$ 分别与 $a^{l-1}， W^{l}， b^{l}$ 相对应；输出 $O u t$ 即对应 $a^{l}$ 。

def conv_forward(X, W, b, stride=(1,1), padding=(0,0)):
    # number of sample, number of channel, height, widtd  (input X)
    m,c,Ih,Iw=X.shape
    # dimension of filter, number of channel, height, width (kernel W)
    f,_,Kw,Kh=W.shape
    #size of stride and padding
    Sw,Sh=stride
    Pw,Ph=padding
    # calculate the width and height of the output
    Oh = int( 1 + (Ih + 2 * Ph - Kh) / Sh )
    Ow = int( 1 + (Iw + 2 * Pw - Kw) / Sw )
    # pre-allocate output Out
    # number of sample, number of channel, height, width  (output Out)
    Out=np.zeros([m, f, Oh, Ow]) 
    X_pad = np.zeros((m, c, Ih +2 * Ph, Iw +2 * Pw))
    X_pad[:,:,Ph:Ph+Ih,Pw:Pw+Iw]= X
    
    # multi in (c channels), multi out (f channels)
    # dimension of filter, also the number of output channel
    for n in range(Out.shape[1]):
        # consider the multi-in-single-out situation
        for i in range(Out.shape[2]):
            for j in range(Out.shape[3]):
                # the m samples are dealt in parallel
                # (m,c,Ih,Iw) * (c,Kh,Kw) = (m,c,Oh,Ow) 
                # sum -> m*Oh*Ow
                Out[:,n,i,j] = np.sum(X_pad[:, :, i*Sh : i*Sh+Kh, j*Sw : j*Sw+Kw] * W[n, :, :, :], axis=(1, 2, 3))
        #bias added in each dimension of filter
        Out[:,n,:,:]+=b[n]
        #Relu_forward
        relu(Out)                
    return Out

反向传播

原理

我们要通过反向传播求出损失函数 $J (W, b)$ 对 $z^{l}, W^{l}, b^{l}$ 三者的偏导数。其中记
$\delta^{l}=\frac{\partial J(W, b)}{\partial z^{l}}$

$\delta^{l-1}=\delta^{l} * \operatorname{rot} 180\left(W^{l}\right) \odot \sigma^{\prime}\left(z^{l-1}\right)$
其中， $\odot$ 代表Hadamard积，对于两个维度相同的向量： $A=(a_{1},a_{2},...,a_{n})^{T}$ ， $B=(b_{1},b_{2},...,b_{n})^{T}$ ，则 $A⊙B=(a_{1}b_{1},a_{2}b_{2},...,a_{n}b_{n})^{T}$ 。而 $\sigma^{\prime}(z^{l-1})$ 为Relu函数的导数，即当函数的自变量小于零时，导数为零，否则为1。至于为何要旋转180度，链接1的文章中有解释。
$\frac{\partial J(W, b)}{\partial W^{l}}=a^{l-1} * \delta^{l}$

$\frac{\partial J(W, b)}{\partial b^{l}}=\sum_{u, v}\left(\delta^{l}\right)_{u, v}$

代码

损失函数 $J (W, b)$ 对 $z^{l}, a^{l-1},W^{l}, b^{l},z^{l-1}$ 三者的偏导数分别记为dz,dx,dw,db,dz0

def conv_backward(dz, X, W, b, stride=(1,1), padding=(0,0)):  
    """
    dz: Gradient with respect to z
    dz0: Gradient with respect to z of the former convolutional layer
    dx: Gradient with respect to x
    dw: Gradient with respect to w
    db: Gradient with respect to b
    """
    m, f, _, _ = dz.shape
    m, c, Ih, Iw = X.shape
    _,_,Kh,Kw = W.shape
    Sw,Sh=stride
    Pw,Ph=padding
  
    dx, dw, db = np.zeros_like(X), np.zeros_like(W), np.zeros_like(b)
    X_pad = np.pad(X, [(0,0), (0,0), (Ph,Ph), (Pw,Pw)], 'constant')
    dx_pad = np.pad(dx, [(0,0), (0,0), (Ph,Ph), (Pw,Pw)], 'constant')
    db = np.sum(dz, axis=(0,2,3))
 
    for k in range(dz.shape[0]):
        for i in range(dz.shape[2]):
            for j in range(dz.shape[3]):
                X_w = X_pad[k, :, i * Sh : i * Sh + Kh, j * Sw : j * Sw + Kw]
 
            for n in range(f):
                #f,c,Kw,Kh
                dw[n] += X_w* dz[k, n, i, j] 
                dx_pad[k, :, i * Sh : i * Sh + Kh, j * Sw : j * Sw + Kw] += np.flip(W[n],axis=(1,2)) * dz[k, n, i, j]
 
    dx = dx_pad[:, :, Ph:Ph+Ih, Pw:Pw+Iw]
    dz0 = dx
    dz0[dz0<0]=0 #Relu
 
    return dx, dw, db, dz0

完整代码

import numpy as np

def relu(z):
    z[z<0]=0
    return z;

def conv_forward(X, W, b, stride=(1,1), padding=(0,0)):
    # number of sample, number of channel, height, widtd  (input X)
    m,c,Ih,Iw=X.shape
    # dimension of filter, number of channel, height, width (kernel W)
    f,_,Kw,Kh=W.shape
    #size of stride and padding
    Sw,Sh=stride
    Pw,Ph=padding
    # calculate the width and height of the output
    Oh = int( 1 + (Ih + 2 * Ph - Kh) / Sh )
    Ow = int( 1 + (Iw + 2 * Pw - Kw) / Sw )
    # pre-allocate output O
    # number of sample, number of channel, height, width  (output O)
    Out=np.zeros([m, f, Oh, Ow]) 
    X_pad = np.zeros((m, c, Ih +2 * Ph, Iw +2 * Pw))
    X_pad[:,:,Ph:Ph+Ih,Pw:Pw+Iw]= X
    
    # multi in (c channels), multi out (f channels)
    # dimension of filter, also the number of output channel
    for n in range(Out.shape[1]):
        # consider the multi-in-single-out situation
        for i in range(Out.shape[2]):
            for j in range(Out.shape[3]):
                # the m samples are dealt in parallel
                # (m,c,Ih,Iw) * (c,Kh,Kw) = (m,c,Oh,Ow) 
                # sum -> m*Oh*Ow
                Out[:,n,i,j] = np.sum(X_pad[:, :, i*Sh : i*Sh+Kh, j*Sw : j*Sw+Kw] * W[n, :, :, :], axis=(1, 2, 3))
        #bias added in each dimension of filter
        Out[:,n,:,:]+=b[n]
        #Relu_forward
        relu(Out)                
    return Out

def conv_backward(dz, X, W, b, stride=(1,1), padding=(0,0)):  
    """
    dz: Gradient with respect to z
    dz0: Gradient with respect to z of the former convolutional layer
    dx: Gradient with respect to x
    dw: Gradient with respect to w
    db: Gradient with respect to b
    """
    m, f, _, _ = dz.shape
    m, c, Ih, Iw = X.shape
    _,_,Kh,Kw = W.shape
    Sw,Sh=stride
    Pw,Ph=padding
  
    dx, dw, db = np.zeros_like(X), np.zeros_like(W), np.zeros_like(b)
    X_pad = np.pad(X, [(0,0), (0,0), (Ph,Ph), (Pw,Pw)], 'constant')
    dx_pad = np.pad(dx, [(0,0), (0,0), (Ph,Ph), (Pw,Pw)], 'constant')
    db = np.sum(dz, axis=(0,2,3))
 
    for k in range(dz.shape[0]):
        for i in range(dz.shape[2]):
            for j in range(dz.shape[3]):
                x_window = X_pad[k, :, i * Sh : i * Sh + Kh, j * Sw : j * Sw + Kw]
 
            for n in range(f):
                #f,c,Kw,Kh
                dw[n] += x_window * dz[k, n, i, j] 
                dx_pad[k, :, i * Sh : i * Sh + Kh, j * Sw : j * Sw + Kw] += np.flip(W[n],axis=(1,2)) * dz[k, n, i, j]
 
    dx = dx_pad[:, :, Ph:Ph+Ih, Pw:Pw+Iw]
    dz0 = dx
    dz0[dz0<0]=0 #Relu
 
    return dx, dw, db, dz0