深度学习——激活函数sigmoid与relu的反向传播原理

最新推荐文章于 2024-08-21 21:03:27 发布

github_37462634

最新推荐文章于 2024-08-21 21:03:27 发布

阅读量3.5k

点赞数 4

分类专栏：深度学习文章标签：激活函数 sigmoid relu 反向传播

本文链接：https://blog.csdn.net/github_37462634/article/details/100178257

版权

深度学习专栏收录该内容

1 篇文章 1 订阅

订阅专栏

深度学习——激活函数sigmoid与relu的反向传播原理

一. sigmoid函数
二. sigmoid函数反向传播原理
三. relu函数
四. relu函数反向传播原理

一. sigmoid函数

sigmoid函数公式
$σ(z)=\frac{1}{1+e^{-z}}\tag{1}$
sigmoid函数Python实现

def sigmoid(Z):
    """
    Implements the sigmoid activation in numpy
    
    Arguments:
    Z -- numpy array of any shape
    
    Returns:
    A -- output of sigmoid(z), same shape as Z
    cache -- returns Z as well, useful during backpropagation
    """
    
    A = 1/(1+np.exp(-Z))
    cache = Z

    return A, cache

注：因为反向传播要用到Z，所以先将其储存在cache里

二. sigmoid函数反向传播原理

sigmoid函数导数
$σ'(z)=σ(z)*(1-σ(z))\tag{2}$
sigmoid函数反向传播原理
在第 $l$ 层神经网络，正向传播计算公式如下：
$Z^{[l]}=W^{[l]}A^{[l-1]} + b^{[l]}\tag{3}$
$A^{[l]} = σ(Z^{[l]})\tag{4}$

其中(1)为线性部分，(2)为激活部分，激活函数为sigmoid函数
在反向传播中，计算到第 $l$ 层时，会通过后一层得到 $dA^{[l]}$ (即 $\frac{\partial \mathcal{L} }{\partial A^{[l]}}$ ，其中 $\mathcal{L}$ 为成本函数)
当前层需要计算 $dZ^{[l]}$ (即 $\frac{\partial \mathcal{L} }{\partial Z^{[l]}}$ )，公式如下：
$dZ^{[l]} = \frac{\partial \mathcal{L} }{\partial Z^{[l]}} = \frac{\partial \mathcal{L} }{\partial A^{[l]}} * \frac{\partial A^{[l]} }{\partial Z^{[l]}} = dA * σ'(Z^{[l]}) = dA * σ(z)*(1-σ(z))\tag{5}$
因此实现代码如下：
sigmoid函数反向传播Python实现

def sigmoid_backward(dA, cache):
    """
    Implement the backward propagation for a single SIGMOID unit.
    
    Arguments:
    dA -- post-activation gradient, of any shape
    cache -- 'Z' where we store for computing backward propagation efficiently
    
    Returns:
    dZ -- Gradient of the cost with respect to Z
    """

    Z = cache

    s = 1/(1+np.exp(-Z))
    dZ = dA * s * (1-s)

    assert (dZ.shape == Z.shape)

    return dZ

三. relu函数

relu函数公式
$\begin{cases} z & z > 0 \\ 0 & z \leq 0 \end{cases}\tag{6}$
等价于
$max(0,z)\tag{7}$
relu函数Python实现

def relu(Z):
    """
    Implement the RELU function.
    
    Arguments:
    Z -- Output of the linear layer, of any shape
    
    Returns:
    A -- Post-activation parameter, of the same shape as Z
    cache -- a python dictionary containing "A" ; stored for computing the backward pass efficiently
    """

    A = np.maximum(0,Z)

    assert(A.shape == Z.shape)
    
    cache = Z 
    return A, cache

注：因为反向传播要用到Z，所以先将其储存在cache里

四. relu函数反向传播原理

relu函数导数
$\begin{cases} 1 & z > 0 \\ 0 & z \leq 0 \end{cases}\tag{8}$
relu函数反向传播原理
与sigmoid同理，正向传播时，计算公式如下：
$Z^{[l]}=W^{[l]}A^{[l-1]} + b^{[l]}\tag{9}$
$A^{[l]} = relu(Z^{[l]})\tag{10}$
反向传播时， $dZ^{[l]}$ 的计算公式如下：
$dZ^{[l]} = \frac{\partial \mathcal{L} }{\partial Z^{[l]}} = \frac{\partial \mathcal{L} }{\partial A^{[l]}} * \frac{\partial A^{[l]} }{\partial Z^{[l]}} = dA * relu'(Z^{[l]}) = \begin{cases} dA_{ij} & Z_{ij} > 0 \\ 0 & Z_{ij} \leq 0 \end{cases}\tag{11}$
因此实现的代码如下：
relu函数反向传播Python实现

def relu_backward(dA, cache):
    """
    Implement the backward propagation for a single RELU unit.
    
    Arguments:
    dA -- post-activation gradient, of any shape
    cache -- 'Z' where we store for computing backward propagation efficiently

    Returns:
    dZ -- Gradient of the cost with respect to Z
    """
    
    Z = cache
    dZ = np.array(dA, copy=True) # just converting dz to a correct object.
    
    # When z <= 0, you should set dz to 0 as well. 
    dZ[Z <= 0] = 0
    
    assert (dZ.shape == Z.shape)

    return dZ