深度学习——激活函数sigmoid与relu的反向传播原理

深度学习——激活函数sigmoid与relu的反向传播原理

一. sigmoid函数

  • sigmoid函数公式
    (1) σ ( z ) = 1 1 + e − z σ(z)=\frac{1}{1+e^{-z}}\tag{1} σ(z)=1+ez1(1)
  • sigmoid函数Python实现
def sigmoid(Z):
    """
    Implements the sigmoid activation in numpy
    
    Arguments:
    Z -- numpy array of any shape
    
    Returns:
    A -- output of sigmoid(z), same shape as Z
    cache -- returns Z as well, useful during backpropagation
    """
    
    A = 1/(1+np.exp(-Z))
    cache = Z

    return A, cache
  • 注:因为反向传播要用到Z,所以先将其储存在cache里

二. sigmoid函数反向传播原理

  • sigmoid函数导数
    (2) σ ′ ( z ) = σ ( z ) ∗ ( 1 − σ ( z ) ) σ'(z)=σ(z)*(1-σ(z))\tag{2} σ(z)=σ(z)(1σ(z))(2)

  • sigmoid函数反向传播原理
    在第 l l l 层神经网络,正向传播计算公式如下:
    (3) Z [ l ] = W [ l ] A [ l − 1 ] + b [ l ] Z^{[l]}=W^{[l]}A^{[l-1]} + b^{[l]}\tag{3} Z[l]=W[l]A[l1]+b[l](3)
    (4) A [ l ] = σ ( Z [ l ] ) A^{[l]} = σ(Z^{[l]})\tag{4} A[l]=σ(Z[l])(4)

    其中(1)为线性部分,(2)为激活部分,激活函数为sigmoid函数
    在反向传播中,计算到第 l l l 层时,会通过后一层得到 d A [ l ] dA^{[l]} dA[l] (即 ∂ L ∂ A [ l ] \frac{\partial \mathcal{L} }{\partial A^{[l]}} A[l]L,其中 L \mathcal{L} L为成本函数)
    当前层需要计算 d Z [ l ] dZ^{[l]} dZ[l] (即 ∂ L ∂ Z [ l ] \frac{\partial \mathcal{L} }{\partial Z^{[l]}} Z[l]L),公式如下:
    (5) d Z [ l ] = ∂ L ∂ Z [ l ] = ∂ L ∂ A [ l ] ∗ ∂ A [ l ] ∂ Z [ l ] = d A ∗ σ ′ ( Z [ l ] ) = d A ∗ σ ( z ) ∗ ( 1 − σ ( z ) ) dZ^{[l]} = \frac{\partial \mathcal{L} }{\partial Z^{[l]}} = \frac{\partial \mathcal{L} }{\partial A^{[l]}} * \frac{\partial A^{[l]} }{\partial Z^{[l]}} = dA * σ'(Z^{[l]}) = dA * σ(z)*(1-σ(z))\tag{5} dZ[l]=Z[l]L=A[l]LZ[l]A[l]=dAσ(Z[l])=dAσ(z)(1σ(z))(5)
    因此实现代码如下:

  • sigmoid函数反向传播Python实现

def sigmoid_backward(dA, cache):
    """
    Implement the backward propagation for a single SIGMOID unit.
    
    Arguments:
    dA -- post-activation gradient, of any shape
    cache -- 'Z' where we store for computing backward propagation efficiently
    
    Returns:
    dZ -- Gradient of the cost with respect to Z
    """

    Z = cache

    s = 1/(1+np.exp(-Z))
    dZ = dA * s * (1-s)

    assert (dZ.shape == Z.shape)

    return dZ

三. relu函数

  • relu函数公式
    (6) r e l u ( z ) = { z z > 0 0 z ≤ 0 relu(z) = \begin{cases} z & z > 0 \\ 0 & z \leq 0 \end{cases}\tag{6} relu(z)={z0z>0z0(6)
    等价于
    (7) r e l u ( z ) = m a x ( 0 , z ) relu(z) = max(0,z)\tag{7} relu(z)=max(0,z)(7)
  • relu函数Python实现
def relu(Z):
    """
    Implement the RELU function.
    
    Arguments:
    Z -- Output of the linear layer, of any shape
    
    Returns:
    A -- Post-activation parameter, of the same shape as Z
    cache -- a python dictionary containing "A" ; stored for computing the backward pass efficiently
    """

    A = np.maximum(0,Z)

    assert(A.shape == Z.shape)
    
    cache = Z 
    return A, cache
  • 注:因为反向传播要用到Z,所以先将其储存在cache里

四. relu函数反向传播原理

  • relu函数导数
    (8) r e l u ′ ( z ) = { 1 z > 0 0 z ≤ 0 relu'(z) = \begin{cases} 1 & z > 0 \\ 0 & z \leq 0 \end{cases}\tag{8} relu(z)={10z>0z0(8)

  • relu函数反向传播原理
    与sigmoid同理,正向传播时,计算公式如下:
    (9) Z [ l ] = W [ l ] A [ l − 1 ] + b [ l ] Z^{[l]}=W^{[l]}A^{[l-1]} + b^{[l]}\tag{9} Z[l]=W[l]A[l1]+b[l](9)
    (10) A [ l ] = r e l u ( Z [ l ] ) A^{[l]} = relu(Z^{[l]})\tag{10} A[l]=relu(Z[l])(10)
    反向传播时, d Z [ l ] dZ^{[l]} dZ[l]的计算公式如下:
    (11) d Z [ l ] = ∂ L ∂ Z [ l ] = ∂ L ∂ A [ l ] ∗ ∂ A [ l ] ∂ Z [ l ] = d A ∗ r e l u ′ ( Z [ l ] ) = { d A i j Z i j > 0 0 Z i j ≤ 0 dZ^{[l]} = \frac{\partial \mathcal{L} }{\partial Z^{[l]}} = \frac{\partial \mathcal{L} }{\partial A^{[l]}} * \frac{\partial A^{[l]} }{\partial Z^{[l]}} = dA * relu'(Z^{[l]}) = \begin{cases} dA_{ij} & Z_{ij} > 0 \\ 0 & Z_{ij} \leq 0 \end{cases}\tag{11} dZ[l]=Z[l]L=A[l]LZ[l]A[l]=dArelu(Z[l])={dAij0Zij>0Zij0(11)
    因此实现的代码如下:

  • relu函数反向传播Python实现

def relu_backward(dA, cache):
    """
    Implement the backward propagation for a single RELU unit.
    
    Arguments:
    dA -- post-activation gradient, of any shape
    cache -- 'Z' where we store for computing backward propagation efficiently

    Returns:
    dZ -- Gradient of the cost with respect to Z
    """
    
    Z = cache
    dZ = np.array(dA, copy=True) # just converting dz to a correct object.
    
    # When z <= 0, you should set dz to 0 as well. 
    dZ[Z <= 0] = 0
    
    assert (dZ.shape == Z.shape)

    return dZ
  • 3
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值