深度学习——激活函数sigmoid与relu的反向传播原理
一. sigmoid函数
- sigmoid函数公式
(1) σ ( z ) = 1 1 + e − z σ(z)=\frac{1}{1+e^{-z}}\tag{1} σ(z)=1+e−z1(1) - sigmoid函数Python实现
def sigmoid(Z):
"""
Implements the sigmoid activation in numpy
Arguments:
Z -- numpy array of any shape
Returns:
A -- output of sigmoid(z), same shape as Z
cache -- returns Z as well, useful during backpropagation
"""
A = 1/(1+np.exp(-Z))
cache = Z
return A, cache
- 注:因为反向传播要用到Z,所以先将其储存在cache里
二. sigmoid函数反向传播原理
-
sigmoid函数导数
(2) σ ′ ( z ) = σ ( z ) ∗ ( 1 − σ ( z ) ) σ'(z)=σ(z)*(1-σ(z))\tag{2} σ′(z)=σ(z)∗(1−σ(z))(2) -
sigmoid函数反向传播原理
在第 l l l 层神经网络,正向传播计算公式如下:
(3) Z [ l ] = W [ l ] A [ l − 1 ] + b [ l ] Z^{[l]}=W^{[l]}A^{[l-1]} + b^{[l]}\tag{3} Z[l]=W[l]A[l−1]+b[l](3)
(4) A [ l ] = σ ( Z [ l ] ) A^{[l]} = σ(Z^{[l]})\tag{4} A[l]=σ(Z[l])(4)
其中(1)为线性部分,(2)为激活部分,激活函数为sigmoid函数
在反向传播中,计算到第 l l l 层时,会通过后一层得到 d A [ l ] dA^{[l]} dA[l] (即 ∂ L ∂ A [ l ] \frac{\partial \mathcal{L} }{\partial A^{[l]}} ∂A[l]∂L,其中 L \mathcal{L} L为成本函数)
当前层需要计算 d Z [ l ] dZ^{[l]} dZ[l] (即 ∂ L ∂ Z [ l ] \frac{\partial \mathcal{L} }{\partial Z^{[l]}} ∂Z[l]∂L),公式如下:
(5) d Z [ l ] = ∂ L ∂ Z [ l ] = ∂ L ∂ A [ l ] ∗ ∂ A [ l ] ∂ Z [ l ] = d A ∗ σ ′ ( Z [ l ] ) = d A ∗ σ ( z ) ∗ ( 1 − σ ( z ) ) dZ^{[l]} = \frac{\partial \mathcal{L} }{\partial Z^{[l]}} = \frac{\partial \mathcal{L} }{\partial A^{[l]}} * \frac{\partial A^{[l]} }{\partial Z^{[l]}} = dA * σ'(Z^{[l]}) = dA * σ(z)*(1-σ(z))\tag{5} dZ[l]=∂Z[l]∂L=∂A[l]∂L∗∂Z[l]∂A[l]=dA∗σ′(Z[l])=dA∗σ(z)∗(1−σ(z))(5)
因此实现代码如下: -
sigmoid函数反向传播Python实现
def sigmoid_backward(dA, cache):
"""
Implement the backward propagation for a single SIGMOID unit.
Arguments:
dA -- post-activation gradient, of any shape
cache -- 'Z' where we store for computing backward propagation efficiently
Returns:
dZ -- Gradient of the cost with respect to Z
"""
Z = cache
s = 1/(1+np.exp(-Z))
dZ = dA * s * (1-s)
assert (dZ.shape == Z.shape)
return dZ
三. relu函数
- relu函数公式
(6) r e l u ( z ) = { z z > 0 0 z ≤ 0 relu(z) = \begin{cases} z & z > 0 \\ 0 & z \leq 0 \end{cases}\tag{6} relu(z)={z0z>0z≤0(6)
等价于
(7) r e l u ( z ) = m a x ( 0 , z ) relu(z) = max(0,z)\tag{7} relu(z)=max(0,z)(7) - relu函数Python实现
def relu(Z):
"""
Implement the RELU function.
Arguments:
Z -- Output of the linear layer, of any shape
Returns:
A -- Post-activation parameter, of the same shape as Z
cache -- a python dictionary containing "A" ; stored for computing the backward pass efficiently
"""
A = np.maximum(0,Z)
assert(A.shape == Z.shape)
cache = Z
return A, cache
- 注:因为反向传播要用到Z,所以先将其储存在cache里
四. relu函数反向传播原理
-
relu函数导数
(8) r e l u ′ ( z ) = { 1 z > 0 0 z ≤ 0 relu'(z) = \begin{cases} 1 & z > 0 \\ 0 & z \leq 0 \end{cases}\tag{8} relu′(z)={10z>0z≤0(8) -
relu函数反向传播原理
与sigmoid同理,正向传播时,计算公式如下:
(9) Z [ l ] = W [ l ] A [ l − 1 ] + b [ l ] Z^{[l]}=W^{[l]}A^{[l-1]} + b^{[l]}\tag{9} Z[l]=W[l]A[l−1]+b[l](9)
(10) A [ l ] = r e l u ( Z [ l ] ) A^{[l]} = relu(Z^{[l]})\tag{10} A[l]=relu(Z[l])(10)
反向传播时, d Z [ l ] dZ^{[l]} dZ[l]的计算公式如下:
(11) d Z [ l ] = ∂ L ∂ Z [ l ] = ∂ L ∂ A [ l ] ∗ ∂ A [ l ] ∂ Z [ l ] = d A ∗ r e l u ′ ( Z [ l ] ) = { d A i j Z i j > 0 0 Z i j ≤ 0 dZ^{[l]} = \frac{\partial \mathcal{L} }{\partial Z^{[l]}} = \frac{\partial \mathcal{L} }{\partial A^{[l]}} * \frac{\partial A^{[l]} }{\partial Z^{[l]}} = dA * relu'(Z^{[l]}) = \begin{cases} dA_{ij} & Z_{ij} > 0 \\ 0 & Z_{ij} \leq 0 \end{cases}\tag{11} dZ[l]=∂Z[l]∂L=∂A[l]∂L∗∂Z[l]∂A[l]=dA∗relu′(Z[l])={dAij0Zij>0Zij≤0(11)
因此实现的代码如下: -
relu函数反向传播Python实现
def relu_backward(dA, cache):
"""
Implement the backward propagation for a single RELU unit.
Arguments:
dA -- post-activation gradient, of any shape
cache -- 'Z' where we store for computing backward propagation efficiently
Returns:
dZ -- Gradient of the cost with respect to Z
"""
Z = cache
dZ = np.array(dA, copy=True) # just converting dz to a correct object.
# When z <= 0, you should set dz to 0 as well.
dZ[Z <= 0] = 0
assert (dZ.shape == Z.shape)
return dZ