声明: 此笔记为吴恩达(Andrew Ng)的深度学习课程学习后的总结,会根据自己的学习进度更新。
浅层神经网络
What’s a Nerual Network?
1.多层神经网络计算过程表示
用 [ ] 的上标来表示所在层的参数
#两层神经网络的计算过程
for i = 1 to m :
z [ 1 ] ( i ) = W [ 1 ] x ( i ) + b [ 1 ] z^{[1](i)}=W^[1]x^{(i)}+b^{[1]} z[1](i)=W[1]x(i)+b[1]
a [ 1 ] ( i ) = σ ( z [ 1 ] ( i ) ) a^{[1](i)}=\sigma(z^{[1](i)}) a[1](i)=σ(z[1](i))
z [ 2 ] ( i ) = W [ 2 ] x ( i ) + b [ 2 ] z^{[2](i)}=W^[2]x^{(i)}+b^{[2]} z[2](i)=W[2]x(i)+b[2]
a [ 2 ] ( i ) = σ ( z [ 2 ] ( i ) ) a^{[2](i)}=\sigma(z^{[2](i)}) a[2](i)=σ(z[2](i))
X
=
[
∣
∣
∣
x
(
1
)
x
(
1
)
⋅
⋅
⋅
x
(
m
)
∣
∣
∣
]
(1)
X = \left[ \begin{matrix} | & |& & |\\ x^{(1)} & x^{(1)} &··· & x^{(m)} \\ | & | & &| \end{matrix} \right]\tag{1}
X=⎣⎡∣x(1)∣∣x(1)∣⋅⋅⋅∣x(m)∣⎦⎤(1)
A
[
1
]
=
[
∣
∣
∣
a
[
1
]
(
1
)
a
[
1
]
(
2
)
⋅
⋅
⋅
a
[
1
]
(
m
)
∣
∣
∣
]
(2)
A^{[1]}= \left[ \begin{matrix} | & |& & |\\ a^{[1](1)} & a^{[1](2)} &··· & a^{[1](m)} \\ | & | & &| \end{matrix} \right]\tag{2}
A[1]=⎣⎡∣a[1](1)∣∣a[1](2)∣⋅⋅⋅∣a[1](m)∣⎦⎤(2)
2.激活函数(Activation Function )
(1). Sigmod Activation Function
g ( z ) = 1 1 + e − z g(z) = \frac{1}{1+e^{-z}} g(z)=1+e−z1
g ′ ( z ) = g ( z ) × [ 1 − g ( z ) ] g'(z) = g(z) \times [1-g(z)] g′(z)=g(z)×[1−g(z)]
(2). Tanh Activation Function
g ( z ) = e z − e − z e z + e − z g(z)=\frac{e^z-e^{-z}}{e^z+e^{-z}} g(z)=ez+e−zez−e−z
g ′ ( z ) = 1 − g ( z ) 2 g'(z)=1-g(z)^2 g′(z)=1−g(z)2
(3). ReLu Activation Function
g ( z ) = m a x ( 0 , z ) g(z)=max(0,z) g(z)=max(0,z)
g ′ ( z ) = { 0 i f z < 0 1 i f z ≥ 0 (1.ReLu函数及导数) g'(z)=\begin{cases} 0 & if \ z<0 \\ 1 & if \ z \geq 0 \end{cases} \tag{1.ReLu函数及导数} g′(z)={01if z<0if z≥0(1.ReLu函数及导数)
g ( z ) = m a x ( 0.01 ⋅ z , z ) g(z)=max(0.01 \cdot z,z) g(z)=max(0.01⋅z,z)
g ′ ( z ) = { 0.01 i f z < 0 1 i f z ≥ 0 (2.Leaky ReLu函数及导数) g'(z)=\begin{cases} 0.01 & if \ z<0 \\ 1 & if \ z \geq 0 \end{cases} \tag{2.Leaky ReLu函数及导数} g′(z)={0.011if z<0if z≥0(2.Leaky ReLu函数及导数)
3. 神经网络的梯度下降(Gradient descent for neural network
- forward propagation
Z [ 1 ] = w [ 1 ] ⋅ X + b [ 1 ] Z^{[1]}=w^{[1]} \cdot X+b^{[1]} Z[1]=w[1]⋅X+b[1]
A [ 1 ] = g [ 1 ] ( Z [ 1 ] ) A^{[1]}=g^{[1]}(Z^{[1]}) A[1]=g[1](Z[1])
Z [ 2 ] = w [ 2 ] ⋅ A [ 1 ] + b [ 2 ] Z^{[2]}=w^{[2]} \cdot A^{[1]}+b^{[2]} Z[2]=w[2]⋅A[1]+b[2]
A [ 2 ] = g [ 2 ] ( Z [ 2 ] ) = σ ( Z [ 2 ] ) (A,Z, X 都已向量化) A^{[2]}=g^{[2]}(Z^{[2]})= \sigma(Z^{[2]}) \tag{A,Z, X 都已向量化} A[2]=g[2](Z[2])=σ(Z[2])(A,Z, X 都已向量化)
-
back propagation
d Z [ 2 ] = A [ 2 ] − Y dZ^{[2]} = A^{[2]} - Y dZ[2]=A[2]−Y
d W [ 2 ] = 1 m ⋅ d Z [ 2 ] ⋅ A [ 1 ] T dW^{[2]} = \frac 1 m \cdot dZ^{[2]} \cdot A^{[1]T} dW[2]=m1⋅dZ[2]⋅A[1]T
d b [ 2 ] = 1 m ⋅ n p . s u m ( d Z [ 2 ] , a x i s = 1 , k e e p d i m s = t r u e ) db^{[2]} = \frac 1 m \cdot np.sum(dZ^{[2]},axis = 1, keepdims = true) db[2]=m1⋅np.sum(dZ[2],axis=1,keepdims=true)
d Z [ 1 ] = W [ 2 ] T ⋅ d Z [ 2 ] ⋅ g ′ [ 1 ] ( Z [ 1 ] ) dZ^{[1]} = W^{[2]T} \cdot dZ^{[2]} \cdot g'^{[1]}(Z^{[1]}) dZ[1]=W[2]T⋅dZ[2]⋅g′[1](Z[1])
d W [ 1 ] = 1 m ⋅ d Z [ 1 ] ⋅ X T dW^{[1]} = \frac 1 m \cdot dZ^{[1]} \cdot X^T dW[1]=m1⋅dZ[1]⋅XT
d b [ 1 ] = 1 m ⋅ n p . s u m ( d Z [ 1 ] , a x i s = 1 , k e e p d i m s = t r u e ) db^{[1]} = \frac 1 m \cdot np.sum(dZ^{[1]},axis = 1, keepdims = true ) db[1]=m1⋅np.sum(dZ[1],axis=1,keepdims=true)
4.参数随机初始化(Random initialization)
import numpy as np
W^[1] = np.random.randn({2,2}) * 0.01 #0.01就是较为合适的学习率,一般初始化数值都较小,这样梯度下降才能实现
b^[1] = np.zeros((2,1))
W^[1] = np.random.randn({2,2}) * 0.01
b^[2] = 0 #python语法有延展性