深度神经网络(Deep Neural Networks)
Forward
输入: 总层数L,所有隐藏层和输出层对应的矩阵𝑊(从2开始),偏倚向量𝑏,输入值向量𝑥
输出:输出层的输出
a
L
a^L
aL
- 初始化 a 1 = x a^1=x a1=x
- f o r l = 2 t o L for\; l=2\;to\; L forl=2toL: a l = σ ( z l ) = σ ( W l a l − 1 + b l ) a^l = \sigma(z^l) = \sigma(W^la^{l-1} + b^l) al=σ(zl)=σ(Wlal−1+bl)
- 最后的结果即为输出 a L a^L aL
Back Propagation
J
(
W
,
b
,
x
,
y
)
=
1
2
∣
∣
a
L
−
y
∣
∣
2
2
J(W,b,x,y) = \frac{1}{2}||a^L-y||_2^2
J(W,b,x,y)=21∣∣aL−y∣∣22
δ
L
=
∂
J
(
W
,
b
,
x
,
y
)
∂
z
L
=
(
a
L
−
y
)
⊙
σ
′
(
z
L
)
\delta^L = \frac{\partial J(W,b,x,y)}{\partial z^L} = (a^L-y)\odot \sigma^{'}(z^L)
δL=∂zL∂J(W,b,x,y)=(aL−y)⊙σ′(zL)
δ
l
=
∂
J
(
W
,
b
,
x
,
y
)
∂
z
l
=
(
∂
z
l
+
1
∂
z
l
)
T
∂
J
(
W
,
b
,
x
,
y
)
∂
z
l
+
1
=
(
∂
z
l
+
1
∂
z
l
)
T
δ
l
+
1
\delta^{l} = \frac{\partial J(W,b,x,y)}{\partial z^l} = (\frac{\partial z^{l+1}}{\partial z^{l}})^T\frac{\partial J(W,b,x,y)}{\partial z^{l+1}} =(\frac{\partial z^{l+1}}{\partial z^{l}})^T \delta^{l+1}
δl=∂zl∂J(W,b,x,y)=(∂zl∂zl+1)T∂zl+1∂J(W,b,x,y)=(∂zl∂zl+1)Tδl+1
z
l
+
1
=
W
l
+
1
a
l
+
b
l
+
1
=
W
l
+
1
σ
(
z
l
)
+
b
l
+
1
z^{l+1}= W^{l+1}a^{l} + b^{l+1} = W^{l+1}\sigma(z^l) + b^{l+1}
zl+1=Wl+1al+bl+1=Wl+1σ(zl)+bl+1
δ
l
=
(
∂
z
l
+
1
∂
z
l
)
T
∂
J
(
W
,
b
,
x
,
y
)
∂
z
l
+
1
=
(
W
l
+
1
)
T
δ
l
+
1
⊙
σ
′
(
z
l
)
\delta^{l} = (\frac{\partial z^{l+1}}{\partial z^{l}})^T\frac{\partial J(W,b,x,y)}{\partial z^{l+1}} =(W^{l+1})^T\delta^{l+1}\odot \sigma^{'}(z^l)
δl=(∂zl∂zl+1)T∂zl+1∂J(W,b,x,y)=(Wl+1)Tδl+1⊙σ′(zl)
∂
J
(
W
,
b
,
x
,
y
)
∂
W
l
=
δ
l
(
a
l
−
1
)
T
\frac{\partial J(W,b,x,y)}{\partial W^l} = \delta^{l}(a^{l-1})^T
∂Wl∂J(W,b,x,y)=δl(al−1)T
∂
J
(
W
,
b
,
x
,
y
)
∂
b
l
=
δ
l
\frac{\partial J(W,b,x,y)}{\partial b^l} = \delta^{l}
∂bl∂J(W,b,x,y)=δl符号⊙代表Hadamard积,矩阵点乘
输入: 总层数L,以及各隐藏层与输出层的神经元个数,激活函数σ,损失函数,迭代步长𝛼,最大迭代次数MAX与停止迭代阈值𝜖,m个训练样本
{
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
,
.
.
.
,
(
x
m
,
y
m
)
}
\{(x_1,y_1), (x_2,y_2), ..., (x_m,y_m)\}
{(x1,y1),(x2,y2),...,(xm,ym)}
输出:各隐藏层与输出层的线性关系系数矩阵𝑊和偏倚向量𝑏
- 初始化各隐藏层与输出层的线性关系系数矩阵𝑊和偏倚向量𝑏的值为一个随机值。
- f o r i t e r t o 1 t o m a x for\; iter\; to\; 1\; to\; max foriterto1tomax: 3-5
-
f
o
r
i
=
1
t
o
m
for\; i =1\; to\; m
fori=1tom:
- DNN输入 a 1 = x 1 a^1=x^1 a1=x1
- f o r l = 2 t o L for\; l=2\;to\; L forl=2toL,计算 a i , l = σ ( z i , l ) = σ ( W l a i , l − 1 + b l ) a^{i,l} = \sigma(z^{i,l}) = \sigma(W^la^{i,l-1} + b^l) ai,l=σ(zi,l)=σ(Wlai,l−1+bl)
- 通过损失函数计算输出层的 δ i , L \delta^{i,L} δi,L
- f o r l = L − 1 t o 2 for\; l=L-1\;to\; 2 forl=L−1to2, 进行反向传播算法计算 δ i , l = ( W l + 1 ) T δ i , l + 1 ⊙ σ ′ ( z i , l ) \delta^{i,l} = (W^{l+1})^T\delta^{i,l+1}\odot \sigma^{'}(z^{i,l}) δi,l=(Wl+1)Tδi,l+1⊙σ′(zi,l)
- f o r l = 2 t o L for\; l =2\; to\; L forl=2toL,更新第𝑙层的 W l , b l W^l,b^l Wl,bl: W l = W l − α ∑ i = 1 m δ i , l ( a i , l − 1 ) T W^l = W^l -\alpha \sum\limits_{i=1}^m \delta^{i,l}(a^{i, l-1})^T Wl=Wl−αi=1∑mδi,l(ai,l−1)T b l = b l − α ∑ i = 1 m δ i , l b^l = b^l -\alpha \sum\limits_{i=1}^m \delta^{i,l} bl=bl−αi=1∑mδi,l
- 如果所有𝑊, 𝑏的变化值都小于停止迭代阈值𝜖,则跳出迭代循环。
- 输出各隐藏层与输出层的线性关系系数矩阵𝑊和偏倚向量𝑏。