MLP反向传播
最终的loss对倒数第二层的
W
i
j
W_{ij}
Wij求导的过程:
对
W
j
k
W_{jk}
Wjk
∂
E
∂
W
j
k
=
(
O
k
−
t
k
)
O
k
(
1
−
O
k
)
O
j
J
\frac{\partial E}{\partial W_{jk}}=(O_{k}-t_{k})O_{k}(1-O_{k})O^{J}_{j}
∂Wjk∂E=(Ok−tk)Ok(1−Ok)OjJ
令
δ
k
K
=
(
O
k
−
t
k
)
O
k
(
1
−
O
k
)
\delta^{K}_{k}=(O_{k}-t_{k})O_{k}(1-O_{k})
δkK=(Ok−tk)Ok(1−Ok),
δ
k
K
\delta^{K}_{k}
δkK有k个值。
则
∂
E
∂
W
j
k
=
δ
k
K
O
j
J
\frac{\partial E}{\partial W_{jk}}=\delta^{K}_{k}O^{J}_{j}
∂Wjk∂E=δkKOjJ
对
w
i
j
w_{ij}
wij,将
E
E
E展开,
∂
E
∂
W
i
j
=
∂
∂
W
i
j
1
2
∑
k
∈
K
(
O
k
−
t
k
)
2
\frac{\partial E}{\partial W_{ij}}=\frac{\partial }{\partial W_{ij}}\frac{1}{2}\sum_{k\in K}(O_{k}-t_{k})^{2}
∂Wij∂E=∂Wij∂21k∈K∑(Ok−tk)2
∂
E
∂
W
i
j
=
∑
k
∈
K
(
O
k
−
t
k
)
∂
∂
W
i
j
O
k
\frac{\partial E}{\partial W_{ij}}=\sum_{k\in K}(O_{k}-t_{k})\frac{\partial }{\partial W_{ij}}O_{k}
∂Wij∂E=k∈K∑(Ok−tk)∂Wij∂Ok
∂
E
∂
W
i
j
=
∑
k
∈
K
(
O
k
−
t
k
)
∂
∂
W
i
j
σ
(
x
k
)
\frac{\partial E}{\partial W_{ij}}=\sum_{k\in K}(O_{k}-t_{k})\frac{\partial }{\partial W_{ij}}\sigma(x_{k})
∂Wij∂E=k∈K∑(Ok−tk)∂Wij∂σ(xk)
∂
E
∂
W
i
j
=
∑
k
∈
K
(
O
k
−
t
k
)
σ
(
x
k
)
(
1
−
σ
(
x
k
)
)
∂
x
k
∂
W
i
j
\frac{\partial E}{\partial W_{ij}}=\sum_{k\in K}(O_{k}-t_{k})\sigma(x_{k})(1-\sigma(x_{k}))\frac{\partial x_{k}}{\partial W_{ij}}
∂Wij∂E=k∈K∑(Ok−tk)σ(xk)(1−σ(xk))∂Wij∂xk
∂
E
∂
W
i
j
=
∑
k
∈
K
(
O
k
−
t
k
)
O
k
(
1
−
O
k
)
∂
x
k
∂
O
j
⋅
∂
O
j
∂
W
i
j
\frac{\partial E}{\partial W_{ij}}=\sum_{k\in K}(O_{k}-t_{k})O_{k}(1-O_{k})\frac{\partial x_{k}}{\partial O_{j}}\cdot\frac{\partial O_{j}}{\partial W_{ij}}
∂Wij∂E=k∈K∑(Ok−tk)Ok(1−Ok)∂Oj∂xk⋅∂Wij∂Oj
∂
E
∂
W
i
j
=
∑
k
∈
K
(
O
k
−
t
k
)
O
k
(
1
−
O
k
)
W
j
k
∂
O
j
∂
W
i
j
\frac{\partial E}{\partial W_{ij}}=\sum_{k\in K}(O_{k}-t_{k})O_{k}(1-O_{k})W_{jk}\frac{\partial O_{j}}{\partial W_{ij}}
∂Wij∂E=k∈K∑(Ok−tk)Ok(1−Ok)Wjk∂Wij∂Oj
∂
E
∂
W
i
j
=
∂
O
j
∂
W
i
j
∑
k
∈
K
(
O
k
−
t
k
)
O
k
(
1
−
O
k
)
W
j
k
\frac{\partial E}{\partial W_{ij}}=\frac{\partial O_{j}}{\partial W_{ij}}\sum_{k\in K}(O_{k}-t_{k})O_{k}(1-O_{k})W_{jk}
∂Wij∂E=∂Wij∂Ojk∈K∑(Ok−tk)Ok(1−Ok)Wjk
∂
E
∂
W
i
j
=
O
j
(
1
−
O
j
)
∂
x
j
∂
W
i
j
∑
k
∈
K
(
O
k
−
t
k
)
O
k
(
1
−
O
k
)
W
j
k
\frac{\partial E}{\partial W_{ij}}=O_{j}(1-O_{j})\frac{\partial x_{j}}{\partial W_{ij}}\sum_{k\in K}(O_{k}-t_{k})O_{k}(1-O_{k})W_{jk}
∂Wij∂E=Oj(1−Oj)∂Wij∂xjk∈K∑(Ok−tk)Ok(1−Ok)Wjk
∂
E
∂
W
i
j
=
O
j
(
1
−
O
j
)
O
i
∑
k
∈
K
(
O
k
−
t
k
)
O
k
(
1
−
O
k
)
W
j
k
\frac{\partial E}{\partial W_{ij}}=O_{j}(1-O_{j})O_{i}\sum_{k\in K}(O_{k}-t_{k})O_{k}(1-O_{k})W_{jk}
∂Wij∂E=Oj(1−Oj)Oik∈K∑(Ok−tk)Ok(1−Ok)Wjk
∂
E
∂
W
i
j
=
O
i
O
j
(
1
−
O
j
)
∑
k
∈
K
σ
k
W
j
k
\frac{\partial E}{\partial W_{ij}}=O_{i}O_{j}(1-O_{j})\sum_{k\in K}\sigma_{k}W_{jk}
∂Wij∂E=OiOj(1−Oj)k∈K∑σkWjk
总结:
For an output layer node
k
∈
K
k\in K
k∈K
∂
E
∂
W
j
k
=
O
j
δ
k
\frac{\partial E}{\partial W_{jk}}=O_{j}\delta_{k}
∂Wjk∂E=Ojδk
Where
δ
k
=
(
O
k
−
t
k
)
O
k
(
1
−
O
k
)
\delta_{k}=(O_{k}-t_{k})O_{k}(1-O_{k})
δk=(Ok−tk)Ok(1−Ok)
For a hidden layer node
j
∈
J
j\in J
j∈J
∂
E
∂
W
j
k
=
O
i
δ
j
\frac{\partial E}{\partial W_{jk}}=O_{i}\delta_{j}
∂Wjk∂E=Oiδj
Where
δ
j
=
O
j
(
1
−
O
j
)
∑
k
∈
K
σ
k
W
j
k
\delta_{j}=O_{j}(1-O_{j})\sum_{k\in K}\sigma_{k}W_{jk}
δj=Oj(1−Oj)k∈K∑σkWjk