反向传播算法推导
神经网络结构
1. 神经元总输入z和输出a
为了帮助计算偏导数,我们为每个神经元引入了总输入z和输出a
1.1. 第二层输入 z 2 z^2 z2
z
1
2
=
w
11
2
a
1
1
+
w
12
2
a
2
1
+
w
13
2
a
3
1
+
b
1
2
z
2
2
=
w
21
2
a
1
1
+
w
22
2
a
2
1
+
w
23
2
a
3
1
+
b
2
2
z
3
2
=
w
31
2
a
1
1
+
w
32
2
a
2
1
+
w
33
2
a
3
1
+
b
3
2
z
4
2
=
w
41
2
a
1
1
+
w
42
2
a
2
1
+
w
43
2
a
3
1
+
b
4
2
z^2_1 = w^2_{11}a^1_1+w^2_{12}a^1_2+w^2_{13}a^1_3+b^2_1 \\ z^2_2 = w^2_{21}a^1_1+w^2_{22}a^1_2+w^2_{23}a^1_3+b^2_2 \\ z^2_3 = w^2_{31}a^1_1+w^2_{32}a^1_2+w^2_{33}a^1_3+b^2_3 \\ z^2_4 = w^2_{41}a^1_1+w^2_{42}a^1_2+w^2_{43}a^1_3+b^2_4
z12=w112a11+w122a21+w132a31+b12z22=w212a11+w222a21+w232a31+b22z32=w312a11+w322a21+w332a31+b32z42=w412a11+w422a21+w432a31+b42
写成矩阵形式:
[
z
1
2
z
2
2
z
3
2
z
4
2
]
=
[
w
11
2
w
12
2
w
13
2
w
21
2
w
22
2
w
23
2
w
31
2
w
32
2
w
33
2
w
41
2
w
42
2
w
43
2
]
[
a
1
1
a
2
1
a
3
1
]
+
[
b
1
2
b
2
2
b
3
2
b
4
2
]
z
2
=
w
2
a
1
+
b
2
\begin{bmatrix} z^2_1 \\ z^2_2 \\ z^2_3 \\ z^2_4 \end{bmatrix} = \begin{bmatrix} w^2_{11} & w^2_{12} & w^2_{13} \\ w^2_{21} & w^2_{22} & w^2_{23} \\ w^2_{31} & w^2_{32} & w^2_{33} \\ w^2_{41} & w^2_{42} & w^2_{43} \end{bmatrix} \begin{bmatrix} a^1_1 \\ a^1_2 \\ a^1_3 \end{bmatrix} + \begin{bmatrix} b^2_1 \\ b^2_2 \\ b^2_3 \\ b^2_4 \end{bmatrix} \\ z^2 = w^2a^1 + b^2
⎣⎢⎢⎡z12z22z32z42⎦⎥⎥⎤=⎣⎢⎢⎡w112w212w312w412w122w222w322w422w132w232w332w432⎦⎥⎥⎤⎣⎡a11a21a31⎦⎤+⎣⎢⎢⎡b12b22b32b42⎦⎥⎥⎤z2=w2a1+b2
1.2. 第二层输出 a 2 a^2 a2
a
1
2
=
σ
(
z
1
2
)
a
2
2
=
σ
(
z
2
2
)
a
3
2
=
σ
(
z
3
2
)
a
4
2
=
σ
(
z
4
2
)
a^2_1 = \sigma(z^2_1) \\ a^2_2 = \sigma(z^2_2) \\ a^2_3 = \sigma(z^2_3) \\ a^2_4 = \sigma(z^2_4)
a12=σ(z12)a22=σ(z22)a32=σ(z32)a42=σ(z42)
写成矩阵形式:
[
a
1
2
a
2
2
a
3
2
a
4
2
]
=
σ
(
[
z
1
2
z
2
2
z
3
2
z
4
2
]
)
a
2
=
σ
(
z
2
)
\begin{bmatrix} a^2_1 \\ a^2_2 \\ a^2_3 \\ a^2_4 \end{bmatrix} = \sigma \left ( \begin{bmatrix} z^2_1 \\ z^2_2 \\ z^2_3 \\ z^2_4 \end{bmatrix} \right ) \\ a^2 = \sigma(z^2)
⎣⎢⎢⎡a12a22a32a42⎦⎥⎥⎤=σ⎝⎜⎜⎛⎣⎢⎢⎡z12z22z32z42⎦⎥⎥⎤⎠⎟⎟⎞a2=σ(z2)
1.3. 第三层输入 z 3 z^3 z3
z
1
3
=
w
11
3
a
1
2
+
w
12
3
a
2
2
+
w
13
3
a
3
2
+
w
14
3
a
4
2
+
b
1
3
z
2
3
=
w
21
3
a
1
2
+
w
22
3
a
2
2
+
w
23
3
a
3
2
+
w
24
3
a
4
2
+
b
2
3
z^3_1 = w^3_{11}a^2_1 + w^3_{12} a^2_2 + w^3_{13} a^2_3 + w^3_{14} a^2_4 + b^3_1 \\ z^3_2 = w^3_{21} a^2_1 + w^3_{22} a^2_2 + w^3_{23} a^2_3 + w^3_{24} a^2_4 + b^3_2
z13=w113a12+w123a22+w133a32+w143a42+b13z23=w213a12+w223a22+w233a32+w243a42+b23
写成矩阵形式:
[
z
1
3
z
2
3
]
=
[
w
11
3
w
12
3
w
13
3
w
14
3
w
21
3
w
22
3
w
23
3
w
24
3
]
[
a
1
2
a
2
2
a
3
2
a
4
2
]
+
[
b
1
3
b
2
3
]
z
3
=
w
3
a
2
+
b
3
\begin{bmatrix} z^3_1 \\ z^3_2 \end{bmatrix} = \begin{bmatrix} w^3_{11} & w^3_{12} & w^3_{13} & w^3_{14} \\ w^3_{21} & w^3_{22} & w^3_{23} & w^3_{24} \end{bmatrix} \begin{bmatrix} a^2_1 \\ a^2_2 \\ a^2_3 \\ a^2_4 \end{bmatrix} + \begin{bmatrix} b^3_1 \\ b^3_2 \end{bmatrix} \\ z^3 = w^3 a^2 + b^3
[z13z23]=[w113w213w123w223w133w233w143w243]⎣⎢⎢⎡a12a22a32a42⎦⎥⎥⎤+[b13b23]z3=w3a2+b3
1.4. 第三层输出 a 3 a^3 a3
a
1
3
=
σ
(
z
1
3
)
a
2
3
=
σ
(
z
2
3
)
a^3_1 = \sigma(z^3_1) \\ a^3_2 = \sigma(z^3_2)
a13=σ(z13)a23=σ(z23)
写成矩阵形式:
[
a
1
3
a
2
3
]
=
σ
(
[
z
1
3
z
2
3
]
)
a
3
=
σ
(
z
3
)
\begin{bmatrix} a^3_1 \\ a^3_2 \end{bmatrix} = \sigma \left ( \begin{bmatrix} z^3_1 \\ z^3_2 \end{bmatrix} \right ) \\ a^3 = \sigma(z^3)
[a13a23]=σ([z13z23])a3=σ(z3)
2. 代价函数
假设我们使用二次代价函数,那么每个样本的代价为:
C
=
1
2
(
(
a
1
3
−
y
1
)
2
+
(
a
2
3
−
y
2
)
2
)
C=\frac{1}{2}((a^3_1-y_1)^2 + (a^3_2 - y_2)^2)
C=21((a13−y1)2+(a23−y2)2)
3. 代价函数对第三层输入的偏导数 ∂ C ∂ z 3 \frac{\partial C}{\partial z^3} ∂z3∂C
∂
C
∂
z
1
3
=
∂
C
∂
a
1
3
∂
a
1
3
∂
z
1
3
=
(
a
1
3
−
y
1
)
σ
′
(
z
1
3
)
∂
C
∂
z
2
3
=
∂
C
∂
a
2
3
∂
a
2
3
∂
z
2
3
=
(
a
2
3
−
y
2
)
σ
′
(
z
2
3
)
\frac{\partial C}{\partial z^3_1} = \frac{\partial C}{\partial a^3_1} \frac{\partial a^3_1}{\partial z^3_1} = (a^3_1 - y_1) \sigma^{\prime} (z^3_1) \\ \frac{\partial C}{\partial z^3_2} = \frac{\partial C}{\partial a^3_2} \frac{\partial a^3_2}{\partial z^3_2} = (a^3_2 - y_2) \sigma^{\prime}(z^3_2)
∂z13∂C=∂a13∂C∂z13∂a13=(a13−y1)σ′(z13)∂z23∂C=∂a23∂C∂z23∂a23=(a23−y2)σ′(z23)
写成矩阵形式:
[
∂
C
∂
z
1
3
∂
C
∂
z
2
3
]
=
[
(
a
1
3
−
y
1
)
σ
′
(
z
1
3
)
(
a
2
3
−
y
2
)
σ
′
(
z
2
3
)
]
=
(
[
a
1
3
a
2
3
]
−
[
y
1
y
2
]
)
σ
′
(
[
z
1
3
z
2
3
]
)
∂
C
∂
z
3
=
(
a
3
−
y
)
σ
′
(
z
3
)
\begin{bmatrix} \frac{\partial C}{\partial z^3_1} \\ \frac{\partial C}{\partial z^3_2} \end{bmatrix} = \begin{bmatrix} (a^3_1 - y_1) \sigma^{\prime} (z^3_1) \\ (a^3_2 - y_2) \sigma^{\prime} (z^3_2) \end{bmatrix} = \left ( \begin{bmatrix} a^3_1 \\ a^3_2 \end{bmatrix} - \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} \right ) \sigma^{\prime} \left ( \begin{bmatrix} z^3_1 \\ z^3_2 \end{bmatrix} \right ) \\ \frac{\partial C}{\partial z^3} = (a^3 - y) \sigma^{\prime}(z^3)
[∂z13∂C∂z23∂C]=[(a13−y1)σ′(z13)(a23−y2)σ′(z23)]=([a13a23]−[y1y2])σ′([z13z23])∂z3∂C=(a3−y)σ′(z3)
3.1. 代价函数C对第三层b的偏导数 ∂ C ∂ b 3 \frac{\partial C}{\partial b^3} ∂b3∂C
∂
C
∂
b
1
3
=
∂
C
∂
a
1
3
∂
a
1
3
∂
z
1
3
∂
z
1
3
∂
b
1
3
=
(
a
1
3
−
y
1
)
σ
′
(
z
1
3
)
∗
1
∂
C
∂
b
2
3
=
∂
C
∂
a
2
3
∂
a
2
3
∂
z
2
3
∂
z
2
3
∂
b
2
3
=
(
a
2
3
−
y
2
)
σ
′
(
z
2
3
)
∗
1
\frac{\partial C}{\partial b^3_1} = \frac{\partial C}{\partial a^3_1} \frac{\partial a^3_1}{\partial z^3_1} \frac{\partial z^3_1}{\partial b^3_1} = (a^3_1 - y_1) \sigma^{\prime}(z^3_1)*1 \\ \frac{\partial C}{\partial b^3_2} = \frac{\partial C}{\partial a^3_2} \frac{\partial a^3_2}{\partial z^3_2} \frac{\partial z^3_2}{\partial b^3_2} = (a^3_2 - y_2) \sigma^{\prime}(z^3_2) * 1
∂b13∂C=∂a13∂C∂z13∂a13∂b13∂z13=(a13−y1)σ′(z13)∗1∂b23∂C=∂a23∂C∂z23∂a23∂b23∂z23=(a23−y2)σ′(z23)∗1
写成矩阵形式:
[
∂
C
∂
b
1
3
∂
C
∂
b
2
3
]
=
(
[
a
1
3
a
2
3
]
−
[
y
1
y
2
]
)
σ
′
(
[
z
1
3
z
2
3
]
)
∂
C
∂
b
3
=
∂
C
∂
z
3
=
(
a
3
−
y
)
σ
′
(
z
3
)
\begin{bmatrix} \frac{\partial C}{\partial b^3_1} \\ \frac{\partial C}{\partial b^3_2} \end{bmatrix} = \left ( \begin{bmatrix} a^3_1 \\ a^3_2 \end{bmatrix} - \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} \right ) \sigma^{\prime} \left ( \begin{bmatrix} z^3_1 \\ z^3_2 \end{bmatrix} \right ) \\ \frac{\partial C}{\partial b^3} = \frac{\partial C}{\partial z^3}= (a^3 - y) \sigma^{\prime}(z^3)
[∂b13∂C∂b23∂C]=([a13a23]−[y1y2])σ′([z13z23])∂b3∂C=∂z3∂C=(a3−y)σ′(z3)
3.2. 代价函数C对第三层w的偏导数 ∂ C ∂ w 3 \frac{\partial C}{\partial w^3} ∂w3∂C
∂
C
∂
w
11
3
=
∂
C
∂
a
1
3
∂
a
1
3
∂
z
1
3
∂
z
1
3
∂
w
11
3
=
(
a
1
3
−
y
1
)
σ
′
(
z
1
3
)
a
1
2
∂
C
∂
w
12
3
=
∂
C
∂
a
1
3
∂
a
1
3
∂
z
1
3
∂
z
1
3
∂
w
12
3
=
(
a
1
3
−
y
1
)
σ
′
(
z
1
3
)
a
2
2
∂
C
∂
w
13
3
=
∂
C
∂
a
1
3
∂
a
1
3
∂
z
1
3
∂
z
1
3
∂
w
13
3
=
(
a
1
3
−
y
1
)
σ
′
(
z
1
3
)
a
3
2
∂
C
∂
w
14
3
=
∂
C
∂
a
1
3
∂
a
1
3
∂
z
1
3
∂
z
1
3
∂
w
14
3
=
(
a
1
3
−
y
1
)
σ
′
(
z
1
3
)
a
4
2
−
−
−
−
−
−
−
−
−
−
l
i
n
e
−
−
−
−
−
−
−
−
−
−
∂
C
∂
w
21
3
=
∂
C
∂
a
2
3
∂
a
2
3
∂
z
2
3
∂
z
2
3
∂
w
21
3
=
(
a
2
3
−
y
2
)
σ
′
(
z
2
3
)
a
1
2
∂
C
∂
w
22
3
=
∂
C
∂
a
2
3
∂
a
2
3
∂
z
2
3
∂
z
2
3
∂
w
22
3
=
(
a
2
3
−
y
2
)
σ
′
(
z
2
3
)
a
2
2
∂
C
∂
w
23
3
=
∂
C
∂
a
2
3
∂
a
2
3
∂
z
2
3
∂
z
2
3
∂
w
23
3
=
(
a
2
3
−
y
2
)
σ
′
(
z
2
3
)
a
3
2
∂
C
∂
w
24
3
=
∂
C
∂
a
2
3
∂
a
2
3
∂
z
2
3
∂
z
2
3
∂
w
24
3
=
(
a
2
3
−
y
2
)
σ
′
(
z
2
3
)
a
4
2
−
−
−
−
−
−
−
−
−
−
l
i
n
e
−
−
−
−
−
−
−
−
−
−
\frac{\partial C}{\partial w^3_{11}} = \frac{\partial C}{\partial a^3_1} \frac{\partial a^3_1}{\partial z^3_1} \frac{\partial z^3_1}{\partial w^3_{11}} = (a^3_1 - y_1) \sigma^{\prime}(z^3_1)a^2_1 \\ \frac{\partial C}{\partial w^3_{12}} = \frac{\partial C}{\partial a^3_1} \frac{\partial a^3_1}{\partial z^3_1} \frac{\partial z^3_1}{\partial w^3_{12}} = (a^3_1 - y_1) \sigma^{\prime}(z^3_1) a^2_2 \\ \frac{\partial C}{\partial w^3_{13}} = \frac{\partial C}{\partial a^3_1} \frac{\partial a^3_1}{\partial z^3_1} \frac{\partial z^3_1}{\partial w^3_{13}} = (a^3_1 - y_1) \sigma^{\prime}(z^3_1)a^2_3 \\ \frac{\partial C}{\partial w^3_{14}} = \frac{\partial C}{\partial a^3_1} \frac{\partial a^3_1}{\partial z^3_1} \frac{\partial z^3_1}{\partial w^3_{14}} = (a^3_1 - y_1) \sigma^{\prime}(z^3_1) a^2_4 \\ ----------line---------- \\ \frac{\partial C}{\partial w^3_{21}} = \frac{\partial C}{\partial a^3_2} \frac{\partial a^3_2}{\partial z^3_2} \frac{\partial z^3_2}{\partial w^3_{21}} = (a^3_2 - y_2) \sigma^{\prime}(z^3_2) a^2_1 \\ \frac{\partial C}{\partial w^3_{22}} = \frac{\partial C}{\partial a^3_2} \frac{\partial a^3_2}{\partial z^3_2} \frac{\partial z^3_2}{\partial w^3_{22}} = (a^3_2 - y_2) \sigma^{\prime}(z^3_2)a^2_2 \\ \frac{\partial C}{\partial w^3_{23}} = \frac{\partial C}{\partial a^3_2} \frac{\partial a^3_2}{\partial z^3_2} \frac{\partial z^3_2}{\partial w^3_{23}} = (a^3_2 - y_2) \sigma^{\prime}(z^3_2) a^2_3 \\ \frac{\partial C}{\partial w^3_{24}} = \frac{\partial C}{\partial a^3_2} \frac{\partial a^3_2}{\partial z^3_2} \frac{\partial z^3_2}{\partial w^3_{24}} = (a^3_2 - y_2)\sigma^{\prime}(z^3_2) a^2_4 \\ ----------line----------
∂w113∂C=∂a13∂C∂z13∂a13∂w113∂z13=(a13−y1)σ′(z13)a12∂w123∂C=∂a13∂C∂z13∂a13∂w123∂z13=(a13−y1)σ′(z13)a22∂w133∂C=∂a13∂C∂z13∂a13∂w133∂z13=(a13−y1)σ′(z13)a32∂w143∂C=∂a13∂C∂z13∂a13∂w143∂z13=(a13−y1)σ′(z13)a42−−−−−−−−−−line−−−−−−−−−−∂w213∂C=∂a23∂C∂z23∂a23∂w213∂z23=(a23−y2)σ′(z23)a12∂w223∂C=∂a23∂C∂z23∂a23∂w223∂z23=(a23−y2)σ′(z23)a22∂w233∂C=∂a23∂C∂z23∂a23∂w233∂z23=(a23−y2)σ′(z23)a32∂w243∂C=∂a23∂C∂z23∂a23∂w243∂z23=(a23−y2)σ′(z23)a42−−−−−−−−−−line−−−−−−−−−−
写成矩阵形式:
[
∂
C
∂
w
11
3
∂
C
∂
w
12
3
∂
C
∂
w
13
3
∂
C
∂
w
14
3
∂
C
∂
w
21
3
∂
C
∂
w
22
3
∂
C
∂
w
23
3
∂
C
∂
w
24
3
]
=
[
(
a
1
3
−
y
1
)
σ
′
(
z
1
3
)
a
1
2
(
a
1
3
−
y
1
)
σ
′
(
z
1
3
)
a
2
2
(
a
1
3
−
y
1
)
σ
′
(
z
1
3
)
a
3
2
(
a
1
3
−
y
1
)
σ
′
(
z
1
3
)
a
4
2
(
a
2
3
−
y
2
)
σ
′
(
z
2
3
)
a
1
2
(
a
2
3
−
y
2
)
σ
′
(
z
2
3
)
a
2
2
(
a
2
3
−
y
2
)
σ
′
(
z
2
3
)
a
3
2
(
a
2
3
−
y
2
)
σ
′
(
z
2
3
)
a
4
2
]
=
[
(
a
1
3
−
y
1
)
σ
′
(
z
1
3
)
(
a
2
3
−
y
2
)
σ
′
(
z
2
3
)
]
[
a
1
2
a
2
2
a
3
2
a
4
2
]
=
(
(
[
a
1
3
a
2
3
]
−
[
y
1
y
2
]
)
σ
′
(
[
z
1
3
z
2
3
]
)
)
[
a
1
2
a
2
2
a
3
2
a
4
2
]
∂
C
∂
w
3
=
∂
C
∂
z
3
⋅
(
a
2
)
T
\begin{aligned} \begin{bmatrix} \frac{\partial C}{\partial w^3_{11}} & \frac{\partial C}{\partial w^3_{12}} & \frac{\partial C}{\partial w^3_{13}} & \frac{\partial C}{\partial w^3_{14}} \\ \frac{\partial C}{\partial w^3_{21}} & \frac{\partial C}{\partial w^3_{22}} & \frac{\partial C}{\partial w^3_{23}} & \frac{\partial C}{\partial w^3_{24}} \end{bmatrix} &= \begin{bmatrix} (a^3_1 - y_1) \sigma^{\prime}(z^3_1)a^2_1 & (a^3_1 - y_1) \sigma^{\prime}(z^3_1)a^2_2 & (a^3_1 - y_1) \sigma^{\prime}(z^3_1)a^2_3 & (a^3_1 - y_1) \sigma^{\prime}(z^3_1)a^2_4 \\ (a^3_2 - y_2) \sigma^{\prime}(z^3_2)a^2_1 & (a^3_2 - y_2) \sigma^{\prime}(z^3_2)a^2_2 & (a^3_2 - y_2) \sigma^{\prime}(z^3_2)a^2_3 & (a^3_2 - y_2) \sigma^{\prime}(z^3_2)a^2_4 \end{bmatrix} \\ &=\begin{bmatrix} (a^3_1 - y_1) \sigma^{\prime}(z^3_1) \\ (a^3_2 - y_2) \sigma^{\prime}(z^3_2) \end{bmatrix} \begin{bmatrix} a^2_1 & a^2_2 & a^2_3 & a^2_4 \end{bmatrix} \\ &= \left ( \left ( \begin{bmatrix} a^3_1 \\ a^3_2 \end{bmatrix} - \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} \right ) \sigma^{\prime} \left ( \begin{bmatrix} z^3_1 \\ z^3_2 \end{bmatrix} \right ) \right ) \begin{bmatrix} a^2_1 & a^2_2 & a^2_3 & a^2_4 \end{bmatrix} \\ \frac{\partial C}{\partial w^3} &= \frac{\partial C}{\partial z^3} \cdot (a^2)^T \end{aligned}
[∂w113∂C∂w213∂C∂w123∂C∂w223∂C∂w133∂C∂w233∂C∂w143∂C∂w243∂C]∂w3∂C=[(a13−y1)σ′(z13)a12(a23−y2)σ′(z23)a12(a13−y1)σ′(z13)a22(a23−y2)σ′(z23)a22(a13−y1)σ′(z13)a32(a23−y2)σ′(z23)a32(a13−y1)σ′(z13)a42(a23−y2)σ′(z23)a42]=[(a13−y1)σ′(z13)(a23−y2)σ′(z23)][a12a22a32a42]=(([a13a23]−[y1y2])σ′([z13z23]))[a12a22a32a42]=∂z3∂C⋅(a2)T
4. 代价函数对第二层输入的偏导数 ∂ C ∂ z 2 \frac{\partial C}{\partial z^2} ∂z2∂C
∂
C
∂
z
1
2
=
∂
C
∂
a
1
3
∂
a
1
3
∂
z
1
3
∂
z
1
3
∂
a
1
2
∂
a
1
2
∂
z
1
2
+
∂
C
∂
a
2
3
∂
a
2
3
∂
z
2
3
∂
z
2
3
∂
a
1
2
∂
a
1
2
∂
z
1
2
=
(
a
1
3
−
y
1
)
σ
′
(
z
1
3
)
w
11
3
σ
′
(
z
1
2
)
+
(
a
2
3
−
y
2
)
σ
′
(
z
2
3
)
w
21
3
σ
′
(
z
1
2
)
∂
C
∂
z
2
2
=
∂
C
∂
a
1
3
∂
a
1
3
∂
z
1
3
∂
z
1
3
∂
a
2
2
∂
a
2
2
∂
z
2
2
+
∂
C
∂
a
2
3
∂
a
2
3
∂
z
2
3
∂
z
2
3
∂
a
2
2
∂
a
2
2
∂
z
2
2
=
(
a
1
3
−
y
1
)
σ
′
(
z
1
3
)
w
12
3
σ
′
(
z
2
2
)
+
(
a
2
3
−
y
2
)
σ
′
(
z
2
3
)
w
22
3
σ
′
(
z
2
2
)
∂
C
∂
z
3
2
=
∂
C
∂
a
1
3
∂
a
1
3
∂
z
1
3
∂
z
1
3
∂
a
3
2
∂
a
3
2
∂
z
3
2
+
∂
C
∂
a
2
3
∂
a
2
3
∂
z
2
3
∂
z
2
3
∂
a
3
2
∂
a
3
2
∂
z
3
2
=
(
a
1
3
−
y
1
)
σ
′
(
z
1
3
)
w
13
3
σ
′
(
z
3
2
)
+
(
a
2
3
−
y
2
)
σ
′
(
z
2
3
)
w
23
3
σ
′
(
z
3
2
)
∂
C
∂
z
4
2
=
∂
C
∂
a
1
3
∂
a
1
3
∂
z
1
3
∂
z
1
3
∂
a
4
2
∂
a
4
2
∂
z
4
2
+
∂
C
∂
a
2
3
∂
a
2
3
∂
z
2
3
∂
z
2
3
∂
a
4
2
∂
a
4
2
∂
z
4
2
=
(
a
1
3
−
y
1
)
σ
′
(
z
1
3
)
w
14
3
σ
′
(
z
4
2
)
+
(
a
2
3
−
y
2
)
σ
′
(
z
2
3
)
w
24
3
σ
′
(
z
4
2
)
\begin{aligned} \frac{\partial C}{\partial z^2_1}&=\frac{\partial C}{\partial a^3_1} \frac{\partial a^3_1}{\partial z^3_1} \frac{\partial z^3_1}{\partial a^2_1} \frac{\partial a^2_1}{\partial z^2_1} + \frac{\partial C}{\partial a^3_2} \frac{\partial a^3_2}{\partial z^3_2} \frac{\partial z^3_2}{\partial a^2_1} \frac{\partial a^2_1}{\partial z^2_1} = (a^3_1 - y_1) \sigma^{\prime}(z^3_1)w^3_{11} \sigma^{\prime}(z^2_1) + (a^3_2 - y_2) \sigma^{\prime}(z^3_2)w^3_{21} \sigma^{\prime}(z^2_1) \\ \frac{\partial C}{\partial z^2_2} &= \frac{\partial C}{\partial a^3_1} \frac{\partial a^3_1}{\partial z^3_1} \frac{\partial z^3_1}{\partial a^2_2} \frac{\partial a^2_2}{\partial z^2_2} + \frac{\partial C}{\partial a^3_2} \frac{\partial a^3_2}{\partial z^3_2} \frac{\partial z^3_2}{\partial a^2_2} \frac{\partial a^2_2}{\partial z^2_2} = (a^3_1 - y_1) \sigma^{\prime}(z^3_1)w^3_{12} \sigma^{\prime}(z^2_2) + (a^3_2 - y_2) \sigma^{\prime}(z^3_2)w^3_{22} \sigma^{\prime}(z^2_2) \\ \frac{\partial C}{\partial z^2_3} &= \frac{\partial C}{\partial a^3_1} \frac{\partial a^3_1}{\partial z^3_1} \frac{\partial z^3_1}{\partial a^2_3} \frac{\partial a^2_3}{\partial z^2_3} + \frac{\partial C}{\partial a^3_2} \frac{\partial a^3_2}{\partial z^3_2} \frac{\partial z^3_2}{\partial a^2_3} \frac{\partial a^2_3}{\partial z^2_3} =(a^3_1 - y_1) \sigma^{\prime}(z^3_1)w^3_{13}\sigma^{\prime}(z^2_3) + (a^3_2 - y_2) \sigma^{\prime}(z^3_2)w^3_{23} \sigma^{\prime}(z^2_3) \\ \frac{\partial C}{\partial z^2_4} &= \frac{\partial C}{\partial a^3_1} \frac{\partial a^3_1}{\partial z^3_1} \frac{\partial z^3_1}{\partial a^2_4} \frac{\partial a^2_4}{\partial z^2_4} + \frac{\partial C}{\partial a^3_2} \frac{\partial a^3_2}{\partial z^3_2} \frac{\partial z^3_2}{\partial a^2_4} \frac{\partial a^2_4}{\partial z^2_4} = (a^3_1 - y_1) \sigma^{\prime}(z^3_1)w^3_{14} \sigma^{\prime}(z^2_4) + (a^3_2 - y_2) \sigma^{\prime}(z^3_2) w^3_{24} \sigma^{\prime}(z^2_4) \end{aligned}
∂z12∂C∂z22∂C∂z32∂C∂z42∂C=∂a13∂C∂z13∂a13∂a12∂z13∂z12∂a12+∂a23∂C∂z23∂a23∂a12∂z23∂z12∂a12=(a13−y1)σ′(z13)w113σ′(z12)+(a23−y2)σ′(z23)w213σ′(z12)=∂a13∂C∂z13∂a13∂a22∂z13∂z22∂a22+∂a23∂C∂z23∂a23∂a22∂z23∂z22∂a22=(a13−y1)σ′(z13)w123σ′(z22)+(a23−y2)σ′(z23)w223σ′(z22)=∂a13∂C∂z13∂a13∂a32∂z13∂z32∂a32+∂a23∂C∂z23∂a23∂a32∂z23∂z32∂a32=(a13−y1)σ′(z13)w133σ′(z32)+(a23−y2)σ′(z23)w233σ′(z32)=∂a13∂C∂z13∂a13∂a42∂z13∂z42∂a42+∂a23∂C∂z23∂a23∂a42∂z23∂z42∂a42=(a13−y1)σ′(z13)w143σ′(z42)+(a23−y2)σ′(z23)w243σ′(z42)
写成矩阵的形式:
[
∂
C
∂
z
1
2
∂
C
∂
z
2
2
∂
C
∂
z
3
2
∂
C
∂
z
4
2
]
=
[
(
a
1
3
−
y
1
)
σ
′
(
z
1
3
)
w
11
3
σ
′
(
z
1
2
)
+
(
a
2
3
−
y
2
)
σ
′
(
z
2
3
)
w
21
3
σ
′
(
z
1
2
)
(
a
1
3
−
y
1
)
σ
′
(
z
1
3
)
w
12
3
σ
′
(
z
2
2
)
+
(
a
2
3
−
y
2
)
σ
′
(
z
2
3
)
w
22
3
σ
′
(
z
2
2
)
(
a
1
3
−
y
1
)
σ
′
(
z
1
3
)
w
13
3
σ
′
(
z
3
2
)
+
(
a
2
3
−
y
2
)
σ
′
(
z
2
3
)
w
23
3
σ
′
(
z
3
2
)
(
a
1
3
−
y
1
)
σ
′
(
z
1
3
)
w
14
3
σ
′
(
z
4
2
)
+
(
a
2
3
−
y
2
)
σ
′
(
z
2
3
)
w
24
3
σ
′
(
z
4
2
)
]
=
(
[
w
11
3
w
21
3
w
12
3
w
22
3
w
13
3
w
23
3
w
14
3
w
24
3
]
[
(
a
1
3
−
y
1
)
σ
′
(
z
1
3
)
(
a
2
3
−
y
2
)
σ
′
(
z
2
3
)
]
)
σ
′
(
[
z
1
2
z
2
2
z
3
2
z
4
2
]
)
∂
C
∂
z
2
=
(
(
w
3
)
T
⋅
∂
C
∂
z
3
)
σ
′
(
z
2
)
\begin{aligned} \begin{bmatrix} \frac{\partial C}{\partial z^2_1} \\ \frac{\partial C}{\partial z^2_2} \\ \frac{\partial C}{\partial z^2_3} \\ \frac{\partial C}{\partial z^2_4} \end{bmatrix} &= \begin{bmatrix} (a^3_1 - y_1) \sigma^{\prime}(z^3_1)w^3_{11} \sigma^{\prime}(z^2_1) + (a^3_2 - y_2) \sigma^{\prime}(z^3_2)w^3_{21} \sigma^{\prime}(z^2_1) \\ (a^3_1 - y_1) \sigma^{\prime}(z^3_1)w^3_{12} \sigma^{\prime}(z^2_2) + (a^3_2 - y_2) \sigma^{\prime}(z^3_2)w^3_{22} \sigma^{\prime}(z^2_2) \\ (a^3_1 - y_1) \sigma^{\prime}(z^3_1)w^3_{13} \sigma^{\prime}(z^2_3) + (a^3_2 - y_2) \sigma^{\prime}(z^3_2)w^3_{23} \sigma^{\prime}(z^2_3) \\ (a^3_1 - y_1) \sigma^{\prime}(z^3_1)w^3_{14} \sigma^{\prime}(z^2_4) + (a^3_2 - y_2) \sigma^{\prime}(z^3_2)w^3_{24} \sigma^{\prime}(z^2_4) \end{bmatrix} \\ &=\left ( \begin{bmatrix} w^3_{11} & w^3_{21} \\ w^3_{12} & w^3_{22} \\ w^3_{13} & w^3_{23} \\ w^3_{14} & w^3_{24} \end{bmatrix} \begin{bmatrix} (a^3_1 - y_1) \sigma^{\prime}(z^3_1) \\ (a^3_2 - y_2) \sigma^{\prime} (z^3_2) \end{bmatrix} \right ) \sigma^{\prime} \left ( \begin{bmatrix} z^2_1 \\ z^2_2 \\ z^2_3 \\ z^2_4 \end{bmatrix} \right ) \\ \frac{\partial C}{\partial z^2} &= \left ( (w^3)^T \cdot \frac{\partial C}{\partial z^3} \right ) \sigma^{\prime} (z^2) \end{aligned}
⎣⎢⎢⎢⎡∂z12∂C∂z22∂C∂z32∂C∂z42∂C⎦⎥⎥⎥⎤∂z2∂C=⎣⎢⎢⎡(a13−y1)σ′(z13)w113σ′(z12)+(a23−y2)σ′(z23)w213σ′(z12)(a13−y1)σ′(z13)w123σ′(z22)+(a23−y2)σ′(z23)w223σ′(z22)(a13−y1)σ′(z13)w133σ′(z32)+(a23−y2)σ′(z23)w233σ′(z32)(a13−y1)σ′(z13)w143σ′(z42)+(a23−y2)σ′(z23)w243σ′(z42)⎦⎥⎥⎤=⎝⎜⎜⎛⎣⎢⎢⎡w113w123w133w143w213w223w233w243⎦⎥⎥⎤[(a13−y1)σ′(z13)(a23−y2)σ′(z23)]⎠⎟⎟⎞σ′⎝⎜⎜⎛⎣⎢⎢⎡z12z22z32z42⎦⎥⎥⎤⎠⎟⎟⎞=((w3)T⋅∂z3∂C)σ′(z2)
4.1. 代价函数对第二层b的偏导数
∂ C ∂ b 2 = ∂ C ∂ z 2 \frac{\partial C}{\partial b^2} = \frac{\partial C}{\partial z^2} ∂b2∂C=∂z2∂C
4.2. 代价函数对第二层w的偏导数
∂ C ∂ w 2 = ∂ C ∂ z 2 ⋅ ( a 1 ) T \frac{\partial C}{\partial w^2} = \frac{\partial C}{\partial z^2} \cdot (a^1)^T ∂w2∂C=∂z2∂C⋅(a1)T
5. 简化公式
Let δ l = ∂ C ∂ z l δ L = ∂ C ∂ a L σ ′ ( z L ) δ l = ( ( w l + 1 ) T ⋅ δ l + 1 ) σ ′ ( z l ) ∂ C ∂ b l = δ l ∂ C ∂ w l = δ l ⋅ ( a l − 1 ) T \begin{aligned} \text{Let } \delta^l &= \frac{\partial C}{\partial z^l} \\ \delta ^L &= \frac{\partial C}{\partial a^L} \sigma^{\prime}(z^L) \\ \delta ^l &= ((w^{l+1})^T \cdot \delta ^{l+1}) \sigma^{\prime}(z^l) \\ \frac{\partial C}{\partial b^l} &= \delta ^l \\ \frac{\partial C}{\partial w^l} &= \delta ^l \cdot (a^{l-1})^T \end{aligned} Let δlδLδl∂bl∂C∂wl∂C=∂zl∂C=∂aL∂Cσ′(zL)=((wl+1)T⋅δl+1)σ′(zl)=δl=δl⋅(al−1)T