反向传播算法推导

神经网络结构

在这里插入图片描述

1. 神经元总输入z和输出a

为了帮助计算偏导数,我们为每个神经元引入了总输入z和输出a

1.1. 第二层输入 z 2 z^2 z2

z 1 2 = w 11 2 a 1 1 + w 12 2 a 2 1 + w 13 2 a 3 1 + b 1 2 z 2 2 = w 21 2 a 1 1 + w 22 2 a 2 1 + w 23 2 a 3 1 + b 2 2 z 3 2 = w 31 2 a 1 1 + w 32 2 a 2 1 + w 33 2 a 3 1 + b 3 2 z 4 2 = w 41 2 a 1 1 + w 42 2 a 2 1 + w 43 2 a 3 1 + b 4 2 z^2_1 = w^2_{11}a^1_1+w^2_{12}a^1_2+w^2_{13}a^1_3+b^2_1 \\ z^2_2 = w^2_{21}a^1_1+w^2_{22}a^1_2+w^2_{23}a^1_3+b^2_2 \\ z^2_3 = w^2_{31}a^1_1+w^2_{32}a^1_2+w^2_{33}a^1_3+b^2_3 \\ z^2_4 = w^2_{41}a^1_1+w^2_{42}a^1_2+w^2_{43}a^1_3+b^2_4 z12=w112a11+w122a21+w132a31+b12z22=w212a11+w222a21+w232a31+b22z32=w312a11+w322a21+w332a31+b32z42=w412a11+w422a21+w432a31+b42
写成矩阵形式:
[ z 1 2 z 2 2 z 3 2 z 4 2 ] = [ w 11 2 w 12 2 w 13 2 w 21 2 w 22 2 w 23 2 w 31 2 w 32 2 w 33 2 w 41 2 w 42 2 w 43 2 ] [ a 1 1 a 2 1 a 3 1 ] + [ b 1 2 b 2 2 b 3 2 b 4 2 ] z 2 = w 2 a 1 + b 2 \begin{bmatrix} z^2_1 \\ z^2_2 \\ z^2_3 \\ z^2_4 \end{bmatrix} = \begin{bmatrix} w^2_{11} & w^2_{12} & w^2_{13} \\ w^2_{21} & w^2_{22} & w^2_{23} \\ w^2_{31} & w^2_{32} & w^2_{33} \\ w^2_{41} & w^2_{42} & w^2_{43} \end{bmatrix} \begin{bmatrix} a^1_1 \\ a^1_2 \\ a^1_3 \end{bmatrix} + \begin{bmatrix} b^2_1 \\ b^2_2 \\ b^2_3 \\ b^2_4 \end{bmatrix} \\ z^2 = w^2a^1 + b^2 z12z22z32z42=w112w212w312w412w122w222w322w422w132w232w332w432a11a21a31+b12b22b32b42z2=w2a1+b2

1.2. 第二层输出 a 2 a^2 a2

a 1 2 = σ ( z 1 2 ) a 2 2 = σ ( z 2 2 ) a 3 2 = σ ( z 3 2 ) a 4 2 = σ ( z 4 2 ) a^2_1 = \sigma(z^2_1) \\ a^2_2 = \sigma(z^2_2) \\ a^2_3 = \sigma(z^2_3) \\ a^2_4 = \sigma(z^2_4) a12=σ(z12)a22=σ(z22)a32=σ(z32)a42=σ(z42)
写成矩阵形式:
[ a 1 2 a 2 2 a 3 2 a 4 2 ] = σ ( [ z 1 2 z 2 2 z 3 2 z 4 2 ] ) a 2 = σ ( z 2 ) \begin{bmatrix} a^2_1 \\ a^2_2 \\ a^2_3 \\ a^2_4 \end{bmatrix} = \sigma \left ( \begin{bmatrix} z^2_1 \\ z^2_2 \\ z^2_3 \\ z^2_4 \end{bmatrix} \right ) \\ a^2 = \sigma(z^2) a12a22a32a42=σz12z22z32z42a2=σ(z2)

1.3. 第三层输入 z 3 z^3 z3

z 1 3 = w 11 3 a 1 2 + w 12 3 a 2 2 + w 13 3 a 3 2 + w 14 3 a 4 2 + b 1 3 z 2 3 = w 21 3 a 1 2 + w 22 3 a 2 2 + w 23 3 a 3 2 + w 24 3 a 4 2 + b 2 3 z^3_1 = w^3_{11}a^2_1 + w^3_{12} a^2_2 + w^3_{13} a^2_3 + w^3_{14} a^2_4 + b^3_1 \\ z^3_2 = w^3_{21} a^2_1 + w^3_{22} a^2_2 + w^3_{23} a^2_3 + w^3_{24} a^2_4 + b^3_2 z13=w113a12+w123a22+w133a32+w143a42+b13z23=w213a12+w223a22+w233a32+w243a42+b23
写成矩阵形式:
[ z 1 3 z 2 3 ] = [ w 11 3 w 12 3 w 13 3 w 14 3 w 21 3 w 22 3 w 23 3 w 24 3 ] [ a 1 2 a 2 2 a 3 2 a 4 2 ] + [ b 1 3 b 2 3 ] z 3 = w 3 a 2 + b 3 \begin{bmatrix} z^3_1 \\ z^3_2 \end{bmatrix} = \begin{bmatrix} w^3_{11} & w^3_{12} & w^3_{13} & w^3_{14} \\ w^3_{21} & w^3_{22} & w^3_{23} & w^3_{24} \end{bmatrix} \begin{bmatrix} a^2_1 \\ a^2_2 \\ a^2_3 \\ a^2_4 \end{bmatrix} + \begin{bmatrix} b^3_1 \\ b^3_2 \end{bmatrix} \\ z^3 = w^3 a^2 + b^3 [z13z23]=[w113w213w123w223w133w233w143w243]a12a22a32a42+[b13b23]z3=w3a2+b3

1.4. 第三层输出 a 3 a^3 a3

a 1 3 = σ ( z 1 3 ) a 2 3 = σ ( z 2 3 ) a^3_1 = \sigma(z^3_1) \\ a^3_2 = \sigma(z^3_2) a13=σ(z13)a23=σ(z23)
写成矩阵形式:
[ a 1 3 a 2 3 ] = σ ( [ z 1 3 z 2 3 ] ) a 3 = σ ( z 3 ) \begin{bmatrix} a^3_1 \\ a^3_2 \end{bmatrix} = \sigma \left ( \begin{bmatrix} z^3_1 \\ z^3_2 \end{bmatrix} \right ) \\ a^3 = \sigma(z^3) [a13a23]=σ([z13z23])a3=σ(z3)

2. 代价函数

假设我们使用二次代价函数,那么每个样本的代价为:
C = 1 2 ( ( a 1 3 − y 1 ) 2 + ( a 2 3 − y 2 ) 2 ) C=\frac{1}{2}((a^3_1-y_1)^2 + (a^3_2 - y_2)^2) C=21((a13y1)2+(a23y2)2)

3. 代价函数对第三层输入的偏导数 ∂ C ∂ z 3 \frac{\partial C}{\partial z^3} z3C

∂ C ∂ z 1 3 = ∂ C ∂ a 1 3 ∂ a 1 3 ∂ z 1 3 = ( a 1 3 − y 1 ) σ ′ ( z 1 3 ) ∂ C ∂ z 2 3 = ∂ C ∂ a 2 3 ∂ a 2 3 ∂ z 2 3 = ( a 2 3 − y 2 ) σ ′ ( z 2 3 ) \frac{\partial C}{\partial z^3_1} = \frac{\partial C}{\partial a^3_1} \frac{\partial a^3_1}{\partial z^3_1} = (a^3_1 - y_1) \sigma^{\prime} (z^3_1) \\ \frac{\partial C}{\partial z^3_2} = \frac{\partial C}{\partial a^3_2} \frac{\partial a^3_2}{\partial z^3_2} = (a^3_2 - y_2) \sigma^{\prime}(z^3_2) z13C=a13Cz13a13=(a13y1)σ(z13)z23C=a23Cz23a23=(a23y2)σ(z23)
写成矩阵形式:
[ ∂ C ∂ z 1 3 ∂ C ∂ z 2 3 ] = [ ( a 1 3 − y 1 ) σ ′ ( z 1 3 ) ( a 2 3 − y 2 ) σ ′ ( z 2 3 ) ] = ( [ a 1 3 a 2 3 ] − [ y 1 y 2 ] ) σ ′ ( [ z 1 3 z 2 3 ] ) ∂ C ∂ z 3 = ( a 3 − y ) σ ′ ( z 3 ) \begin{bmatrix} \frac{\partial C}{\partial z^3_1} \\ \frac{\partial C}{\partial z^3_2} \end{bmatrix} = \begin{bmatrix} (a^3_1 - y_1) \sigma^{\prime} (z^3_1) \\ (a^3_2 - y_2) \sigma^{\prime} (z^3_2) \end{bmatrix} = \left ( \begin{bmatrix} a^3_1 \\ a^3_2 \end{bmatrix} - \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} \right ) \sigma^{\prime} \left ( \begin{bmatrix} z^3_1 \\ z^3_2 \end{bmatrix} \right ) \\ \frac{\partial C}{\partial z^3} = (a^3 - y) \sigma^{\prime}(z^3) [z13Cz23C]=[(a13y1)σ(z13)(a23y2)σ(z23)]=([a13a23][y1y2])σ([z13z23])z3C=(a3y)σ(z3)

3.1. 代价函数C对第三层b的偏导数 ∂ C ∂ b 3 \frac{\partial C}{\partial b^3} b3C

∂ C ∂ b 1 3 = ∂ C ∂ a 1 3 ∂ a 1 3 ∂ z 1 3 ∂ z 1 3 ∂ b 1 3 = ( a 1 3 − y 1 ) σ ′ ( z 1 3 ) ∗ 1 ∂ C ∂ b 2 3 = ∂ C ∂ a 2 3 ∂ a 2 3 ∂ z 2 3 ∂ z 2 3 ∂ b 2 3 = ( a 2 3 − y 2 ) σ ′ ( z 2 3 ) ∗ 1 \frac{\partial C}{\partial b^3_1} = \frac{\partial C}{\partial a^3_1} \frac{\partial a^3_1}{\partial z^3_1} \frac{\partial z^3_1}{\partial b^3_1} = (a^3_1 - y_1) \sigma^{\prime}(z^3_1)*1 \\ \frac{\partial C}{\partial b^3_2} = \frac{\partial C}{\partial a^3_2} \frac{\partial a^3_2}{\partial z^3_2} \frac{\partial z^3_2}{\partial b^3_2} = (a^3_2 - y_2) \sigma^{\prime}(z^3_2) * 1 b13C=a13Cz13a13b13z13=(a13y1)σ(z13)1b23C=a23Cz23a23b23z23=(a23y2)σ(z23)1
写成矩阵形式:
[ ∂ C ∂ b 1 3 ∂ C ∂ b 2 3 ] = ( [ a 1 3 a 2 3 ] − [ y 1 y 2 ] ) σ ′ ( [ z 1 3 z 2 3 ] ) ∂ C ∂ b 3 = ∂ C ∂ z 3 = ( a 3 − y ) σ ′ ( z 3 ) \begin{bmatrix} \frac{\partial C}{\partial b^3_1} \\ \frac{\partial C}{\partial b^3_2} \end{bmatrix} = \left ( \begin{bmatrix} a^3_1 \\ a^3_2 \end{bmatrix} - \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} \right ) \sigma^{\prime} \left ( \begin{bmatrix} z^3_1 \\ z^3_2 \end{bmatrix} \right ) \\ \frac{\partial C}{\partial b^3} = \frac{\partial C}{\partial z^3}= (a^3 - y) \sigma^{\prime}(z^3) [b13Cb23C]=([a13a23][y1y2])σ([z13z23])b3C=z3C=(a3y)σ(z3)

3.2. 代价函数C对第三层w的偏导数 ∂ C ∂ w 3 \frac{\partial C}{\partial w^3} w3C

∂ C ∂ w 11 3 = ∂ C ∂ a 1 3 ∂ a 1 3 ∂ z 1 3 ∂ z 1 3 ∂ w 11 3 = ( a 1 3 − y 1 ) σ ′ ( z 1 3 ) a 1 2 ∂ C ∂ w 12 3 = ∂ C ∂ a 1 3 ∂ a 1 3 ∂ z 1 3 ∂ z 1 3 ∂ w 12 3 = ( a 1 3 − y 1 ) σ ′ ( z 1 3 ) a 2 2 ∂ C ∂ w 13 3 = ∂ C ∂ a 1 3 ∂ a 1 3 ∂ z 1 3 ∂ z 1 3 ∂ w 13 3 = ( a 1 3 − y 1 ) σ ′ ( z 1 3 ) a 3 2 ∂ C ∂ w 14 3 = ∂ C ∂ a 1 3 ∂ a 1 3 ∂ z 1 3 ∂ z 1 3 ∂ w 14 3 = ( a 1 3 − y 1 ) σ ′ ( z 1 3 ) a 4 2 − − − − − − − − − − l i n e − − − − − − − − − − ∂ C ∂ w 21 3 = ∂ C ∂ a 2 3 ∂ a 2 3 ∂ z 2 3 ∂ z 2 3 ∂ w 21 3 = ( a 2 3 − y 2 ) σ ′ ( z 2 3 ) a 1 2 ∂ C ∂ w 22 3 = ∂ C ∂ a 2 3 ∂ a 2 3 ∂ z 2 3 ∂ z 2 3 ∂ w 22 3 = ( a 2 3 − y 2 ) σ ′ ( z 2 3 ) a 2 2 ∂ C ∂ w 23 3 = ∂ C ∂ a 2 3 ∂ a 2 3 ∂ z 2 3 ∂ z 2 3 ∂ w 23 3 = ( a 2 3 − y 2 ) σ ′ ( z 2 3 ) a 3 2 ∂ C ∂ w 24 3 = ∂ C ∂ a 2 3 ∂ a 2 3 ∂ z 2 3 ∂ z 2 3 ∂ w 24 3 = ( a 2 3 − y 2 ) σ ′ ( z 2 3 ) a 4 2 − − − − − − − − − − l i n e − − − − − − − − − − \frac{\partial C}{\partial w^3_{11}} = \frac{\partial C}{\partial a^3_1} \frac{\partial a^3_1}{\partial z^3_1} \frac{\partial z^3_1}{\partial w^3_{11}} = (a^3_1 - y_1) \sigma^{\prime}(z^3_1)a^2_1 \\ \frac{\partial C}{\partial w^3_{12}} = \frac{\partial C}{\partial a^3_1} \frac{\partial a^3_1}{\partial z^3_1} \frac{\partial z^3_1}{\partial w^3_{12}} = (a^3_1 - y_1) \sigma^{\prime}(z^3_1) a^2_2 \\ \frac{\partial C}{\partial w^3_{13}} = \frac{\partial C}{\partial a^3_1} \frac{\partial a^3_1}{\partial z^3_1} \frac{\partial z^3_1}{\partial w^3_{13}} = (a^3_1 - y_1) \sigma^{\prime}(z^3_1)a^2_3 \\ \frac{\partial C}{\partial w^3_{14}} = \frac{\partial C}{\partial a^3_1} \frac{\partial a^3_1}{\partial z^3_1} \frac{\partial z^3_1}{\partial w^3_{14}} = (a^3_1 - y_1) \sigma^{\prime}(z^3_1) a^2_4 \\ ----------line---------- \\ \frac{\partial C}{\partial w^3_{21}} = \frac{\partial C}{\partial a^3_2} \frac{\partial a^3_2}{\partial z^3_2} \frac{\partial z^3_2}{\partial w^3_{21}} = (a^3_2 - y_2) \sigma^{\prime}(z^3_2) a^2_1 \\ \frac{\partial C}{\partial w^3_{22}} = \frac{\partial C}{\partial a^3_2} \frac{\partial a^3_2}{\partial z^3_2} \frac{\partial z^3_2}{\partial w^3_{22}} = (a^3_2 - y_2) \sigma^{\prime}(z^3_2)a^2_2 \\ \frac{\partial C}{\partial w^3_{23}} = \frac{\partial C}{\partial a^3_2} \frac{\partial a^3_2}{\partial z^3_2} \frac{\partial z^3_2}{\partial w^3_{23}} = (a^3_2 - y_2) \sigma^{\prime}(z^3_2) a^2_3 \\ \frac{\partial C}{\partial w^3_{24}} = \frac{\partial C}{\partial a^3_2} \frac{\partial a^3_2}{\partial z^3_2} \frac{\partial z^3_2}{\partial w^3_{24}} = (a^3_2 - y_2)\sigma^{\prime}(z^3_2) a^2_4 \\ ----------line---------- w113C=a13Cz13a13w113z13=(a13y1)σ(z13)a12w123C=a13Cz13a13w123z13=(a13y1)σ(z13)a22w133C=a13Cz13a13w133z13=(a13y1)σ(z13)a32w143C=a13Cz13a13w143z13=(a13y1)σ(z13)a42linew213C=a23Cz23a23w213z23=(a23y2)σ(z23)a12w223C=a23Cz23a23w223z23=(a23y2)σ(z23)a22w233C=a23Cz23a23w233z23=(a23y2)σ(z23)a32w243C=a23Cz23a23w243z23=(a23y2)σ(z23)a42line
写成矩阵形式:
[ ∂ C ∂ w 11 3 ∂ C ∂ w 12 3 ∂ C ∂ w 13 3 ∂ C ∂ w 14 3 ∂ C ∂ w 21 3 ∂ C ∂ w 22 3 ∂ C ∂ w 23 3 ∂ C ∂ w 24 3 ] = [ ( a 1 3 − y 1 ) σ ′ ( z 1 3 ) a 1 2 ( a 1 3 − y 1 ) σ ′ ( z 1 3 ) a 2 2 ( a 1 3 − y 1 ) σ ′ ( z 1 3 ) a 3 2 ( a 1 3 − y 1 ) σ ′ ( z 1 3 ) a 4 2 ( a 2 3 − y 2 ) σ ′ ( z 2 3 ) a 1 2 ( a 2 3 − y 2 ) σ ′ ( z 2 3 ) a 2 2 ( a 2 3 − y 2 ) σ ′ ( z 2 3 ) a 3 2 ( a 2 3 − y 2 ) σ ′ ( z 2 3 ) a 4 2 ] = [ ( a 1 3 − y 1 ) σ ′ ( z 1 3 ) ( a 2 3 − y 2 ) σ ′ ( z 2 3 ) ] [ a 1 2 a 2 2 a 3 2 a 4 2 ] = ( ( [ a 1 3 a 2 3 ] − [ y 1 y 2 ] ) σ ′ ( [ z 1 3 z 2 3 ] ) ) [ a 1 2 a 2 2 a 3 2 a 4 2 ] ∂ C ∂ w 3 = ∂ C ∂ z 3 ⋅ ( a 2 ) T \begin{aligned} \begin{bmatrix} \frac{\partial C}{\partial w^3_{11}} & \frac{\partial C}{\partial w^3_{12}} & \frac{\partial C}{\partial w^3_{13}} & \frac{\partial C}{\partial w^3_{14}} \\ \frac{\partial C}{\partial w^3_{21}} & \frac{\partial C}{\partial w^3_{22}} & \frac{\partial C}{\partial w^3_{23}} & \frac{\partial C}{\partial w^3_{24}} \end{bmatrix} &= \begin{bmatrix} (a^3_1 - y_1) \sigma^{\prime}(z^3_1)a^2_1 & (a^3_1 - y_1) \sigma^{\prime}(z^3_1)a^2_2 & (a^3_1 - y_1) \sigma^{\prime}(z^3_1)a^2_3 & (a^3_1 - y_1) \sigma^{\prime}(z^3_1)a^2_4 \\ (a^3_2 - y_2) \sigma^{\prime}(z^3_2)a^2_1 & (a^3_2 - y_2) \sigma^{\prime}(z^3_2)a^2_2 & (a^3_2 - y_2) \sigma^{\prime}(z^3_2)a^2_3 & (a^3_2 - y_2) \sigma^{\prime}(z^3_2)a^2_4 \end{bmatrix} \\ &=\begin{bmatrix} (a^3_1 - y_1) \sigma^{\prime}(z^3_1) \\ (a^3_2 - y_2) \sigma^{\prime}(z^3_2) \end{bmatrix} \begin{bmatrix} a^2_1 & a^2_2 & a^2_3 & a^2_4 \end{bmatrix} \\ &= \left ( \left ( \begin{bmatrix} a^3_1 \\ a^3_2 \end{bmatrix} - \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} \right ) \sigma^{\prime} \left ( \begin{bmatrix} z^3_1 \\ z^3_2 \end{bmatrix} \right ) \right ) \begin{bmatrix} a^2_1 & a^2_2 & a^2_3 & a^2_4 \end{bmatrix} \\ \frac{\partial C}{\partial w^3} &= \frac{\partial C}{\partial z^3} \cdot (a^2)^T \end{aligned} [w113Cw213Cw123Cw223Cw133Cw233Cw143Cw243C]w3C=[(a13y1)σ(z13)a12(a23y2)σ(z23)a12(a13y1)σ(z13)a22(a23y2)σ(z23)a22(a13y1)σ(z13)a32(a23y2)σ(z23)a32(a13y1)σ(z13)a42(a23y2)σ(z23)a42]=[(a13y1)σ(z13)(a23y2)σ(z23)][a12a22a32a42]=(([a13a23][y1y2])σ([z13z23]))[a12a22a32a42]=z3C(a2)T

4. 代价函数对第二层输入的偏导数 ∂ C ∂ z 2 \frac{\partial C}{\partial z^2} z2C

∂ C ∂ z 1 2 = ∂ C ∂ a 1 3 ∂ a 1 3 ∂ z 1 3 ∂ z 1 3 ∂ a 1 2 ∂ a 1 2 ∂ z 1 2 + ∂ C ∂ a 2 3 ∂ a 2 3 ∂ z 2 3 ∂ z 2 3 ∂ a 1 2 ∂ a 1 2 ∂ z 1 2 = ( a 1 3 − y 1 ) σ ′ ( z 1 3 ) w 11 3 σ ′ ( z 1 2 ) + ( a 2 3 − y 2 ) σ ′ ( z 2 3 ) w 21 3 σ ′ ( z 1 2 ) ∂ C ∂ z 2 2 = ∂ C ∂ a 1 3 ∂ a 1 3 ∂ z 1 3 ∂ z 1 3 ∂ a 2 2 ∂ a 2 2 ∂ z 2 2 + ∂ C ∂ a 2 3 ∂ a 2 3 ∂ z 2 3 ∂ z 2 3 ∂ a 2 2 ∂ a 2 2 ∂ z 2 2 = ( a 1 3 − y 1 ) σ ′ ( z 1 3 ) w 12 3 σ ′ ( z 2 2 ) + ( a 2 3 − y 2 ) σ ′ ( z 2 3 ) w 22 3 σ ′ ( z 2 2 ) ∂ C ∂ z 3 2 = ∂ C ∂ a 1 3 ∂ a 1 3 ∂ z 1 3 ∂ z 1 3 ∂ a 3 2 ∂ a 3 2 ∂ z 3 2 + ∂ C ∂ a 2 3 ∂ a 2 3 ∂ z 2 3 ∂ z 2 3 ∂ a 3 2 ∂ a 3 2 ∂ z 3 2 = ( a 1 3 − y 1 ) σ ′ ( z 1 3 ) w 13 3 σ ′ ( z 3 2 ) + ( a 2 3 − y 2 ) σ ′ ( z 2 3 ) w 23 3 σ ′ ( z 3 2 ) ∂ C ∂ z 4 2 = ∂ C ∂ a 1 3 ∂ a 1 3 ∂ z 1 3 ∂ z 1 3 ∂ a 4 2 ∂ a 4 2 ∂ z 4 2 + ∂ C ∂ a 2 3 ∂ a 2 3 ∂ z 2 3 ∂ z 2 3 ∂ a 4 2 ∂ a 4 2 ∂ z 4 2 = ( a 1 3 − y 1 ) σ ′ ( z 1 3 ) w 14 3 σ ′ ( z 4 2 ) + ( a 2 3 − y 2 ) σ ′ ( z 2 3 ) w 24 3 σ ′ ( z 4 2 ) \begin{aligned} \frac{\partial C}{\partial z^2_1}&=\frac{\partial C}{\partial a^3_1} \frac{\partial a^3_1}{\partial z^3_1} \frac{\partial z^3_1}{\partial a^2_1} \frac{\partial a^2_1}{\partial z^2_1} + \frac{\partial C}{\partial a^3_2} \frac{\partial a^3_2}{\partial z^3_2} \frac{\partial z^3_2}{\partial a^2_1} \frac{\partial a^2_1}{\partial z^2_1} = (a^3_1 - y_1) \sigma^{\prime}(z^3_1)w^3_{11} \sigma^{\prime}(z^2_1) + (a^3_2 - y_2) \sigma^{\prime}(z^3_2)w^3_{21} \sigma^{\prime}(z^2_1) \\ \frac{\partial C}{\partial z^2_2} &= \frac{\partial C}{\partial a^3_1} \frac{\partial a^3_1}{\partial z^3_1} \frac{\partial z^3_1}{\partial a^2_2} \frac{\partial a^2_2}{\partial z^2_2} + \frac{\partial C}{\partial a^3_2} \frac{\partial a^3_2}{\partial z^3_2} \frac{\partial z^3_2}{\partial a^2_2} \frac{\partial a^2_2}{\partial z^2_2} = (a^3_1 - y_1) \sigma^{\prime}(z^3_1)w^3_{12} \sigma^{\prime}(z^2_2) + (a^3_2 - y_2) \sigma^{\prime}(z^3_2)w^3_{22} \sigma^{\prime}(z^2_2) \\ \frac{\partial C}{\partial z^2_3} &= \frac{\partial C}{\partial a^3_1} \frac{\partial a^3_1}{\partial z^3_1} \frac{\partial z^3_1}{\partial a^2_3} \frac{\partial a^2_3}{\partial z^2_3} + \frac{\partial C}{\partial a^3_2} \frac{\partial a^3_2}{\partial z^3_2} \frac{\partial z^3_2}{\partial a^2_3} \frac{\partial a^2_3}{\partial z^2_3} =(a^3_1 - y_1) \sigma^{\prime}(z^3_1)w^3_{13}\sigma^{\prime}(z^2_3) + (a^3_2 - y_2) \sigma^{\prime}(z^3_2)w^3_{23} \sigma^{\prime}(z^2_3) \\ \frac{\partial C}{\partial z^2_4} &= \frac{\partial C}{\partial a^3_1} \frac{\partial a^3_1}{\partial z^3_1} \frac{\partial z^3_1}{\partial a^2_4} \frac{\partial a^2_4}{\partial z^2_4} + \frac{\partial C}{\partial a^3_2} \frac{\partial a^3_2}{\partial z^3_2} \frac{\partial z^3_2}{\partial a^2_4} \frac{\partial a^2_4}{\partial z^2_4} = (a^3_1 - y_1) \sigma^{\prime}(z^3_1)w^3_{14} \sigma^{\prime}(z^2_4) + (a^3_2 - y_2) \sigma^{\prime}(z^3_2) w^3_{24} \sigma^{\prime}(z^2_4) \end{aligned} z12Cz22Cz32Cz42C=a13Cz13a13a12z13z12a12+a23Cz23a23a12z23z12a12=(a13y1)σ(z13)w113σ(z12)+(a23y2)σ(z23)w213σ(z12)=a13Cz13a13a22z13z22a22+a23Cz23a23a22z23z22a22=(a13y1)σ(z13)w123σ(z22)+(a23y2)σ(z23)w223σ(z22)=a13Cz13a13a32z13z32a32+a23Cz23a23a32z23z32a32=(a13y1)σ(z13)w133σ(z32)+(a23y2)σ(z23)w233σ(z32)=a13Cz13a13a42z13z42a42+a23Cz23a23a42z23z42a42=(a13y1)σ(z13)w143σ(z42)+(a23y2)σ(z23)w243σ(z42)
写成矩阵的形式:
[ ∂ C ∂ z 1 2 ∂ C ∂ z 2 2 ∂ C ∂ z 3 2 ∂ C ∂ z 4 2 ] = [ ( a 1 3 − y 1 ) σ ′ ( z 1 3 ) w 11 3 σ ′ ( z 1 2 ) + ( a 2 3 − y 2 ) σ ′ ( z 2 3 ) w 21 3 σ ′ ( z 1 2 ) ( a 1 3 − y 1 ) σ ′ ( z 1 3 ) w 12 3 σ ′ ( z 2 2 ) + ( a 2 3 − y 2 ) σ ′ ( z 2 3 ) w 22 3 σ ′ ( z 2 2 ) ( a 1 3 − y 1 ) σ ′ ( z 1 3 ) w 13 3 σ ′ ( z 3 2 ) + ( a 2 3 − y 2 ) σ ′ ( z 2 3 ) w 23 3 σ ′ ( z 3 2 ) ( a 1 3 − y 1 ) σ ′ ( z 1 3 ) w 14 3 σ ′ ( z 4 2 ) + ( a 2 3 − y 2 ) σ ′ ( z 2 3 ) w 24 3 σ ′ ( z 4 2 ) ] = ( [ w 11 3 w 21 3 w 12 3 w 22 3 w 13 3 w 23 3 w 14 3 w 24 3 ] [ ( a 1 3 − y 1 ) σ ′ ( z 1 3 ) ( a 2 3 − y 2 ) σ ′ ( z 2 3 ) ] ) σ ′ ( [ z 1 2 z 2 2 z 3 2 z 4 2 ] ) ∂ C ∂ z 2 = ( ( w 3 ) T ⋅ ∂ C ∂ z 3 ) σ ′ ( z 2 ) \begin{aligned} \begin{bmatrix} \frac{\partial C}{\partial z^2_1} \\ \frac{\partial C}{\partial z^2_2} \\ \frac{\partial C}{\partial z^2_3} \\ \frac{\partial C}{\partial z^2_4} \end{bmatrix} &= \begin{bmatrix} (a^3_1 - y_1) \sigma^{\prime}(z^3_1)w^3_{11} \sigma^{\prime}(z^2_1) + (a^3_2 - y_2) \sigma^{\prime}(z^3_2)w^3_{21} \sigma^{\prime}(z^2_1) \\ (a^3_1 - y_1) \sigma^{\prime}(z^3_1)w^3_{12} \sigma^{\prime}(z^2_2) + (a^3_2 - y_2) \sigma^{\prime}(z^3_2)w^3_{22} \sigma^{\prime}(z^2_2) \\ (a^3_1 - y_1) \sigma^{\prime}(z^3_1)w^3_{13} \sigma^{\prime}(z^2_3) + (a^3_2 - y_2) \sigma^{\prime}(z^3_2)w^3_{23} \sigma^{\prime}(z^2_3) \\ (a^3_1 - y_1) \sigma^{\prime}(z^3_1)w^3_{14} \sigma^{\prime}(z^2_4) + (a^3_2 - y_2) \sigma^{\prime}(z^3_2)w^3_{24} \sigma^{\prime}(z^2_4) \end{bmatrix} \\ &=\left ( \begin{bmatrix} w^3_{11} & w^3_{21} \\ w^3_{12} & w^3_{22} \\ w^3_{13} & w^3_{23} \\ w^3_{14} & w^3_{24} \end{bmatrix} \begin{bmatrix} (a^3_1 - y_1) \sigma^{\prime}(z^3_1) \\ (a^3_2 - y_2) \sigma^{\prime} (z^3_2) \end{bmatrix} \right ) \sigma^{\prime} \left ( \begin{bmatrix} z^2_1 \\ z^2_2 \\ z^2_3 \\ z^2_4 \end{bmatrix} \right ) \\ \frac{\partial C}{\partial z^2} &= \left ( (w^3)^T \cdot \frac{\partial C}{\partial z^3} \right ) \sigma^{\prime} (z^2) \end{aligned} z12Cz22Cz32Cz42Cz2C=(a13y1)σ(z13)w113σ(z12)+(a23y2)σ(z23)w213σ(z12)(a13y1)σ(z13)w123σ(z22)+(a23y2)σ(z23)w223σ(z22)(a13y1)σ(z13)w133σ(z32)+(a23y2)σ(z23)w233σ(z32)(a13y1)σ(z13)w143σ(z42)+(a23y2)σ(z23)w243σ(z42)=w113w123w133w143w213w223w233w243[(a13y1)σ(z13)(a23y2)σ(z23)]σz12z22z32z42=((w3)Tz3C)σ(z2)

4.1. 代价函数对第二层b的偏导数

∂ C ∂ b 2 = ∂ C ∂ z 2 \frac{\partial C}{\partial b^2} = \frac{\partial C}{\partial z^2} b2C=z2C

4.2. 代价函数对第二层w的偏导数

∂ C ∂ w 2 = ∂ C ∂ z 2 ⋅ ( a 1 ) T \frac{\partial C}{\partial w^2} = \frac{\partial C}{\partial z^2} \cdot (a^1)^T w2C=z2C(a1)T

5. 简化公式

Let  δ l = ∂ C ∂ z l δ L = ∂ C ∂ a L σ ′ ( z L ) δ l = ( ( w l + 1 ) T ⋅ δ l + 1 ) σ ′ ( z l ) ∂ C ∂ b l = δ l ∂ C ∂ w l = δ l ⋅ ( a l − 1 ) T \begin{aligned} \text{Let } \delta^l &= \frac{\partial C}{\partial z^l} \\ \delta ^L &= \frac{\partial C}{\partial a^L} \sigma^{\prime}(z^L) \\ \delta ^l &= ((w^{l+1})^T \cdot \delta ^{l+1}) \sigma^{\prime}(z^l) \\ \frac{\partial C}{\partial b^l} &= \delta ^l \\ \frac{\partial C}{\partial w^l} &= \delta ^l \cdot (a^{l-1})^T \end{aligned} Let δlδLδlblCwlC=zlC=aLCσ(zL)=((wl+1)Tδl+1)σ(zl)=δl=δl(al1)T

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值