示意图
符号说明
y
0
:
输
入
,
y
∈
R
s
0
×
1
z
l
:
第
l
层
输
出
z
(
l
)
∈
R
s
l
×
1
y
l
:
第
l
层
输
出
y
(
l
)
∈
R
s
l
×
1
σ
:
激
活
函
数
s
l
:
表
示
l
层
y
(
l
)
z
(
l
)
的
向
量
维
数
t
:
表
示
真
实
值
L
:
一
共
L
层
f
i
l
:
表
示
∂
y
i
l
∂
z
i
l
I
(
i
)
:
表
示
为
列
向
量
,
且
在
第
i
行
为
1
,
其
余
位
置
为
0
;
δ
i
l
:
表
示
∂
E
∂
y
i
l
δ
l
:
表
示
∂
E
∂
y
l
,
即
为
:
(
δ
1
l
,
δ
2
l
,
⋯
 
,
δ
s
l
l
)
\begin{aligned} \boldsymbol{y}^{0}: & 输入,\boldsymbol{y}\in \mathbb{R}^{s0\times1} \\ \boldsymbol{z}^{l}: &第l层输出\boldsymbol{z}^{(l)} \in \mathbb{R}^{sl \times 1} \\ \boldsymbol{y}^{l}:&第l层输出\boldsymbol{y}^{(l)} \in \mathbb{R}^{sl \times 1} \\ \boldsymbol{\sigma}:&激活函数\\ sl:& 表示l层 \boldsymbol{y}^{(l)} \boldsymbol{z}^{(l)}的向量维数 \\ \boldsymbol{t}: &表示真实值 \\ L:&一共L层 \\ f^l_{i}:& 表示\frac{\partial{y^l_i}}{\partial{z^l_i}} \\ \boldsymbol{I}(i):&表示为列向量,且在第i行为1,其余位置为0; \\ \delta^l_i: &表示 \frac{\partial{E}}{\partial{y^l_i}} \\ \boldsymbol{\delta}^l: &表示\frac{\partial{E}}{\partial{\boldsymbol{y}^l}} , 即为: \begin{pmatrix} \delta^l_1 ,\delta^l_2 ,\cdots,\delta^l_{sl} \end{pmatrix} \end{aligned}
y0:zl:yl:σ:sl:t:L:fil:I(i):δil:δl:输入,y∈Rs0×1第l层输出z(l)∈Rsl×1第l层输出y(l)∈Rsl×1激活函数表示l层y(l)z(l)的向量维数表示真实值一共L层表示∂zil∂yil表示为列向量,且在第i行为1,其余位置为0;表示∂yil∂E表示∂yl∂E,即为:(δ1l,δ2l,⋯,δsll)
它们之间的关系:
z
l
=
w
l
∗
y
l
−
1
z
i
l
=
∑
j
=
1
s
(
l
−
1
)
w
i
j
∗
y
j
l
−
1
y
l
=
σ
(
y
l
)
y
i
l
=
σ
(
z
i
l
)
f
(
x
)
\begin{aligned} \boldsymbol{z}^{l} &=\boldsymbol{w}^{l}*\boldsymbol{y}^{l-1}\\ z^{l}_i &= \sum_{j=1}^{s(l-1)}w_{ij}*y^{l-1}_j \\ \boldsymbol{y}^{l} &=\boldsymbol{\sigma}(\boldsymbol{y}^{l}) \\ y^{l}_i &= \sigma(z^l_i) \end{aligned} f(\boldsymbol{x})
zlzilylyil=wl∗yl−1=j=1∑s(l−1)wij∗yjl−1=σ(yl)=σ(zil)f(x)
矩阵相关求导说明
符号说明
y
:
列
向
量
,
y
∈
R
n
×
1
\boldsymbol{y}:列向量,\boldsymbol{y} \in \mathbb{R}^{n \times 1}
y:列向量,y∈Rn×1
x
:
列
向
量
,
x
∈
R
m
×
1
\boldsymbol{x}:列向量,\boldsymbol{x} \in \mathbb{R}^{m \times 1}
x:列向量,x∈Rm×1
f
(
x
)
:
实
值
标
量
函
数
,
记
做
f
:
R
m
→
R
f(\boldsymbol{x}):实值标量函数,记做 f: \mathbb{R}^m \to \mathbb{R}
f(x):实值标量函数,记做f:Rm→R
公式
∂
y
T
∂
x
=
(
∂
y
1
∂
x
1
⋯
∂
y
n
∂
x
1
⋮
⋮
∂
y
1
∂
x
m
⋯
∂
y
n
∂
x
m
)
∂
y
T
∂
y
=
E
n
×
n
f
(
x
)
∂
x
=
[
f
(
x
)
∂
x
1
,
⋯
 
,
f
(
x
)
∂
x
m
]
T
\begin{aligned} \frac{\partial{\boldsymbol{y}^ \mathrm{T}}}{\partial{\boldsymbol{x}}} &= \begin{pmatrix} \frac{\partial{y_1}}{\partial{x_1}} & \cdots & \frac{\partial{y_n}}{\partial{x_1}} \\ \vdots & & \vdots \\ \frac{\partial{y_1}}{\partial{x_m}} & \cdots & \frac{\partial{y_n}}{\partial{x_m}} \end{pmatrix} \\ \frac{\partial{\boldsymbol{y}^ \mathrm{T}}}{\partial{\boldsymbol{y}}} &= \mathbf{E}_{n \times n} \\ \frac{f(\boldsymbol{x})}{\partial{\boldsymbol{x}}} &= [ \frac{f(\boldsymbol{x})}{\partial{x_1}} , \cdots ,\frac{f(\boldsymbol{x})}{\partial{x_m}}]^{\mathrm{T}} \end{aligned}
∂x∂yT∂y∂yT∂xf(x)=⎝⎜⎛∂x1∂y1⋮∂xm∂y1⋯⋯∂x1∂yn⋮∂xm∂yn⎠⎟⎞=En×n=[∂x1f(x),⋯,∂xmf(x)]T
公式推导
误差定义
E
=
1
m
∑
p
=
1
m
(
E
p
)
E
p
=
1
2
(
y
L
−
t
L
)
2
=
1
2
∑
i
=
1
s
L
(
y
i
L
−
t
i
)
2
\begin{aligned} E &=\frac{1}{m}\sum_{p=1}^{m}(E_p) \\ E_p &= \frac{1}{2}(\boldsymbol{y}^L - \boldsymbol{t}^L)^2 \\ &=\frac{1}{2}\sum_{i=1}^{sL}(y^L_i - t_i)^2 \end{aligned}
EEp=m1p=1∑m(Ep)=21(yL−tL)2=21i=1∑sL(yiL−ti)2
其中m为样本数,为了推导简单,让m=1
求
∂
E
∂
w
i
j
L
\frac{\partial{E}}{\partial{w^L_{ij}}}
∂wijL∂E
几点说明
∂
z
k
l
∂
w
i
j
l
=
{
z
j
l
−
1
z
=
i
0
k
≠
i
∂
z
l
∂
w
i
j
=
[
∂
z
1
l
∂
w
i
j
l
,
⋯
 
,
∂
z
s
l
l
∂
w
i
j
l
]
T
∈
R
s
l
×
1
=
I
(
i
)
.
z
i
l
−
1
∂
y
l
∂
(
z
l
)
T
=
(
∂
y
1
l
∂
z
1
l
⋯
∂
y
1
l
∂
z
s
l
l
⋮
⋮
∂
y
s
l
l
∂
z
1
l
⋯
∂
y
s
l
l
∂
z
s
l
l
)
∈
R
s
l
×
s
l
=
(
f
1
l
f
2
l
⋱
f
(
s
l
)
l
)
∂
z
l
∂
(
y
l
−
1
)
T
=
(
∂
z
1
l
∂
y
1
l
−
1
⋯
∂
z
1
l
∂
y
s
(
l
−
1
)
l
−
1
⋮
⋮
∂
z
s
l
l
∂
y
1
l
−
1
⋯
∂
z
s
l
l
∂
y
s
(
l
−
1
)
(
l
−
1
)
)
=
(
w
11
l
⋯
w
(
s
(
l
−
1
)
)
1
l
⋮
⋮
w
(
s
l
)
1
l
⋯
w
(
s
l
)
(
s
(
l
−
1
)
)
l
)
∈
R
s
l
×
s
(
l
−
1
)
∂
y
l
∂
y
l
−
1
=
∂
y
l
∂
z
l
.
∂
z
l
∂
y
l
−
1
=
∂
y
l
∂
(
z
l
)
T
.
∂
(
z
l
)
T
∂
z
l
.
∂
z
l
∂
(
y
l
−
1
)
T
.
∂
(
y
l
−
1
)
T
∂
y
l
−
1
=
∂
y
l
∂
(
z
l
)
T
.
∂
z
l
∂
(
y
l
−
1
)
T
=
(
f
1
l
w
11
l
⋯
f
1
l
w
(
s
(
l
−
1
)
)
1
⋮
⋮
f
s
l
l
w
(
s
l
)
1
l
⋯
f
s
l
l
w
(
s
l
)
(
s
(
l
−
1
)
)
l
)
∈
R
s
l
×
s
(
l
−
1
)
∂
E
∂
(
y
L
)
T
=
[
y
1
L
−
t
1
,
⋯
 
,
y
s
L
L
−
t
s
L
]
∂
z
l
∂
z
i
l
−
1
=
[
w
1
i
l
,
w
2
i
l
,
⋯
 
,
w
(
s
l
)
i
l
]
T
∂
y
l
∂
y
i
l
−
1
=
∂
y
l
∂
z
l
.
∂
z
l
∂
y
i
l
−
1
=
(
f
1
l
f
2
l
⋱
f
(
s
l
)
l
)
.
(
w
1
i
l
w
2
i
l
⋯
w
(
s
l
)
i
l
)
=
(
f
1
l
w
1
i
l
f
2
l
w
2
i
l
⋮
f
(
s
l
)
l
w
(
s
l
)
i
l
)
\begin{aligned} \frac{\partial{z_k^l}}{\partial{w^l_{ij}}} &= \begin{cases} z^{l-1}_j & z = i \\ 0 & k \ne i \end{cases} \\ \frac{\partial{\boldsymbol{z}^l } }{\partial{w_{ij}}} &= [\frac{\partial{z_1^l}}{\partial{w_{ij}^l}} ,\cdots,\frac{\partial{z_{sl}^l}}{\partial{w_{ij}^l}}]^{\mathrm{T}} \in \mathbb{R}^{sl \times 1} \\ &=\boldsymbol{I}(i).z_i^{l-1} \\ \\ \frac{\partial{\boldsymbol{y}^l}}{\partial{(\boldsymbol{z}^{l})^{\mathrm{T}}}} &= \begin{pmatrix} \frac{\partial{y_1^l}}{\partial{z_1^l}}& \cdots & \frac{\partial{y_1^l}}{\partial{z_{sl}^l}} \\ \vdots & & \vdots \\ \frac{\partial{y_{sl}^l}}{\partial{z_1^l}}& \cdots & \frac{\partial{y_{sl}^l}}{\partial{z_{sl}^l}} \end{pmatrix} \in \mathbb{R}^{sl \times sl} \\ &= \begin{pmatrix} f^l_1& & & \\ &f^l_2 \\ & &\ddots \\ & & &f^l_{(sl)} \end{pmatrix} \\ \\ \frac{\partial{\boldsymbol{z}^{l}}}{\partial{(\boldsymbol{y}^{l-1})^{\mathrm{T}}}}&= \begin{pmatrix} \frac{\partial{z_1^l}}{\partial{y_1^{l-1}}}& \cdots & \frac{\partial{z_1^l}}{\partial{y_{s(l-1)}^{l-1}}} \\ \vdots & & \vdots \\ \frac{\partial{z_{sl}^l}}{\partial{y_1^{l-1}}}& \cdots & \frac{\partial{z_{sl}^l}}{\partial{y_{s(l-1)}^{(l-1)}}} \end{pmatrix} \\ &= \begin{pmatrix} w_{11}^l& \cdots & w_{(s(l-1))1}^l \\ \vdots & & \vdots \\ w_{(sl)1}^l& \cdots & w_{(sl)(s(l-1))}^l \end{pmatrix} \in \mathbb{R}^{sl \times s(l-1)} \\ \\ \frac{\partial{\boldsymbol{y}^l}}{\partial{\boldsymbol{y}^{l-1}}} &= \frac{\partial{\boldsymbol{y}^l}}{\partial{\boldsymbol{z}^{l}}} .\frac{\partial{\boldsymbol{z}^l}}{\partial{\boldsymbol{y}^{l-1}}} \\ &=\frac{\partial{\boldsymbol{y}^l}}{\partial{(\boldsymbol{z}^{l})^{\mathrm{T}}}} . \frac{\partial{(\boldsymbol{z}^{l})^{\mathrm{T}}}}{\partial{\boldsymbol{z}^{l}}} . \frac{\partial{\boldsymbol{z}^{l}}}{\partial{(\boldsymbol{y}^{l-1})^{\mathrm{T}}}} . \frac{\partial{(\boldsymbol{y}^{l-1})^{\mathrm{T}}}}{\partial{\boldsymbol{y}^{l-1}}} \\ &= \frac{\partial{\boldsymbol{y}^l}}{\partial{(\boldsymbol{z}^{l})^{\mathrm{T}}}} .\frac{\partial{\boldsymbol{z}^{l}}}{\partial{(\boldsymbol{y}^{l-1})^{\mathrm{T}}}} \\ &= \begin{pmatrix} f^l_1w_{11}^l & \cdots & f^l_1w_{(s(l-1))1} \\ \vdots & & \vdots \\ f^l_{sl}w_{(sl)1}^l& \cdots & f^l_{sl}w_{(sl)(s(l-1))}^l \end{pmatrix} \in \mathbb{R^{sl \times s(l-1)}} \\ \\ \frac{\partial{E}}{\partial{(\boldsymbol{y}^L)^{\mathrm{T}}}} &=[y^L_1 - t_1,\cdots, y^L_{sL} - t_{sL}] \\ \\ \frac{\partial{\boldsymbol{z}^l}}{\partial{z^{l-1}_i}} &=[w^l_{1i},w^l_{2i},\cdots,w^l_{(sl)i}]^{\mathrm{T}} \\ \\ \frac{\partial{\boldsymbol{y}^l}}{\partial{y^{l-1}_i}} &= \frac{\partial{\boldsymbol{y}^l}}{\partial{\boldsymbol{z}^l}}. \frac{\partial{\boldsymbol{z}^l}}{\partial{y^{l-1}_i}} \\ &= \begin{pmatrix} f^l_1& & & \\ &f^l_2 \\ & &\ddots \\ & & &f^l_{(sl)} \end{pmatrix}. \begin{pmatrix} w^l_{1i} \\ w^l_{2i} \\ \cdots \\ w^l_{(sl)i} \end{pmatrix} \\ &= \begin{pmatrix} f^l_1w^l_{1i} \\ f^l_2 w^l_{2i} \\ \vdots \\ f^l_{(sl)} w^l_{(sl)i} \end{pmatrix} \end{aligned}
∂wijl∂zkl∂wij∂zl∂(zl)T∂yl∂(yl−1)T∂zl∂yl−1∂yl∂(yL)T∂E∂zil−1∂zl∂yil−1∂yl={zjl−10z=ik̸=i=[∂wijl∂z1l,⋯,∂wijl∂zsll]T∈Rsl×1=I(i).zil−1=⎝⎜⎜⎜⎛∂z1l∂y1l⋮∂z1l∂ysll⋯⋯∂zsll∂y1l⋮∂zsll∂ysll⎠⎟⎟⎟⎞∈Rsl×sl=⎝⎜⎜⎛f1lf2l⋱f(sl)l⎠⎟⎟⎞=⎝⎜⎜⎜⎜⎛∂y1l−1∂z1l⋮∂y1l−1∂zsll⋯⋯∂ys(l−1)l−1∂z1l⋮∂ys(l−1)(l−1)∂zsll⎠⎟⎟⎟⎟⎞=⎝⎜⎛w11l⋮w(sl)1l⋯⋯w(s(l−1))1l⋮w(sl)(s(l−1))l⎠⎟⎞∈Rsl×s(l−1)=∂zl∂yl.∂yl−1∂zl=∂(zl)T∂yl.∂zl∂(zl)T.∂(yl−1)T∂zl.∂yl−1∂(yl−1)T=∂(zl)T∂yl.∂(yl−1)T∂zl=⎝⎜⎛f1lw11l⋮fsllw(sl)1l⋯⋯f1lw(s(l−1))1⋮fsllw(sl)(s(l−1))l⎠⎟⎞∈Rsl×s(l−1)=[y1L−t1,⋯,ysLL−tsL]=[w1il,w2il,⋯,w(sl)il]T=∂zl∂yl.∂yil−1∂zl=⎝⎜⎜⎛f1lf2l⋱f(sl)l⎠⎟⎟⎞.⎝⎜⎜⎛w1ilw2il⋯w(sl)il⎠⎟⎟⎞=⎝⎜⎜⎜⎛f1lw1ilf2lw2il⋮f(sl)lw(sl)il⎠⎟⎟⎟⎞
求解
∂
E
∂
w
i
j
L
=
∂
E
∂
y
L
.
∂
y
L
∂
w
i
j
L
=
∂
E
∂
(
y
L
)
T
.
∂
(
y
L
)
T
∂
y
L
.
∂
y
L
∂
w
i
j
L
.
=
(
y
1
L
−
t
1
,
⋯
 
,
y
s
L
L
−
t
s
L
)
.
I
(
i
)
.
z
i
L
−
1
∂
E
∂
w
i
j
L
−
1
=
∂
E
∂
y
L
.
∂
y
L
∂
y
L
−
1
.
∂
y
L
−
1
∂
w
i
j
L
−
1
=
(
y
1
L
−
t
1
,
⋯
 
,
y
s
L
L
−
t
s
L
)
.
(
f
1
L
w
11
L
⋯
f
1
L
w
(
s
(
L
−
1
)
)
1
⋮
⋮
f
s
L
L
w
(
s
L
)
1
L
⋯
f
s
L
L
w
(
s
L
)
(
s
(
L
−
1
)
)
L
)
.
I
(
i
)
.
z
i
l
−
1
=
∑
k
=
1
s
l
(
y
k
L
−
t
k
)
f
1
L
w
k
i
L
z
j
L
−
1
\begin{aligned} \frac{\partial{E}}{\partial{w^L_{ij}}} &= \frac{\partial{E}}{\partial{\boldsymbol{y}^L}} . \frac{\partial{\boldsymbol{y}^L}}{\partial{w^L_{ij}}} \\ &=\frac{\partial{E}}{\partial{(\boldsymbol{y}^L)^{\mathrm{T}}}} . \frac{\partial{(\boldsymbol{y}^L)^{\mathrm{T}}}}{\partial{\boldsymbol{y}^L}} . \frac{\partial{\boldsymbol{y}^L}}{\partial{w^L_{ij}}} . \\ &= \begin{pmatrix} y^L_1 - t_1,\cdots, y^L_{sL} - t_{sL} \end{pmatrix} . \boldsymbol{I}(i).z_i^{L-1} \\ \\ \frac{\partial{E}}{\partial{w^{L-1}_{ij}}} &= \frac{\partial{E}}{\partial{\boldsymbol{y}^L}} . \frac{\partial{\boldsymbol{y}^L}}{\partial{\boldsymbol{y}^{L-1}}}. \frac{\partial{\boldsymbol{y}^{L-1}}}{\partial{w^{L-1}_{ij}}} \\ &= \begin{pmatrix} y^L_1 - t_1,\cdots, y^L_{sL} - t_{sL} \end{pmatrix} . \begin{pmatrix} f^L_1w_{11}^L & \cdots & f^L_1w_{(s(L-1))1} \\ \vdots & & \vdots \\ f^L_{sL}w_{(sL)1}^L& \cdots & f^L_{sL}w_{(sL)(s(L-1))}^L \end{pmatrix} . \boldsymbol{I}(i).z_i^{l-1} \\ &= \sum_{k=1}^{sl}(y_k^L-t_k) f_1^L w^L_{ki}z^{L-1}_j \end{aligned}
∂wijL∂E∂wijL−1∂E=∂yL∂E.∂wijL∂yL=∂(yL)T∂E.∂yL∂(yL)T.∂wijL∂yL.=(y1L−t1,⋯,ysLL−tsL).I(i).ziL−1=∂yL∂E.∂yL−1∂yL.∂wijL−1∂yL−1=(y1L−t1,⋯,ysLL−tsL).⎝⎜⎛f1Lw11L⋮fsLLw(sL)1L⋯⋯f1Lw(s(L−1))1⋮fsLLw(sL)(s(L−1))L⎠⎟⎞.I(i).zil−1=k=1∑sl(ykL−tk)f1LwkiLzjL−1
另一种定义方法
∂
E
∂
w
i
j
L
=
∂
E
∂
y
i
L
.
∂
y
i
L
∂
w
i
j
L
=
(
y
i
L
−
h
i
)
z
i
L
−
1
∂
E
∂
w
i
j
L
−
1
=
∂
E
∂
y
L
.
∂
y
L
∂
y
i
L
−
1
.
∂
y
i
L
−
1
∂
w
i
j
L
−
1
=
(
y
1
L
−
t
1
,
⋯
 
,
y
s
L
L
−
t
s
L
)
.
(
f
1
L
w
1
i
L
f
2
L
w
2
i
L
⋮
f
(
s
L
)
L
w
(
s
L
)
i
L
)
.
z
i
L
−
1
=
∑
k
=
1
s
l
(
y
k
L
−
t
k
)
f
1
L
w
k
i
L
z
j
L
−
1
δ
i
l
−
1
=
∂
E
∂
y
i
l
−
1
=
∂
E
∂
y
l
.
∂
y
l
∂
y
i
l
−
1
=
(
δ
1
l
,
δ
2
l
,
⋯
 
,
δ
s
l
l
)
.
(
f
1
l
w
1
i
l
f
2
l
w
2
i
l
⋮
f
(
s
l
)
l
w
(
s
l
)
i
l
)
=
∑
k
=
1
s
l
δ
k
l
f
k
l
w
(
s
l
)
i
l
∂
E
∂
w
i
j
L
=
∂
E
∂
y
i
L
.
∂
y
i
L
∂
w
i
j
L
=
δ
i
L
z
i
L
−
1
=
(
y
i
L
−
h
i
)
z
i
L
−
1
∂
E
∂
w
i
j
l
=
∂
E
∂
y
l
.
∂
y
i
l
∂
w
i
j
l
=
δ
i
l
z
i
l
−
1
=
∑
k
=
1
s
(
l
+
1
)
δ
k
(
l
+
1
)
f
k
l
+
1
w
k
i
l
+
1
z
j
l
−
1
\begin{aligned} \frac{\partial{E}}{\partial{w^L_{ij}}} &= \frac{\partial{E}}{\partial{y^L_i}} . \frac{\partial{y^L_i}}{\partial{w^L_{ij}}} \\ &= (y^L_i - h_i) z_i^{L-1} \\ \\ \frac{\partial{E}}{\partial{w^{L-1}_{ij}}} &= \frac{\partial{E}}{\partial{\boldsymbol{y}^L}} . \frac{\partial{\boldsymbol{y}^L}}{\partial{y^{L-1}_i}}. \frac{\partial{y^{L-1}_i}}{\partial{w^{L-1}_{ij}}} \\ &= \begin{pmatrix} y^L_1 - t_1,\cdots, y^L_{sL} - t_{sL} \end{pmatrix} . \begin{pmatrix} f^L_1w^L_{1i} \\ f^L_2 w^L_{2i} \\ \vdots \\ f^L_{(sL)} w^L_{(sL)i} \end{pmatrix} .z_i^{L-1} \\ &= \sum_{k=1}^{sl}(y_k^L-t_k) f_1^L w^L_{ki}z^{L-1}_j \\ \\ \delta^{l-1}_i&=\frac{\partial{E}}{\partial{y^{l-1}_i}} \\ &=\frac{\partial{E}}{\partial{\boldsymbol{y}^{l}}}. \frac{\partial{\boldsymbol{y}^{l}}}{\partial{y^{l-1}_i}} \\ &= \begin{pmatrix} \delta^l_1 ,\delta^l_2 ,\cdots,\delta^l_{sl} \end{pmatrix}. \begin{pmatrix} f^l_1w^l_{1i} \\ f^l_2 w^l_{2i} \\ \vdots \\ f^l_{(sl)} w^l_{(sl)i} \end{pmatrix} \\ &=\sum_{k=1}^{sl}\delta^l_k f^l_k w^l_{(sl)i} \\ \\ \frac{\partial{E}}{\partial{w^L_{ij}}} &= \frac{\partial{E}}{\partial{y^L_i}} . \frac{\partial{y^L_i}}{\partial{w^L_{ij}}} \\ &= \delta^L_i z_i^{L-1} \\ &= (y^L_i - h_i) z_i^{L-1} \\ \\ \frac{\partial{E}}{\partial{w^{l}_{ij}}} &= \frac{\partial{E}}{\partial{\boldsymbol{y}^l}} . \frac{\partial{y^{l}_i}}{\partial{w^{l}_{ij}}} \\ &=\delta^{l}_i z_i^{l-1} \\ &=\sum_{k=1}^{s(l+1)} \delta^{(l+1)}_k f^{l+1}_k w^{l+1}_{ki}z_j^{l-1} \end{aligned}
∂wijL∂E∂wijL−1∂Eδil−1∂wijL∂E∂wijl∂E=∂yiL∂E.∂wijL∂yiL=(yiL−hi)ziL−1=∂yL∂E.∂yiL−1∂yL.∂wijL−1∂yiL−1=(y1L−t1,⋯,ysLL−tsL).⎝⎜⎜⎜⎛f1Lw1iLf2Lw2iL⋮f(sL)Lw(sL)iL⎠⎟⎟⎟⎞.ziL−1=k=1∑sl(ykL−tk)f1LwkiLzjL−1=∂yil−1∂E=∂yl∂E.∂yil−1∂yl=(δ1l,δ2l,⋯,δsll).⎝⎜⎜⎜⎛f1lw1ilf2lw2il⋮f(sl)lw(sl)il⎠⎟⎟⎟⎞=k=1∑slδklfklw(sl)il=∂yiL∂E.∂wijL∂yiL=δiLziL−1=(yiL−hi)ziL−1=∂yl∂E.∂wijl∂yil=δilzil−1=k=1∑s(l+1)δk(l+1)fkl+1wkil+1zjl−1