目录
1. ∂ e ∂ W l \frac{\partial e}{\partial W^{l}} ∂Wl∂e与 ∂ e ∂ B l \frac{\partial e}{\partial B^{l}} ∂Bl∂e
损失函数 : 损失函数: 损失函数: e = g ( n e t l ) e=g\left(net^{l}\right) e=g(netl) 第 l 层净输入 : 第l层净输入: 第l层净输入: n e t l = W l ⋅ O l − 1 + B l net^{l}=W^{l}\cdot O^{l-1}+B^{l} netl=Wl⋅Ol−1+Bl 损失函数的微分 : 损失函数的微分: 损失函数的微分: d e = t r ( ∂ e ∂ n e t l T ⋅ d n e t l ) de=tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot dnet^{l}\right) de=tr(∂netl∂eT⋅dnetl) 净输入的微分 : 净输入的微分: 净输入的微分: d n e t l = d W l ⋅ O l − 1 + d B l dnet^{l} =dW^{l}\cdot O^{l-1}+dB^{l} dnetl=dWl⋅Ol−1+dBl
d e = t r ( ∂ e ∂ n e t l T ⋅ ( d W l ⋅ O l − 1 + d B l ) ) de=tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot \left(dW^{l}\cdot O^{l-1}+dB^{l}\right)\right) de=tr(∂netl∂eT⋅(dWl⋅Ol−1+dBl)) = t r ( ∂ e ∂ n e t l T ⋅ d W l ⋅ O l − 1 + ∂ e ∂ n e t l T ⋅ d B l ) =tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot dW^{l}\cdot O^{l-1}+\frac{\partial e}{\partial net^{l}}^{T}\cdot dB^{l}\right) =tr(∂netl∂eT⋅dWl⋅Ol−1+∂netl∂eT⋅dBl) = t r ( O l − 1 ⋅ ∂ e ∂ n e t l T ⋅ d W l + ∂ e ∂ n e t l T ⋅ d B l ) =tr\left(O^{l-1}\cdot \frac{\partial e}{\partial net^{l}}^{T}\cdot dW^{l}+\frac{\partial e}{\partial net^{l}}^{T}\cdot dB^{l}\right) =tr(Ol−1⋅∂netl∂eT⋅dWl+∂netl∂eT⋅dBl)
损失函数对 l 层权值 W l 的偏导 : 损失函数对l层权值W^{l}的偏导: 损失函数对l层权值Wl的偏导: ∂ e ∂ W l = ( O l − 1 ⋅ ∂ e ∂ n e t l T ) T \frac{\partial e}{\partial W^{l}}=\left(O^{l-1}\cdot \frac{\partial e}{\partial net^{l}}^{T}\right)^{T} ∂Wl∂e=(Ol−1⋅∂netl∂eT)T = ∂ e ∂ n e t l ⋅ ( O l − 1 ) T =\frac{\partial e}{\partial net^{l}}\cdot \left(O^{l-1}\right)^{T} =∂netl∂e⋅(Ol−1)T 损失函数对 l 层偏移 B l 的偏导 : 损失函数对l层偏移B^{l}的偏导: 损失函数对l层偏移Bl的偏导: ∂ e ∂ B l = ∂ e ∂ n e t l \frac{\partial e}{\partial B^{l}}=\frac{\partial e}{\partial net^{l}} ∂Bl∂e=∂netl∂e
2. ∂ e ∂ n e t l \frac{\partial e}{\partial net^{l}} ∂netl∂e递推公式
损失函数 : 损失函数: 损失函数: e = g ( n e t l + 1 ) e=g\left(net^{l+1}\right) e=g(netl+1) 第 l + 1 层的净输入 : 第l+1层的净输入: 第l+1层的净输入: n e t l + 1 = W l + 1 ⋅ O l + B l + 1 net^{l+1}=W^{l+1}\cdot O^l+B^{l+1} netl+1=Wl+1⋅Ol+Bl+1 第 l 层净输出 : 第l层净输出: 第l层净输出: O l = f ( n e t l ) O^l=f\left(net^l\right) Ol=f(netl) 损失函数的微分 : 损失函数的微分: 损失函数的微分: d e = t r ( ∂ e ∂ n e t l + 1 T ⋅ d n e t l + 1 ) de=tr\left(\frac{\partial e}{\partial net^{l+1}}^T\cdot dnet^{l+1}\right) de=tr(∂netl+1∂eT⋅dnetl+1) 净输入的微分 : 净输入的微分: 净输入的微分: d n e t l + 1 = W l + 1 ⋅ d O l dnet^{l+1}=W^{l+1} \cdot dO^l dnetl+1=Wl+1⋅dOl 净输出的微分 : 净输出的微分: 净输出的微分: d O l = f ′ ( n e t l ) ⊙ d n e t l dO^l=f^{'}\left(net^{l}\right)\odot dnet^{l} dOl=f′(netl)⊙dnetl
d e = t r ( ∂ e ∂ n e t l + 1 T ⋅ W l + 1 ⋅ ( f ′ ( n e t l ) ⊙ d n e t l ) ) de=tr\left(\frac{\partial e}{\partial net^{l+1}}^T\cdot W^{l+1}\cdot \left(f^{'}\left(net^{l}\right) \odot dnet^{l}\right)\right) de=tr(∂netl+1∂eT⋅Wl+1⋅(f′(netl)⊙dnetl)) = t r ( ( ( ∂ e ∂ n e t l + 1 T ⋅ W l + 1 ) ⊙ ( f ′ ( n e t l ) ) T ) ⋅ d n e t l ) =tr\left (\left(\left(\frac{\partial e}{\partial net^{l+1}}^T\cdot W^{l+1}\right)\odot \left(f^{'}\left(net^{l}\right)\right)^{T}\right)\cdot dnet^{l}\right) =tr(((∂netl+1∂eT⋅Wl+1)⊙(f′(netl))T)⋅dnetl)
损失函数对第 l 层净输入与第 l + 1 层净输入偏导的递归式 : 损失函数对第l层净输入与第l+1层净输入偏导的递归式: 损失函数对第l层净输入与第l+1层净输入偏导的递归式: ∂ e ∂ n e t l = ( ( ∂ e ∂ n e t l + 1 T ⋅ W l + 1 ) ⊙ ( f ′ ( n e t l ) ) T ) T \frac{\partial e}{\partial net^{l}}=\left(\left(\frac{\partial e}{\partial net^{l+1}}^T\cdot W^{l+1}\right)\odot \left(f^{'}\left(net^{l}\right)\right)^{T}\right)^{T} ∂netl∂e=((∂netl+1∂eT⋅Wl+1)⊙(f′(netl))T)T = ( ∂ e ∂ n e t l + 1 T ⋅ W l + 1 ) T ⊙ f ′ ( n e t l ) =\left(\frac{\partial e}{\partial net^{l+1}}^T\cdot W^{l+1}\right)^{T}\odot f^{'}\left(net^{l}\right) =(∂netl+1∂eT⋅Wl+1)T⊙f′(netl) = ( ( W l + 1 ) T ⋅ ∂ e ∂ n e t l + 1 ) ⊙ f ′ ( n e t l ) =\left(\left(W^{l+1}\right)^{T}\cdot \frac{\partial e}{\partial net^{l+1}}\right)\odot f^{'}\left(net^{l}\right) =((Wl+1)T⋅∂netl+1∂e)⊙f′(netl)