目录
- 一、单输入通道单输出通道公式
- 二、多输入通道多输出通道公式
本文给出卷积神经网络中卷积层的误差反向传播公式.
卷积层包括卷积与激活运算.一般卷积神经网络的卷积层包含数个输入通道与输出通道,每一层卷积层包含若干卷积核,所以本文先给出在单输入单输出的情况下的公式,然后给出多输入多输出情况下的公式.关于卷积运算,本文针对valid与full两种模式进行讨论.
一、单输入通道单输出通道公式
在单输入单输出情况下,卷积核 K l K^{l} Kl,输入 O l − 1 O^{l-1} Ol−1输出 n e t l net^{l} netl均为二维矩阵.
1. n e t l = K l ∗ v a l i d O l − 1 + B l net^{l}=K^{l}*_{valid}O^{l-1}+B^{l} netl=Kl∗validOl−1+Bl
卷积核 K l ∈ R n × n , K^{l}\in\mathbb{R}^{n\times n}, Kl∈Rn×n,
输入 O l − 1 ∈ R m × m , O^{l-1}\in\mathbb{R}^{m\times m}, Ol−1∈Rm×m,
输出 n e t l ∈ R m − n + 1 × m − n + 1 . net^{l}\in\mathbb{R}^{m-n+1\times m-n+1}. netl∈Rm−n+1×m−n+1.
1.1. ∂ e ∂ K l \frac{\partial e}{\partial K^{l}} ∂Kl∂e与 ∂ e ∂ B l \frac{\partial e}{\partial B^{l}} ∂Bl∂e
损失函数 : 损失函数: 损失函数: e = g ( n e t l ) e=g\left(net^{l}\right) e=g(netl) 第 l 层净输入 : 第l层净输入: 第l层净输入: n e t l = K l ∗ v a l i d O l − 1 + B l net^{l}=K^{l}*_{valid}O^{l-1}+B^{l} netl=Kl∗validOl−1+Bl 损失函数的微分 : 损失函数的微分: 损失函数的微分: d e = t r ( ∂ e ∂ n e t l T ⋅ d n e t l ) de=tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot dnet^{l}\right) de=tr(∂netl∂eT⋅dnetl) 净输入的微分 : 净输入的微分: 净输入的微分: d n e t l = d K l ∗ v a l i d O l − 1 + d B l dnet^{l} =dK^{l}*_{valid}O^{l-1}+dB^{l} dnetl=dKl∗validOl−1+dBl
d e = t r ( ∂ e ∂ n e t l T ⋅ ( d K l ∗ v a l i d O l − 1 + d B l ) ) de=tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot \left(dK^{l}*_{valid}O^{l-1}+dB^{l}\right)\right) de=tr(∂netl∂eT⋅(dKl∗validOl−1+dBl)) = t r ( ∂ e ∂ n e t l T ⋅ ( d K l ∗ v a l i d O l − 1 ) + ∂ e ∂ n e t l T ⋅ d B l ) =tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot \left(dK^{l}*_{valid}O^{l-1}\right)+\frac{\partial e}{\partial net^{l}}^{T}\cdot dB^{l}\right) =tr(∂netl∂eT⋅(dKl∗validOl−1)+∂netl∂eT⋅dBl) = t r ( ( ∂ e ∂ n e t l T ∗ v a l i d ( O l − 1 ) r o t T ) ⋅ d K l + ∂ e ∂ n e t l T ⋅ d B l ) =tr\left(\left(\frac{\partial e}{\partial net^{l}}^{T}*_{valid}\left(O^{l-1}\right)_{rot}^{T}\right)\cdot dK^{l}+\frac{\partial e}{\partial net^{l}}^{T}\cdot dB^{l}\right) =tr((∂netl∂eT∗valid(Ol−1)rotT)⋅dKl+∂netl∂eT⋅dBl)
损失函数对 l 层卷积核 K l 的偏导 : 损失函数对l层卷积核K^{l}的偏导: 损失函数对l层卷积核Kl的偏导: ∂ e ∂ K l = ( ∂ e ∂ n e t l T ∗ v a l i d ( O l − 1 ) r o t T ) T \frac{\partial e}{\partial K^{l}}=\left(\frac{\partial e}{\partial net^{l}}^{T}*_{valid}\left(O^{l-1}\right)_{rot}^{T}\right)^{T} ∂Kl∂e=(∂netl∂eT∗valid(Ol−1)rotT)T = ∂ e ∂ n e t l ∗ v a l i d O r o t l − 1 =\frac{\partial e}{\partial net^{l}}*_{valid}O^{l-1}_{rot} =∂netl∂e∗validOrotl−1 损失函数对 l 层偏移 B l 的偏导 : 损失函数对l层偏移B^{l}的偏导: 损失函数对l层偏移Bl的偏导: ∂ e ∂ B l = ∂ e ∂ n e t l \frac{\partial e}{\partial B^{l}}=\frac{\partial e}{\partial net^{l}} ∂Bl∂e=∂netl∂e
1.2. ∂ e ∂ n e t l \frac{\partial e}{\partial net^{l}} ∂netl∂e递推公式
损失函数 : 损失函数: 损失函数: e = g ( n e t l + 1 ) e=g(net^{l+1}) e=g(netl+1) 第 l + 1 层净输入 : 第l+1层净输入: 第l+1层净输入: n e t l + 1 = K l + 1 ∗ v a l i d O l + B l + 1 net^{l+1}=K^{l+1}*_{valid}O^{l}+B^{l+1} netl+1=Kl+1∗validOl+Bl+1 第 l 层净输出 第l层净输出 第l层净输出 O l = f ( n e t l ) O^{l}=f(net^{l}) Ol=f(netl) 损失函数的微分 : 损失函数的微分: 损失函数的微分: d e = t r ( ∂ e ∂ n e t l + 1 T ⋅ d n e t l + 1 ) de=tr\left(\frac{\partial e}{\partial net^{l+1}}^{T}\cdot dnet^{l+1}\right) de=tr(∂netl+1∂eT⋅dnetl+1) 净输入的微分 : 净输入的微分: 净输入的微分: d n e t l + 1 = K l + 1 ∗ v a l i d d O l dnet^{l+1}=K^{l+1}*_{valid}dO^{l} dnetl+1=Kl+1∗validdOl 净输出的微分 : 净输出的微分: 净输出的微分: d O l = f ′ ( n e t l ) ⊙ d n e t l dO^{l}=f^{'}(net^{l})\odot dnet^{l} dOl=f′(netl)⊙dnetl
d e = t r ( ∂ e ∂ n e t l + 1 T ⋅ ( K l + 1 ∗ v a l i d ( f ′ ( n e t l ) ⊙ d n e t l ) ) ) de= tr\left(\frac{\partial e}{\partial net^{l+1}}^{T}\cdot \left(K^{l+1}*_{valid}\left(f^{'}(net^{l})\odot dnet^{l}\right)\right)\right) de=tr(∂netl+1∂eT⋅(Kl+1∗valid(f′(netl)⊙dnetl))) = t r ( ( ∂ e ∂ n e t l + 1 T ∗ f u l l ( K l + 1 ) r o t T ) ⋅ ( f ′ ( n e t l ) ⊙ d n e t l ) ) =tr\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{full}\left(K^{l+1}\right)_{rot}^{T}\right)\cdot \left(f^{'}(net^{l})\odot dnet^{l}\right)\right) =tr((∂netl+1∂eT∗full(Kl+1)rotT)⋅(f′(netl)⊙dnetl)) = t r ( ( ( ∂ e ∂ n e t l + 1 T ∗ f u l l ( K l + 1 ) r o t T ) ⊙ ( f ′ ( n e t l ) ) T ) ⋅ d n e t l ) =tr\left(\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{full}\left(K^{l+1}\right)_{rot}^{T}\right)\odot \left(f^{'}(net^{l})\right)^{T}\right)\cdot dnet^{l}\right) =tr(((∂netl+1∂eT∗full(Kl+1)rotT)⊙(f′(netl))T)⋅dnetl)
∂ e ∂ n e t l = ( ( ∂ e ∂ n e t l + 1 T ∗ f u l l ( K l + 1 ) r o t T ) ⊙ ( f ′ ( n e t l ) ) T ) T \frac{\partial e}{\partial net^{l}}=\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{full}\left(K^{l+1}\right)_{rot}^{T}\right)\odot \left(f^{'}(net^{l})\right)^{T}\right)^{T} ∂netl∂e=((∂netl+1∂eT∗full(Kl+1)rotT)⊙(f′(netl))T)T = ( ∂ e ∂ n e t l + 1 T ∗ f u l l ( K l + 1 ) r o t T ) T ⊙ f ′ ( n e t l ) =\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{full}\left(K^{l+1}\right)_{rot}^{T}\right)^{T}\odot f^{'}(net^{l}) =(∂netl+1∂eT∗full(Kl+1)rotT)T⊙f′(netl) = ( ∂ e ∂ n e t l + 1 ∗ f u l l K r o t l + 1 ) ⊙ f ′ ( n e t l ) =\left(\frac{\partial e}{\partial net^{l+1}}*_{full}K^{l+1}_{rot}\right)\odot f^{'}(net^{l}) =(∂netl+1∂e∗fullKrotl+1)⊙f′(netl)
2. n e t l = K l ∗ f u l l O l − 1 + B l net^{l}=K^{l}*_{full}O^{l-1}+B^{l} netl=Kl∗fullOl−1+Bl
卷积核 K l ∈ R n × n , K^{l}\in\mathbb{R}^{n\times n}, Kl∈Rn×n,
输入 O l − 1 ∈ R m × m , O^{l-1}\in\mathbb{R}^{m\times m}, Ol−1∈Rm×m,
输出 n e t l ∈ R m + n − 1 × m + n − 1 . net^{l}\in\mathbb{R}^{m+n-1\times m+n-1}. netl∈Rm+n−1×m+n−1.
2.1. ∂ e ∂ K l \frac{\partial e}{\partial K^{l}} ∂Kl∂e与 ∂ e ∂ B l \frac{\partial e}{\partial B^{l}} ∂Bl∂e
损失函数 : 损失函数: 损失函数: e = g ( n e t l ) e=g\left(net^{l}\right) e=g(netl) 第 l 层净输入 : 第l层净输入: 第l层净输入: n e t l = K l ∗ f u l l O l − 1 + B l net^{l}=K^{l}*_{full}O^{l-1}+B^{l} netl=Kl∗fullOl−1+Bl 损失函数的微分 : 损失函数的微分: 损失函数的微分: d e = t r ( ∂ e ∂ n e t l T ⋅ d n e t l ) de=tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot dnet^{l}\right) de=tr(∂netl∂eT⋅dnetl) 净输入的微分 : 净输入的微分: 净输入的微分: d n e t l = d K l ∗ f u l l O l − 1 + d B l dnet^{l} =dK^{l}*_{full}O^{l-1}+dB^{l} dnetl=dKl∗fullOl−1+dBl
d e = t r ( ∂ e ∂ n e t l T ⋅ ( d K l ∗ f u l l O l − 1 + d B l ) ) de=tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot \left(dK^{l}*_{full}O^{l-1}+dB^{l}\right)\right) de=tr(∂netl∂eT⋅(dKl∗fullOl−1+dBl)) = t r ( ∂ e ∂ n e t l T ⋅ ( d K l ∗ f u l l O l − 1 ) + ∂ e ∂ n e t l T ⋅ d B l ) =tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot \left(dK^{l}*_{full}O^{l-1}\right)+\frac{\partial e}{\partial net^{l}}^{T}\cdot dB^{l}\right) =tr(∂netl∂eT⋅(dKl∗fullOl−1)+∂netl∂eT⋅dBl) = t r ( ( ∂ e ∂ n e t l T ∗ v a l i d ( O l − 1 ) r o t T ) ⋅ d K l + ∂ e ∂ n e t l T ⋅ d B l ) =tr\left(\left(\frac{\partial e}{\partial net^{l}}^{T}*_{valid}\left(O^{l-1}\right)_{rot}^{T}\right)\cdot dK^{l}+\frac{\partial e}{\partial net^{l}}^{T}\cdot dB^{l}\right) =tr((∂netl∂eT∗valid(Ol−1)rotT)⋅dKl+∂netl∂eT⋅dBl)
损失函数对 l 层卷积核 K l 的偏导 : 损失函数对l层卷积核K^{l}的偏导: 损失函数对l层卷积核Kl的偏导: ∂ e ∂ K l = ( ∂ e ∂ n e t l T ∗ v a l i d ( O l − 1 ) r o t T ) T \frac{\partial e}{\partial K^{l}}=\left(\frac{\partial e}{\partial net^{l}}^{T}*_{valid}\left(O^{l-1}\right)_{rot}^{T}\right)^{T} ∂Kl∂e=(∂netl∂eT∗valid(Ol−1)rotT)T = ∂ e ∂ n e t l ∗ v a l i d O r o t l − 1 =\frac{\partial e}{\partial net^{l}}*_{valid}O^{l-1}_{rot} =∂netl∂e∗validOrotl−1 损失函数对 l 层偏移 B l 的偏导 : 损失函数对l层偏移B^{l}的偏导: 损失函数对l层偏移Bl的偏导: ∂ e ∂ B l = ∂ e ∂ n e t l \frac{\partial e}{\partial B^{l}}=\frac{\partial e}{\partial net^{l}} ∂Bl∂e=∂netl∂e
2.2. ∂ e ∂ n e t l \frac{\partial e}{\partial net^{l}} ∂netl∂e递推公式
损失函数 : 损失函数: 损失函数: e = g ( n e t l + 1 ) e=g(net^{l+1}) e=g(netl+1) 第 l + 1 层净输入 : 第l+1层净输入: 第l+1层净输入: n e t l + 1 = K l + 1 ∗ f u l l O l + B l + 1 net^{l+1}=K^{l+1}*_{full}O^{l}+B^{l+1} netl+1=Kl+1∗fullOl+Bl+1 第 l 层净输出 第l层净输出 第l层净输出 O l = f ( n e t l ) O^{l}=f(net^{l}) Ol=f(netl) 损失函数的微分 : 损失函数的微分: 损失函数的微分: d e = t r ( ∂ e ∂ n e t l + 1 T ⋅ d n e t l + 1 ) de=tr\left(\frac{\partial e}{\partial net^{l+1}}^{T}\cdot dnet^{l+1}\right) de=tr(∂netl+1∂eT⋅dnetl+1) 净输入的微分 : 净输入的微分: 净输入的微分: d n e t l + 1 = K l + 1 ∗ f u l l d O l dnet^{l+1}=K^{l+1}*_{full}dO^{l} dnetl+1=Kl+1∗fulldOl 净输出的微分 : 净输出的微分: 净输出的微分: d O l = f ′ ( n e t l ) ⊙ d n e t l dO^{l}=f^{'}(net^{l})\odot dnet^{l} dOl=f′(netl)⊙dnetl
d e = t r ( ∂ e ∂ n e t l + 1 T ⋅ ( K l + 1 ∗ f u l l ( f ′ ( n e t l ) ⊙ d n e t l ) ) ) de= tr\left(\frac{\partial e}{\partial net^{l+1}}^{T}\cdot \left(K^{l+1}*_{full}\left(f^{'}(net^{l})\odot dnet^{l}\right)\right)\right) de=tr(∂netl+1∂eT⋅(Kl+1∗full(f′(netl)⊙dnetl))) = t r ( ( ∂ e ∂ n e t l + 1 T ∗ v a l i d ( K l + 1 ) r o t T ) ⋅ ( f ′ ( n e t l ) ⊙ d n e t l ) ) =tr\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{valid}\left(K^{l+1}\right)_{rot}^{T}\right)\cdot \left(f^{'}(net^{l})\odot dnet^{l}\right)\right) =tr((∂netl+1∂eT∗valid(Kl+1)rotT)⋅(f′(netl)⊙dnetl)) = t r ( ( ( ∂ e ∂ n e t l + 1 T ∗ v a l i d ( K l + 1 ) r o t T ) ⊙ ( f ′ ( n e t l ) ) T ) ⋅ d n e t l ) =tr\left(\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{valid}\left(K^{l+1}\right)_{rot}^{T}\right)\odot \left(f^{'}(net^{l})\right)^{T}\right)\cdot dnet^{l}\right) =tr(((∂netl+1∂eT∗valid(Kl+1)rotT)⊙(f′(netl))T)⋅dnetl)
∂ e ∂ n e t l = ( ( ∂ e ∂ n e t l + 1 T ∗ v a l i d ( K l + 1 ) r o t T ) ⊙ ( f ′ ( n e t l ) ) T ) T \frac{\partial e}{\partial net^{l}}=\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{valid}\left(K^{l+1}\right)_{rot}^{T}\right)\odot \left(f^{'}(net^{l})\right)^{T}\right)^{T} ∂netl∂e=((∂netl+1∂eT∗valid(Kl+1)rotT)⊙(f′(netl))T)T = ( ∂ e ∂ n e t l + 1 T ∗ v a l i d ( K l + 1 ) r o t T ) T ⊙ f ′ ( n e t l ) =\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{valid}\left(K^{l+1}\right)_{rot}^{T}\right)^{T}\odot f^{'}(net^{l}) =(∂netl+1∂eT∗valid(Kl+1)rotT)T⊙f′(netl) = ( ∂ e ∂ n e t l + 1 ∗ v a l i d K r o t l + 1 ) ⊙ f ′ ( n e t l ) =\left(\frac{\partial e}{\partial net^{l+1}}*_{valid}K^{l+1}_{rot}\right)\odot f^{'}(net^{l}) =(∂netl+1∂e∗validKrotl+1)⊙f′(netl)
二、多输入通道多输出通道公式
在多输入多输出情况下,卷积核 K l K^{l} Kl,输入 O l − 1 O^{l-1} Ol−1,输出 n e t l net^{l} netl均为四维矩阵.
四维矩阵 A A A可以理解为元素类型为二维矩阵的二维矩阵.
A r t A_{r}^{t} Art运算的意义为对矩阵 A A A中每个元素进行旋转与转置操作.
A T A^{T} AT运算的意义为对矩阵 A A A进行转置操作,及对矩阵 A A A和 A A A中每个元素进行转置.
1. n e t l = K l ∗ v a l i d O l − 1 + B l net^{l}=K^{l}*_{valid}O^{l-1}+B^{l} netl=Kl∗validOl−1+Bl
1.1. ∂ e ∂ K l \frac{\partial e}{\partial K^{l}} ∂Kl∂e与 ∂ e ∂ B l \frac{\partial e}{\partial B^{l}} ∂Bl∂e
卷积核 K l ∈ R m × p × i × i , K^{l}\in\mathbb{R}^{m\times p\times i\times i}, Kl∈Rm×p×i×i,
输入 O l − 1 ∈ R p × n × j × j , O^{l-1}\in\mathbb{R}^{p\times n\times j\times j}, Ol−1∈Rp×n×j×j,
输出 n e t l ∈ R m × n × j − i + 1 × j − i + 1 , net^{l}\in\mathbb{R}^{m\times n\times j-i+1\times j-i+1}, netl∈Rm×n×j−i+1×j−i+1,
输入有n批,每批包含p个通道,每个通道的大小为 j*j , \text{j*j}, j*j,
输出有n批,每批包含m个通道,每个通道的大小为 j-i+1*j-i+1 , \text{j-i+1*j-i+1}, j-i+1*j-i+1,
卷积核有输出有 m*p \text{m*p} m*p个, 每个卷积核的大小为 i*i . \text{i*i}. i*i.
损失函数 : 损失函数: 损失函数: e = g ( n e t l ) e=g\left(net^{l}\right) e=g(netl) 第 l 层净输入 : 第l层净输入: 第l层净输入: n e t l = K l ∗ v a l i d O l − 1 + B l net^{l}=K^{l}*_{valid}O^{l-1}+B^{l} netl=Kl∗validOl−1+Bl 损失函数的微分 : 损失函数的微分: 损失函数的微分: d e = t r ( ∂ e ∂ n e t l T ⋅ d n e t l ) de=tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot dnet^{l}\right) de=tr(∂netl∂eT⋅dnetl) 净输入的微分 : 净输入的微分: 净输入的微分: d n e t l = d K l ∗ v a l i d O l − 1 + d B l dnet^{l} =dK^{l}*_{valid}O^{l-1}+dB^{l} dnetl=dKl∗validOl−1+dBl
d e = t r ( ∂ e ∂ n e t l T ⋅ ( d K l ∗ v a l i d O l − 1 + d B l ) ) de=tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot \left(dK^{l}*_{valid}O^{l-1}+dB^{l}\right)\right) de=tr(∂netl∂eT⋅(dKl∗validOl−1+dBl)) = t r ( ∂ e ∂ n e t l T ⋅ ( d K l ∗ v a l i d O l − 1 ) + ∂ e ∂ n e t l T ⋅ d B l ) =tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot \left(dK^{l}*_{valid}O^{l-1}\right)+\frac{\partial e}{\partial net^{l}}^{T}\cdot dB^{l}\right) =tr(∂netl∂eT⋅(dKl∗validOl−1)+∂netl∂eT⋅dBl) = t r ( ( ( O l − 1 ) r t ∗ v a l i d ∂ e ∂ n e t l T ) ⋅ d K l + ∂ e ∂ n e t l T ⋅ d B l ) =tr\left(\left(\left(O^{l-1}\right)_{r}^{t} *_{valid}\frac{\partial e}{\partial net^{l}}^{T}\right)\cdot dK^{l}+\frac{\partial e}{\partial net^{l}}^{T}\cdot dB^{l}\right) =tr(((Ol−1)rt∗valid∂netl∂eT)⋅dKl+∂netl∂eT⋅dBl)
损失函数对 l 层卷积核 K l 的偏导 : 损失函数对l层卷积核K^{l}的偏导: 损失函数对l层卷积核Kl的偏导: ∂ e ∂ K l = ( ( O l − 1 ) r t ∗ v a l i d ∂ e ∂ n e t l T ) T \frac{\partial e}{\partial K^{l}}=\left(\left(O^{l-1}\right)_{r}^{t} *_{valid}\frac{\partial e}{\partial net^{l}}^{T}\right)^{T} ∂Kl∂e=((Ol−1)rt∗valid∂netl∂eT)T = ∂ e ∂ n e t l ∗ v a l i d ( ( O l − 1 ) r t ) T =\frac{\partial e}{\partial net^{l}}*_{valid}\left(\left(O^{l-1}\right)_{r}^{t}\right)^{T} =∂netl∂e∗valid((Ol−1)rt)T 损失函数对 l 层偏移 B l 的偏导 : 损失函数对l层偏移B^{l}的偏导: 损失函数对l层偏移Bl的偏导: ∂ e ∂ B l = ∂ e ∂ n e t l \frac{\partial e}{\partial B^{l}}=\frac{\partial e}{\partial net^{l}} ∂Bl∂e=∂netl∂e
1.2. ∂ e ∂ n e t l \frac{\partial e}{\partial net^{l}} ∂netl∂e递推公式
损失函数 : 损失函数: 损失函数: e = g ( n e t l + 1 ) e=g(net^{l+1}) e=g(netl+1) 第 l + 1 层净输入 : 第l+1层净输入: 第l+1层净输入: n e t l + 1 = K l + 1 ∗ v a l i d O l + B l + 1 net^{l+1}=K^{l+1}*_{valid}O^{l}+B^{l+1} netl+1=Kl+1∗validOl+Bl+1 第 l 层净输出 第l层净输出 第l层净输出 O l = f ( n e t l ) O^{l}=f(net^{l}) Ol=f(netl) 损失函数的微分 : 损失函数的微分: 损失函数的微分: d e = t r ( ∂ e ∂ n e t l + 1 T ⋅ d n e t l + 1 ) de=tr\left(\frac{\partial e}{\partial net^{l+1}}^{T}\cdot dnet^{l+1}\right) de=tr(∂netl+1∂eT⋅dnetl+1) 净输入的微分 : 净输入的微分: 净输入的微分: d n e t l + 1 = K l + 1 ∗ v a l i d d O l dnet^{l+1}=K^{l+1}*_{valid}dO^{l} dnetl+1=Kl+1∗validdOl 净输出的微分 : 净输出的微分: 净输出的微分: d O l = f ′ ( n e t l ) ⊙ d n e t l dO^{l}=f^{'}(net^{l})\odot dnet^{l} dOl=f′(netl)⊙dnetl
d e = t r ( ∂ e ∂ n e t l + 1 T ⋅ ( K l + 1 ∗ v a l i d ( f ′ ( n e t l ) ⊙ d n e t l ) ) ) de= tr\left(\frac{\partial e}{\partial net^{l+1}}^{T}\cdot \left(K^{l+1}*_{valid}\left(f^{'}(net^{l})\odot dnet^{l}\right)\right)\right) de=tr(∂netl+1∂eT⋅(Kl+1∗valid(f′(netl)⊙dnetl))) = t r ( ( ∂ e ∂ n e t l + 1 T ∗ f u l l ( K l + 1 ) r t ) ⋅ ( f ′ ( n e t l ) ⊙ d n e t l ) ) =tr\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{full}\left(K^{l+1}\right)_{r}^{t}\right)\cdot \left(f^{'}(net^{l})\odot dnet^{l}\right)\right) =tr((∂netl+1∂eT∗full(Kl+1)rt)⋅(f′(netl)⊙dnetl)) = t r ( ( ( ∂ e ∂ n e t l + 1 T ∗ f u l l ( K l + 1 ) r t ) ⊙ ( f ′ ( n e t l ) ) T ) ⋅ d n e t l ) =tr\left(\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{full}\left(K^{l+1}\right)_{r}^{t}\right)\odot \left(f^{'}(net^{l})\right)^{T}\right)\cdot dnet^{l}\right) =tr(((∂netl+1∂eT∗full(Kl+1)rt)⊙(f′(netl))T)⋅dnetl)
∂ e ∂ n e t l = ( ( ∂ e ∂ n e t l + 1 T ∗ f u l l ( K l + 1 ) r t ) ⊙ ( f ′ ( n e t l ) ) T ) T \frac{\partial e}{\partial net^{l}}=\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{full}\left(K^{l+1}\right)_{r}^{t}\right)\odot \left(f^{'}(net^{l})\right)^{T}\right)^{T} ∂netl∂e=((∂netl+1∂eT∗full(Kl+1)rt)⊙(f′(netl))T)T = ( ∂ e ∂ n e t l + 1 T ∗ f u l l ( K l + 1 ) r t ) T ⊙ f ′ ( n e t l ) =\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{full}\left(K^{l+1}\right)_{r}^{t}\right)^{T}\odot f^{'}(net^{l}) =(∂netl+1∂eT∗full(Kl+1)rt)T⊙f′(netl) = ( ( ( K l + 1 ) r t ) T ∗ f u l l ∂ e ∂ n e t l + 1 ) ⊙ f ′ ( n e t l ) =\left(\left(\left(K^{l+1}\right)_{r}^{t}\right)^{T}*_{full}\frac{\partial e}{\partial net^{l+1}}\right)\odot f^{'}(net^{l}) =(((Kl+1)rt)T∗full∂netl+1∂e)⊙f′(netl)
2. n e t l = K l ∗ f u l l O l − 1 + B l net^{l}=K^{l}*_{full}O^{l-1}+B^{l} netl=Kl∗fullOl−1+Bl
卷积核 K l ∈ R m × p × i × i , K^{l}\in\mathbb{R}^{m\times p\times i\times i}, Kl∈Rm×p×i×i,
输入 O l − 1 ∈ R p × n × j × j , O^{l-1}\in\mathbb{R}^{p\times n\times j\times j}, Ol−1∈Rp×n×j×j,
输出 n e t l ∈ R m × n × j + i − 1 × j + i − 1 , net^{l}\in\mathbb{R}^{m\times n\times j+i-1\times j+i-1}, netl∈Rm×n×j+i−1×j+i−1,
输入有n批,每批包含p个通道,每个通道的大小为 j*j , \text{j*j}, j*j,
输出有n批,每批包含m个通道,每个通道的大小为 j+i-1*j+i-1 , \text{j+i-1*j+i-1}, j+i-1*j+i-1,
卷积核有输出有 m*p \text{m*p} m*p个, 每个卷积核的大小为 i*i . \text{i*i}. i*i.
2.1. ∂ e ∂ K l \frac{\partial e}{\partial K^{l}} ∂Kl∂e与 ∂ e ∂ B l \frac{\partial e}{\partial B^{l}} ∂Bl∂e
损失函数 : 损失函数: 损失函数: e = g ( n e t l ) e=g\left(net^{l}\right) e=g(netl) 第 l 层净输入 : 第l层净输入: 第l层净输入: n e t l = K l ∗ f u l l O l − 1 + B l net^{l}=K^{l}*_{full}O^{l-1}+B^{l} netl=Kl∗fullOl−1+Bl 损失函数的微分 : 损失函数的微分: 损失函数的微分: d e = t r ( ∂ e ∂ n e t l T ⋅ d n e t l ) de=tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot dnet^{l}\right) de=tr(∂netl∂eT⋅dnetl) 净输入的微分 : 净输入的微分: 净输入的微分: d n e t l = d K l ∗ f u l l O l − 1 + d B l dnet^{l} =dK^{l}*_{full}O^{l-1}+dB^{l} dnetl=dKl∗fullOl−1+dBl
d e = t r ( ∂ e ∂ n e t l T ⋅ ( d K l ∗ f u l l O l − 1 + d B l ) ) de=tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot \left(dK^{l}*_{full}O^{l-1}+dB^{l}\right)\right) de=tr(∂netl∂eT⋅(dKl∗fullOl−1+dBl)) = t r ( ∂ e ∂ n e t l T ⋅ ( d K l ∗ f u l l O l − 1 ) + ∂ e ∂ n e t l T ⋅ d B l ) =tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot \left(dK^{l}*_{full}O^{l-1}\right)+\frac{\partial e}{\partial net^{l}}^{T}\cdot dB^{l}\right) =tr(∂netl∂eT⋅(dKl∗fullOl−1)+∂netl∂eT⋅dBl) = t r ( ( ( O l − 1 ) r t ∗ v a l i d ∂ e ∂ n e t l T ) ⋅ d K l + ∂ e ∂ n e t l T ⋅ d B l ) =tr\left(\left(\left(O^{l-1}\right)_{r}^{t}*_{valid}\frac{\partial e}{\partial net^{l}}^{T}\right)\cdot dK^{l}+\frac{\partial e}{\partial net^{l}}^{T}\cdot dB^{l}\right) =tr(((Ol−1)rt∗valid∂netl∂eT)⋅dKl+∂netl∂eT⋅dBl)
损失函数对 l 层卷积核 K l 的偏导 : 损失函数对l层卷积核K^{l}的偏导: 损失函数对l层卷积核Kl的偏导: ∂ e ∂ K l = ( ( O l − 1 ) r t ∗ v a l i d ∂ e ∂ n e t l T ) T \frac{\partial e}{\partial K^{l}}=\left(\left(O^{l-1}\right)_{r}^{t}*_{valid}\frac{\partial e}{\partial net^{l}}^{T}\right)^{T} ∂Kl∂e=((Ol−1)rt∗valid∂netl∂eT)T = ∂ e ∂ n e t l ∗ v a l i d ( ( O l − 1 ) r t ) T =\frac{\partial e}{\partial net^{l}}*_{valid}\left(\left(O^{l-1}\right)_{r}^{t}\right)^{T} =∂netl∂e∗valid((Ol−1)rt)T 损失函数对 l 层偏移 B l 的偏导 : 损失函数对l层偏移B^{l}的偏导: 损失函数对l层偏移Bl的偏导: ∂ e ∂ B l = ∂ e ∂ n e t l \frac{\partial e}{\partial B^{l}}=\frac{\partial e}{\partial net^{l}} ∂Bl∂e=∂netl∂e
2.2. ∂ e ∂ n e t l \frac{\partial e}{\partial net^{l}} ∂netl∂e递推公式
损失函数 : 损失函数: 损失函数: e = g ( n e t l + 1 ) e=g(net^{l+1}) e=g(netl+1) 第 l + 1 层净输入 : 第l+1层净输入: 第l+1层净输入: n e t l + 1 = K l + 1 ∗ f u l l O l + B l + 1 net^{l+1}=K^{l+1}*_{full}O^{l}+B^{l+1} netl+1=Kl+1∗fullOl+Bl+1 第 l 层净输出 第l层净输出 第l层净输出 O l = f ( n e t l ) O^{l}=f(net^{l}) Ol=f(netl) 损失函数的微分 : 损失函数的微分: 损失函数的微分: d e = t r ( ∂ e ∂ n e t l + 1 T ⋅ d n e t l + 1 ) de=tr\left(\frac{\partial e}{\partial net^{l+1}}^{T}\cdot dnet^{l+1}\right) de=tr(∂netl+1∂eT⋅dnetl+1) 净输入的微分 : 净输入的微分: 净输入的微分: d n e t l + 1 = K l + 1 ∗ f u l l d O l dnet^{l+1}=K^{l+1}*_{full}dO^{l} dnetl+1=Kl+1∗fulldOl 净输出的微分 : 净输出的微分: 净输出的微分: d O l = f ′ ( n e t l ) ⊙ d n e t l dO^{l}=f^{'}(net^{l})\odot dnet^{l} dOl=f′(netl)⊙dnetl
d e = t r ( ∂ e ∂ n e t l + 1 T ⋅ ( K l + 1 ∗ f u l l ( f ′ ( n e t l ) ⊙ d n e t l ) ) ) de= tr\left(\frac{\partial e}{\partial net^{l+1}}^{T}\cdot \left(K^{l+1}*_{full}\left(f^{'}(net^{l})\odot dnet^{l}\right)\right)\right) de=tr(∂netl+1∂eT⋅(Kl+1∗full(f′(netl)⊙dnetl))) = t r ( ( ∂ e ∂ n e t l + 1 T ∗ v a l i d ( K l + 1 ) r t ) ⋅ ( f ′ ( n e t l ) ⊙ d n e t l ) ) =tr\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{valid}\left(K^{l+1}\right)_{r}^{t}\right)\cdot \left(f^{'}(net^{l})\odot dnet^{l}\right)\right) =tr((∂netl+1∂eT∗valid(Kl+1)rt)⋅(f′(netl)⊙dnetl)) = t r ( ( ( ∂ e ∂ n e t l + 1 T ∗ v a l i d ( K l + 1 ) r t ) ⊙ ( f ′ ( n e t l ) ) T ) ⋅ d n e t l ) =tr\left(\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{valid}\left(K^{l+1}\right)_{r}^{t}\right)\odot \left(f^{'}(net^{l})\right)^{T}\right)\cdot dnet^{l}\right) =tr(((∂netl+1∂eT∗valid(Kl+1)rt)⊙(f′(netl))T)⋅dnetl)
∂ e ∂ n e t l = ( ( ∂ e ∂ n e t l + 1 T ∗ v a l i d ( K l + 1 ) r t ) ⊙ ( f ′ ( n e t l ) ) T ) T \frac{\partial e}{\partial net^{l}}=\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{valid}\left(K^{l+1}\right)_{r}^{t}\right)\odot \left(f^{'}(net^{l})\right)^{T}\right)^{T} ∂netl∂e=((∂netl+1∂eT∗valid(Kl+1)rt)⊙(f′(netl))T)T = ( ∂ e ∂ n e t l + 1 T ∗ v a l i d ( K l + 1 ) r t ) T ⊙ f ′ ( n e t l ) =\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{valid}\left(K^{l+1}\right)_{r}^{t}\right)^{T}\odot f^{'}(net^{l}) =(∂netl+1∂eT∗valid(Kl+1)rt)T⊙f′(netl) = ( ( ( K l + 1 ) r t ) T ∗ v a l i d ∂ e ∂ n e t l + 1 ) ⊙ f ′ ( n e t l ) =\left(\left(\left(K^{l+1}\right)_{r}^{t}\right)^{T}*_{valid}\frac{\partial e}{\partial net^{l+1}}\right)\odot f^{'}(net^{l}) =(((Kl+1)rt)T∗valid∂netl+1∂e)⊙f′(netl)