卷积神经网络误差反向传播公式-卷积层

本文给出卷积神经网络中卷积层的误差反向传播公式.

卷积层包括卷积与激活运算.一般卷积神经网络的卷积层包含数个输入通道与输出通道,每一层卷积层包含若干卷积核,所以本文先给出在单输入单输出的情况下的公式,然后给出多输入多输出情况下的公式.关于卷积运算,本文针对valid与full两种模式进行讨论.

一、单输入通道单输出通道公式

在单输入单输出情况下,卷积核 K l K^{l} Kl,输入 O l − 1 O^{l-1} Ol1输出 n e t l net^{l} netl均为二维矩阵.

1. n e t l = K l ∗ v a l i d O l − 1 + B l net^{l}=K^{l}*_{valid}O^{l-1}+B^{l} netl=KlvalidOl1+Bl

卷积核 K l ∈ R n × n , K^{l}\in\mathbb{R}^{n\times n}, KlRn×n,
输入 O l − 1 ∈ R m × m , O^{l-1}\in\mathbb{R}^{m\times m}, Ol1Rm×m,
输出 n e t l ∈ R m − n + 1 × m − n + 1 . net^{l}\in\mathbb{R}^{m-n+1\times m-n+1}. netlRmn+1×mn+1.

1.1. ∂ e ∂ K l \frac{\partial e}{\partial K^{l}} Kle ∂ e ∂ B l \frac{\partial e}{\partial B^{l}} Ble

损失函数 : 损失函数: 损失函数: e = g ( n e t l ) e=g\left(net^{l}\right) e=g(netl) 第 l 层净输入 : 第l层净输入: l层净输入: n e t l = K l ∗ v a l i d O l − 1 + B l net^{l}=K^{l}*_{valid}O^{l-1}+B^{l} netl=KlvalidOl1+Bl 损失函数的微分 : 损失函数的微分: 损失函数的微分: d e = t r ( ∂ e ∂ n e t l T ⋅ d n e t l ) de=tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot dnet^{l}\right) de=tr(netleTdnetl) 净输入的微分 : 净输入的微分: 净输入的微分: d n e t l = d K l ∗ v a l i d O l − 1 + d B l dnet^{l} =dK^{l}*_{valid}O^{l-1}+dB^{l} dnetl=dKlvalidOl1+dBl

d e = t r ( ∂ e ∂ n e t l T ⋅ ( d K l ∗ v a l i d O l − 1 + d B l ) ) de=tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot \left(dK^{l}*_{valid}O^{l-1}+dB^{l}\right)\right) de=tr(netleT(dKlvalidOl1+dBl)) = t r ( ∂ e ∂ n e t l T ⋅ ( d K l ∗ v a l i d O l − 1 ) + ∂ e ∂ n e t l T ⋅ d B l ) =tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot \left(dK^{l}*_{valid}O^{l-1}\right)+\frac{\partial e}{\partial net^{l}}^{T}\cdot dB^{l}\right) =tr(netleT(dKlvalidOl1)+netleTdBl) = t r ( ( ∂ e ∂ n e t l T ∗ v a l i d ( O l − 1 ) r o t T ) ⋅ d K l + ∂ e ∂ n e t l T ⋅ d B l ) =tr\left(\left(\frac{\partial e}{\partial net^{l}}^{T}*_{valid}\left(O^{l-1}\right)_{rot}^{T}\right)\cdot dK^{l}+\frac{\partial e}{\partial net^{l}}^{T}\cdot dB^{l}\right) =tr((netleTvalid(Ol1)rotT)dKl+netleTdBl)

损失函数对 l 层卷积核 K l 的偏导 : 损失函数对l层卷积核K^{l}的偏导: 损失函数对l层卷积核Kl的偏导: ∂ e ∂ K l = ( ∂ e ∂ n e t l T ∗ v a l i d ( O l − 1 ) r o t T ) T \frac{\partial e}{\partial K^{l}}=\left(\frac{\partial e}{\partial net^{l}}^{T}*_{valid}\left(O^{l-1}\right)_{rot}^{T}\right)^{T} Kle=(netleTvalid(Ol1)rotT)T = ∂ e ∂ n e t l ∗ v a l i d O r o t l − 1 =\frac{\partial e}{\partial net^{l}}*_{valid}O^{l-1}_{rot} =netlevalidOrotl1 损失函数对 l 层偏移 B l 的偏导 : 损失函数对l层偏移B^{l}的偏导: 损失函数对l层偏移Bl的偏导: ∂ e ∂ B l = ∂ e ∂ n e t l \frac{\partial e}{\partial B^{l}}=\frac{\partial e}{\partial net^{l}} Ble=netle

1.2. ∂ e ∂ n e t l \frac{\partial e}{\partial net^{l}} netle递推公式

损失函数 : 损失函数: 损失函数: e = g ( n e t l + 1 ) e=g(net^{l+1}) e=g(netl+1) 第 l + 1 层净输入 : 第l+1层净输入: l+1层净输入: n e t l + 1 = K l + 1 ∗ v a l i d O l + B l + 1 net^{l+1}=K^{l+1}*_{valid}O^{l}+B^{l+1} netl+1=Kl+1validOl+Bl+1 第 l 层净输出 第l层净输出 l层净输出 O l = f ( n e t l ) O^{l}=f(net^{l}) Ol=f(netl) 损失函数的微分 : 损失函数的微分: 损失函数的微分: d e = t r ( ∂ e ∂ n e t l + 1 T ⋅ d n e t l + 1 ) de=tr\left(\frac{\partial e}{\partial net^{l+1}}^{T}\cdot dnet^{l+1}\right) de=tr(netl+1eTdnetl+1) 净输入的微分 : 净输入的微分: 净输入的微分: d n e t l + 1 = K l + 1 ∗ v a l i d d O l dnet^{l+1}=K^{l+1}*_{valid}dO^{l} dnetl+1=Kl+1validdOl 净输出的微分 : 净输出的微分: 净输出的微分: d O l = f ′ ( n e t l ) ⊙ d n e t l dO^{l}=f^{'}(net^{l})\odot dnet^{l} dOl=f(netl)dnetl

d e = t r ( ∂ e ∂ n e t l + 1 T ⋅ ( K l + 1 ∗ v a l i d ( f ′ ( n e t l ) ⊙ d n e t l ) ) ) de= tr\left(\frac{\partial e}{\partial net^{l+1}}^{T}\cdot \left(K^{l+1}*_{valid}\left(f^{'}(net^{l})\odot dnet^{l}\right)\right)\right) de=tr(netl+1eT(Kl+1valid(f(netl)dnetl))) = t r ( ( ∂ e ∂ n e t l + 1 T ∗ f u l l ( K l + 1 ) r o t T ) ⋅ ( f ′ ( n e t l ) ⊙ d n e t l ) ) =tr\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{full}\left(K^{l+1}\right)_{rot}^{T}\right)\cdot \left(f^{'}(net^{l})\odot dnet^{l}\right)\right) =tr((netl+1eTfull(Kl+1)rotT)(f(netl)dnetl)) = t r ( ( ( ∂ e ∂ n e t l + 1 T ∗ f u l l ( K l + 1 ) r o t T ) ⊙ ( f ′ ( n e t l ) ) T ) ⋅ d n e t l ) =tr\left(\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{full}\left(K^{l+1}\right)_{rot}^{T}\right)\odot \left(f^{'}(net^{l})\right)^{T}\right)\cdot dnet^{l}\right) =tr(((netl+1eTfull(Kl+1)rotT)(f(netl))T)dnetl)

∂ e ∂ n e t l = ( ( ∂ e ∂ n e t l + 1 T ∗ f u l l ( K l + 1 ) r o t T ) ⊙ ( f ′ ( n e t l ) ) T ) T \frac{\partial e}{\partial net^{l}}=\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{full}\left(K^{l+1}\right)_{rot}^{T}\right)\odot \left(f^{'}(net^{l})\right)^{T}\right)^{T} netle=((netl+1eTfull(Kl+1)rotT)(f(netl))T)T = ( ∂ e ∂ n e t l + 1 T ∗ f u l l ( K l + 1 ) r o t T ) T ⊙ f ′ ( n e t l ) =\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{full}\left(K^{l+1}\right)_{rot}^{T}\right)^{T}\odot f^{'}(net^{l}) =(netl+1eTfull(Kl+1)rotT)Tf(netl) = ( ∂ e ∂ n e t l + 1 ∗ f u l l K r o t l + 1 ) ⊙ f ′ ( n e t l ) =\left(\frac{\partial e}{\partial net^{l+1}}*_{full}K^{l+1}_{rot}\right)\odot f^{'}(net^{l}) =(netl+1efullKrotl+1)f(netl)

2. n e t l = K l ∗ f u l l O l − 1 + B l net^{l}=K^{l}*_{full}O^{l-1}+B^{l} netl=KlfullOl1+Bl

卷积核 K l ∈ R n × n , K^{l}\in\mathbb{R}^{n\times n}, KlRn×n,
输入 O l − 1 ∈ R m × m , O^{l-1}\in\mathbb{R}^{m\times m}, Ol1Rm×m,
输出 n e t l ∈ R m + n − 1 × m + n − 1 . net^{l}\in\mathbb{R}^{m+n-1\times m+n-1}. netlRm+n1×m+n1.

2.1. ∂ e ∂ K l \frac{\partial e}{\partial K^{l}} Kle ∂ e ∂ B l \frac{\partial e}{\partial B^{l}} Ble

损失函数 : 损失函数: 损失函数: e = g ( n e t l ) e=g\left(net^{l}\right) e=g(netl) 第 l 层净输入 : 第l层净输入: l层净输入: n e t l = K l ∗ f u l l O l − 1 + B l net^{l}=K^{l}*_{full}O^{l-1}+B^{l} netl=KlfullOl1+Bl 损失函数的微分 : 损失函数的微分: 损失函数的微分: d e = t r ( ∂ e ∂ n e t l T ⋅ d n e t l ) de=tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot dnet^{l}\right) de=tr(netleTdnetl) 净输入的微分 : 净输入的微分: 净输入的微分: d n e t l = d K l ∗ f u l l O l − 1 + d B l dnet^{l} =dK^{l}*_{full}O^{l-1}+dB^{l} dnetl=dKlfullOl1+dBl

d e = t r ( ∂ e ∂ n e t l T ⋅ ( d K l ∗ f u l l O l − 1 + d B l ) ) de=tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot \left(dK^{l}*_{full}O^{l-1}+dB^{l}\right)\right) de=tr(netleT(dKlfullOl1+dBl)) = t r ( ∂ e ∂ n e t l T ⋅ ( d K l ∗ f u l l O l − 1 ) + ∂ e ∂ n e t l T ⋅ d B l ) =tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot \left(dK^{l}*_{full}O^{l-1}\right)+\frac{\partial e}{\partial net^{l}}^{T}\cdot dB^{l}\right) =tr(netleT(dKlfullOl1)+netleTdBl) = t r ( ( ∂ e ∂ n e t l T ∗ v a l i d ( O l − 1 ) r o t T ) ⋅ d K l + ∂ e ∂ n e t l T ⋅ d B l ) =tr\left(\left(\frac{\partial e}{\partial net^{l}}^{T}*_{valid}\left(O^{l-1}\right)_{rot}^{T}\right)\cdot dK^{l}+\frac{\partial e}{\partial net^{l}}^{T}\cdot dB^{l}\right) =tr((netleTvalid(Ol1)rotT)dKl+netleTdBl)

损失函数对 l 层卷积核 K l 的偏导 : 损失函数对l层卷积核K^{l}的偏导: 损失函数对l层卷积核Kl的偏导: ∂ e ∂ K l = ( ∂ e ∂ n e t l T ∗ v a l i d ( O l − 1 ) r o t T ) T \frac{\partial e}{\partial K^{l}}=\left(\frac{\partial e}{\partial net^{l}}^{T}*_{valid}\left(O^{l-1}\right)_{rot}^{T}\right)^{T} Kle=(netleTvalid(Ol1)rotT)T = ∂ e ∂ n e t l ∗ v a l i d O r o t l − 1 =\frac{\partial e}{\partial net^{l}}*_{valid}O^{l-1}_{rot} =netlevalidOrotl1 损失函数对 l 层偏移 B l 的偏导 : 损失函数对l层偏移B^{l}的偏导: 损失函数对l层偏移Bl的偏导: ∂ e ∂ B l = ∂ e ∂ n e t l \frac{\partial e}{\partial B^{l}}=\frac{\partial e}{\partial net^{l}} Ble=netle

2.2. ∂ e ∂ n e t l \frac{\partial e}{\partial net^{l}} netle递推公式

损失函数 : 损失函数: 损失函数: e = g ( n e t l + 1 ) e=g(net^{l+1}) e=g(netl+1) 第 l + 1 层净输入 : 第l+1层净输入: l+1层净输入: n e t l + 1 = K l + 1 ∗ f u l l O l + B l + 1 net^{l+1}=K^{l+1}*_{full}O^{l}+B^{l+1} netl+1=Kl+1fullOl+Bl+1 第 l 层净输出 第l层净输出 l层净输出 O l = f ( n e t l ) O^{l}=f(net^{l}) Ol=f(netl) 损失函数的微分 : 损失函数的微分: 损失函数的微分: d e = t r ( ∂ e ∂ n e t l + 1 T ⋅ d n e t l + 1 ) de=tr\left(\frac{\partial e}{\partial net^{l+1}}^{T}\cdot dnet^{l+1}\right) de=tr(netl+1eTdnetl+1) 净输入的微分 : 净输入的微分: 净输入的微分: d n e t l + 1 = K l + 1 ∗ f u l l d O l dnet^{l+1}=K^{l+1}*_{full}dO^{l} dnetl+1=Kl+1fulldOl 净输出的微分 : 净输出的微分: 净输出的微分: d O l = f ′ ( n e t l ) ⊙ d n e t l dO^{l}=f^{'}(net^{l})\odot dnet^{l} dOl=f(netl)dnetl

d e = t r ( ∂ e ∂ n e t l + 1 T ⋅ ( K l + 1 ∗ f u l l ( f ′ ( n e t l ) ⊙ d n e t l ) ) ) de= tr\left(\frac{\partial e}{\partial net^{l+1}}^{T}\cdot \left(K^{l+1}*_{full}\left(f^{'}(net^{l})\odot dnet^{l}\right)\right)\right) de=tr(netl+1eT(Kl+1full(f(netl)dnetl))) = t r ( ( ∂ e ∂ n e t l + 1 T ∗ v a l i d ( K l + 1 ) r o t T ) ⋅ ( f ′ ( n e t l ) ⊙ d n e t l ) ) =tr\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{valid}\left(K^{l+1}\right)_{rot}^{T}\right)\cdot \left(f^{'}(net^{l})\odot dnet^{l}\right)\right) =tr((netl+1eTvalid(Kl+1)rotT)(f(netl)dnetl)) = t r ( ( ( ∂ e ∂ n e t l + 1 T ∗ v a l i d ( K l + 1 ) r o t T ) ⊙ ( f ′ ( n e t l ) ) T ) ⋅ d n e t l ) =tr\left(\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{valid}\left(K^{l+1}\right)_{rot}^{T}\right)\odot \left(f^{'}(net^{l})\right)^{T}\right)\cdot dnet^{l}\right) =tr(((netl+1eTvalid(Kl+1)rotT)(f(netl))T)dnetl)

∂ e ∂ n e t l = ( ( ∂ e ∂ n e t l + 1 T ∗ v a l i d ( K l + 1 ) r o t T ) ⊙ ( f ′ ( n e t l ) ) T ) T \frac{\partial e}{\partial net^{l}}=\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{valid}\left(K^{l+1}\right)_{rot}^{T}\right)\odot \left(f^{'}(net^{l})\right)^{T}\right)^{T} netle=((netl+1eTvalid(Kl+1)rotT)(f(netl))T)T = ( ∂ e ∂ n e t l + 1 T ∗ v a l i d ( K l + 1 ) r o t T ) T ⊙ f ′ ( n e t l ) =\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{valid}\left(K^{l+1}\right)_{rot}^{T}\right)^{T}\odot f^{'}(net^{l}) =(netl+1eTvalid(Kl+1)rotT)Tf(netl) = ( ∂ e ∂ n e t l + 1 ∗ v a l i d K r o t l + 1 ) ⊙ f ′ ( n e t l ) =\left(\frac{\partial e}{\partial net^{l+1}}*_{valid}K^{l+1}_{rot}\right)\odot f^{'}(net^{l}) =(netl+1evalidKrotl+1)f(netl)

二、多输入通道多输出通道公式

在多输入多输出情况下,卷积核 K l K^{l} Kl,输入 O l − 1 O^{l-1} Ol1,输出 n e t l net^{l} netl均为四维矩阵.

四维矩阵 A A A可以理解为元素类型为二维矩阵的二维矩阵.
A r t A_{r}^{t} Art运算的意义为对矩阵 A A A中每个元素进行旋转与转置操作.
A T A^{T} AT运算的意义为对矩阵 A A A进行转置操作,及对矩阵 A A A A A A中每个元素进行转置.

1. n e t l = K l ∗ v a l i d O l − 1 + B l net^{l}=K^{l}*_{valid}O^{l-1}+B^{l} netl=KlvalidOl1+Bl

1.1. ∂ e ∂ K l \frac{\partial e}{\partial K^{l}} Kle ∂ e ∂ B l \frac{\partial e}{\partial B^{l}} Ble

卷积核 K l ∈ R m × p × i × i , K^{l}\in\mathbb{R}^{m\times p\times i\times i}, KlRm×p×i×i,
输入 O l − 1 ∈ R p × n × j × j , O^{l-1}\in\mathbb{R}^{p\times n\times j\times j}, Ol1Rp×n×j×j,
输出 n e t l ∈ R m × n × j − i + 1 × j − i + 1 , net^{l}\in\mathbb{R}^{m\times n\times j-i+1\times j-i+1}, netlRm×n×ji+1×ji+1,
输入有n批,每批包含p个通道,每个通道的大小为 j*j , \text{j*j}, j*j,
输出有n批,每批包含m个通道,每个通道的大小为 j-i+1*j-i+1 , \text{j-i+1*j-i+1}, j-i+1*j-i+1,
卷积核有输出有 m*p \text{m*p} m*p个, 每个卷积核的大小为 i*i . \text{i*i}. i*i.

损失函数 : 损失函数: 损失函数: e = g ( n e t l ) e=g\left(net^{l}\right) e=g(netl) 第 l 层净输入 : 第l层净输入: l层净输入: n e t l = K l ∗ v a l i d O l − 1 + B l net^{l}=K^{l}*_{valid}O^{l-1}+B^{l} netl=KlvalidOl1+Bl 损失函数的微分 : 损失函数的微分: 损失函数的微分: d e = t r ( ∂ e ∂ n e t l T ⋅ d n e t l ) de=tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot dnet^{l}\right) de=tr(netleTdnetl) 净输入的微分 : 净输入的微分: 净输入的微分: d n e t l = d K l ∗ v a l i d O l − 1 + d B l dnet^{l} =dK^{l}*_{valid}O^{l-1}+dB^{l} dnetl=dKlvalidOl1+dBl

d e = t r ( ∂ e ∂ n e t l T ⋅ ( d K l ∗ v a l i d O l − 1 + d B l ) ) de=tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot \left(dK^{l}*_{valid}O^{l-1}+dB^{l}\right)\right) de=tr(netleT(dKlvalidOl1+dBl)) = t r ( ∂ e ∂ n e t l T ⋅ ( d K l ∗ v a l i d O l − 1 ) + ∂ e ∂ n e t l T ⋅ d B l ) =tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot \left(dK^{l}*_{valid}O^{l-1}\right)+\frac{\partial e}{\partial net^{l}}^{T}\cdot dB^{l}\right) =tr(netleT(dKlvalidOl1)+netleTdBl) = t r ( ( ( O l − 1 ) r t ∗ v a l i d ∂ e ∂ n e t l T ) ⋅ d K l + ∂ e ∂ n e t l T ⋅ d B l ) =tr\left(\left(\left(O^{l-1}\right)_{r}^{t} *_{valid}\frac{\partial e}{\partial net^{l}}^{T}\right)\cdot dK^{l}+\frac{\partial e}{\partial net^{l}}^{T}\cdot dB^{l}\right) =tr(((Ol1)rtvalidnetleT)dKl+netleTdBl)

损失函数对 l 层卷积核 K l 的偏导 : 损失函数对l层卷积核K^{l}的偏导: 损失函数对l层卷积核Kl的偏导: ∂ e ∂ K l = ( ( O l − 1 ) r t ∗ v a l i d ∂ e ∂ n e t l T ) T \frac{\partial e}{\partial K^{l}}=\left(\left(O^{l-1}\right)_{r}^{t} *_{valid}\frac{\partial e}{\partial net^{l}}^{T}\right)^{T} Kle=((Ol1)rtvalidnetleT)T = ∂ e ∂ n e t l ∗ v a l i d ( ( O l − 1 ) r t ) T =\frac{\partial e}{\partial net^{l}}*_{valid}\left(\left(O^{l-1}\right)_{r}^{t}\right)^{T} =netlevalid((Ol1)rt)T 损失函数对 l 层偏移 B l 的偏导 : 损失函数对l层偏移B^{l}的偏导: 损失函数对l层偏移Bl的偏导: ∂ e ∂ B l = ∂ e ∂ n e t l \frac{\partial e}{\partial B^{l}}=\frac{\partial e}{\partial net^{l}} Ble=netle

1.2. ∂ e ∂ n e t l \frac{\partial e}{\partial net^{l}} netle递推公式

损失函数 : 损失函数: 损失函数: e = g ( n e t l + 1 ) e=g(net^{l+1}) e=g(netl+1) 第 l + 1 层净输入 : 第l+1层净输入: l+1层净输入: n e t l + 1 = K l + 1 ∗ v a l i d O l + B l + 1 net^{l+1}=K^{l+1}*_{valid}O^{l}+B^{l+1} netl+1=Kl+1validOl+Bl+1 第 l 层净输出 第l层净输出 l层净输出 O l = f ( n e t l ) O^{l}=f(net^{l}) Ol=f(netl) 损失函数的微分 : 损失函数的微分: 损失函数的微分: d e = t r ( ∂ e ∂ n e t l + 1 T ⋅ d n e t l + 1 ) de=tr\left(\frac{\partial e}{\partial net^{l+1}}^{T}\cdot dnet^{l+1}\right) de=tr(netl+1eTdnetl+1) 净输入的微分 : 净输入的微分: 净输入的微分: d n e t l + 1 = K l + 1 ∗ v a l i d d O l dnet^{l+1}=K^{l+1}*_{valid}dO^{l} dnetl+1=Kl+1validdOl 净输出的微分 : 净输出的微分: 净输出的微分: d O l = f ′ ( n e t l ) ⊙ d n e t l dO^{l}=f^{'}(net^{l})\odot dnet^{l} dOl=f(netl)dnetl

d e = t r ( ∂ e ∂ n e t l + 1 T ⋅ ( K l + 1 ∗ v a l i d ( f ′ ( n e t l ) ⊙ d n e t l ) ) ) de= tr\left(\frac{\partial e}{\partial net^{l+1}}^{T}\cdot \left(K^{l+1}*_{valid}\left(f^{'}(net^{l})\odot dnet^{l}\right)\right)\right) de=tr(netl+1eT(Kl+1valid(f(netl)dnetl))) = t r ( ( ∂ e ∂ n e t l + 1 T ∗ f u l l ( K l + 1 ) r t ) ⋅ ( f ′ ( n e t l ) ⊙ d n e t l ) ) =tr\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{full}\left(K^{l+1}\right)_{r}^{t}\right)\cdot \left(f^{'}(net^{l})\odot dnet^{l}\right)\right) =tr((netl+1eTfull(Kl+1)rt)(f(netl)dnetl)) = t r ( ( ( ∂ e ∂ n e t l + 1 T ∗ f u l l ( K l + 1 ) r t ) ⊙ ( f ′ ( n e t l ) ) T ) ⋅ d n e t l ) =tr\left(\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{full}\left(K^{l+1}\right)_{r}^{t}\right)\odot \left(f^{'}(net^{l})\right)^{T}\right)\cdot dnet^{l}\right) =tr(((netl+1eTfull(Kl+1)rt)(f(netl))T)dnetl)

∂ e ∂ n e t l = ( ( ∂ e ∂ n e t l + 1 T ∗ f u l l ( K l + 1 ) r t ) ⊙ ( f ′ ( n e t l ) ) T ) T \frac{\partial e}{\partial net^{l}}=\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{full}\left(K^{l+1}\right)_{r}^{t}\right)\odot \left(f^{'}(net^{l})\right)^{T}\right)^{T} netle=((netl+1eTfull(Kl+1)rt)(f(netl))T)T = ( ∂ e ∂ n e t l + 1 T ∗ f u l l ( K l + 1 ) r t ) T ⊙ f ′ ( n e t l ) =\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{full}\left(K^{l+1}\right)_{r}^{t}\right)^{T}\odot f^{'}(net^{l}) =(netl+1eTfull(Kl+1)rt)Tf(netl) = ( ( ( K l + 1 ) r t ) T ∗ f u l l ∂ e ∂ n e t l + 1 ) ⊙ f ′ ( n e t l ) =\left(\left(\left(K^{l+1}\right)_{r}^{t}\right)^{T}*_{full}\frac{\partial e}{\partial net^{l+1}}\right)\odot f^{'}(net^{l}) =(((Kl+1)rt)Tfullnetl+1e)f(netl)

2. n e t l = K l ∗ f u l l O l − 1 + B l net^{l}=K^{l}*_{full}O^{l-1}+B^{l} netl=KlfullOl1+Bl

卷积核 K l ∈ R m × p × i × i , K^{l}\in\mathbb{R}^{m\times p\times i\times i}, KlRm×p×i×i,
输入 O l − 1 ∈ R p × n × j × j , O^{l-1}\in\mathbb{R}^{p\times n\times j\times j}, Ol1Rp×n×j×j,
输出 n e t l ∈ R m × n × j + i − 1 × j + i − 1 , net^{l}\in\mathbb{R}^{m\times n\times j+i-1\times j+i-1}, netlRm×n×j+i1×j+i1,
输入有n批,每批包含p个通道,每个通道的大小为 j*j , \text{j*j}, j*j,
输出有n批,每批包含m个通道,每个通道的大小为 j+i-1*j+i-1 , \text{j+i-1*j+i-1}, j+i-1*j+i-1,
卷积核有输出有 m*p \text{m*p} m*p个, 每个卷积核的大小为 i*i . \text{i*i}. i*i.

2.1. ∂ e ∂ K l \frac{\partial e}{\partial K^{l}} Kle ∂ e ∂ B l \frac{\partial e}{\partial B^{l}} Ble

损失函数 : 损失函数: 损失函数: e = g ( n e t l ) e=g\left(net^{l}\right) e=g(netl) 第 l 层净输入 : 第l层净输入: l层净输入: n e t l = K l ∗ f u l l O l − 1 + B l net^{l}=K^{l}*_{full}O^{l-1}+B^{l} netl=KlfullOl1+Bl 损失函数的微分 : 损失函数的微分: 损失函数的微分: d e = t r ( ∂ e ∂ n e t l T ⋅ d n e t l ) de=tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot dnet^{l}\right) de=tr(netleTdnetl) 净输入的微分 : 净输入的微分: 净输入的微分: d n e t l = d K l ∗ f u l l O l − 1 + d B l dnet^{l} =dK^{l}*_{full}O^{l-1}+dB^{l} dnetl=dKlfullOl1+dBl

d e = t r ( ∂ e ∂ n e t l T ⋅ ( d K l ∗ f u l l O l − 1 + d B l ) ) de=tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot \left(dK^{l}*_{full}O^{l-1}+dB^{l}\right)\right) de=tr(netleT(dKlfullOl1+dBl)) = t r ( ∂ e ∂ n e t l T ⋅ ( d K l ∗ f u l l O l − 1 ) + ∂ e ∂ n e t l T ⋅ d B l ) =tr\left(\frac{\partial e}{\partial net^{l}}^{T}\cdot \left(dK^{l}*_{full}O^{l-1}\right)+\frac{\partial e}{\partial net^{l}}^{T}\cdot dB^{l}\right) =tr(netleT(dKlfullOl1)+netleTdBl) = t r ( ( ( O l − 1 ) r t ∗ v a l i d ∂ e ∂ n e t l T ) ⋅ d K l + ∂ e ∂ n e t l T ⋅ d B l ) =tr\left(\left(\left(O^{l-1}\right)_{r}^{t}*_{valid}\frac{\partial e}{\partial net^{l}}^{T}\right)\cdot dK^{l}+\frac{\partial e}{\partial net^{l}}^{T}\cdot dB^{l}\right) =tr(((Ol1)rtvalidnetleT)dKl+netleTdBl)

损失函数对 l 层卷积核 K l 的偏导 : 损失函数对l层卷积核K^{l}的偏导: 损失函数对l层卷积核Kl的偏导: ∂ e ∂ K l = ( ( O l − 1 ) r t ∗ v a l i d ∂ e ∂ n e t l T ) T \frac{\partial e}{\partial K^{l}}=\left(\left(O^{l-1}\right)_{r}^{t}*_{valid}\frac{\partial e}{\partial net^{l}}^{T}\right)^{T} Kle=((Ol1)rtvalidnetleT)T = ∂ e ∂ n e t l ∗ v a l i d ( ( O l − 1 ) r t ) T =\frac{\partial e}{\partial net^{l}}*_{valid}\left(\left(O^{l-1}\right)_{r}^{t}\right)^{T} =netlevalid((Ol1)rt)T 损失函数对 l 层偏移 B l 的偏导 : 损失函数对l层偏移B^{l}的偏导: 损失函数对l层偏移Bl的偏导: ∂ e ∂ B l = ∂ e ∂ n e t l \frac{\partial e}{\partial B^{l}}=\frac{\partial e}{\partial net^{l}} Ble=netle

2.2. ∂ e ∂ n e t l \frac{\partial e}{\partial net^{l}} netle递推公式

损失函数 : 损失函数: 损失函数: e = g ( n e t l + 1 ) e=g(net^{l+1}) e=g(netl+1) 第 l + 1 层净输入 : 第l+1层净输入: l+1层净输入: n e t l + 1 = K l + 1 ∗ f u l l O l + B l + 1 net^{l+1}=K^{l+1}*_{full}O^{l}+B^{l+1} netl+1=Kl+1fullOl+Bl+1 第 l 层净输出 第l层净输出 l层净输出 O l = f ( n e t l ) O^{l}=f(net^{l}) Ol=f(netl) 损失函数的微分 : 损失函数的微分: 损失函数的微分: d e = t r ( ∂ e ∂ n e t l + 1 T ⋅ d n e t l + 1 ) de=tr\left(\frac{\partial e}{\partial net^{l+1}}^{T}\cdot dnet^{l+1}\right) de=tr(netl+1eTdnetl+1) 净输入的微分 : 净输入的微分: 净输入的微分: d n e t l + 1 = K l + 1 ∗ f u l l d O l dnet^{l+1}=K^{l+1}*_{full}dO^{l} dnetl+1=Kl+1fulldOl 净输出的微分 : 净输出的微分: 净输出的微分: d O l = f ′ ( n e t l ) ⊙ d n e t l dO^{l}=f^{'}(net^{l})\odot dnet^{l} dOl=f(netl)dnetl

d e = t r ( ∂ e ∂ n e t l + 1 T ⋅ ( K l + 1 ∗ f u l l ( f ′ ( n e t l ) ⊙ d n e t l ) ) ) de= tr\left(\frac{\partial e}{\partial net^{l+1}}^{T}\cdot \left(K^{l+1}*_{full}\left(f^{'}(net^{l})\odot dnet^{l}\right)\right)\right) de=tr(netl+1eT(Kl+1full(f(netl)dnetl))) = t r ( ( ∂ e ∂ n e t l + 1 T ∗ v a l i d ( K l + 1 ) r t ) ⋅ ( f ′ ( n e t l ) ⊙ d n e t l ) ) =tr\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{valid}\left(K^{l+1}\right)_{r}^{t}\right)\cdot \left(f^{'}(net^{l})\odot dnet^{l}\right)\right) =tr((netl+1eTvalid(Kl+1)rt)(f(netl)dnetl)) = t r ( ( ( ∂ e ∂ n e t l + 1 T ∗ v a l i d ( K l + 1 ) r t ) ⊙ ( f ′ ( n e t l ) ) T ) ⋅ d n e t l ) =tr\left(\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{valid}\left(K^{l+1}\right)_{r}^{t}\right)\odot \left(f^{'}(net^{l})\right)^{T}\right)\cdot dnet^{l}\right) =tr(((netl+1eTvalid(Kl+1)rt)(f(netl))T)dnetl)

∂ e ∂ n e t l = ( ( ∂ e ∂ n e t l + 1 T ∗ v a l i d ( K l + 1 ) r t ) ⊙ ( f ′ ( n e t l ) ) T ) T \frac{\partial e}{\partial net^{l}}=\left(\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{valid}\left(K^{l+1}\right)_{r}^{t}\right)\odot \left(f^{'}(net^{l})\right)^{T}\right)^{T} netle=((netl+1eTvalid(Kl+1)rt)(f(netl))T)T = ( ∂ e ∂ n e t l + 1 T ∗ v a l i d ( K l + 1 ) r t ) T ⊙ f ′ ( n e t l ) =\left(\frac{\partial e}{\partial net^{l+1}}^{T}*_{valid}\left(K^{l+1}\right)_{r}^{t}\right)^{T}\odot f^{'}(net^{l}) =(netl+1eTvalid(Kl+1)rt)Tf(netl) = ( ( ( K l + 1 ) r t ) T ∗ v a l i d ∂ e ∂ n e t l + 1 ) ⊙ f ′ ( n e t l ) =\left(\left(\left(K^{l+1}\right)_{r}^{t}\right)^{T}*_{valid}\frac{\partial e}{\partial net^{l+1}}\right)\odot f^{'}(net^{l}) =(((Kl+1)rt)Tvalidnetl+1e)f(netl)

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值