卷积神经网络CNN的前向和后向传播(二)

5 篇文章 0 订阅

本文在卷积神经网络CNN的前向和后向传播(一)的基础上做一些扩展,本篇主要是公式。

padding=1,stride=1的情形

输入为8x8和卷积核3x3

考虑一个稍大一点的输入 X X X W W W

X = [ X 11 X 12 X 13 X 14 X 15 X 16 X 17 X 18 X 21 X 22 X 23 X 24 X 25 X 26 X 27 X 28 X 31 X 32 X 33 X 34 X 35 X 36 X 37 X 38 X 41 X 42 X 43 X 44 X 45 X 46 X 47 X 48 X 51 X 52 X 53 X 54 X 55 X 56 X 57 X 58 X 61 X 62 X 63 X 64 X 65 X 66 X 67 X 68 X 71 X 72 X 73 X 74 X 75 X 76 X 77 X 78 X 81 X 82 X 83 X 84 X 85 X 86 X 87 X 88 ] X=\left[ \begin{matrix} X_{11} & X_{12} & X_{13} & X_{14} & X_{15} & X_{16} & X_{17} & X_{18} \\ X_{21} & X_{22} & X_{23} & X_{24} & X_{25} & X_{26} & X_{27} & X_{28} \\ X_{31} & X_{32} & X_{33} & X_{34} & X_{35} & X_{36} & X_{37} & X_{38} \\ X_{41} & X_{42} & X_{43} & X_{44} & X_{45} & X_{46} & X_{47} & X_{48} \\ X_{51} & X_{52} & X_{53} & X_{54} & X_{55} & X_{56} & X_{57} & X_{58} \\ X_{61} & X_{62} & X_{63} & X_{64} & X_{65} & X_{66} & X_{67} & X_{68} \\ X_{71} & X_{72} & X_{73} & X_{74} & X_{75} & X_{76} & X_{77} & X_{78} \\ X_{81} & X_{82} & X_{83} & X_{84} & X_{85} & X_{86} & X_{87} & X_{88} \end{matrix} \right] X=X11X21X31X41X51X61X71X81X12X22X32X42X52X62X72X82X13X23X33X43X53X63X73X83X14X24X34X44X54X64X74X84X15X25X35X45X55X65X75X85X16X26X36X46X56X66X76X86X17X27X37X47X57X67X77X87X18X28X38X48X58X68X78X88

W = [ W 11 W 12 W 13 W 21 W 22 W 23 W 31 W 32 W 33 ] W=\left[ \begin{matrix} W_{11} & W_{12} & W_{13} \\ W_{21} & W_{22} & W_{23} \\ W_{31} & W_{32} & W_{33} \end{matrix} \right] W=W11W21W31W12W22W32W13W23W33

p a d d i n g p = 1 , s t r i d e s = 1 , Y = c o n v 2 ( X , W ) padding p=1, stride s=1,Y=conv2(X,W) paddingp=1,strides=1,Y=conv2(X,W)

Y = [ Y 11 Y 12 Y 13 Y 14 Y 15 Y 16 Y 17 Y 18 Y 21 Y 22 Y 23 Y 24 Y 25 Y 26 Y 27 Y 28 Y 31 Y 32 Y 33 Y 34 Y 35 Y 36 Y 37 Y 38 Y 41 Y 42 Y 43 Y 44 Y 45 Y 46 Y 47 Y 48 Y 51 Y 52 Y 53 Y 54 Y 55 Y 56 Y 57 Y 58 Y 61 Y 62 Y 63 Y 64 Y 65 Y 66 Y 67 Y 68 Y 71 Y 72 Y 73 Y 74 Y 75 Y 76 Y 77 Y 78 Y 81 Y 82 Y 83 Y 84 Y 85 Y 86 Y 87 Y 88 ] Y=\left[ \begin{matrix} Y_{11} & Y_{12} & Y_{13} & Y_{14} & Y_{15} & Y_{16} & Y_{17} & Y_{18} \\ Y_{21} & Y_{22} & Y_{23} & Y_{24} & Y_{25} & Y_{26} & Y_{27} & Y_{28} \\ Y_{31} & Y_{32} & Y_{33} & Y_{34} & Y_{35} & Y_{36} & Y_{37} & Y_{38} \\ Y_{41} & Y_{42} & Y_{43} & Y_{44} & Y_{45} & Y_{46} & Y_{47} & Y_{48} \\ Y_{51} & Y_{52} & Y_{53} & Y_{54} & Y_{55} & Y_{56} & Y_{57} & Y_{58} \\ Y_{61} & Y_{62} & Y_{63} & Y_{64} & Y_{65} & Y_{66} & Y_{67} & Y_{68} \\ Y_{71} & Y_{72} & Y_{73} & Y_{74} & Y_{75} & Y_{76} & Y_{77} & Y_{78} \\ Y_{81} & Y_{82} & Y_{83} & Y_{84} & Y_{85} & Y_{86} & Y_{87} & Y_{88} \end{matrix} \right] Y=Y11Y21Y31Y41Y51Y61Y71Y81Y12Y22Y32Y42Y52Y62Y72Y82Y13Y23Y33Y43Y53Y63Y73Y83Y14Y24Y34Y44Y54Y64Y74Y84Y15Y25Y35Y45Y55Y65Y75Y85Y16Y26Y36Y46Y56Y66Y76Y86Y17Y27Y37Y47Y57Y67Y77Y87Y18Y28Y38Y48Y58Y68Y78Y88

对输入的求导

W W W 进行翻转

W ′ = [ W 33 W 32 W 31 W 23 W 22 W 21 W 13 W 12 W 11 ] W'=\left[ \begin{matrix} W_{33} & W_{32} & W_{31} \\ W_{23} & W_{22} & W_{21} \\ W_{13} & W_{12} & W_{11} \end{matrix} \right] W=W33W23W13W32W22W12W31W21W11


X p a d = [ 0 0 0 0 0 0 0 0 0 0 0 X 11 X 12 X 13 X 14 X 15 X 16 X 17 X 18 0 0 X 21 X 22 X 23 X 24 X 25 X 26 X 27 X 28 0 0 X 31 X 32 X 33 X 34 X 35 X 36 X 37 X 38 0 0 X 41 X 42 X 43 X 44 X 45 X 46 X 47 X 48 0 0 X 51 X 52 X 53 X 54 X 55 X 56 X 57 X 58 0 0 X 61 X 62 X 63 X 64 X 65 X 66 X 67 X 68 0 0 X 71 X 72 X 73 X 74 X 75 X 76 X 77 X 78 0 0 X 81 X 82 X 83 X 84 X 85 X 86 X 87 X 88 0 0 0 0 0 0 0 0 0 0 0 ] X^{pad} =\left[ \begin{matrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &0 & 0\\ 0 & X_{11} & X_{12} & X_{13} & X_{14} & X_{15} & X_{16} & X_{17} & X_{18} & 0\\ 0 & X_{21} & X_{22} & X_{23} & X_{24} & X_{25} & X_{26} & X_{27} & X_{28} & 0\\ 0 & X_{31} & X_{32} & X_{33} & X_{34} & X_{35} & X_{36} & X_{37} & X_{38} & 0\\ 0 & X_{41} & X_{42} & X_{43} & X_{44} & X_{45} & X_{46} & X_{47} & X_{48} & 0\\ 0 & X_{51} & X_{52} & X_{53} & X_{54} & X_{55} & X_{56} & X_{57} & X_{58} & 0\\ 0 & X_{61} & X_{62} & X_{63} & X_{64} & X_{65} & X_{66} & X_{67} & X_{68} & 0\\ 0 & X_{71} & X_{72} & X_{73} & X_{74} & X_{75} & X_{76} & X_{77} & X_{78} & 0\\ 0 & X_{81} & X_{82} & X_{83} & X_{84} & X_{85} & X_{86} & X_{87} & X_{88} & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &0 & 0 \end{matrix} \right] Xpad=00000000000X11X21X31X41X51X61X71X8100X12X22X32X42X52X62X72X8200X13X23X33X43X53X63X73X8300X14X24X34X44X54X64X74X8400X15X25X35X45X55X65X75X8500X16X26X36X46X56X66X76X8600X17X27X37X47X57X67X77X8700X18X28X38X48X58X68X78X8800000000000

X i , j = X i + p , j + p p a d X_{i,j}=X^{pad}_{i+p,j+p} Xi,j=Xi+p,j+ppad

Y i , j = ∑ u = 0 2 ∑ v = 0 2 X i + u , j + v p a d ⋅ W 1 + u , 1 + v ′ Y_{i,j}=\sum_{u=0}^2\sum_{v=0}^2X^{pad}_{i+u,j+v}·W'_{1+u,1+v} Yi,j=u=02v=02Xi+u,j+vpadW1+u,1+v

考虑 X 11 X_{11} X11的情况

Y 11 = [ 0 0 0 0 X 11 X 12 0 X 21 X 22 ] ⋅ [ W 33 W 32 W 31 W 23 W 22 W 21 W 13 W 12 W 11 ] , Y_{11}=\left[ \begin{matrix} 0 & 0 & 0 \\ 0 &X_{11} & X_{12}\\ 0 & X_{21} & X_{22} \end{matrix} \right]·\left[ \begin{matrix} W_{33} & W_{32} & W_{31} \\ W_{23} & W_{22} & W_{21} \\ W_{13} & W_{12} & W_{11} \end{matrix} \right], Y11=0000X11X210X12X22W33W23W13W32W22W12W31W21W11,

Y 12 = [ 0 0 0 X 11 X 12 X 13 X 21 X 22 X 23 ] ⋅ [ W 33 W 32 W 31 W 23 W 22 W 21 W 13 W 12 W 11 ] , Y_{12}=\left[ \begin{matrix} 0 & 0 & 0 \\ X_{11} & X_{12}& X_{13}\\ X_{21} & X_{22}& X_{23} \end{matrix} \right]·\left[ \begin{matrix} W_{33} & W_{32} & W_{31} \\ W_{23} & W_{22} & W_{21} \\ W_{13} & W_{12} & W_{11} \end{matrix} \right], Y12=0X11X210X12X220X13X23W33W23W13W32W22W12W31W21W11,

Y 21 = [ 0 X 11 X 12 0 X 21 X 22 0 X 31 X 32 ] ⋅ [ W 33 W 32 W 31 W 23 W 22 W 21 W 13 W 12 W 11 ] , Y_{21}=\left[ \begin{matrix} 0 &X_{11} & X_{12}\\ 0 & X_{21} & X_{22}\\ 0 & X_{31} & X_{32} \end{matrix} \right]·\left[ \begin{matrix} W_{33} & W_{32} & W_{31} \\ W_{23} & W_{22} & W_{21} \\ W_{13} & W_{12} & W_{11} \end{matrix} \right], Y21=000X11X21X31X12X22X32W33W23W13W32W22W12W31W21W11,

Y 22 = [ X 11 X 12 X 13 X 21 X 22 X 23 X 31 X 32 X 33 ] ⋅ [ W 33 W 32 W 31 W 23 W 22 W 21 W 13 W 12 W 11 ] , Y_{22}=\left[ \begin{matrix} X_{11} & X_{12} & X_{13}\\ X_{21} & X_{22} & X_{23}\\ X_{31} & X_{32} & X_{33} \end{matrix} \right]·\left[ \begin{matrix} W_{33} & W_{32} & W_{31} \\ W_{23} & W_{22} & W_{21} \\ W_{13} & W_{12} & W_{11} \end{matrix} \right], Y22=X11X21X31X12X22X32X13X23X33W33W23W13W32W22W12W31W21W11,

∂ E ∂ X 11 = ∂ E ∂ Y 11 ⋅ W 22 + ∂ E ∂ Y 12 ⋅ W 23 + ∂ E ∂ Y 21 ⋅ W 32 + ∂ E ∂ Y 22 ⋅ W 33 \frac{\partial E}{\partial X_{11}}=\frac{\partial E}{\partial Y_{11}}·W_{22}+\frac{\partial E}{\partial Y_{12}}·W_{23}+\frac{\partial E}{\partial Y_{21}}·W_{32}+\frac{\partial E}{\partial Y_{22}}·W_{33} X11E=Y11EW22+Y12EW23+Y21EW32+Y22EW33

∂ E ∂ X 11 = [ 0 0 0 0 ∂ E ∂ Y 11 ∂ E ∂ Y 12 0 ∂ E ∂ Y 21 ∂ E ∂ Y 22 ] ⋅ W \frac{\partial E}{\partial X_{11}}=\left[ \begin{matrix} 0&0&0 \\ \\ 0 &\frac{\partial E}{\partial Y_{11}}&\frac{\partial E}{\partial Y_{12}}\\ \\0&\frac{\partial E}{\partial Y_{21}}&\frac{\partial E}{\partial Y_{22}}\end{matrix} \right]·W X11E=0000Y11EY21E0Y12EY22EW

来看 ∂ E ∂ X 12 \frac{\partial E}{\partial X_{12}} X12E

Y 13 = [ 0 0 0 X 12 X 13 X 14 X 22 X 23 X 24 ] ⋅ [ W 33 W 32 W 31 W 23 W 22 W 21 W 13 W 12 W 11 ] , Y_{13}=\left[ \begin{matrix} 0 & 0 & 0 \\ X_{12} & X_{13}& X_{14}\\ X_{22} & X_{23}& X_{24} \end{matrix} \right]·\left[ \begin{matrix} W_{33} & W_{32} & W_{31} \\ W_{23} & W_{22} & W_{21} \\ W_{13} & W_{12} & W_{11} \end{matrix} \right], Y13=0X12X220X13X230X14X24W33W23W13W32W22W12W31W21W11,

Y 23 = [ X 12 X 13 X 14 X 22 X 23 X 24 X 32 X 33 X 34 ] ⋅ [ W 33 W 32 W 31 W 23 W 22 W 21 W 13 W 12 W 11 ] , Y_{23}=\left[ \begin{matrix} X_{12} & X_{13}& X_{14}\\ X_{22} & X_{23}& X_{24}\\ X_{32} & X_{33}& X_{34}\\ \end{matrix} \right]·\left[ \begin{matrix} W_{33} & W_{32} & W_{31} \\ W_{23} & W_{22} & W_{21} \\ W_{13} & W_{12} & W_{11} \end{matrix} \right], Y23=X12X22X32X13X23X33X14X24X34W33W23W13W32W22W12W31W21W11,

∂ E ∂ X 12 = ∂ E ∂ Y 11 ⋅ W 21 + ∂ E ∂ Y 12 ⋅ W 22 + ∂ E ∂ Y 13 ⋅ W 23 + ∂ E ∂ Y 21 ⋅ W 31 + ∂ E ∂ Y 22 ⋅ W 32 + ∂ E ∂ Y 23 ⋅ W 33 \frac{\partial E}{\partial X_{12}}=\frac{\partial E}{\partial Y_{11}}·W_{21}+\frac{\partial E}{\partial Y_{12}}·W_{22}+\frac{\partial E}{\partial Y_{13}}·W_{23}+\frac{\partial E}{\partial Y_{21}}·W_{31}+\frac{\partial E}{\partial Y_{22}}·W_{32}+\frac{\partial E}{\partial Y_{23}}·W_{33} X12E=Y11EW21+Y12EW22+Y13EW23+Y21EW31+Y22EW32+Y23EW33


∂ E ∂ X 12 = [ 0 0 0 ∂ E ∂ Y 11 ∂ E ∂ Y 12 ∂ E ∂ Y 13 ∂ E ∂ Y 21 ∂ E ∂ Y 22 ∂ E ∂ Y 23 ] ⋅ W \frac{\partial E}{\partial X_{12}}=\left[ \begin{matrix} 0&0&0\\ \\ \frac{\partial E}{\partial Y_{11}}&\frac{\partial E}{\partial Y_{12}}&\frac{\partial E}{\partial Y_{13}}\\ \\\frac{\partial E}{\partial Y_{21}}&\frac{\partial E}{\partial Y_{22}}&\frac{\partial E}{\partial Y_{23}}\end{matrix} \right]·W X12E=0Y11EY21E0Y12EY22E0Y13EY23EW

再来看 ∂ E ∂ X 21 \frac{\partial E}{\partial X_{21}} X21E

Y 31 = [ 0 X 21 X 22 0 X 31 X 32 0 X 41 X 42 ] ⋅ [ W 33 W 32 W 31 W 23 W 22 W 21 W 13 W 12 W 11 ] , Y_{31}=\left[ \begin{matrix} 0 & X_{21} & X_{22}\\ 0 & X_{31} & X_{32}\\ 0 & X_{41} & X_{42} \end{matrix} \right]·\left[ \begin{matrix} W_{33} & W_{32} & W_{31} \\ W_{23} & W_{22} & W_{21} \\ W_{13} & W_{12} & W_{11} \end{matrix} \right], Y31=000X21X31X41X22X32X42W33W23W13W32W22W12W31W21W11,

Y 32 = [ X 21 X 22 X 23 X 31 X 32 X 33 X 41 X 42 X 43 ] ⋅ [ W 33 W 32 W 31 W 23 W 22 W 21 W 13 W 12 W 11 ] , Y_{32}=\left[ \begin{matrix} X_{21} & X_{22} & X_{23}\\ X_{31} & X_{32} & X_{33}\\ X_{41} & X_{42} & X_{43}\\ \end{matrix} \right]·\left[ \begin{matrix} W_{33} & W_{32} & W_{31} \\ W_{23} & W_{22} & W_{21} \\ W_{13} & W_{12} & W_{11} \end{matrix} \right], Y32=X21X31X41X22X32X42X23X33X43W33W23W13W32W22W12W31W21W11,

…省略推导…
∂ E ∂ X 21 = [ 0 ∂ E ∂ Y 11 ∂ E ∂ Y 12 0 ∂ E ∂ Y 21 ∂ E ∂ Y 22 0 ∂ E ∂ Y 31 ∂ E ∂ Y 32 ] ⋅ W \frac{\partial E}{\partial X_{21}} =\left[ \begin{matrix} 0&\frac{\partial E}{\partial Y_{11}}&\frac{\partial E}{\partial Y_{12}}\\ \\0&\frac{\partial E}{\partial Y_{21}}&\frac{\partial E}{\partial Y_{22}}\\ \\0&\frac{\partial E}{\partial Y_{31}}&\frac{\partial E}{\partial Y_{32}}\end{matrix} \right]·W X21E=000Y11EY21EY31EY12EY22EY32EW

同样,对 ∂ E ∂ Y \frac{\partial E}{\partial Y} YE 四周填0,得到 ∂ E ∂ Y ( p a d ) \frac{\partial E}{\partial Y}^{(pad)} YE(pad)

∂ E ∂ X = C o r r e l a t i o n ( ∂ E ∂ Y ( p a d ) , W ) \frac{\partial E}{\partial X} =Correlation(\frac{\partial E}{\partial Y}^{(pad)},W) XE=Correlation(YE(pad),W)

对卷积核的求导

先放结论。
∂ E ∂ W = R o t 180 ( C o r r e l a t i o n ( ∂ E ∂ Y , X ) ) \frac{\partial E}{\partial W} =Rot180\left(Correlation(\frac{\partial E}{\partial Y},X)\right) WE=Rot180(Correlation(YE,X))

R o t 180 Rot180 Rot180操作表示先水平翻转再垂直翻转。

来看公式
Y i , j = ∑ u = 0 2 ∑ v = 0 2 X i + u , j + v p a d ⋅ W 1 + u , 1 + v ′ Y_{i,j}=\sum_{u=0}^2\sum_{v=0}^2X^{pad}_{i+u,j+v}·W'_{1+u,1+v} Yi,j=u=02v=02Xi+u,j+vpadW1+u,1+v


Y i , j = [ X i , j p a d X i , j + 1 p a d X i , j + 2 p a d X i + 1 , j p a d X i + 1 , j + 1 p a d X i + 1 , j + 2 p a d X i + 2 , j p a d X i + 2 , j + 1 p a d X i + 2 , j + 2 p a d ] ⋅ [ W 33 W 32 W 31 W 23 W 22 W 21 W 13 W 12 W 11 ] Y_{i,j}=\left[ \begin{matrix} X^{pad}_{i,j} & X^{pad}_{i,j+1} & X^{pad}_{i,j+2} \\ \\ X^{pad}_{i+1,j} & X^{pad}_{i+1,j+1} & X^{pad}_{i+1,j+2} \\ \\ X^{pad}_{i+2,j} & X^{pad}_{i+2,j+1} & X^{pad}_{i+2,j+2} \end{matrix} \right]·\left[ \begin{matrix} W_{33} & W_{32} & W_{31} \\ W_{23} & W_{22} & W_{21} \\ W_{13} & W_{12} & W_{11} \end{matrix} \right] Yi,j=Xi,jpadXi+1,jpadXi+2,jpadXi,j+1padXi+1,j+1padXi+2,j+1padXi,j+2padXi+1,j+2padXi+2,j+2padW33W23W13W32W22W12W31W21W11

因此,

∂ Y i , j ∂ W 11 = X i + 2 , j + 2 p a d , ∂ Y i , j ∂ W 12 = X i + 2 , j + 1 p a d , ∂ Y i , j ∂ W 13 = X i + 2 , j p a d \frac{\partial Y_{i,j}}{\partial W_{11}}=X^{pad}_{i+2,j+2}, \frac{\partial Y_{i,j}}{\partial W_{12}}=X^{pad}_{i+2,j+1},\frac{\partial Y_{i,j}}{\partial W_{13}}=X^{pad}_{i+2,j} W11Yi,j=Xi+2,j+2pad,W12Yi,j=Xi+2,j+1pad,W13Yi,j=Xi+2,jpad
∂ Y i , j ∂ W 21 = X i + 1 , j + 2 p a d , ∂ Y i , j ∂ W 22 = X i + 1 , j + 1 p a d , ∂ Y i , j ∂ W 23 = X i + 1 , j p a d \frac{\partial Y_{i,j}}{\partial W_{21}}=X^{pad}_{i+1,j+2}, \frac{\partial Y_{i,j}}{\partial W_{22}}=X^{pad}_{i+1,j+1},\frac{\partial Y_{i,j}}{\partial W_{23}}=X^{pad}_{i+1,j} W21Yi,j=Xi+1,j+2pad,W22Yi,j=Xi+1,j+1pad,W23Yi,j=Xi+1,jpad
∂ Y i , j ∂ W 31 = X i , j + 2 p a d , ∂ Y i , j ∂ W 32 = X i , j + 1 p a d , ∂ Y i , j ∂ W 13 = X i , j p a d \frac{\partial Y_{i,j}}{\partial W_{31}}=X^{pad}_{i\quad,j+2}, \frac{\partial Y_{i,j}}{\partial W_{32}}=X^{pad}_{i \quad ,j+1}, \frac{\partial Y_{i,j}}{\partial W_{13}}=X^{pad}_{i \quad,j } W31Yi,j=Xi,j+2pad,W32Yi,j=Xi,j+1pad,W13Yi,j=Xi,jpad

∂ E ∂ W 11 = ∑ i ∑ j ∂ E ∂ Y i , j ⋅ ∂ Y i , j ∂ W 11 = ∑ i ∑ j ∂ E ∂ Y i , j ⋅ X i + 2 , j + 2 p a d \frac{\partial E}{\partial W_{11}}=\sum_i \sum_j\frac{\partial E}{\partial Y_{i,j}}·\frac{\partial Y_{i,j}}{\partial W_{11}}=\sum_i \sum_j \frac{\partial E}{\partial Y_{i,j}}·X^{pad}_{i+2,j+2} W11E=ijYi,jEW11Yi,j=ijYi,jEXi+2,j+2pad
同理,
∂ E ∂ W 12 = ∑ i ∑ j ∂ E ∂ Y i , j ⋅ X i + 2 , j + 1 p a d , ∂ E ∂ W 13 = ∑ i ∑ j ∂ E ∂ Y i , j ⋅ X i + 2 , j p a d \frac{\partial E}{\partial W_{12}}=\sum_i \sum_j \frac{\partial E}{\partial Y_{i,j}}·X^{pad}_{i+2,j+1} , \frac{\partial E}{\partial W_{13}}=\sum_i \sum_j \frac{\partial E}{\partial Y_{i,j}}·X^{pad}_{i+2,j} W12E=ijYi,jEXi+2,j+1pad,W13E=ijYi,jEXi+2,jpad

∂ E ∂ W 21 = ∑ i ∑ j ∂ E ∂ Y i , j ⋅ X i + 1 , j + 2 p a d , ∂ E ∂ W 22 = ∑ i ∑ j ∂ E ∂ Y i , j ⋅ X i + 1 , j + 1 p a d , ∂ E ∂ W 23 = ∑ i ∑ j ∂ E ∂ Y i , j ⋅ X i + 1 , j p a d \frac{\partial E}{\partial W_{21}}=\sum_i \sum_j \frac{\partial E}{\partial Y_{i,j}}·X^{pad}_{i+1,j+2} , \frac{\partial E}{\partial W_{22}}=\sum_i \sum_j \frac{\partial E}{\partial Y_{i,j}}·X^{pad}_{i+1,j+1}, \frac{\partial E}{\partial W_{23}}=\sum_i \sum_j \frac{\partial E}{\partial Y_{i,j}}·X^{pad}_{i+1,j} W21E=ijYi,jEXi+1,j+2pad,W22E=ijYi,jEXi+1,j+1pad,W23E=ijYi,jEXi+1,jpad

∂ E ∂ W 31 = ∑ i ∑ j ∂ E ∂ Y i , j ⋅ X i , j + 2 p a d , ∂ E ∂ W 32 = ∑ i ∑ j ∂ E ∂ Y i , j ⋅ X i , j + 1 p a d , ∂ E ∂ W 33 = ∑ i ∑ j ∂ E ∂ Y i , j ⋅ X i , j p a d \frac{\partial E}{\partial W_{31}}=\sum_i \sum_j \frac{\partial E}{\partial Y_{i,j}}·X^{pad}_{i,j+2} , \frac{\partial E}{\partial W_{32}}=\sum_i \sum_j \frac{\partial E}{\partial Y_{i,j}}·X^{pad}_{i,j+1}, \frac{\partial E}{\partial W_{33}}=\sum_i \sum_j \frac{\partial E}{\partial Y_{i,j}}·X^{pad}_{i,j} W31E=ijYi,jEXi,j+2pad,W32E=ijYi,jEXi,j+1pad,W33E=ijYi,jEXi,jpad

整合一下,

∂ E ∂ W = [ ∂ E ∂ W 11 ∂ E ∂ W 12 ∂ E ∂ W 13 ∂ E ∂ W 21 ∂ E ∂ W 22 ∂ E ∂ W 23 ∂ E ∂ W 31 ∂ E ∂ W 32 ∂ E ∂ W 33 ] \frac{\partial E}{\partial W}=\left[ \begin{matrix} \frac{\partial E}{\partial W_{11}} & \frac{\partial E}{\partial W_{12}} & \frac{\partial E}{\partial W_{13}} \\ \\ \frac{\partial E}{\partial W_{21}} & \frac{\partial E}{\partial W_{22}} & \frac{\partial E}{\partial W_{23}} \\ \\ \frac{\partial E}{\partial W_{31}} & \frac{\partial E}{\partial W_{32}} & \frac{\partial E}{\partial W_{33}} \end{matrix} \right] WE=W11EW21EW31EW12EW22EW32EW13EW23EW33E
= [ ∑ i ∑ j ∂ E ∂ Y i , j ⋅ X i + 2 , j + 2 p a d ∑ i ∑ j ∂ E ∂ Y i , j ⋅ X i + 2 , j + 1 p a d ∑ i ∑ j ∂ E ∂ Y i , j ⋅ X i + 2 , j p a d ∑ i ∑ j ∂ E ∂ Y i , j ⋅ X i + 1 , j + 2 p a d ∑ i ∑ j ∂ E ∂ Y i , j ⋅ X i + 1 , j + 1 p a d ∑ i ∑ j ∂ E ∂ Y i , j ⋅ X i + 1 , j p a d ∑ i ∑ j ∂ E ∂ Y i , j ⋅ X i , j + 2 p a d ∑ i ∑ j ∂ E ∂ Y i , j ⋅ X i , j + 1 p a d ∑ i ∑ j ∂ E ∂ Y i , j ⋅ X i , j p a d ] =\left[ \begin{matrix} \sum_i \sum_j \frac{\partial E}{\partial Y_{i,j}}·X^{pad}_{i+2,j+2} & \sum_i \sum_j \frac{\partial E}{\partial Y_{i,j}}·X^{pad}_{i+2,j+1} & \sum_i \sum_j \frac{\partial E}{\partial Y_{i,j}}·X^{pad}_{i+2,j} \\ \\ \sum_i \sum_j \frac{\partial E}{\partial Y_{i,j}}·X^{pad}_{i+1,j+2} & \sum_i \sum_j \frac{\partial E}{\partial Y_{i,j}}·X^{pad}_{i+1,j+1} & \sum_i \sum_j \frac{\partial E}{\partial Y_{i,j}}·X^{pad}_{i+1,j} \\ \\ \sum_i \sum_j \frac{\partial E}{\partial Y_{i,j}}·X^{pad}_{i,j+2} & \sum_i \sum_j \frac{\partial E}{\partial Y_{i,j}}·X^{pad}_{i,j+1} & \sum_i \sum_j \frac{\partial E}{\partial Y_{i,j}}·X^{pad}_{i,j} \\ \end{matrix} \right] =ijYi,jEXi+2,j+2padijYi,jEXi+1,j+2padijYi,jEXi,j+2padijYi,jEXi+2,j+1padijYi,jEXi+1,j+1padijYi,jEXi,j+1padijYi,jEXi+2,jpadijYi,jEXi+1,jpadijYi,jEXi,jpad
实际上是, ∂ E / ∂ W = R o t 180 ( C o r r e l a t i o n ( ∂ E / ∂ Y , X ) ) \partial E/\partial W=Rot180\left(Correlation(\partial E/\partial Y, X)\right) E/W=Rot180(Correlation(E/Y,X)) p a d d i n g = 1 , s t r i d e = 1 padding=1, stride=1 padding=1,stride=1

证明完毕。

padding=1,stride=2的情形

输入为8x8和卷积核3x3

p a d d i n g p = 1 , s t r i d e s = 2 , Y = c o n v 2 ( X , W ) padding p=1, stride s=2,Y=conv2(X,W) paddingp=1,strides=2,Y=conv2(X,W)

Y = [ Y 11 Y 12 Y 13 Y 14 Y 21 Y 22 Y 23 Y 24 Y 31 Y 32 Y 33 Y 34 Y 41 Y 42 Y 43 Y 44 ] Y=\left[ \begin{matrix} Y_{11} & Y_{12} & Y_{13} & Y_{14} \\ Y_{21} & Y_{22} & Y_{23} & Y_{24} \\ Y_{31} & Y_{32} & Y_{33} & Y_{34} \\ Y_{41} & Y_{42} & Y_{43} & Y_{44} \end{matrix} \right] Y=Y11Y21Y31Y41Y12Y22Y32Y42Y13Y23Y33Y43Y14Y24Y34Y44

对输入 X X X的求导

先放张图帮助理解。
在这里插入图片描述

Y 11 = [ 0 0 0 0 X 11 X 12 0 X 21 X 22 ] ⋅ [ W 33 W 32 W 31 W 23 W 22 W 21 W 13 W 12 W 11 ] , Y 12 = [ 0 0 0 X 12 X 13 X 14 X 22 X 23 X 24 ] ⋅ [ W 33 W 32 W 31 W 23 W 22 W 21 W 13 W 12 W 11 ] Y_{11}=\left[ \begin{matrix} 0 & 0 & 0 \\ 0 & X_{11} & X_{12} \\ 0 & X_{21} & X_{22} \end{matrix}\right]·\left[ \begin{matrix} W_{33} & W_{32} & W_{31} \\ W_{23} & W_{22} & W_{21} \\ W_{13} & W_{12} & W_{11} \end{matrix} \right], Y_{12}=\left[ \begin{matrix} 0 & 0 & 0 \\ X_{12} & X_{13} & X_{14} \\ X_{22} & X_{23} & X_{24} \end{matrix}\right]·\left[ \begin{matrix} W_{33} & W_{32} & W_{31} \\ W_{23} & W_{22} & W_{21} \\ W_{13} & W_{12} & W_{11} \end{matrix} \right] Y11=0000X11X210X12X22W33W23W13W32W22W12W31W21W11,Y12=0X12X220X13X230X14X24W33W23W13W32W22W12W31W21W11
Y 13 = [ 0 0 0 X 14 X 15 X 16 X 24 X 25 X 26 ] ⋅ [ W 33 W 32 W 31 W 23 W 22 W 21 W 13 W 12 W 11 ] , Y 14 = [ 0 0 0 X 16 X 17 X 18 X 26 X 27 X 28 ] ⋅ [ W 33 W 32 W 31 W 23 W 22 W 21 W 13 W 12 W 11 ] Y_{13}=\left[ \begin{matrix} 0 & 0 & 0 \\ X_{14} & X_{15} & X_{16} \\ X_{24} & X_{25} & X_{26} \end{matrix}\right]·\left[ \begin{matrix} W_{33} & W_{32} & W_{31} \\ W_{23} & W_{22} & W_{21} \\ W_{13} & W_{12} & W_{11} \end{matrix} \right], Y_{14}=\left[ \begin{matrix} 0 & 0 & 0 \\ X_{16} & X_{17} & X_{18} \\ X_{26} & X_{27} & X_{28} \end{matrix}\right]·\left[ \begin{matrix} W_{33} & W_{32} & W_{31} \\ W_{23} & W_{22} & W_{21} \\ W_{13} & W_{12} & W_{11} \end{matrix} \right] Y13=0X14X240X15X250X16X26W33W23W13W32W22W12W31W21W11,Y14=0X16X260X17X270X18X28W33W23W13W32W22W12W31W21W11
Y 21 = [ 0 X 21 X 22 0 X 31 X 32 0 X 41 X 42 ] ⋅ [ W 33 W 32 W 31 W 23 W 22 W 21 W 13 W 12 W 11 ] , Y 22 = [ X 22 X 23 X 24 X 32 X 33 X 34 X 42 X 43 X 44 ] ⋅ [ W 33 W 32 W 31 W 23 W 22 W 21 W 13 W 12 W 11 ] , Y_{21}=\left[ \begin{matrix} 0 & X_{21} & X_{22} \\ 0 & X_{31} & X_{32} \\ 0 & X_{41} & X_{42} \end{matrix}\right]·\left[ \begin{matrix} W_{33} & W_{32} & W_{31} \\ W_{23} & W_{22} & W_{21} \\ W_{13} & W_{12} & W_{11} \end{matrix} \right], Y_{22}=\left[ \begin{matrix} X_{22} & X_{23} & X_{24} \\ X_{32} & X_{33} & X_{34} \\ X_{42} & X_{43} & X_{44} \end{matrix}\right]·\left[ \begin{matrix} W_{33} & W_{32} & W_{31} \\ W_{23} & W_{22} & W_{21} \\ W_{13} & W_{12} & W_{11} \end{matrix} \right], Y21=000X21X31X41X22X32X42W33W23W13W32W22W12W31W21W11,Y22=X22X32X42X23X33X43X24X34X44W33W23W13W32W22W12W31W21W11,
……

省略中间的推导,

∂ E ∂ X 11 = [ 0 0 0 0 ∂ E ∂ Y 11 0 0 0 0 ] ⋅ W \frac{\partial E}{\partial X_{11}}=\left[ \begin{matrix} 0 & 0 & 0 \\ 0 & \frac{\partial E}{\partial Y_{11}} & 0 \\ 0 & 0 & 0 \end{matrix} \right]·W X11E=0000Y11E0000W

∂ E ∂ X 12 = [ 0 0 0 ∂ E ∂ Y 11 0 ∂ E ∂ Y 12 0 0 0 ] ⋅ W \frac{\partial E}{\partial X_{12}}=\left[ \begin{matrix} 0 & 0 & 0 \\ \frac{\partial E}{\partial Y_{11}} & 0 & \frac{\partial E}{\partial Y_{12}} \\ 0 & 0 & 0 \end{matrix} \right]·W X12E=0Y11E00000Y12E0W

∂ E ∂ X 17 = [ 0 0 0 0 ∂ E ∂ Y 14 0 0 0 0 ] ⋅ W \frac{\partial E}{\partial X_{17}}=\left[ \begin{matrix} 0 & 0 & 0 \\ 0 & \frac{\partial E}{\partial Y_{14}} & 0 \\ 0 & 0 & 0 \end{matrix} \right]·W X17E=0000Y14E0000W

∂ E ∂ X 18 = [ 0 0 0 ∂ E ∂ Y 14 0 0 0 0 0 ] ⋅ W \frac{\partial E}{\partial X_{18}}=\left[ \begin{matrix} 0 & 0 & 0 \\ \frac{\partial E}{\partial Y_{14}} & 0 & 0 \\ 0 & 0 & 0 \end{matrix} \right]·W X18E=0Y14E0000000W

……

∂ E ∂ X 23 = [ 0 ∂ E ∂ Y 12 0 0 0 0 0 ∂ E ∂ Y 22 0 ] ⋅ W \frac{\partial E}{\partial X_{23}}=\left[ \begin{matrix} 0 & \frac{\partial E}{\partial Y_{12}} & 0 \\ 0 & 0 & 0 \\ 0 & \frac{\partial E}{\partial Y_{22}} & 0 \end{matrix} \right]·W X23E=000Y12E0Y22E000W

∂ E ∂ X 24 = [ ∂ E ∂ Y 12 0 ∂ E ∂ Y 13 0 0 0 ∂ E ∂ Y 22 0 ∂ E ∂ Y 23 ] ⋅ W \frac{\partial E}{\partial X_{24}}=\left[ \begin{matrix} \frac{\partial E}{\partial Y_{12}} & 0 & \frac{\partial E}{\partial Y_{13}} \\ 0 & 0 & 0 \\ \frac{\partial E}{\partial Y_{22}} & 0 & \frac{\partial E}{\partial Y_{23}} \end{matrix} \right]·W X24E=Y12E0Y22E000Y13E0Y23EW

……

∂ E ∂ X = C o r r e l a t i o n ( U p s a m p l e ( ∂ E ∂ Y , s t r i d e = 2 ) p a d , W ) \frac{\partial E}{\partial X} = Correlation\left(Upsample\left(\frac{\partial E}{\partial Y},stride=2\right)^{pad},W\right) XE=Correlation(Upsample(YE,stride=2)pad,W)

对卷积核 W W W的求导

……省略推导过程……

∂ E ∂ Y = [ ∂ E ∂ Y 11 ∂ E ∂ Y 12 ∂ E ∂ Y 13 ∂ E ∂ Y 14 ∂ E ∂ Y 21 ∂ E ∂ Y 22 ∂ E ∂ Y 23 ∂ E ∂ Y 24 ∂ E ∂ Y 31 ∂ E ∂ Y 32 ∂ E ∂ Y 33 ∂ E ∂ Y 34 ∂ E ∂ Y 41 ∂ E ∂ Y 42 ∂ E ∂ Y 43 ∂ E ∂ Y 44 ] \frac{\partial E}{\partial Y}=\left[\begin{matrix} \frac{\partial E}{\partial Y_{11}} & \frac{\partial E}{\partial Y_{12}} & \frac{\partial E}{\partial Y_{13}} & \frac{\partial E}{\partial Y_{14}} \\ \\ \frac{\partial E}{\partial Y_{21}} & \frac{\partial E}{\partial Y_{22}} & \frac{\partial E}{\partial Y_{23}} & \frac{\partial E}{\partial Y_{24}} \\ \\ \frac{\partial E}{\partial Y_{31}} & \frac{\partial E}{\partial Y_{32}} & \frac{\partial E}{\partial Y_{33}} & \frac{\partial E}{\partial Y_{34}} \\ \\ \frac{\partial E}{\partial Y_{41}} & \frac{\partial E}{\partial Y_{42}} & \frac{\partial E}{\partial Y_{43}} & \frac{\partial E}{\partial Y_{44}} \end{matrix}\right] YE=Y11EY21EY31EY41EY12EY22EY32EY42EY13EY23EY33EY43EY14EY24EY34EY44E
那么,
∂ E ∂ W 33 = ∂ E ∂ Y ⋅ [ 0 0 0 0 0 X 22 X 24 X 26 0 X 42 X 44 X 46 0 X 62 X 64 X 66 ] , \frac{\partial E}{\partial W_{33}}=\frac{\partial E}{\partial Y}·\left[ \begin{matrix} 0 & 0 & 0 &0 \\ 0 & X_{22} & X_{24} & X_{26} \\ 0 & X_{42} & X_{44} & X_{46} \\ 0 & X_{62} & X_{64} & X_{66} \end{matrix} \right], W33E=YE00000X22X42X620X24X44X640X26X46X66, ∂ E ∂ W 32 = ∂ E ∂ Y ⋅ [ 0 0 0 0 X 21 X 23 X 25 X 27 X 41 X 43 X 45 X 47 X 61 X 63 X 65 X 67 ] \frac{\partial E}{\partial W_{32}}=\frac{\partial E}{\partial Y}·\left[ \begin{matrix} 0 & 0 & 0 &0 \\ X_{21} & X_{23} & X_{25} & X_{27} \\ X_{41} & X_{43} & X_{45} & X_{47} \\ X_{61} & X_{63} & X_{65} & X_{67} \end{matrix} \right] W32E=YE0X21X41X610X23X43X630X25X45X650X27X47X67 ∂ E ∂ W 31 = ∂ E ∂ Y ⋅ [ 0 0 0 0 X 22 X 24 X 26 X 28 X 42 X 44 X 46 X 48 X 62 X 64 X 66 X 68 ] \frac{\partial E}{\partial W_{31}}=\frac{\partial E}{\partial Y}·\left[ \begin{matrix} 0 & 0 & 0 &0 \\ X_{22} & X_{24} & X_{26} & X_{28} \\ X_{42} & X_{44} & X_{46} & X_{48} \\ X_{62} & X_{64} & X_{66} & X_{68} \end{matrix} \right] W31E=YE0X22X42X620X24X44X640X26X46X660X28X48X68
∂ E ∂ W 23 = ∂ E ∂ Y ⋅ [ 0 X 12 X 14 X 16 0 X 32 X 34 X 36 0 X 52 X 54 X 56 0 X 72 X 74 X 76 ] , \frac{\partial E}{\partial W_{23}}=\frac{\partial E}{\partial Y}·\left[ \begin{matrix} 0 & X_{12} & X_{14} & X_{16} \\ 0 & X_{32} & X_{34} & X_{36} \\ 0 & X_{52} & X_{54} & X_{56} \\ 0 & X_{72} & X_{74} & X_{76} \end{matrix} \right], W23E=YE0000X12X32X52X72X14X34X54X74X16X36X56X76, ∂ E ∂ W 22 = ∂ E ∂ Y ⋅ [ X 11 X 13 X 15 X 17 X 31 X 33 X 35 X 37 X 51 X 53 X 55 X 57 X 71 X 73 X 75 X 77 ] , \frac{\partial E}{\partial W_{22}}=\frac{\partial E}{\partial Y}·\left[ \begin{matrix} X_{11} & X_{13} & X_{15} & X_{17} \\ X_{31} & X_{33} & X_{35} & X_{37} \\ X_{51} & X_{53} & X_{55} & X_{57} \\ X_{71} & X_{73} & X_{75} & X_{77} \end{matrix} \right], W22E=YEX11X31X51X71X13X33X53X73X15X35X55X75X17X37X57X77, ∂ E ∂ W 21 = ∂ E ∂ Y ⋅ [ X 12 X 14 X 16 X 18 X 32 X 34 X 36 X 38 X 52 X 54 X 56 X 58 X 72 X 74 X 76 X 78 ] \frac{\partial E}{\partial W_{21}}=\frac{\partial E}{\partial Y}·\left[ \begin{matrix} X_{12} & X_{14} & X_{16} & X_{18} \\ X_{32} & X_{34} & X_{36} & X_{38} \\ X_{52} & X_{54} & X_{56} & X_{58} \\ X_{72} & X_{74} & X_{76} & X_{78} \end{matrix} \right] W21E=YEX12X32X52X72X14X34X54X74X16X36X56X76X18X38X58X78
∂ E ∂ W 13 = ∂ E ∂ Y ⋅ [ 0 X 22 X 24 X 26 0 X 42 X 44 X 46 0 X 62 X 64 X 66 0 X 82 X 84 X 86 ] , \frac{\partial E}{\partial W_{13}}=\frac{\partial E}{\partial Y}·\left[ \begin{matrix} 0 & X_{22} & X_{24} & X_{26} \\ 0 & X_{42} & X_{44} & X_{46} \\ 0 & X_{62} & X_{64} & X_{66} \\ 0 & X_{82} & X_{84} & X_{86} \end{matrix} \right], W13E=YE0000X22X42X62X82X24X44X64X84X26X46X66X86, ∂ E ∂ W 12 = ∂ E ∂ Y ⋅ [ X 21 X 23 X 25 X 27 X 41 X 43 X 45 X 47 X 61 X 63 X 65 X 67 X 81 X 83 X 85 X 87 ] , \frac{\partial E}{\partial W_{12}}=\frac{\partial E}{\partial Y}·\left[ \begin{matrix} X_{21} & X_{23} & X_{25} & X_{27} \\ X_{41} & X_{43} & X_{45} & X_{47} \\ X_{61} & X_{63} & X_{65} & X_{67} \\ X_{81} & X_{83} & X_{85} & X_{87} \end{matrix} \right], W12E=YEX21X41X61X81X23X43X63X83X25X45X65X85X27X47X67X87, ∂ E ∂ W 11 = ∂ E ∂ Y ⋅ [ X 22 X 24 X 26 X 28 X 42 X 44 X 46 X 48 X 62 X 64 X 66 X 68 X 82 X 84 X 86 X 88 ] \frac{\partial E}{\partial W_{11}}=\frac{\partial E}{\partial Y}·\left[ \begin{matrix} X_{22} & X_{24} & X_{26} & X_{28} \\ X_{42} & X_{44} & X_{46} & X_{48} \\ X_{62} & X_{64} & X_{66} & X_{68} \\ X_{82} & X_{84} & X_{86} & X_{88} \end{matrix} \right] W11E=YEX22X42X62X82X24X44X64X84X26X46X66X86X28X48X68X88

因而, ∂ E / ∂ W = R o t 180 ( I n t e r s e c t _ C o r r e l a t i o n ( ∂ E / ∂ Y , X p a d ) ) \partial E/\partial W=Rot180\left(Intersect\_Correlation(\partial E/\partial Y, X^{pad}) \right) E/W=Rot180(Intersect_Correlation(E/Y,Xpad)) p a d d i n g = 1 , s t r i d e = 2 padding=1, stride=2 padding=1,stride=2

I n t e r s e c t _ C o r r e l a t i o n Intersect\_Correlation Intersect_Correlation这个操作需要仔细琢磨。

基本上,Yolo V3所用的ResNet的卷积层已经完全推导。

不好意思,本文章弄错了,卷积是3D的,懒得改了,思路差不多。

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值