深度学习中反向传播算法简单推导笔记

反向传播算法简单推导笔记

1.全连接神经网络

在这里插入图片描述

该结构的前向传播可以写成:

z ( 1 ) = W ( 1 ) x + b ( 1 ) z^{(1)} = W^{(1)}x+b^{(1)} z(1)=W(1)x+b(1)
a ( 1 ) = σ ( z ( 1 ) ) a^{(1)} = \sigma(z^{(1)}) a(1)=σ(z(1))
z ( 2 ) = W ( 2 ) a ( 1 ) + b ( 2 ) z^{(2)}=W^{(2)}a^{(1)}+b^{(2)} z(2)=W(2)a(1)+b(2)
a ( 2 ) = σ ( z ( 2 ) ) a^{(2)} = \sigma(z^{(2)}) a(2)=σ(z(2))

2.符号约定

  1. δ i ( x ) \delta^{(x)}_{i} δi(x): ∂ L o s s ∂ z i ( x ) \frac{\partial Loss}{\partial z_i^{(x)}} zi(x)Loss 标量.

  2. δ ( x ) \delta^{(x)} δ(x):向量,由 δ 1 ( x ) , δ 2 ( x ) . . . , δ n ( x ) \delta^{(x)}_{1},\delta^{(x)}_{2}...,\delta^{(x)}_{n} δ1(x),δ2(x)...,δn(x)组成.

  3. ⊙ \odot :表示按元素级别的乘法操作.
    [ a b ] ⊙ [ c d ] = [ a c b d ] \left[ \begin{matrix} a\\b \end{matrix} \right] \odot \left[ \begin{matrix} c\\d \end{matrix} \right] = \left[ \begin{matrix} ac\\bd \end{matrix} \right] [ab][cd]=[acbd]

    [ a b c d ] ⊙ [ e f ] = [ a e b e c f d f ] \left[ \begin{matrix} a&b\\c&d \end{matrix} \right] \odot \left[ \begin{matrix} e\\f \end{matrix} \right] = \left[ \begin{matrix} ae&be\\cf &df\end{matrix} \right] [acbd][ef]=[aecfbedf]

3.全连接反向传播公式推导

针对本博客给出来的2层,每层2个神经元的全连接神经网络模型,给出其反向传播推导过程.

现在欲求 ∂ L o s s ∂ W ( 1 ) , ∂ L o s s ∂ W ( 2 ) , ∂ L o s s ∂ b ( 1 ) , ∂ L o s s ∂ b ( 2 ) , δ ( 1 ) , δ ( 2 ) \frac{\partial Loss}{\partial W^{(1)}},\frac{\partial Loss}{\partial W^{(2)}},\frac{\partial Loss}{\partial b^{(1)}},\frac{\partial Loss}{\partial b^{(2)}},\delta^{(1)},\delta^{(2)} W(1)Loss,W(2)Loss,b(1)Loss,b(2)Loss,δ(1),δ(2).

依据模型图并根据链式求导法则可以得到下面的等式:
欲求 δ ( 2 ) \delta^{(2)} δ(2)先考虑下面式子

δ 1 ( 2 ) = ∂ L o s s ∂ a 1 ( 2 ) ∗ ∂ a 1 ( 2 ) ∂ z 1 ( 2 ) = ∂ L o s s ∂ a 1 ( 2 ) ∗ σ ′ ( z 1 ( 2 ) ) \delta^{(2)}_1=\frac{\partial Loss}{\partial a^{(2)}_1}*\frac{\partial a^{(2)}_1}{\partial z^{(2)}_1} = \frac{\partial Loss}{\partial a^{(2)}_1}*\sigma'(z^{(2)}_1) δ1(2)=a1(2)Lossz1(2)a1(2)=a1(2)Lossσ(z1(2))

δ 2 ( 2 ) = ∂ L o s s ∂ a 2 ( 2 ) ∗ ∂ a 2 ( 2 ) ∂ z 2 ( 2 ) = ∂ L o s s ∂ a 2 ( 2 ) ∗ σ ′ ( z 2 ( 2 ) ) \delta^{(2)}_2=\frac{\partial Loss}{\partial a^{(2)}_2}*\frac{\partial a^{(2)}_2}{\partial z^{(2)}_2} = \frac{\partial Loss}{\partial a^{(2)}_2}*\sigma'(z^{(2)}_2) δ2(2)=a2(2)Lossz2(2)a2(2)=a2(2)Lossσ(z2(2))
那么
δ ( 2 ) = [ δ 1 ( 2 ) δ 2 ( 2 ) ] = [ ∂ L o s s ∂ a 1 ( 2 ) ∂ L o s s ∂ a 2 ( 2 ) ] ⊙ [ σ ′ ( z 1 ( 2 ) ) σ ′ ( z 2 ( 2 ) ) ] \delta^{(2)}=\left[ \begin{matrix} \delta^{(2)}_1 \\ \\ \delta^{(2)}_2 \end{matrix} \right] = \left[ \begin{matrix} \frac{\partial Loss}{\partial a^{(2)}_1} \\ \\ \frac{\partial Loss}{\partial a^{(2)}_2} \end{matrix} \right] \odot \left[ \begin{matrix} \sigma'(z^{(2)}_1) \\ \\ \sigma'(z^{(2)}_2) \end{matrix} \right] δ(2)=δ1(2)δ2(2)=a1(2)Lossa2(2)Lossσ(z1(2))σ(z2(2))


欲求 ∂ L o s s ∂ W ( 2 ) \frac{\partial Loss}{\partial W^{(2)}} W(2)Loss先考虑下面式子
∂ L o s s ∂ w 11 ( 2 ) = δ 1 ( 2 ) ∗ ∂ z 1 ( 2 ) ∂ w 11 ( 2 ) = δ 1 ( 2 ) ∗ a 1 ( 1 ) \frac{\partial Loss}{\partial w^{(2)}_{11}} = \delta^{(2)}_1*\frac{\partial z^{(2)}_1}{\partial w^{(2)}_{11}}= \delta^{(2)}_1*a^{(1)}_1 w11(2)Loss=δ1(2)w11(2)z1(2)=δ1(2)a1(1)

∂ L o s s ∂ w 12 ( 2 ) = δ 1 ( 2 ) ∗ ∂ z 1 ( 2 ) ∂ w 12 ( 2 ) = δ 1 ( 2 ) ∗ a 2 ( 1 ) \frac{\partial Loss}{\partial w^{(2)}_{12}} = \delta^{(2)}_1*\frac{\partial z^{(2)}_1}{\partial w^{(2)}_{12}}= \delta^{(2)}_1*a^{(1)}_2 w12(2)Loss=δ1(2)w12(2)z1(2)=δ1(2)a2(1)

∂ L o s s ∂ w 21 ( 2 ) = δ 2 ( 2 ) ∗ ∂ z 2 ( 2 ) ∂ w 21 ( 2 ) = δ 2 ( 2 ) ∗ a 1 ( 1 ) \frac{\partial Loss}{\partial w^{(2)}_{21}} = \delta^{(2)}_2*\frac{\partial z^{(2)}_2}{\partial w^{(2)}_{21}}= \delta^{(2)}_2*a^{(1)}_1 w21(2)Loss=δ2(2)w21(2)z2(2)=δ2(2)a1(1)

∂ L o s s ∂ w 22 ( 2 ) = δ 2 ( 2 ) ∗ ∂ z 2 ( 2 ) ∂ w 22 ( 2 ) = δ 2 ( 2 ) ∗ a 2 ( 1 ) \frac{\partial Loss}{\partial w^{(2)}_{22}} = \delta^{(2)}_2*\frac{\partial z^{(2)}_2}{\partial w^{(2)}_{22}}= \delta^{(2)}_2*a^{(1)}_2 w22(2)Loss=δ2(2)w22(2)z2(2)=δ2(2)a2(1)

那么
∂ L o s s ∂ W ( 2 ) = [ δ 1 ( 2 ) ⋅ a 1 ( 1 ) δ 1 ( 2 ) ⋅ a 2 ( 1 ) δ 2 ( 2 ) ⋅ a 1 ( 1 ) δ 2 ( 2 ) ⋅ a 2 ( 1 ) ] = [ δ 1 ( 2 ) δ 2 ( 2 ) ] ⋅ [ a 1 ( 1 ) a 2 ( 1 ) ] = δ ( 2 ) ⋅ a ( 1 ) T \frac{\partial Loss}{\partial W^{(2)}} = \left[ \begin{matrix} \delta^{(2)}_1 \cdot a^{(1)}_1 & \delta^{(2)}_1\cdot a^{(1)}_2 \\ \\ \delta^{(2)}_2 \cdot a^{(1)}_1 & \delta^{(2)}_2 \cdot a^{(1)}_2 \end{matrix} \right] = \left[ \begin{matrix} \delta^{(2)}_1 \\ \\ \delta^{(2)}_2 \end{matrix} \right] \cdot \left[ \begin{matrix} a^{(1)}_1 & a^{(1)}_2 \end{matrix} \right] = \delta^{(2)}\cdot a^{(1)T} W(2)Loss=δ1(2)a1(1)δ2(2)a1(1)δ1(2)a2(1)δ2(2)a2(1)=δ1(2)δ2(2)[a1(1)a2(1)]=δ(2)a(1)T


欲求 ∂ L o s s ∂ b ( 2 ) \frac{\partial Loss}{\partial b^{(2)}} b(2)Loss,先考虑下面式子

∂ L o s s ∂ b 1 ( 2 ) = δ 1 ( 2 ) ∗ 1 \frac{\partial Loss}{\partial b^{(2)}_1} = \delta^{(2)}_1*1 b1(2)Loss=δ1(2)1

∂ L o s s ∂ b 2 ( 2 ) = δ 2 ( 2 ) ∗ 1 \frac{\partial Loss}{\partial b^{(2)}_2} = \delta^{(2)}_2*1 b2(2)Loss=δ2(2)1

那么
∂ L o s s ∂ b ( 2 ) = [ δ 1 ( 2 ) δ 2 ( 2 ) ] ⊙ [ 1 ] = δ ( 2 ) \frac{\partial Loss}{\partial b^{(2)}} = \left[ \begin{matrix} \delta^{(2)}_1 \\ \\ \delta^{(2)}_2 \end{matrix} \right] \odot \left[ \begin{matrix} 1 \end{matrix} \right] = \delta^{(2)} b(2)Loss=δ1(2)δ2(2)[1]=δ(2)


欲求 δ ( 1 ) \delta^{(1)} δ(1),先考虑下面式子

δ 1 ( 1 ) = δ 1 ( 2 ) ⋅ ∂ z 1 ( 2 ) ∂ a 1 ( 1 ) ⋅ ∂ a 1 ( 1 ) ∂ z 1 ( 1 ) + δ 2 ( 2 ) ⋅ ∂ z 2 ( 2 ) ∂ a 1 ( 1 ) ⋅ ∂ a 1 ( 1 ) ∂ z 1 ( 1 ) = δ 1 ( 2 ) ⋅ w 11 ( 2 ) ⋅ σ ′ ( z 1 ( 1 ) ) + δ 2 ( 2 ) ⋅ w 21 ( 2 ) ⋅ σ ′ ( z 1 ( 1 ) ) \delta^{(1)}_1 = \delta^{(2)}_1 \cdot \frac{\partial z^{(2)}_{1}}{\partial a^{(1)}_1} \cdot \frac{\partial a_1^{(1)}}{\partial z_1^{(1)}} + \delta^{(2)}_2 \cdot \frac{\partial z^{(2)}_{2}}{\partial a^{(1)}_1} \cdot \frac{\partial a_1^{(1)}}{\partial z_1^{(1)}} = \delta^{(2)}_1 \cdot w^{(2)}_{11} \cdot \sigma'(z_1^{(1)}) + \delta^{(2)}_2 \cdot w_{21}^{(2)} \cdot \sigma'(z_1^{(1)}) δ1(1)=δ1(2)a1(1)z1(2)z1(1)a1(1)+δ2(2)a1(1)z2(2)z1(1)a1(1)=δ1(2)w11(2)σ(z1(1))+δ2(2)w21(2)σ(z1(1))

δ 2 ( 1 ) = δ 1 ( 2 ) ⋅ ∂ z 1 ( 2 ) ∂ a 2 ( 1 ) ⋅ ∂ a 2 ( 1 ) ∂ z 2 ( 1 ) + δ 2 ( 2 ) ⋅ ∂ z 2 ( 2 ) ∂ a 2 ( 1 ) ⋅ ∂ a 2 ( 1 ) ∂ z 2 ( 1 ) = δ 1 ( 2 ) ⋅ w 12 ( 2 ) ⋅ σ ′ ( z 2 ( 1 ) ) + δ 2 ( 2 ) ⋅ w 22 ( 2 ) ⋅ σ ′ ( z 2 ( 1 ) ) \delta^{(1)}_2 = \delta^{(2)}_1 \cdot \frac{\partial z^{(2)}_{1}}{\partial a^{(1)}_2} \cdot \frac{\partial a_2^{(1)}}{\partial z_2^{(1)}} + \delta^{(2)}_2 \cdot \frac{\partial z^{(2)}_{2}}{\partial a^{(1)}_2} \cdot \frac{\partial a_2^{(1)}}{\partial z_2^{(1)}} = \delta^{(2)}_1 \cdot w^{(2)}_{12} \cdot \sigma'(z_2^{(1)}) + \delta^{(2)}_2 \cdot w_{22}^{(2)} \cdot \sigma'(z_2^{(1)}) δ2(1)=δ1(2)a2(1)z1(2)z2(1)a2(1)+δ2(2)a2(1)z2(2)z2(1)a2(1)=δ1(2)w12(2)σ(z2(1))+δ2(2)w22(2)σ(z2(1))

那么

δ ( 1 ) = [ w 11 ( 2 ) w 21 ( 2 ) w 12 ( 2 ) w 22 ( 2 ) ] ⋅ [ δ 1 ( 2 ) δ 2 ( 2 ) ] ⊙ [ σ ′ ( z 1 ( 1 ) ) σ ′ ( z 2 ( 1 ) ) ] = W ( 2 ) T ⋅ δ ( 2 ) ⊙ σ ′ ( z ( 1 ) ) \delta^{(1)}=\left[ \begin{matrix} w_{11}^{(2)} & w_{21}^{(2)} \\ \\ w_{12}^{(2)} & w_{22}^{(2)} \end{matrix} \right] \cdot \left[ \begin{matrix} \delta^{(2)}_1 \\ \\ \delta^{(2)}_2 \end{matrix} \right] \odot \left[ \begin{matrix} \sigma'(z^{(1)}_1) \\ \\ \sigma'{(z_2^{(1)})} \end{matrix} \right]=W^{(2)T} \cdot \delta^{(2)} \odot \sigma'(z^{(1)}) δ(1)=w11(2)w12(2)w21(2)w22(2)δ1(2)δ2(2)σ(z1(1))σ(z2(1))=W(2)Tδ(2)σ(z(1))


∂ L o s s ∂ W ( 1 ) , ∂ L o s s ∂ b ( 1 ) \frac{\partial Loss}{\partial W^{(1)}},\frac{\partial Loss}{\partial b^{(1)}} W(1)Loss,b(1)Loss的方法与之前求 ∂ L o s s ∂ W ( 2 ) , ∂ L o s s ∂ b ( 2 ) \frac{\partial Loss}{\partial W^{(2)}},\frac{\partial Loss}{\partial b^{(2)}} W(2)Loss,b(2)Loss完全相同,这里就不赘述了

利用向量微积分简化推导过程

上面的推到方法我们专注于拆成标量用链式求导法则算,最后拼成矩阵相乘的形式,这样从数学上来说比较严谨,但是较为麻烦.

而实际上我们有一种更为简单(玄学 )的做法,那就是直接对向量运用链式求导法则,并且根据矩阵的维数来调整项目的位置和对项目进行转置.

欲求 δ ( 2 ) = ∂ L o s s ∂ z ( 2 ) = ∂ L o s s ∂ a ( 2 ) ⋅ ∂ a ( 2 ) ∂ z ( 2 ) \delta^{(2)} = \frac{\partial Loss}{\partial z^{(2)}} = \frac{\partial Loss}{\partial a^{(2)}} \cdot \frac{\partial a^{(2)}}{\partial z^{(2)}} δ(2)=z(2)Loss=a(2)Lossz(2)a(2)

由于 d i m { δ ( 2 ) } = 2 ∗ 1 , d i m {   ∂ L o s s ∂ a ( 2 ) } = 2 ∗ 1 dim\{\delta^{(2)}\} = 2*1,dim\{\ \frac{\partial Loss}{\partial a^{(2)}} \} = 2*1 dim{δ(2)}=21,dim{ a(2)Loss}=21,根据矩阵乘法的法则,理论上应该有 d i m {   ∂ a ( 2 ) ∂ z ( 2 ) } = 1 ∗ 1 dim\{\ \frac{\partial a^{(2)}}{\partial z^{(2)}} \} = 1*1 dim{ z(2)a(2)}=11.

而实际上如果是列向量( a ( 2 ) a^{(2)} a(2))对列向量( z ( 2 ) z^{(2)} z(2))进行求导,大多都是对应元素进行求导,比如 ∂ a ( 2 ) ∂ z ( 2 ) = [ σ ′ ( z 1 ( 2 ) ) σ ′ ( z 2 ( 2 ) ) ] = σ ′ ( z ( 2 ) ) \frac{\partial a^{(2)}}{\partial z^{(2)}}=\left[ \begin{matrix} \sigma'(z^{(2)}_1) \\ \\ \sigma'(z^{(2)}_2) \end{matrix} \right]=\sigma'(z^{(2)}) z(2)a(2)=σ(z1(2))σ(z2(2))=σ(z(2)),也就是说 d i m { ∂ a ( 2 ) ∂ z ( 2 ) } = 2 ∗ 1 dim\{\frac{\partial a^{(2)}}{\partial z^{(2)}} \} = 2*1 dim{z(2)a(2)}=21.

那怎么办呢, 2 ∗ 1 , 2 ∗ 2 2*1,2*2 21,22的矩阵不满足乘法规律,这样的话,使用 ⊙ \odot 这个操作刚好就可以,所以:
δ ( 2 ) = ∂ L o s s ∂ z ( 2 ) = ∂ L o s s ∂ a ( 2 ) ⋅ ∂ a ( 2 ) ∂ z ( 2 ) = ∂ L o s s ∂ a ( 2 ) ⊙ σ ′ ( z ( 2 ) ) \delta^{(2)} = \frac{\partial Loss}{\partial z^{(2)}} = \frac{\partial Loss}{\partial a^{(2)}} \cdot \frac{\partial a^{(2)}}{\partial z^{(2)}}=\frac{\partial Loss}{\partial a^{(2)}} \odot \sigma'(z^{(2)}) δ(2)=z(2)Loss=a(2)Lossz(2)a(2)=a(2)Lossσ(z(2))

看到这里,明白人一定会说:你这不扯淡吗,毫无道理.

没错,你来打我呀.

我们再看一个例子:

欲求 ∂ L o s s ∂ W ( 2 ) = ( ∂ L o s s ∂ a ( 2 ) ⋅ ∂ a ( 2 ) ∂ z ( 2 ) ) ⋅ ∂ z ( 2 ) ∂ W ( 2 ) = δ ( 2 ) ⋅ ∂ z ( 2 ) ∂ W ( 2 ) \frac{\partial Loss}{\partial W^{(2)}} = (\frac{\partial Loss}{\partial a^{(2)}} \cdot \frac{\partial a^{(2)}}{\partial z^{(2)}}) \cdot \frac{\partial z^{(2)}}{\partial W^{(2)}} = \delta^{(2)} \cdot \frac{\partial z^{(2)}}{\partial W^{(2)}} W(2)Loss=(a(2)Lossz(2)a(2))W(2)z(2)=δ(2)W(2)z(2)

注意到 d i m { ∂ L o s s ∂ W ( 2 ) } = 2 ∗ 2 , d i m { δ ( 2 ) } = 2 ∗ 1 dim\{\frac{\partial Loss}{\partial W^{(2)}}\}=2*2,dim\{\delta^{(2)}\}=2*1 dim{W(2)Loss}=22,dim{δ(2)}=21
那么理论上应该有 d i m { ∂ z ( 2 ) ∂ W ( 2 ) } = 1 ∗ 2 dim\{\frac{\partial z^{(2)}}{\partial W^{(2)}}\} = 1*2 dim{W(2)z(2)}=12.

我们根据 z ( 2 ) = W ( 2 ) a ( 1 ) + b ( 2 ) z^{(2)}=W^{(2)}a^{(1)}+b^{(2)} z(2)=W(2)a(1)+b(2),知道 d i m { a ( 1 ) } = 2 ∗ 1 dim\{a^{(1)}\}=2*1 dim{a(1)}=21,所以令 ∂ z ( 2 ) ∂ W ( 2 ) = a ( 1 ) T \frac{\partial z^{(2)}}{\partial W^{(2)}}=a^{(1)T} W(2)z(2)=a(1)T.

于是 ∂ L o s s ∂ W ( 2 ) = δ ( 2 ) ⋅ a ( 1 ) T \frac{\partial Loss}{\partial W^{(2)}}= \delta^{(2)} \cdot a^{(1)T} W(2)Loss=δ(2)a(1)T.

虽然很扯淡,但这样做跟前面用数学推导得到的公式是一样的.

我们再看一个例子:

δ ( 1 ) = ∂ L o s s ∂ z ( 1 ) = ∂ L o s s ∂ z ( 2 ) ⋅ ∂ z ( 2 ) ∂ a ( 1 ) ⋅ ∂ a ( 1 ) ∂ z ( 1 ) \delta^{(1)} = \frac{\partial Loss}{\partial z^{(1)}}=\frac{\partial Loss}{\partial z^{(2)}} \cdot \frac{\partial z^{(2)}}{\partial a^{(1)}} \cdot \frac{\partial a^{(1)}}{\partial z^{(1)}} δ(1)=z(1)Loss=z(2)Lossa(1)z(2)z(1)a(1)

由于 d i m { ∂ L o s s ∂ z ( 1 ) } = n ∗ 1 , d i m { ∂ L o s s ∂ z ( 2 ) } = m ∗ 1 , d i m { ∂ z ( 2 ) ∂ a ( 1 ) } = ? , d i m { ∂ a ( 1 ) ∂ z ( 1 ) } = n ∗ 1 dim\{\frac{\partial Loss}{\partial z^{(1)}}\}=n*1,dim\{\frac{\partial Loss}{\partial z^{(2)}}\}=m*1,dim\{\frac{\partial z^{(2)}}{\partial a^{(1)}}\} = ?,dim\{\frac{\partial a^{(1)}}{\partial z^{(1)}}\}=n*1 dim{z(1)Loss}=n1,dim{z(2)Loss}=m1,dim{a(1)z(2)}=?,dim{z(1)a(1)}=n1.

再根据 z ( 2 ) = W ( 2 ) a ( 1 ) + b ( 2 ) z^{(2)}=W^{(2)}a^{(1)}+b^{(2)} z(2)=W(2)a(1)+b(2),那么 ∂ z ( 2 ) ∂ a ( 1 ) \frac{\partial z^{(2)}}{\partial a^{(1)}} a(1)z(2)一定是有 W ( 2 ) W^{(2)} W(2)变化而来的,而 d i m { W ( 2 ) } = m ∗ n dim\{W^{(2)}\}=m*n dim{W(2)}=mn,所以玄学一波(调整一下顺序以及转置)应该得到:
(其中 n , m = 2 n,m=2 n,m=2)

δ ( 1 ) = W ( 2 ) T ⋅ δ ( 2 ) ⊙ σ ′ ( z ( 1 ) ) \delta^{(1)} = W^{(2)T} \cdot \delta^{(2)} \odot \sigma'(z^{(1)}) δ(1)=W(2)Tδ(2)σ(z(1)).这与上面数学推导得到的结果是一致的.


4.卷积层反向传播

输入层为 X X X,卷积核为 K K K,输出层为 Y Y Y.

那么有 X ⊗ K = Y X \otimes K=Y XK=Y.

如果 X X X的宽度为 x x x, K K K的宽度为 k k k, Y Y Y的宽度为 y y y.那么有 y = x − k + 1 y=x-k+1 y=xk+1成立.

[ x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 ] ⊗ [ k 11 k 12 x 21 k 22 ] = [ y 11 y 12 y 21 y 22 ] \left[ \begin{matrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23} \\ x_{31} & x_{32} & x_{33} \end{matrix} \right] \otimes \left[ \begin{matrix} k_{11} & k_{12} \\ x_{21} & k_{22} \end{matrix} \right] = \left[ \begin{matrix} y_{11} & y_{12} \\ y_{21} & y_{22} \end{matrix} \right] x11x21x31x12x22x32x13x23x33[k11x21k12k22]=[y11y21y12y22]

展开之后我们可以写成:

y 11 = x 11 k 11 + x 12 k 12 + x 21 k 21 + x 22 k 22 y_{11}=x_{11}k_{11}+x_{12}k_{12}+x_{21}k_{21}+x_{22}k_{22} y11=x11k11+x12k12+x21k21+x22k22
y 12 = x 12 k 11 + x 13 k 12 + x 22 k 21 + x 23 k 22 y_{12}=x_{12}k_{11}+x_{13}k_{12}+x_{22}k_{21}+x_{23}k_{22} y12=x12k11+x13k12+x22k21+x23k22
y 21 = x 21 k 11 + x 22 k 12 + x 31 k 21 + x 32 k 22 y_{21}=x_{21}k_{11}+x_{22}k_{12}+x_{31}k_{21}+x_{32}k_{22} y21=x21k11+x22k12+x31k21+x32k22
y 22 = x 22 k 11 + x 23 k 12 + x 32 k 21 + x 33 k 22 y_{22}=x_{22}k_{11}+x_{23}k_{12}+x_{32}k_{21}+x_{33}k_{22} y22=x22k11+x23k12+x32k21+x33k22

δ i j = ∂ L o s s ∂ y i j = ∇ y i j \delta_{ij}=\frac{\partial Loss}{\partial y_{ij}}=\nabla y_{ij} δij=yijLoss=yij

∂ L o s s ∂ k 11 = δ 11 ⋅ x 11 + δ 12 ⋅ x 12 + δ 21 ⋅ x 21 + δ 22 ⋅ x 22 \frac{\partial Loss}{\partial k_{11}}=\delta_{11} \cdot x_{11} + \delta_{12} \cdot x_{12}+\delta_{21} \cdot x_{21}+\delta_{22} \cdot x_{22} k11Loss=δ11x11+δ12x12+δ21x21+δ22x22
∂ L o s s ∂ k 12 = δ 11 ⋅ x 12 + δ 12 ⋅ x 13 + δ 21 ⋅ x 22 + δ 22 ⋅ x 23 \frac{\partial Loss}{\partial k_{12}}=\delta_{11} \cdot x_{12} + \delta_{12} \cdot x_{13}+\delta_{21} \cdot x_{22}+\delta_{22} \cdot x_{23} k12Loss=δ11x12+δ12x13+δ21x22+δ22x23
∂ L o s s ∂ k 21 = δ 11 ⋅ x 21 + δ 12 ⋅ x 22 + δ 21 ⋅ x 31 + δ 22 ⋅ x 32 \frac{\partial Loss}{\partial k_{21}}=\delta_{11} \cdot x_{21} + \delta_{12} \cdot x_{22}+\delta_{21} \cdot x_{31}+\delta_{22} \cdot x_{32} k21Loss=δ11x21+δ12x22+δ21x31+δ22x32
∂ L o s s ∂ k 22 = δ 11 ⋅ x 22 + δ 12 ⋅ x 23 + δ 21 ⋅ x 32 + δ 22 ⋅ x 33 \frac{\partial Loss}{\partial k_{22}}=\delta_{11} \cdot x_{22} + \delta_{12} \cdot x_{23}+\delta_{21} \cdot x_{32}+\delta_{22} \cdot x_{33} k22Loss=δ11x22+δ12x23+δ21x32+δ22x33

我们发现

[ ∇ k 11 ∇ k 12 ∇ k 21 ∇ k 22 ] = [ x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 ] ⊗ [ ∇ y 11 ∇ y 12 ∇ y 21 ∇ y 22 ] \left[ \begin{matrix} \nabla k_{11} &\nabla k_{12} \\ \\ \nabla k_{21} &\nabla k_{22} \end{matrix} \right] = \left[ \begin{matrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23} \\ x_{31} & x_{32} & x_{33} \end{matrix} \right] \otimes \left[ \begin{matrix} \nabla y_{11} & \nabla y_{12} \\ \\ \nabla y_{21} & \nabla y_{22} \end{matrix} \right] k11k21k12k22=x11x21x31x12x22x32x13x23x33y11y21y12y22

而这刚好也是一个卷积操作,可以写成

∂ L o s s ∂ K = ∂ L o s s ∂ Y ⋅ ∂ Y ∂ K = X ⊗ ∇ Y \frac{\partial Loss}{\partial K}=\frac{\partial Loss}{\partial Y}\cdot\frac{\partial Y}{\partial K}=X \otimes \nabla Y KLoss=YLossKY=XY

我们写成 ∇ K = X ⊗ ∇ Y \nabla K=X \otimes \nabla Y K=XY

这说明了卷积层的梯度传播仍然是卷积操作.

那么对卷积核 K K K的梯度 ∇ K \nabla K K我们求出来了,下面我们省略对 X X X的梯度推导步骤,直接得出公式 ∇ X = p a d ( ∇ Y ) ⊗ r o t 180 ( K ) \nabla X= pad(\nabla Y) \otimes rot_{180}(K) X=pad(Y)rot180(K)

其中 r o t 180 rot_{180} rot180操作表示将矩阵旋转 180 180 180度,可以理解成先左右旋转,再上下旋转.

其中 p a d pad pad操作表示对矩阵周围进行补 0 0 0操作,为什么需要补 0 0 0呢,这是为了保证最后维数是匹配的. ∇ X \nabla X X的维度是 r r r, K K K的唯独是 k k k,那么我们前向传播时候得到的 Y Y Y的维度是 r − k + 1 r-k+1 rk+1.假设 p a d ( ∇ Y ) pad(\nabla Y) pad(Y)的维度是 ? ? ?,那么必须有 r = ? − k + 1 r=?-k+1 r=?k+1,所以 ? = r + k − 1 ?=r+k-1 ?=r+k1,因此 p a d pad pad操作将维度为 r − k + 1 r-k+1 rk+1的矩阵补成维度 r + k − 1 r+k-1 r+k1.

例如这个例子中的 p a d ( ∇ Y ) pad(\nabla Y) pad(Y)应该写成:

[ 0 0 0 0 0 ∇ y 11 ∇ y 12 0 0 ∇ y 21 ∇ y 22 0 0 0 0 0 ] \left[ \begin{matrix} 0&0&0&0 \\ 0&\nabla y_{11} & \nabla y_{12} & 0\\ 0&\nabla y_{21} & \nabla y_{22} & 0\\ 0&0&0&0 \end{matrix} \right] 00000y11y2100y12y2200000

这个例子中的 r o t 180 ( K ) rot_{180}(K) rot180(K)应该写成:

[ w 22 w 21 w 12 w 11 ] \left[ \begin{matrix} w_{22} &w_{21} \\ \\ w_{12} & w_{11} \end{matrix} \right] w22w12w21w11


5.卷积操作的实现

卷积操作其实也可以写作矩阵乘法来做.

比如下面式子可以写成矩阵相乘的形式:

[ x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 ] ⊗ [ k 11 k 12 x 21 k 22 ] = [ y 11 y 12 y 21 y 22 ] \left[ \begin{matrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23} \\ x_{31} & x_{32} & x_{33} \end{matrix} \right] \otimes \left[ \begin{matrix} k_{11} & k_{12} \\ x_{21} & k_{22} \end{matrix} \right] = \left[ \begin{matrix} y_{11} & y_{12} \\ y_{21} & y_{22} \end{matrix} \right] x11x21x31x12x22x32x13x23x33[k11x21k12k22]=[y11y21y12y22]


y 11 = x 11 k 11 + x 12 k 12 + x 21 k 21 + x 22 k 22 y_{11}=x_{11}k_{11}+x_{12}k_{12}+x_{21}k_{21}+x_{22}k_{22} y11=x11k11+x12k12+x21k21+x22k22
y 12 = x 12 k 11 + x 13 k 12 + x 22 k 21 + x 23 k 22 y_{12}=x_{12}k_{11}+x_{13}k_{12}+x_{22}k_{21}+x_{23}k_{22} y12=x12k11+x13k12+x22k21+x23k22
y 21 = x 21 k 11 + x 22 k 12 + x 31 k 21 + x 32 k 22 y_{21}=x_{21}k_{11}+x_{22}k_{12}+x_{31}k_{21}+x_{32}k_{22} y21=x21k11+x22k12+x31k21+x32k22
y 22 = x 22 k 11 + x 23 k 12 + x 32 k 21 + x 33 k 22 y_{22}=x_{22}k_{11}+x_{23}k_{12}+x_{32}k_{21}+x_{33}k_{22} y22=x22k11+x23k12+x32k21+x33k22

[ y 11 y 12 y 21 y 22 ] = [ x 11 x 12 x 21 x 22 x 12 x 13 x 22 x 23 x 21 x 22 x 31 x 32 x 22 x 23 x 32 x 33 ] ⋅ [ k 11 k 12 k 21 k 22 ] \left[\begin{matrix} y_{11}\\y_{12}\\y_{21}\\y_{22} \end{matrix}\right] = \left[\begin{matrix} x_{11}&x_{12}&x_{21}&x_{22}\\ x_{12}&x_{13}&x_{22}&x_{23}\\ x_{21}&x_{22}&x_{31}&x_{32}\\ x_{22}&x_{23}&x_{32}&x_{33} \end{matrix}\right] \cdot \left[\begin{matrix} k_{11}\\ k_{12}\\k_{21}\\k_{22} \end{matrix}\right] y11y12y21y22=x11x12x21x22x12x13x22x23x21x22x31x32x22x23x32x33k11k12k21k22

从上面的例子可以看出吗,要把卷积操作变成矩阵相乘需要把 Y Y Y矩阵和 K K K矩阵都展成一列向量,并且要对 X X X矩阵做一个操作,就是把 X X X按照卷积核的大小拆出一个大小为 k 2 ∗ k 2 k^2*k^2 k2k2的矩阵来.

最后矩阵乘法做完以后,将 y y y向量 r e s h a p e reshape reshape成矩阵回来就好了.

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值