NNDL 作业八 卷积 导数 反向传播

1. 证明宽卷积具有交换性, 即公式 r o t 180 ( W ) ⊗ ~ X = r o t 180 ( X ) ⊗ ~ W rot180\left( W \right) \widetilde{\otimes }X=rot180\left( X \right) \widetilde{\otimes }W rot180(W) X=rot180(X) W

          W = ( w 11 w 12 w 21 w 22 )                      X = ( x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 ) \ \ \ \ \ \ \ \ \ W=\left( \begin{matrix} w_{11}& w_{12}\\ w_{21}& w_{22}\\ \end{matrix} \right) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ X=\left( \begin{matrix} x_{11}& x_{12}& x_{13}\\ x_{21}& x_{22}& x_{23}\\ x_{31}& x_{32}& x_{33}\\ \end{matrix} \right)          W=(w11w21w12w22)                    X= x11x21x31x12x22x32x13x23x33 r o t 180 ( W ) = ( w 22 w 21 w 12 w 11 )          r o t 180 ( X ) = ( x 33 x 32 x 31 x 23 x 22 x 21 x 13 x 12 x 11 ) rot180\left( W \right) =\left( \begin{matrix} w_{22}& w_{21}\\ w_{12}& w_{11}\\ \end{matrix} \right) \ \ \ \ \ \ \ \ rot180\left( X \right) =\left( \begin{matrix} x_{33}& x_{32}& x_{31}\\ x_{23}& x_{22}& x_{21}\\ x_{13}& x_{12}& x_{11}\\ \end{matrix} \right) rot180(W)=(w22w12w21w11)        rot180(X)= x33x23x13x32x22x12x31x21x11 W ~ = (   0 0   0   0 0 0 0 0   0   0 0 0 0 0 w 11 w 12 0 0 0 0 w 21 w 22 0 0 0 0   0   0 0 0 0 0   0   0 0 0 )              X ~ = ( 0   0   0   0 0 0 x 11 x 12 x 13 0 0 x 21 x 22 x 23 0 0 x 31 x 32 x 33 0 0   0   0   0 0 ) \widetilde{W}=\left( \ \begin{matrix} 0& 0& \ 0& \ 0& 0& 0\\ 0& 0& \ 0& \ 0& 0& 0\\ 0& 0& w_{11}& w_{12}& 0& 0\\ 0& 0& w_{21}& w_{22}& 0& 0\\ 0& 0& \ 0& \ 0& 0& 0\\ 0& 0& \ 0& \ 0& 0& 0\\ \end{matrix} \right) \ \ \ \ \ \ \ \ \ \ \ \ \widetilde{X}=\left( \begin{matrix} 0& \ 0& \ 0& \ 0& 0\\ 0& x_{11}& x_{12}& x_{13}& 0\\ 0& x_{21}& x_{22}& x_{23}& 0\\ 0& x_{31}& x_{32}& x_{33}& 0\\ 0& \ 0& \ 0& \ 0& 0\\ \end{matrix} \right) W =  000000000000 0 0w11w21 0 0 0 0w12w22 0 0000000000000             X = 00000 0x11x21x31 0 0x12x22x32 0 0x13x23x33 000000 r o t 180 ( W ) ⊗ ~ X = r o t 180 ( W ) ⊗ X ~ = ( w 22 w 21 w 12 w 11 ) ⊗ ( 0   0   0   0 0 0 x 11 x 12 x 13 0 0 x 21 x 22 x 23 0 0 x 31 x 32 x 33 0 0   0   0   0 0 ) = ( w 11 x 11 w 12 x 11 + w 11 x 12 w 12 x 12 + w 11 x 13 w 12 x 13 w 21 x 11 + w 11 x 21 w 22 x 11 + w 21 x 12 + w 12 x 21 + w 11 x 22 w 22 x 12 + w 21 x 13 + w 12 x 22 + w 11 x 23 w 22 x 13 + w 12 x 23 w 21 x 21 + w 11 x 31 w 22 x 21 + w 21 x 22 + w 12 x 31 + w 11 x 32 w 11 x 22 + w 21 x 23 + w 12 x 32 + w 11 x 33 w 22 x 23 + w 12 x 33 w 21 x 31 w 22 x 31 + w 21 x 32 w 22 x 32 + w 21 x 33 w 22 x 33 ) rot180\left( W \right) \widetilde{\otimes }X=rot180\left( W \right) \otimes \widetilde{X}=\left( \begin{matrix} w_{22}& w_{21}\\ w_{12}& w_{11}\\ \end{matrix} \right) \otimes \left( \begin{matrix} 0& \ 0& \ 0& \ 0& 0\\ 0& x_{11}& x_{12}& x_{13}& 0\\ 0& x_{21}& x_{22}& x_{23}& 0\\ 0& x_{31}& x_{32}& x_{33}& 0\\ 0& \ 0& \ 0& \ 0& 0\\ \end{matrix} \right) =\left( \begin{matrix} w_{11}x_{11}& w_{12}x_{11}+w_{11}x_{12}& w_{12}x_{12}+w_{11}x_{13}& w_{12}x_{13}\\ w_{21}x_{11}+w_{11}x_{21}& w_{22}x_{11}+w_{21}x_{12}+w_{12}x_{21}+w_{11}x_{22}& w_{22}x_{12}+w_{21}x_{13}+w_{12}x_{22}+w_{11}x_{23}& w_{22}x_{13}+w_{12}x_{23}\\ w_{21}x_{21}+w_{11}x_{31}& w_{22}x_{21}+w_{21}x_{22}+w_{12}x_{31}+w_{11}x_{32}& w_{11}x_{22}+w_{21}x_{23}+w_{12}x_{32}+w_{11}x_{33}& w_{22}x_{23}+w_{12}x_{33}\\ w_{21}x_{31}& w_{22}x_{31}+w_{21}x_{32}& w_{22}x_{32}+w_{21}x_{33}& w_{22}x_{33}\\ \end{matrix} \right) rot180(W) X=rot180(W)X =(w22w12w21w11) 00000 0x11x21x31 0 0x12x22x32 0 0x13x23x33 000000 = w11x11w21x11+w11x21w21x21+w11x31w21x31w12x11+w11x12w22x11+w21x12+w12x21+w11x22w22x21+w21x22+w12x31+w11x32w22x31+w21x32w12x12+w11x13w22x12+w21x13+w12x22+w11x23w11x22+w21x23+w12x32+w11x33w22x32+w21x33w12x13w22x13+w12x23w22x23+w12x33w22x33 r o t 180 ( X ) ⊗ ~ W = r o t 180 ( X ) ⊗ W ~ = ( x 33 x 32 x 31 x 23 x 22 x 21 x 13 x 12 x 11 ) ⊗ ( 0 0   0   0 0 0 0 0   0   0 0 0 0 0 w 11 w 12 0 0 0 0 w 21 w 22 0 0 0 0   0   0 0 0 0 0   0   0 0 0 ) = ( w 11 x 11 w 11 x 12 + w 12 x 11 w 11 x 13 + w 12 x 12 w 12 x 13 w 11 x 21 + w 21 x 11 w 11 x 22 + w 12 x 21 + w 21 x 12 + w 22 x 11 w 11 x 23 + w 12 x 22 + w 21 x 13 + w 22 x 12 w 12 x 23 + w 22 x 13 w 11 x 31 + w 21 x 21 w 11 x 32 + w 12 x 31 + w 21 x 22 + w 22 x 21 w 11 x 33 + w 12 x 32 + w 21 x 23 + w 22 x 22 w 12 x 33 + w 22 x 23 w 21 x 31 w 21 x 32 + w 22 x 31 w 21 x 33 + w 22 x 32 w 22 x 33 ) rot180\left( X \right) \widetilde{\otimes }W=rot180\left( X \right) \otimes \widetilde{W}=\left( \begin{matrix} x_{33}& x_{32}& x_{31}\\ x_{23}& x_{22}& x_{21}\\ x_{13}& x_{12}& x_{11}\\ \end{matrix} \right) \otimes \left( \begin{matrix} 0& 0& \ 0& \ 0& 0& 0\\ 0& 0& \ 0& \ 0& 0& 0\\ 0& 0& w_{11}& w_{12}& 0& 0\\ 0& 0& w_{21}& w_{22}& 0& 0\\ 0& 0& \ 0& \ 0& 0& 0\\ 0& 0& \ 0& \ 0& 0& 0\\ \end{matrix} \right) =\left( \begin{matrix} w_{11}x_{11}& w_{11}x_{12}+w_{12}x_{11}& w_{11}x_{13}+w_{12}x_{12}& w_{12}x_{13}\\ w_{11}x_{21}+w_{21}x_{11}& w_{11}x_{22}+w_{12}x_{21}+w_{21}x_{12}+w_{22}x_{11}& w_{11}x_{23}+w_{12}x_{22}+w_{21}x_{13}+w_{22}x_{12}& w_{12}x_{23}+w_{22}x_{13}\\ w_{11}x_{31}+w_{21}x_{21}& w_{11}x_{32}+w_{12}x_{31}+w_{21}x_{22}+w_{22}x_{21}& w_{11}x_{33}+w_{12}x_{32}+w_{21}x_{23}+w_{22}x_{22}& w_{12}x_{33}+w_{22}x_{23}\\ w_{21}x_{31}& w_{21}x_{32}+w_{22}x_{31}& w_{21}x_{33}+w_{22}x_{32}& w_{22}x_{33}\\ \end{matrix} \right) rot180(X) W=rot180(X)W = x33x23x13x32x22x12x31x21x11 000000000000 0 0w11w21 0 0 0 0w12w22 0 0000000000000 = w11x11w11x21+w21x11w11x31+w21x21w21x31w11x12+w12x11w11x22+w12x21+w21x12+w22x11w11x32+w12x31+w21x22+w22x21w21x32+w22x31w11x13+w12x12w11x23+w12x22+w21x13+w22x12w11x33+w12x32+w21x23+w22x22w21x33+w22x32w12x13w12x23+w22x13w12x33+w22x23w22x33
具体实例:
在这里插入图片描述

通过对比 r o t 180 ( W ) ⊗ ~ X rot180\left( W \right) \widetilde{\otimes }X rot180(W) X r o t 180 ( X ) ⊗ ~ W rot180\left( X \right) \widetilde{\otimes }W rot180(X) W结果可以看出这两个相等。同理,可推广至更大尺寸的 W W W和更大尺寸的 X X X

在这里插入图片描述
在这里插入图片描述

在这里插入图片描述
可以看出宽卷积具有交换性。

可参考:宽卷积具有交换性

2. 对于一个输入为100 × 100 × 256的特征映射组, 使用3 × 3的卷积核, 输出为100 × 100 × 256的特征映射组的卷积层, 求其时间和空间复杂度. 如果引入一个1 × 1卷积核, 先得到100 × 100 × 64的特征映射, 再进行3 × 3的卷积, 得到100 × 100 × 256的特征映射组, 求其时间和空间复杂度。

  • 时间复杂度,指的是浮点运算次数,理解为计算量。

    计算公式 O H × O W × C o u t × K H × K W × C i n O_H\times O_W\times C_{out}\times K_H\times K_W\times C_{in} OH×OW×Cout×KH×KW×Cin

    其中 O H 、 O W O_H、O_W OHOW是输出特征图的长、宽, K H 、 K W K_H、K_W KHKW是卷积核的长、宽, C i n 、 C o u t C_{in}、C_{out} CinCout是输入、输出通道数。
    时间复杂度决定了模型的训练/预测时间。如果复杂度过高,会导致模型训练和预测耗费大量时间,既无法快速的验证想法和改善模型,也无法做到快速的预测。

  • 空间复杂度,即模型的参数数量和各层输出的特征图总大小

    计算公式 K H × K L × C i n × C o u t + C o u t ( 偏置 ) + O H × O W × C o u t K_H\times K_L\times C_{in}\times C_{out}+C_{out}\left( \text{偏置} \right) +O_H\times O_W\times C_{out} KH×KL×Cin×Cout+Cout(偏置)+OH×OW×Cout

    空间复杂度决定了模型的参数数量。模型的参数越多,训练模型所需的数据量就越大,而现实生活中的数据集通常不会太大,这会导致模型的训练更容易过拟合。

    我看到的文章中,计算空间复杂度时,有人加了输出特征图,有人没加,我这里是加了的。

在这里插入图片描述
时间复杂度: 100 × 100 × 256 × 3 × 3 × 256 = 5.89824 × 1 0 9 100\times 100\times 256\times 3\times 3\times 256=5.89824\times 10^9 100×100×256×3×3×256=5.89824×109
空间复杂度: 3 × 3 × 256 × 256 + 256 + 100 × 100 × 256 = 3.15008 × 1 0 6 3\times 3\times 256\times 256+256+100\times 100\times 256=3.15008\times 10^6 3×3×256×256+256+100×100×256=3.15008×106

在这里插入图片描述

时间复杂度: 100 × 100 × 64 × 1 × 1 × 256 + 100 × 100 × 256 × 3 × 3 × 64 = 1.6384 × 1 0 9 100\times 100\times 64\times 1\times 1\times 256+100\times 100\times 256\times 3\times 3\times 64=1.6384\times 10^9 100×100×64×1×1×256+100×100×256×3×3×64=1.6384×109
空间复杂度: ( 1 × 1 × 256 × 64 + 64 + 100 × 100 × 64 ) + ( 3 × 3 × 64 × 256 + 256 + 100 × 100 × 256 ) = 3.36416 × 1 0 6 \left( 1\times 1\times 256\times 64+64+100\times 100\times 64 \right) +\left( 3\times 3\times 64\times 256+256+100\times 100\times 256 \right) =3.36416\times 10^6 (1×1×256×64+64+100×100×64)+(3×3×64×256+256+100×100×256)=3.36416×106

对比使用 1 ∗ 1 1*1 11卷积核前后的计算量和参数数量,可以看出,使用 1 ∗ 1 1*1 11卷积核可以减少模型的时间复杂度和空间复杂度。

3. 对于一个二维卷积, 输入为3 × 3, 卷积核大小为2 × 2, 试将卷积操作重写为仿射变换的形式. 参见公式(5.45) 。

在这里插入图片描述
根据上面我画的图,可以直观的看出将卷积操作转成仿射变换的过程。
           W = ( w 11 w 12 w 21 w 22 )                  X = ( x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 )            \ \ \ \ \ \ \ \ \ \ W=\left( \begin{matrix} w_{11}& w_{12}\\ w_{21}& w_{22}\\ \end{matrix} \right) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ X=\left( \begin{matrix} x_{11}& x_{12}& x_{13}\\ x_{21}& x_{22}& x_{23}\\ x_{31}& x_{32}& x_{33}\\ \end{matrix} \right) \ \ \ \ \ \ \ \ \ \           W=(w11w21w12w22)                X= x11x21x31x12x22x32x13x23x33            Z = W ⊗ X = [ w 11 w 12   0 w 21 w 22   0   0   0   0   0 w 11 w 12   0 w 21 w 22   0   0   0   0   0   0 w 11 w 12   0 w 21 w 22   0   0   0   0   0 w 11 w 12   0 w 21 w 22 ] [ x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 ] Z=W\otimes X=\left[ \begin{matrix} w_{11}& w_{12}& \ 0& w_{21}& w_{22}& \ 0& \ 0& \ 0& \ 0\\ \ 0& w_{11}& w_{12}& \ 0& w_{21}& w_{22}& \ 0& \ 0& \ 0\\ \ 0& \ 0& \ 0& w_{11}& w_{12}& \ 0& w_{21}& w_{22}& \ 0\\ \ 0& \ 0& \ 0& \ 0& w_{11}& w_{12}& \ 0& w_{21}& w_{22}\\ \end{matrix} \right] \left[ \begin{array}{l} x_{11}\\ x_{12}\\ x_{13}\\ x_{21}\\ x_{22}\\ x_{23}\\ x_{31}\\ x_{32}\\ x_{33}\\ \end{array} \right] Z=WX= w11 0 0 0w12w11 0 0 0w12 0 0w21 0w11 0w22w21w12w11 0w22 0w12 0 0w21 0 0 0w22w21 0 0 0w22 x11x12x13x21x22x23x31x32x33

4. 阅读 “5.3.1 卷积神经网络的反向传播算法”,举例说明推导过程.

卷积网络的整体结构

在这里插入图片描述
首先,前向传播,经过卷积、激活、池化、全连接,求出损失函数 f ( Y ) f(Y) f(Y)
在这里插入图片描述
然后,反向传播,首先是全连接层的反向传播,之前有推导过。全连接层反向传播的推导
然后是池化层的反向传播,池化层(下采样)的反向传播比较简单,其实就是个上采样的过程。

在这里插入图片描述

接着就是卷积层的反向传播

注意:我下面举的这个例子没有加激活函数,如果有激活函数,还要乘以激活函数的导数。

在这里插入图片描述
y 11 = w 11 x 11 + w 12 x 12 + w 21 x 21 + w 22 x 22 + b y_{11}=w_{11}x_{11}+w_{12}x_{12}+w_{21}x_{21}+w_{22}x_{22}+b y11=w11x11+w12x12+w21x21+w22x22+b y 12 = w 11 x 12 + w 12 x 13 + w 21 x 22 + w 22 x 23 + b y_{12}=w_{11}x_{12}+w_{12}x_{13}+w_{21}x_{22}+w_{22}x_{23}+b y12=w11x12+w12x13+w21x22+w22x23+b y 21 = w 11 x 21 + w 12 x 22 + w 21 x 31 + w 22 x 32 + b y_{21}=w_{11}x_{21}+w_{12}x_{22}+w_{21}x_{31}+w_{22}x_{32}+b y21=w11x21+w12x22+w21x31+w22x32+b y 22 = w 11 x 22 + w 12 x 23 + w 21 x 32 + w 22 x 33 + b y_{22}=w_{11}x_{22}+w_{12}x_{23}+w_{21}x_{32}+w_{22}x_{33}+b y22=w11x22+w12x23+w21x32+w22x33+b 矩阵形式 : [ y 11 y 12 y 21 y 22 ] = [ w 11 w 12   0 w 21 w 22   0   0   0 0   0 w 11 w 12   0 w 21 w 22   0   0 0   0   0   0 w 11 w 12 0 w 21 w 22 0   0   0   0   0 w 11 w 12   0 w 21 w 22 ] [ x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 ] \mathbf{矩阵形式}\text{:}\left[ \begin{array}{c} y_{11}\\ y_{12}\\ y_{21}\\ y_{22}\\ \end{array} \right] =\left[ \begin{matrix} w_{11}& w_{12}& \ 0& w_{21}& w_{22}& \ 0& \ 0& \ 0& 0\\ \ 0& w_{11}& w_{12}& \ 0& w_{21}& w_{22}& \ 0& \ 0& 0\\ \ 0& \ 0& \ 0& w_{11}& w_{12}& 0& w_{21}& w_{22}& 0\\ \ 0& \ 0& \ 0& \ 0& w_{11}& w_{12}& \ 0& w_{21}& w_{22}\\ \end{matrix} \right] \left[ \begin{array}{l} x_{11}\\ x_{12}\\ x_{13}\\ x_{21}\\ x_{22}\\ x_{23}\\ x_{31}\\ x_{32}\\ x_{33}\\ \end{array} \right] 矩阵形式 y11y12y21y22 = w11 0 0 0w12w11 0 0 0w12 0 0w21 0w11 0w22w21w12w11 0w220w12 0 0w21 0 0 0w22w21000w22 x11x12x13x21x22x23x31x32x33 求 f ( Y ) 对 W 的偏导: \mathbf{求f}\left( \mathbf{Y} \right) \mathbf{对W的偏导:} f(Y)W的偏导: ∂ f ( Y ) ∂ w 11 = ∂ f ( Y ) ∂ y 11 x 11 + ∂ f ( Y ) ∂ y 12 x 12 + ∂ f ( Y ) ∂ y 21 x 21 + ∂ f ( Y ) ∂ y 22 x 22 \frac{\partial f\left( Y \right)}{\partial w_{11}}=\frac{\partial f\left( Y \right)}{\partial y_{11}}x_{11}+\frac{\partial f\left( Y \right)}{\partial y_{12}}x_{12}+\frac{\partial f\left( Y \right)}{\partial y_{21}}x_{21}+\frac{\partial f\left( Y \right)}{\partial y_{22}}x_{22} w11f(Y)=y11f(Y)x11+y12f(Y)x12+y21f(Y)x21+y22f(Y)x22 ∂ f ( Y ) ∂ w 12 = ∂ f ( Y ) ∂ y 11 x 12 + ∂ f ( Y ) ∂ y 12 x 13 + ∂ f ( Y ) ∂ y 21 x 22 + ∂ f ( Y ) ∂ y 22 x 23 \frac{\partial f\left( Y \right)}{\partial w_{12}}=\frac{\partial f\left( Y \right)}{\partial y_{11}}x_{12}+\frac{\partial f\left( Y \right)}{\partial y_{12}}x_{13}+\frac{\partial f\left( Y \right)}{\partial y_{21}}x_{22}+\frac{\partial f\left( Y \right)}{\partial y_{22}}x_{23} w12f(Y)=y11f(Y)x12+y12f(Y)x13+y21f(Y)x22+y22f(Y)x23 ∂ f ( Y ) ∂ w 21 = ∂ f ( Y ) ∂ y 11 x 21 + ∂ f ( Y ) ∂ y 12 x 22 + ∂ f ( Y ) ∂ y 21 x 31 + ∂ f ( Y ) ∂ y 22 x 32 \frac{\partial f\left( Y \right)}{\partial w_{21}}=\frac{\partial f\left( Y \right)}{\partial y_{11}}x_{21}+\frac{\partial f\left( Y \right)}{\partial y_{12}}x_{22}+\frac{\partial f\left( Y \right)}{\partial y_{21}}x_{31}+\frac{\partial f\left( Y \right)}{\partial y_{22}}x_{32} w21f(Y)=y11f(Y)x21+y12f(Y)x22+y21f(Y)x31+y22f(Y)x32 ∂ f ( Y ) ∂ w 22 = ∂ f ( Y ) ∂ y 11 x 22 + ∂ f ( Y ) ∂ y 12 x 23 + ∂ f ( Y ) ∂ y 21 x 32 + ∂ f ( Y ) ∂ y 22 x 33 \frac{\partial f\left( Y \right)}{\partial w_{22}}=\frac{\partial f\left( Y \right)}{\partial y_{11}}x_{22}+\frac{\partial f\left( Y \right)}{\partial y_{12}}x_{23}+\frac{\partial f\left( Y \right)}{\partial y_{21}}x_{32}+\frac{\partial f\left( Y \right)}{\partial y_{22}}x_{33} w22f(Y)=y11f(Y)x22+y12f(Y)x23+y21f(Y)x32+y22f(Y)x33 矩阵形式: [ ∂ f ( Y ) ∂ w 11 ∂ f ( Y ) ∂ w 12 ∂ f ( Y ) ∂ w 21 ∂ f ( Y ) ∂ w 22 ] = [ x 11 x 12 x 21 x 22 x 12 x 13 x 22 x 23 x 21 x 22 x 31 x 32 x 22 x 23 x 32 x 33 ] [ ∂ f ( Y ) ∂ y 11 ∂ f ( Y ) ∂ y 12 ∂ f ( Y ) ∂ y 21 ∂ f ( Y ) ∂ y 22 ] \mathbf{矩阵形式:}\left[ \begin{array}{c} \frac{\partial f\left( Y \right)}{\partial w_{11}}\\ \frac{\partial f\left( Y \right)}{\partial w_{12}}\\ \frac{\partial f\left( Y \right)}{\partial w_{21}}\\ \frac{\partial f\left( Y \right)}{\partial w_{22}}\\ \end{array} \right] =\left[ \begin{matrix} x_{11}& x_{12}& x_{21}& x_{22}\\ x_{12}& x_{13}& x_{22}& x_{23}\\ x_{21}& x_{22}& x_{31}& x_{32}\\ x_{22}& x_{23}& x_{32}& x_{33}\\ \end{matrix} \right] \left[ \begin{array}{c} \frac{\partial f\left( Y \right)}{\partial y_{11}}\\ \frac{\partial f\left( Y \right)}{\partial y_{12}}\\ \frac{\partial f\left( Y \right)}{\partial y_{21}}\\ \frac{\partial f\left( Y \right)}{\partial y22}\\ \end{array} \right] 矩阵形式: w11f(Y)w12f(Y)w21f(Y)w22f(Y) = x11x12x21x22x12x13x22x23x21x22x31x32x22x23x32x33 y11f(Y)y12f(Y)y21f(Y)y22f(Y) 卷积形式: [ ∂ f ( Y ) ∂ w 11 ∂ f ( Y ) ∂ w 12 ∂ f ( Y ) ∂ w 21 ∂ f ( Y ) ∂ w 22 ] = [ ∂ f ( Y ) ∂ y 11 ∂ f ( Y ) ∂ y 12 ∂ f ( Y ) ∂ y 21 ∂ f ( Y ) ∂ y 22 ] ⊗ [ x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 ] \mathbf{卷积形式:}\left[ \begin{matrix} \frac{\partial f\left( Y \right)}{\partial w_{11}}& \frac{\partial f\left( Y \right)}{\partial w_{12}}\\ \frac{\partial f\left( Y \right)}{\partial w_{21}}& \frac{\partial f\left( Y \right)}{\partial w_{22}}\\ \end{matrix} \right] =\left[ \begin{matrix} \frac{\partial f\left( Y \right)}{\partial y_{11}}& \frac{\partial f\left( Y \right)}{\partial y_{12}}\\ \frac{\partial f\left( Y \right)}{\partial y_{21}}& \frac{\partial f\left( Y \right)}{\partial y_{22}}\\ \end{matrix} \right] \otimes \left[ \begin{matrix} x_{11}& x_{12}& x_{13}\\ x_{21}& x_{22}& x_{23}\\ x_{31}& x_{32}& x_{33}\\ \end{matrix} \right] 卷积形式:[w11f(Y)w21f(Y)w12f(Y)w22f(Y)]=[y11f(Y)y21f(Y)y12f(Y)y22f(Y)] x11x21x31x12x22x32x13x23x33 即: ∂ f ( Y ) ∂ W = ∂ f ( Y ) ∂ Y ⊗ X \text{即:}\frac{\mathbf{\partial f}\left( \mathbf{Y} \right)}{\mathbf{\partial W}}=\frac{\mathbf{\partial f}\left( \mathbf{Y} \right)}{\mathbf{\partial Y}}\otimes \mathbf{X} 即:∂W∂f(Y)=∂Y∂f(Y)X 求 f ( Y ) 对 X 的偏导: \mathbf{求f}\left( \mathbf{Y} \right) \mathbf{对X的偏导:} f(Y)X的偏导: ∂ f ( Y ) ∂ x 11 = ∂ f ( Y ) ∂ y 11 w 11 \frac{\partial f\left( Y \right)}{\partial x_{11}}=\frac{\partial f\left( Y \right)}{\partial y_{11}}w_{11} x11f(Y)=y11f(Y)w11 ∂ f ( Y ) ∂ x 12 = ∂ f ( Y ) ∂ y 11 w 12 + ∂ f ( Y ) ∂ y 12 w 11 \frac{\partial f\left( Y \right)}{\partial x_{12}}=\frac{\partial f\left( Y \right)}{\partial y_{11}}w_{12}+\frac{\partial f\left( Y \right)}{\partial y_{12}}w_{11} x12f(Y)=y11f(Y)w12+y12f(Y)w11 ∂ f ( Y ) ∂ x 13 = ∂ f ( Y ) ∂ y 12 w 12 \frac{\partial f\left( Y \right)}{\partial x_{13}}=\frac{\partial f\left( Y \right)}{\partial y_{12}}w_{12} x13f(Y)=y12f(Y)w12 ∂ f ( Y ) ∂ x 21 = ∂ f ( Y ) ∂ y 11 w 21 + ∂ f ( Y ) ∂ y 21 w 11 \frac{\partial f\left( Y \right)}{\partial x_{21}}=\frac{\partial f\left( Y \right)}{\partial y_{11}}w_{21}+\frac{\partial f\left( Y \right)}{\partial y_{21}}w_{11} x21f(Y)=y11f(Y)w21+y21f(Y)w11 ∂ f ( Y ) ∂ x 22 = ∂ f ( Y ) ∂ y 11 w 22 + ∂ f ( Y ) ∂ y 12 w 21 + ∂ f ( Y ) ∂ y 21 w 12 + ∂ f ( Y ) ∂ y 22 w 11 \frac{\partial f\left( Y \right)}{\partial x_{22}}=\frac{\partial f\left( Y \right)}{\partial y_{11}}w_{22}+\frac{\partial f\left( Y \right)}{\partial y_{12}}w_{21}+\frac{\partial f\left( Y \right)}{\partial y_{21}}w_{12}+\frac{\partial f\left( Y \right)}{\partial y_{22}}w_{11} x22f(Y)=y11f(Y)w22+y12f(Y)w21+y21f(Y)w12+y22f(Y)w11 ∂ f ( Y ) ∂ x 23 = ∂ f ( Y ) ∂ y 12 w 22 + ∂ f ( Y ) ∂ y 22 w 12 \frac{\partial f\left( Y \right)}{\partial x_{23}}=\frac{\partial f\left( Y \right)}{\partial y_{12}}w_{22}+\frac{\partial f\left( Y \right)}{\partial y_{22}}w_{12} x23f(Y)=y12f(Y)w22+y22f(Y)w12 ∂ f ( Y ) ∂ x 31 = ∂ f ( Y ) ∂ y 21 w 21 \frac{\partial f\left( Y \right)}{\partial x_{31}}=\frac{\partial f\left( Y \right)}{\partial y_{21}}w_{21} x31f(Y)=y21f(Y)w21 ∂ f ( Y ) ∂ x 32 = ∂ f ( Y ) ∂ y 21 w 22 + ∂ f ( Y ) ∂ y 22 w 21 \frac{\partial f\left( Y \right)}{\partial x_{32}}=\frac{\partial f\left( Y \right)}{\partial y_{21}}w_{22}+\frac{\partial f\left( Y \right)}{\partial y_{22}}w_{21} x32f(Y)=y21f(Y)w22+y22f(Y)w21 ∂ f ( Y ) ∂ x 33 = ∂ f ( Y ) ∂ y 22 w 22 \frac{\partial f\left( Y \right)}{\partial x_{33}}=\frac{\partial f\left( Y \right)}{\partial y_{22}}w_{22} x33f(Y)=y22f(Y)w22 矩阵形式: [ ∂ f ( Y ) ∂ x 11 ∂ f ( Y ) ∂ x 12 ∂ f ( Y ) ∂ x 13 ∂ f ( Y ) ∂ x 21 ∂ f ( Y ) ∂ x 22 ∂ f ( Y ) ∂ x 23 ∂ f ( Y ) ∂ x 31 ∂ f ( Y ) ∂ x 32 ∂ f ( Y ) ∂ x 33 ] = [ w 11   0   0   0 w 12 w 11   0   0   0 w 12   0   0 w 21   0 w 11   0 w 22 w 21 w 12 w 11   0 w 22   0 w 12   0   0 w 21   0   0   0 w 22 w 21   0   0   0 w 22 ] [ ∂ f ( Y ) ∂ y 11 ∂ f ( Y ) ∂ y 12 ∂ f ( Y ) ∂ y 21 ∂ f ( Y ) ∂ y 22 ] \mathbf{矩阵形式:}\left[ \begin{array}{l} \frac{\partial f\left( Y \right)}{\partial x_{11}}\\ \frac{\partial f\left( Y \right)}{\partial x_{12}}\\ \frac{\partial f\left( Y \right)}{\partial x_{13}}\\ \frac{\partial f\left( Y \right)}{\partial x_{21}}\\ \frac{\partial f\left( Y \right)}{\partial x_{22}}\\ \frac{\partial f\left( Y \right)}{\partial x_{23}}\\ \frac{\partial f\left( Y \right)}{\partial x_{31}}\\ \frac{\partial f\left( Y \right)}{\partial x_{32}}\\ \frac{\partial f\left( Y \right)}{\partial x_{33}}\\ \end{array} \right] =\left[ \begin{matrix} w_{11}& \ 0& \ 0& \ 0\\ w_{12}& w_{11}& \ 0& \ 0\\ \ 0& w_{12}& \ 0& \ 0\\ w_{21}& \ 0& w_{11}& \ 0\\ w_{22}& w_{21}& w_{12}& w_{11}\\ \ 0& w_{22}& \ 0& w_{12}\\ \ 0& \ 0& w_{21}& \ 0\\ \ 0& \ 0& w_{22}& w_{21}\\ \ 0& \ 0& \ 0& w_{22}\\ \end{matrix} \right] \left[ \begin{array}{c} \frac{\partial f\left( Y \right)}{\partial y_{11}}\\ \frac{\partial f\left( Y \right)}{\partial y_{12}}\\ \frac{\partial f\left( Y \right)}{\partial y_{21}}\\ \frac{\partial f\left( Y \right)}{\partial y_{22}}\\ \end{array} \right] 矩阵形式: x11f(Y)x12f(Y)x13f(Y)x21f(Y)x22f(Y)x23f(Y)x31f(Y)x32f(Y)x33f(Y) = w11w12 0w21w22 0 0 0 0 0w11w12 0w21w22 0 0 0 0 0 0w11w12 0w21w22 0 0 0 0 0w11w12 0w21w22 y11f(Y)y12f(Y)y21f(Y)y22f(Y) 卷积形式: [ ∂ f ( Y ) ∂ x 11 ∂ f ( Y ) ∂ x 12 ∂ f ( Y ) ∂ x 13 ∂ f ( Y ) ∂ x 21 ∂ f ( Y ) ∂ x 22 ∂ f ( Y ) ∂ x 23 ∂ f ( Y ) ∂ x 31 ∂ f ( Y ) ∂ x 32 ∂ f ( Y ) ∂ x 33 ] = [ ∂ f ( Y ) ∂ y 22 ∂ f ( Y ) ∂ y 21 ∂ f ( Y ) ∂ y 12 ∂ f ( Y ) ∂ y 11 ] ⊗ [   0   0   0   0   0 w 11 w 12   0   0 w 21 w 22   0   0   0   0   0 ] \mathbf{卷积形式:}\left[ \begin{matrix} \frac{\partial f\left( Y \right)}{\partial x_{11}}& \frac{\partial f\left( Y \right)}{\partial x_{12}}& \frac{\partial f\left( Y \right)}{\partial x_{13}}\\ \frac{\partial f\left( Y \right)}{\partial x_{21}}& \frac{\partial f\left( Y \right)}{\partial x_{22}}& \frac{\partial f\left( Y \right)}{\partial x_{23}}\\ \frac{\partial f\left( Y \right)}{\partial x_{31}}& \frac{\partial f\left( Y \right)}{\partial x_{32}}& \frac{\partial f\left( Y \right)}{\partial x_{33}}\\ \end{matrix} \right] =\left[ \begin{matrix} \frac{\partial f\left( Y \right)}{\partial y_{22}}& \frac{\partial f\left( Y \right)}{\partial y_{21}}\\ \frac{\partial f\left( Y \right)}{\partial y_{12}}& \frac{\partial f\left( Y \right)}{\partial y_{11}}\\ \end{matrix} \right] \otimes \left[ \begin{matrix} \ 0& \ 0& \ 0& \ 0\\ \ 0& w_{11}& w_{12}& \ 0\\ \ 0& w_{21}& w_{22}& \ 0\\ \ 0& \ 0& \ 0& \ 0\\ \end{matrix} \right] 卷积形式: x11f(Y)x21f(Y)x31f(Y)x12f(Y)x22f(Y)x32f(Y)x13f(Y)x23f(Y)x33f(Y) =[y22f(Y)y12f(Y)y21f(Y)y11f(Y)]  0 0 0 0 0w11w21 0 0w12w22 0 0 0 0 0                                      = [ w 22 w 21 w 12 w 11 ] ⊗ [   0   0   0   0   0 ∂ f ( Y ) ∂ y 11 ∂ f ( Y ) ∂ y 12   0   0 ∂ f ( Y ) ∂ y 21 ∂ f ( Y ) ∂ y 22   0   0   0   0   0 ] \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =\left[ \begin{matrix} w_{22}& w_{21}\\ w_{12}& w_{11}\\ \end{matrix} \right] \otimes \left[ \begin{matrix} \ 0& \ 0& \ 0& \ 0\\ \ 0& \frac{\partial f\left( Y \right)}{\partial y_{11}}& \frac{\partial f\left( Y \right)}{\partial y_{12}}& \ 0\\ \ 0& \frac{\partial f\left( Y \right)}{\partial y_{21}}& \frac{\partial f\left( Y \right)}{\partial y_{22}}& \ 0\\ \ 0& \ 0& \ 0& \ 0\\ \end{matrix} \right]                                     =[w22w12w21w11]  0 0 0 0 0y11f(Y)y21f(Y) 0 0y12f(Y)y22f(Y) 0 0 0 0 0 即: ∂ f ( Y ) ∂ X = r o t 180 ( ∂ f ( Y ) ∂ Y ) ⊗ ~ W \text{即:}\frac{\mathbf{\partial f}\left( \mathbf{Y} \right)}{\mathbf{\partial X}}=\mathbf{rot180}\left( \frac{\mathbf{\partial f}\left( \mathbf{Y} \right)}{\mathbf{\partial Y}} \right) \widetilde{\otimes }\mathbf{W} 即:∂X∂f(Y)=rot180(∂Y∂f(Y)) W             = r o t 180 ( W ) ⊗ ~ ∂ f ( Y ) ∂ Y \ \ \ \ \ \ \ \ \ \ \ =\mathbf{rot180}\left( \mathbf{W} \right) \widetilde{\otimes }\frac{\mathbf{\partial f}\left( \mathbf{Y} \right)}{\mathbf{\partial Y}}            =rot180(W) ∂Y∂f(Y) 求 f ( Y ) 对 b 的偏导: \mathbf{求f}\left( \mathbf{Y} \right) \mathbf{对b的偏导:} f(Y)b的偏导: ∂ f ( Y ) ∂ b = ∂ f ( Y ) ∂ y 11 + ∂ f ( Y ) ∂ y 12 + ∂ f ( Y ) ∂ y 21 + ∂ f ( Y ) ∂ y 22 \frac{\partial f\left( Y \right)}{\partial b}=\frac{\partial f\left( Y \right)}{\partial y_{11}}+\frac{\partial f\left( Y \right)}{\partial y_{12}}+\frac{\partial f\left( Y \right)}{\partial y_{21}}+\frac{\partial f\left( Y \right)}{\partial y_{22}} bf(Y)=y11f(Y)+y12f(Y)+y21f(Y)+y22f(Y)

5.忽略激活函数, 分析卷积网络中卷积层的前向计算和反向传播(公式(5.39)) 是一种转置关系。

根据我上面推导时的公式:
前向计算:
y 11 = w 11 x 11 + w 12 x 12 + w 21 x 21 + w 22 x 22 + b y_{11}=w_{11}x_{11}+w_{12}x_{12}+w_{21}x_{21}+w_{22}x_{22}+b y11=w11x11+w12x12+w21x21+w22x22+b y 12 = w 11 x 12 + w 12 x 13 + w 21 x 22 + w 22 x 23 + b y_{12}=w_{11}x_{12}+w_{12}x_{13}+w_{21}x_{22}+w_{22}x_{23}+b y12=w11x12+w12x13+w21x22+w22x23+b y 21 = w 11 x 21 + w 12 x 22 + w 21 x 31 + w 22 x 32 + b y_{21}=w_{11}x_{21}+w_{12}x_{22}+w_{21}x_{31}+w_{22}x_{32}+b y21=w11x21+w12x22+w21x31+w22x32+b y 22 = w 11 x 22 + w 12 x 23 + w 21 x 32 + w 22 x 33 + b y_{22}=w_{11}x_{22}+w_{12}x_{23}+w_{21}x_{32}+w_{22}x_{33}+b y22=w11x22+w12x23+w21x32+w22x33+b 矩阵形式 : [ y 11 y 12 y 21 y 22 ] = [ w 11 w 12   0 w 21 w 22   0   0   0 0   0 w 11 w 12   0 w 21 w 22   0   0 0   0   0   0 w 11 w 12 0 w 21 w 22 0   0   0   0   0 w 11 w 12   0 w 21 w 22 ] [ x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 ] \mathbf{矩阵形式}\text{:}\left[ \begin{array}{c} y_{11}\\ y_{12}\\ y_{21}\\ y_{22}\\ \end{array} \right] =\left[ \begin{matrix} w_{11}& w_{12}& \ 0& w_{21}& w_{22}& \ 0& \ 0& \ 0& 0\\ \ 0& w_{11}& w_{12}& \ 0& w_{21}& w_{22}& \ 0& \ 0& 0\\ \ 0& \ 0& \ 0& w_{11}& w_{12}& 0& w_{21}& w_{22}& 0\\ \ 0& \ 0& \ 0& \ 0& w_{11}& w_{12}& \ 0& w_{21}& w_{22}\\ \end{matrix} \right] \left[ \begin{array}{l} x_{11}\\ x_{12}\\ x_{13}\\ x_{21}\\ x_{22}\\ x_{23}\\ x_{31}\\ x_{32}\\ x_{33}\\ \end{array} \right] 矩阵形式 y11y12y21y22 = w11 0 0 0w12w11 0 0 0w12 0 0w21 0w11 0w22w21w12w11 0w220w12 0 0w21 0 0 0w22w21000w22 x11x12x13x21x22x23x31x32x33
反向传播:
矩阵形式: [ ∂ f ( Y ) ∂ x 11 ∂ f ( Y ) ∂ x 12 ∂ f ( Y ) ∂ x 13 ∂ f ( Y ) ∂ x 21 ∂ f ( Y ) ∂ x 22 ∂ f ( Y ) ∂ x 23 ∂ f ( Y ) ∂ x 31 ∂ f ( Y ) ∂ x 32 ∂ f ( Y ) ∂ x 33 ] = [ w 11   0   0   0 w 12 w 11   0   0   0 w 12   0   0 w 21   0 w 11   0 w 22 w 21 w 12 w 11   0 w 22   0 w 12   0   0 w 21   0   0   0 w 22 w 21   0   0   0 w 22 ] [ ∂ f ( Y ) ∂ y 11 ∂ f ( Y ) ∂ y 12 ∂ f ( Y ) ∂ y 21 ∂ f ( Y ) ∂ y 22 ] \mathbf{矩阵形式:}\left[ \begin{array}{l} \frac{\partial f\left( Y \right)}{\partial x_{11}}\\ \frac{\partial f\left( Y \right)}{\partial x_{12}}\\ \frac{\partial f\left( Y \right)}{\partial x_{13}}\\ \frac{\partial f\left( Y \right)}{\partial x_{21}}\\ \frac{\partial f\left( Y \right)}{\partial x_{22}}\\ \frac{\partial f\left( Y \right)}{\partial x_{23}}\\ \frac{\partial f\left( Y \right)}{\partial x_{31}}\\ \frac{\partial f\left( Y \right)}{\partial x_{32}}\\ \frac{\partial f\left( Y \right)}{\partial x_{33}}\\ \end{array} \right] =\left[ \begin{matrix} w_{11}& \ 0& \ 0& \ 0\\ w_{12}& w_{11}& \ 0& \ 0\\ \ 0& w_{12}& \ 0& \ 0\\ w_{21}& \ 0& w_{11}& \ 0\\ w_{22}& w_{21}& w_{12}& w_{11}\\ \ 0& w_{22}& \ 0& w_{12}\\ \ 0& \ 0& w_{21}& \ 0\\ \ 0& \ 0& w_{22}& w_{21}\\ \ 0& \ 0& \ 0& w_{22}\\ \end{matrix} \right] \left[ \begin{array}{c} \frac{\partial f\left( Y \right)}{\partial y_{11}}\\ \frac{\partial f\left( Y \right)}{\partial y_{12}}\\ \frac{\partial f\left( Y \right)}{\partial y_{21}}\\ \frac{\partial f\left( Y \right)}{\partial y_{22}}\\ \end{array} \right] 矩阵形式: x11f(Y)x12f(Y)x13f(Y)x21f(Y)x22f(Y)x23f(Y)x31f(Y)x32f(Y)x33f(Y) = w11w12 0w21w22 0 0 0 0 0w11w12 0w21w22 0 0 0 0 0 0w11w12 0w21w22 0 0 0 0 0w11w12 0w21w22 y11f(Y)y12f(Y)y21f(Y)y22f(Y)

可以看出,卷积网络中卷积层的前向计算和反向传播是一种转置关系。
前向计算时,第 l + 1 l+1 l+1层的净输入为 z l + 1 = W l + 1 z l z^{l+1}=W^{l+1}z^l zl+1=Wl+1zl
反向传播时,第 l l l层的误差为 σ l = ( W l + 1 ) T σ l + 1 \sigma ^l=\left( W^{l+1} \right) ^T\sigma ^{l+1} σl=(Wl+1)Tσl+1

6.在空洞卷积中, 当卷积核大小为𝐾, 膨胀率为𝐷时, 如何设置零填充𝑃的值以使得卷积为等宽卷积 。

空洞卷积(膨胀卷积):注意是在卷积核上插入空洞,从而增大感受野,同时不增加参数数量。
在这里插入图片描述

首先看下面的公式,如何根据输入尺寸、卷积核大小、填充、步长,求输出的尺寸。

在这里插入图片描述
在这里插入图片描述

总结:
这是实验收获最大的就是通过手推卷积层的公式,对卷积层反向传播的过程有了更深的了解。同时知道了宽卷积具有交换性,知道了怎么计算卷积层的时间复杂度和空间复杂度,知道了卷积层的前向传播和反向传播是“转置”关系(注意:不是逆运算,只是形式上的转置关系),还学会了空洞卷积。

提醒:画图、编辑文章、公式编辑完之后一定一定要及时保存,我就因为没及时保存卷积反向传播的公式,结果又推了一遍!!!但是印象也更加深刻了。

参考文章:
深度解析转置卷积,理解转置卷积的原理
宽卷积具有交换性
卷积神经网络(CNN)反向传播算法 -刘

  • 13
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
### 回答1: nndl-book是指《自然语言处理综述》一书,它是由计算机科学领域的权威学者Christopher Manning和Hinrich Schütze共同编写的一本综述自然语言处理技术的教材。这本书首次出版于1999年,现已有第二版和第三版。nndl-book的内容广泛而深入,涵盖了自然语言处理领域的基础知识和最新进展,包括文本处理、语法分析、语义理解、信息检索、机器翻译等等方面。此外,书中还涉及了许多实用的技术和算法,比如条件随机场、最大熵模型、词向量和深度学习等。nndl-book的读者群体包括学术界和工业界的研究者、开发者和学生,也适合对自然语言处理领域感兴趣的读者学习。总之,nndl-book是自然语言处理领域的一本重要的参考书籍,它为我们深入了解自然语言处理的技术和应用提供了宝贵的指导。 ### 回答2: NNDL-Book是一个著名的Python深度学习库,它是一个开源项目,由加拿大多伦多大学教授Geoffrey Hinton和他的学生Alex Krizhevsky等人创建。NNDL-Book在计算机视觉、自然语言处理和语音识别等领域得到广泛应用,它提供了丰富的神经网络模型和算法,包括卷积神经网络(CNN)、循环神经网络(RNN)和长短时记忆网络(LSTM)等。此外,NNDL-Book还提供了多种数据处理工具和训练技巧,以帮助开发者更高效地构建和训练深度学习模型。总的来说,NNDL-Book是深度学习领域的重要工具之一,对于帮助人们在各种应用场景中实现AI自动化,提高效率和精度都有很大的帮助。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值