卷积 导数 反向传播
- 1. 证明宽卷积具有交换性, 即公式 r o t 180 ( W ) ⊗ ~ X = r o t 180 ( X ) ⊗ ~ W rot180\left( W \right) \widetilde{\otimes }X=rot180\left( X \right) \widetilde{\otimes }W rot180(W)⊗ X=rot180(X)⊗ W。
- 2. 对于一个输入为100 × 100 × 256的特征映射组, 使用3 × 3的卷积核, 输出为100 × 100 × 256的特征映射组的卷积层, 求其时间和空间复杂度. 如果引入一个1 × 1卷积核, 先得到100 × 100 × 64的特征映射, 再进行3 × 3的卷积, 得到100 × 100 × 256的特征映射组, 求其时间和空间复杂度。
- 3. 对于一个二维卷积, 输入为3 × 3, 卷积核大小为2 × 2, 试将卷积操作重写为仿射变换的形式. 参见公式(5.45) 。
- 4. 阅读 “5.3.1 卷积神经网络的反向传播算法”,举例说明推导过程.
- 5.忽略激活函数, 分析卷积网络中卷积层的前向计算和反向传播(公式(5.39)) 是一种转置关系。
- 6.在空洞卷积中, 当卷积核大小为𝐾, 膨胀率为𝐷时, 如何设置零填充𝑃的值以使得卷积为等宽卷积 。
1. 证明宽卷积具有交换性, 即公式 r o t 180 ( W ) ⊗ ~ X = r o t 180 ( X ) ⊗ ~ W rot180\left( W \right) \widetilde{\otimes }X=rot180\left( X \right) \widetilde{\otimes }W rot180(W)⊗ X=rot180(X)⊗ W。
W
=
(
w
11
w
12
w
21
w
22
)
X
=
(
x
11
x
12
x
13
x
21
x
22
x
23
x
31
x
32
x
33
)
\ \ \ \ \ \ \ \ \ W=\left( \begin{matrix} w_{11}& w_{12}\\ w_{21}& w_{22}\\ \end{matrix} \right) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ X=\left( \begin{matrix} x_{11}& x_{12}& x_{13}\\ x_{21}& x_{22}& x_{23}\\ x_{31}& x_{32}& x_{33}\\ \end{matrix} \right)
W=(w11w21w12w22) X=
x11x21x31x12x22x32x13x23x33
r
o
t
180
(
W
)
=
(
w
22
w
21
w
12
w
11
)
r
o
t
180
(
X
)
=
(
x
33
x
32
x
31
x
23
x
22
x
21
x
13
x
12
x
11
)
rot180\left( W \right) =\left( \begin{matrix} w_{22}& w_{21}\\ w_{12}& w_{11}\\ \end{matrix} \right) \ \ \ \ \ \ \ \ rot180\left( X \right) =\left( \begin{matrix} x_{33}& x_{32}& x_{31}\\ x_{23}& x_{22}& x_{21}\\ x_{13}& x_{12}& x_{11}\\ \end{matrix} \right)
rot180(W)=(w22w12w21w11) rot180(X)=
x33x23x13x32x22x12x31x21x11
W
~
=
(
0
0
0
0
0
0
0
0
0
0
0
0
0
0
w
11
w
12
0
0
0
0
w
21
w
22
0
0
0
0
0
0
0
0
0
0
0
0
0
0
)
X
~
=
(
0
0
0
0
0
0
x
11
x
12
x
13
0
0
x
21
x
22
x
23
0
0
x
31
x
32
x
33
0
0
0
0
0
0
)
\widetilde{W}=\left( \ \begin{matrix} 0& 0& \ 0& \ 0& 0& 0\\ 0& 0& \ 0& \ 0& 0& 0\\ 0& 0& w_{11}& w_{12}& 0& 0\\ 0& 0& w_{21}& w_{22}& 0& 0\\ 0& 0& \ 0& \ 0& 0& 0\\ 0& 0& \ 0& \ 0& 0& 0\\ \end{matrix} \right) \ \ \ \ \ \ \ \ \ \ \ \ \widetilde{X}=\left( \begin{matrix} 0& \ 0& \ 0& \ 0& 0\\ 0& x_{11}& x_{12}& x_{13}& 0\\ 0& x_{21}& x_{22}& x_{23}& 0\\ 0& x_{31}& x_{32}& x_{33}& 0\\ 0& \ 0& \ 0& \ 0& 0\\ \end{matrix} \right)
W
=
000000000000 0 0w11w21 0 0 0 0w12w22 0 0000000000000
X
=
00000 0x11x21x31 0 0x12x22x32 0 0x13x23x33 000000
r
o
t
180
(
W
)
⊗
~
X
=
r
o
t
180
(
W
)
⊗
X
~
=
(
w
22
w
21
w
12
w
11
)
⊗
(
0
0
0
0
0
0
x
11
x
12
x
13
0
0
x
21
x
22
x
23
0
0
x
31
x
32
x
33
0
0
0
0
0
0
)
=
(
w
11
x
11
w
12
x
11
+
w
11
x
12
w
12
x
12
+
w
11
x
13
w
12
x
13
w
21
x
11
+
w
11
x
21
w
22
x
11
+
w
21
x
12
+
w
12
x
21
+
w
11
x
22
w
22
x
12
+
w
21
x
13
+
w
12
x
22
+
w
11
x
23
w
22
x
13
+
w
12
x
23
w
21
x
21
+
w
11
x
31
w
22
x
21
+
w
21
x
22
+
w
12
x
31
+
w
11
x
32
w
11
x
22
+
w
21
x
23
+
w
12
x
32
+
w
11
x
33
w
22
x
23
+
w
12
x
33
w
21
x
31
w
22
x
31
+
w
21
x
32
w
22
x
32
+
w
21
x
33
w
22
x
33
)
rot180\left( W \right) \widetilde{\otimes }X=rot180\left( W \right) \otimes \widetilde{X}=\left( \begin{matrix} w_{22}& w_{21}\\ w_{12}& w_{11}\\ \end{matrix} \right) \otimes \left( \begin{matrix} 0& \ 0& \ 0& \ 0& 0\\ 0& x_{11}& x_{12}& x_{13}& 0\\ 0& x_{21}& x_{22}& x_{23}& 0\\ 0& x_{31}& x_{32}& x_{33}& 0\\ 0& \ 0& \ 0& \ 0& 0\\ \end{matrix} \right) =\left( \begin{matrix} w_{11}x_{11}& w_{12}x_{11}+w_{11}x_{12}& w_{12}x_{12}+w_{11}x_{13}& w_{12}x_{13}\\ w_{21}x_{11}+w_{11}x_{21}& w_{22}x_{11}+w_{21}x_{12}+w_{12}x_{21}+w_{11}x_{22}& w_{22}x_{12}+w_{21}x_{13}+w_{12}x_{22}+w_{11}x_{23}& w_{22}x_{13}+w_{12}x_{23}\\ w_{21}x_{21}+w_{11}x_{31}& w_{22}x_{21}+w_{21}x_{22}+w_{12}x_{31}+w_{11}x_{32}& w_{11}x_{22}+w_{21}x_{23}+w_{12}x_{32}+w_{11}x_{33}& w_{22}x_{23}+w_{12}x_{33}\\ w_{21}x_{31}& w_{22}x_{31}+w_{21}x_{32}& w_{22}x_{32}+w_{21}x_{33}& w_{22}x_{33}\\ \end{matrix} \right)
rot180(W)⊗
X=rot180(W)⊗X
=(w22w12w21w11)⊗
00000 0x11x21x31 0 0x12x22x32 0 0x13x23x33 000000
=
w11x11w21x11+w11x21w21x21+w11x31w21x31w12x11+w11x12w22x11+w21x12+w12x21+w11x22w22x21+w21x22+w12x31+w11x32w22x31+w21x32w12x12+w11x13w22x12+w21x13+w12x22+w11x23w11x22+w21x23+w12x32+w11x33w22x32+w21x33w12x13w22x13+w12x23w22x23+w12x33w22x33
r
o
t
180
(
X
)
⊗
~
W
=
r
o
t
180
(
X
)
⊗
W
~
=
(
x
33
x
32
x
31
x
23
x
22
x
21
x
13
x
12
x
11
)
⊗
(
0
0
0
0
0
0
0
0
0
0
0
0
0
0
w
11
w
12
0
0
0
0
w
21
w
22
0
0
0
0
0
0
0
0
0
0
0
0
0
0
)
=
(
w
11
x
11
w
11
x
12
+
w
12
x
11
w
11
x
13
+
w
12
x
12
w
12
x
13
w
11
x
21
+
w
21
x
11
w
11
x
22
+
w
12
x
21
+
w
21
x
12
+
w
22
x
11
w
11
x
23
+
w
12
x
22
+
w
21
x
13
+
w
22
x
12
w
12
x
23
+
w
22
x
13
w
11
x
31
+
w
21
x
21
w
11
x
32
+
w
12
x
31
+
w
21
x
22
+
w
22
x
21
w
11
x
33
+
w
12
x
32
+
w
21
x
23
+
w
22
x
22
w
12
x
33
+
w
22
x
23
w
21
x
31
w
21
x
32
+
w
22
x
31
w
21
x
33
+
w
22
x
32
w
22
x
33
)
rot180\left( X \right) \widetilde{\otimes }W=rot180\left( X \right) \otimes \widetilde{W}=\left( \begin{matrix} x_{33}& x_{32}& x_{31}\\ x_{23}& x_{22}& x_{21}\\ x_{13}& x_{12}& x_{11}\\ \end{matrix} \right) \otimes \left( \begin{matrix} 0& 0& \ 0& \ 0& 0& 0\\ 0& 0& \ 0& \ 0& 0& 0\\ 0& 0& w_{11}& w_{12}& 0& 0\\ 0& 0& w_{21}& w_{22}& 0& 0\\ 0& 0& \ 0& \ 0& 0& 0\\ 0& 0& \ 0& \ 0& 0& 0\\ \end{matrix} \right) =\left( \begin{matrix} w_{11}x_{11}& w_{11}x_{12}+w_{12}x_{11}& w_{11}x_{13}+w_{12}x_{12}& w_{12}x_{13}\\ w_{11}x_{21}+w_{21}x_{11}& w_{11}x_{22}+w_{12}x_{21}+w_{21}x_{12}+w_{22}x_{11}& w_{11}x_{23}+w_{12}x_{22}+w_{21}x_{13}+w_{22}x_{12}& w_{12}x_{23}+w_{22}x_{13}\\ w_{11}x_{31}+w_{21}x_{21}& w_{11}x_{32}+w_{12}x_{31}+w_{21}x_{22}+w_{22}x_{21}& w_{11}x_{33}+w_{12}x_{32}+w_{21}x_{23}+w_{22}x_{22}& w_{12}x_{33}+w_{22}x_{23}\\ w_{21}x_{31}& w_{21}x_{32}+w_{22}x_{31}& w_{21}x_{33}+w_{22}x_{32}& w_{22}x_{33}\\ \end{matrix} \right)
rot180(X)⊗
W=rot180(X)⊗W
=
x33x23x13x32x22x12x31x21x11
⊗
000000000000 0 0w11w21 0 0 0 0w12w22 0 0000000000000
=
w11x11w11x21+w21x11w11x31+w21x21w21x31w11x12+w12x11w11x22+w12x21+w21x12+w22x11w11x32+w12x31+w21x22+w22x21w21x32+w22x31w11x13+w12x12w11x23+w12x22+w21x13+w22x12w11x33+w12x32+w21x23+w22x22w21x33+w22x32w12x13w12x23+w22x13w12x33+w22x23w22x33
具体实例:
通过对比 r o t 180 ( W ) ⊗ ~ X rot180\left( W \right) \widetilde{\otimes }X rot180(W)⊗ X与 r o t 180 ( X ) ⊗ ~ W rot180\left( X \right) \widetilde{\otimes }W rot180(X)⊗ W结果可以看出这两个相等。同理,可推广至更大尺寸的 W W W和更大尺寸的 X X X。
可以看出宽卷积具有交换性。
可参考:宽卷积具有交换性
2. 对于一个输入为100 × 100 × 256的特征映射组, 使用3 × 3的卷积核, 输出为100 × 100 × 256的特征映射组的卷积层, 求其时间和空间复杂度. 如果引入一个1 × 1卷积核, 先得到100 × 100 × 64的特征映射, 再进行3 × 3的卷积, 得到100 × 100 × 256的特征映射组, 求其时间和空间复杂度。
-
时间复杂度,指的是浮点运算次数,理解为计算量。
计算公式: O H × O W × C o u t × K H × K W × C i n O_H\times O_W\times C_{out}\times K_H\times K_W\times C_{in} OH×OW×Cout×KH×KW×Cin
其中 O H 、 O W O_H、O_W OH、OW是输出特征图的长、宽, K H 、 K W K_H、K_W KH、KW是卷积核的长、宽, C i n 、 C o u t C_{in}、C_{out} Cin、Cout是输入、输出通道数。
时间复杂度决定了模型的训练/预测时间。如果复杂度过高,会导致模型训练和预测耗费大量时间,既无法快速的验证想法和改善模型,也无法做到快速的预测。 -
空间复杂度,即模型的参数数量和各层输出的特征图总大小。
计算公式: K H × K L × C i n × C o u t + C o u t ( 偏置 ) + O H × O W × C o u t K_H\times K_L\times C_{in}\times C_{out}+C_{out}\left( \text{偏置} \right) +O_H\times O_W\times C_{out} KH×KL×Cin×Cout+Cout(偏置)+OH×OW×Cout
空间复杂度决定了模型的参数数量。模型的参数越多,训练模型所需的数据量就越大,而现实生活中的数据集通常不会太大,这会导致模型的训练更容易过拟合。
我看到的文章中,计算空间复杂度时,有人加了输出特征图,有人没加,我这里是加了的。
时间复杂度:
100
×
100
×
256
×
3
×
3
×
256
=
5.89824
×
1
0
9
100\times 100\times 256\times 3\times 3\times 256=5.89824\times 10^9
100×100×256×3×3×256=5.89824×109
空间复杂度:
3
×
3
×
256
×
256
+
256
+
100
×
100
×
256
=
3.15008
×
1
0
6
3\times 3\times 256\times 256+256+100\times 100\times 256=3.15008\times 10^6
3×3×256×256+256+100×100×256=3.15008×106
时间复杂度:
100
×
100
×
64
×
1
×
1
×
256
+
100
×
100
×
256
×
3
×
3
×
64
=
1.6384
×
1
0
9
100\times 100\times 64\times 1\times 1\times 256+100\times 100\times 256\times 3\times 3\times 64=1.6384\times 10^9
100×100×64×1×1×256+100×100×256×3×3×64=1.6384×109
空间复杂度:
(
1
×
1
×
256
×
64
+
64
+
100
×
100
×
64
)
+
(
3
×
3
×
64
×
256
+
256
+
100
×
100
×
256
)
=
3.36416
×
1
0
6
\left( 1\times 1\times 256\times 64+64+100\times 100\times 64 \right) +\left( 3\times 3\times 64\times 256+256+100\times 100\times 256 \right) =3.36416\times 10^6
(1×1×256×64+64+100×100×64)+(3×3×64×256+256+100×100×256)=3.36416×106
对比使用 1 ∗ 1 1*1 1∗1卷积核前后的计算量和参数数量,可以看出,使用 1 ∗ 1 1*1 1∗1卷积核可以减少模型的时间复杂度和空间复杂度。
3. 对于一个二维卷积, 输入为3 × 3, 卷积核大小为2 × 2, 试将卷积操作重写为仿射变换的形式. 参见公式(5.45) 。
根据上面我画的图,可以直观的看出将卷积操作转成仿射变换的过程。
W
=
(
w
11
w
12
w
21
w
22
)
X
=
(
x
11
x
12
x
13
x
21
x
22
x
23
x
31
x
32
x
33
)
\ \ \ \ \ \ \ \ \ \ W=\left( \begin{matrix} w_{11}& w_{12}\\ w_{21}& w_{22}\\ \end{matrix} \right) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ X=\left( \begin{matrix} x_{11}& x_{12}& x_{13}\\ x_{21}& x_{22}& x_{23}\\ x_{31}& x_{32}& x_{33}\\ \end{matrix} \right) \ \ \ \ \ \ \ \ \ \
W=(w11w21w12w22) X=
x11x21x31x12x22x32x13x23x33
Z
=
W
⊗
X
=
[
w
11
w
12
0
w
21
w
22
0
0
0
0
0
w
11
w
12
0
w
21
w
22
0
0
0
0
0
0
w
11
w
12
0
w
21
w
22
0
0
0
0
0
w
11
w
12
0
w
21
w
22
]
[
x
11
x
12
x
13
x
21
x
22
x
23
x
31
x
32
x
33
]
Z=W\otimes X=\left[ \begin{matrix} w_{11}& w_{12}& \ 0& w_{21}& w_{22}& \ 0& \ 0& \ 0& \ 0\\ \ 0& w_{11}& w_{12}& \ 0& w_{21}& w_{22}& \ 0& \ 0& \ 0\\ \ 0& \ 0& \ 0& w_{11}& w_{12}& \ 0& w_{21}& w_{22}& \ 0\\ \ 0& \ 0& \ 0& \ 0& w_{11}& w_{12}& \ 0& w_{21}& w_{22}\\ \end{matrix} \right] \left[ \begin{array}{l} x_{11}\\ x_{12}\\ x_{13}\\ x_{21}\\ x_{22}\\ x_{23}\\ x_{31}\\ x_{32}\\ x_{33}\\ \end{array} \right]
Z=W⊗X=
w11 0 0 0w12w11 0 0 0w12 0 0w21 0w11 0w22w21w12w11 0w22 0w12 0 0w21 0 0 0w22w21 0 0 0w22
x11x12x13x21x22x23x31x32x33
4. 阅读 “5.3.1 卷积神经网络的反向传播算法”,举例说明推导过程.
卷积网络的整体结构
首先,前向传播,经过卷积、激活、池化、全连接,求出损失函数
f
(
Y
)
f(Y)
f(Y),
然后,反向传播,首先是全连接层的反向传播,之前有推导过。全连接层反向传播的推导
然后是池化层的反向传播,池化层(下采样)的反向传播比较简单,其实就是个上采样的过程。
接着就是卷积层的反向传播。
注意
:我下面举的这个例子没有加激活函数,如果有激活函数,还要乘以激活函数的导数。
y
11
=
w
11
x
11
+
w
12
x
12
+
w
21
x
21
+
w
22
x
22
+
b
y_{11}=w_{11}x_{11}+w_{12}x_{12}+w_{21}x_{21}+w_{22}x_{22}+b
y11=w11x11+w12x12+w21x21+w22x22+b
y
12
=
w
11
x
12
+
w
12
x
13
+
w
21
x
22
+
w
22
x
23
+
b
y_{12}=w_{11}x_{12}+w_{12}x_{13}+w_{21}x_{22}+w_{22}x_{23}+b
y12=w11x12+w12x13+w21x22+w22x23+b
y
21
=
w
11
x
21
+
w
12
x
22
+
w
21
x
31
+
w
22
x
32
+
b
y_{21}=w_{11}x_{21}+w_{12}x_{22}+w_{21}x_{31}+w_{22}x_{32}+b
y21=w11x21+w12x22+w21x31+w22x32+b
y
22
=
w
11
x
22
+
w
12
x
23
+
w
21
x
32
+
w
22
x
33
+
b
y_{22}=w_{11}x_{22}+w_{12}x_{23}+w_{21}x_{32}+w_{22}x_{33}+b
y22=w11x22+w12x23+w21x32+w22x33+b
矩阵形式
:
[
y
11
y
12
y
21
y
22
]
=
[
w
11
w
12
0
w
21
w
22
0
0
0
0
0
w
11
w
12
0
w
21
w
22
0
0
0
0
0
0
w
11
w
12
0
w
21
w
22
0
0
0
0
0
w
11
w
12
0
w
21
w
22
]
[
x
11
x
12
x
13
x
21
x
22
x
23
x
31
x
32
x
33
]
\mathbf{矩阵形式}\text{:}\left[ \begin{array}{c} y_{11}\\ y_{12}\\ y_{21}\\ y_{22}\\ \end{array} \right] =\left[ \begin{matrix} w_{11}& w_{12}& \ 0& w_{21}& w_{22}& \ 0& \ 0& \ 0& 0\\ \ 0& w_{11}& w_{12}& \ 0& w_{21}& w_{22}& \ 0& \ 0& 0\\ \ 0& \ 0& \ 0& w_{11}& w_{12}& 0& w_{21}& w_{22}& 0\\ \ 0& \ 0& \ 0& \ 0& w_{11}& w_{12}& \ 0& w_{21}& w_{22}\\ \end{matrix} \right] \left[ \begin{array}{l} x_{11}\\ x_{12}\\ x_{13}\\ x_{21}\\ x_{22}\\ x_{23}\\ x_{31}\\ x_{32}\\ x_{33}\\ \end{array} \right]
矩阵形式:
y11y12y21y22
=
w11 0 0 0w12w11 0 0 0w12 0 0w21 0w11 0w22w21w12w11 0w220w12 0 0w21 0 0 0w22w21000w22
x11x12x13x21x22x23x31x32x33
求
f
(
Y
)
对
W
的偏导:
\mathbf{求f}\left( \mathbf{Y} \right) \mathbf{对W的偏导:}
求f(Y)对W的偏导:
∂
f
(
Y
)
∂
w
11
=
∂
f
(
Y
)
∂
y
11
x
11
+
∂
f
(
Y
)
∂
y
12
x
12
+
∂
f
(
Y
)
∂
y
21
x
21
+
∂
f
(
Y
)
∂
y
22
x
22
\frac{\partial f\left( Y \right)}{\partial w_{11}}=\frac{\partial f\left( Y \right)}{\partial y_{11}}x_{11}+\frac{\partial f\left( Y \right)}{\partial y_{12}}x_{12}+\frac{\partial f\left( Y \right)}{\partial y_{21}}x_{21}+\frac{\partial f\left( Y \right)}{\partial y_{22}}x_{22}
∂w11∂f(Y)=∂y11∂f(Y)x11+∂y12∂f(Y)x12+∂y21∂f(Y)x21+∂y22∂f(Y)x22
∂
f
(
Y
)
∂
w
12
=
∂
f
(
Y
)
∂
y
11
x
12
+
∂
f
(
Y
)
∂
y
12
x
13
+
∂
f
(
Y
)
∂
y
21
x
22
+
∂
f
(
Y
)
∂
y
22
x
23
\frac{\partial f\left( Y \right)}{\partial w_{12}}=\frac{\partial f\left( Y \right)}{\partial y_{11}}x_{12}+\frac{\partial f\left( Y \right)}{\partial y_{12}}x_{13}+\frac{\partial f\left( Y \right)}{\partial y_{21}}x_{22}+\frac{\partial f\left( Y \right)}{\partial y_{22}}x_{23}
∂w12∂f(Y)=∂y11∂f(Y)x12+∂y12∂f(Y)x13+∂y21∂f(Y)x22+∂y22∂f(Y)x23
∂
f
(
Y
)
∂
w
21
=
∂
f
(
Y
)
∂
y
11
x
21
+
∂
f
(
Y
)
∂
y
12
x
22
+
∂
f
(
Y
)
∂
y
21
x
31
+
∂
f
(
Y
)
∂
y
22
x
32
\frac{\partial f\left( Y \right)}{\partial w_{21}}=\frac{\partial f\left( Y \right)}{\partial y_{11}}x_{21}+\frac{\partial f\left( Y \right)}{\partial y_{12}}x_{22}+\frac{\partial f\left( Y \right)}{\partial y_{21}}x_{31}+\frac{\partial f\left( Y \right)}{\partial y_{22}}x_{32}
∂w21∂f(Y)=∂y11∂f(Y)x21+∂y12∂f(Y)x22+∂y21∂f(Y)x31+∂y22∂f(Y)x32
∂
f
(
Y
)
∂
w
22
=
∂
f
(
Y
)
∂
y
11
x
22
+
∂
f
(
Y
)
∂
y
12
x
23
+
∂
f
(
Y
)
∂
y
21
x
32
+
∂
f
(
Y
)
∂
y
22
x
33
\frac{\partial f\left( Y \right)}{\partial w_{22}}=\frac{\partial f\left( Y \right)}{\partial y_{11}}x_{22}+\frac{\partial f\left( Y \right)}{\partial y_{12}}x_{23}+\frac{\partial f\left( Y \right)}{\partial y_{21}}x_{32}+\frac{\partial f\left( Y \right)}{\partial y_{22}}x_{33}
∂w22∂f(Y)=∂y11∂f(Y)x22+∂y12∂f(Y)x23+∂y21∂f(Y)x32+∂y22∂f(Y)x33
矩阵形式:
[
∂
f
(
Y
)
∂
w
11
∂
f
(
Y
)
∂
w
12
∂
f
(
Y
)
∂
w
21
∂
f
(
Y
)
∂
w
22
]
=
[
x
11
x
12
x
21
x
22
x
12
x
13
x
22
x
23
x
21
x
22
x
31
x
32
x
22
x
23
x
32
x
33
]
[
∂
f
(
Y
)
∂
y
11
∂
f
(
Y
)
∂
y
12
∂
f
(
Y
)
∂
y
21
∂
f
(
Y
)
∂
y
22
]
\mathbf{矩阵形式:}\left[ \begin{array}{c} \frac{\partial f\left( Y \right)}{\partial w_{11}}\\ \frac{\partial f\left( Y \right)}{\partial w_{12}}\\ \frac{\partial f\left( Y \right)}{\partial w_{21}}\\ \frac{\partial f\left( Y \right)}{\partial w_{22}}\\ \end{array} \right] =\left[ \begin{matrix} x_{11}& x_{12}& x_{21}& x_{22}\\ x_{12}& x_{13}& x_{22}& x_{23}\\ x_{21}& x_{22}& x_{31}& x_{32}\\ x_{22}& x_{23}& x_{32}& x_{33}\\ \end{matrix} \right] \left[ \begin{array}{c} \frac{\partial f\left( Y \right)}{\partial y_{11}}\\ \frac{\partial f\left( Y \right)}{\partial y_{12}}\\ \frac{\partial f\left( Y \right)}{\partial y_{21}}\\ \frac{\partial f\left( Y \right)}{\partial y22}\\ \end{array} \right]
矩阵形式:
∂w11∂f(Y)∂w12∂f(Y)∂w21∂f(Y)∂w22∂f(Y)
=
x11x12x21x22x12x13x22x23x21x22x31x32x22x23x32x33
∂y11∂f(Y)∂y12∂f(Y)∂y21∂f(Y)∂y22∂f(Y)
卷积形式:
[
∂
f
(
Y
)
∂
w
11
∂
f
(
Y
)
∂
w
12
∂
f
(
Y
)
∂
w
21
∂
f
(
Y
)
∂
w
22
]
=
[
∂
f
(
Y
)
∂
y
11
∂
f
(
Y
)
∂
y
12
∂
f
(
Y
)
∂
y
21
∂
f
(
Y
)
∂
y
22
]
⊗
[
x
11
x
12
x
13
x
21
x
22
x
23
x
31
x
32
x
33
]
\mathbf{卷积形式:}\left[ \begin{matrix} \frac{\partial f\left( Y \right)}{\partial w_{11}}& \frac{\partial f\left( Y \right)}{\partial w_{12}}\\ \frac{\partial f\left( Y \right)}{\partial w_{21}}& \frac{\partial f\left( Y \right)}{\partial w_{22}}\\ \end{matrix} \right] =\left[ \begin{matrix} \frac{\partial f\left( Y \right)}{\partial y_{11}}& \frac{\partial f\left( Y \right)}{\partial y_{12}}\\ \frac{\partial f\left( Y \right)}{\partial y_{21}}& \frac{\partial f\left( Y \right)}{\partial y_{22}}\\ \end{matrix} \right] \otimes \left[ \begin{matrix} x_{11}& x_{12}& x_{13}\\ x_{21}& x_{22}& x_{23}\\ x_{31}& x_{32}& x_{33}\\ \end{matrix} \right]
卷积形式:[∂w11∂f(Y)∂w21∂f(Y)∂w12∂f(Y)∂w22∂f(Y)]=[∂y11∂f(Y)∂y21∂f(Y)∂y12∂f(Y)∂y22∂f(Y)]⊗
x11x21x31x12x22x32x13x23x33
即:
∂
f
(
Y
)
∂
W
=
∂
f
(
Y
)
∂
Y
⊗
X
\text{即:}\frac{\mathbf{\partial f}\left( \mathbf{Y} \right)}{\mathbf{\partial W}}=\frac{\mathbf{\partial f}\left( \mathbf{Y} \right)}{\mathbf{\partial Y}}\otimes \mathbf{X}
即:∂W∂f(Y)=∂Y∂f(Y)⊗X
求
f
(
Y
)
对
X
的偏导:
\mathbf{求f}\left( \mathbf{Y} \right) \mathbf{对X的偏导:}
求f(Y)对X的偏导:
∂
f
(
Y
)
∂
x
11
=
∂
f
(
Y
)
∂
y
11
w
11
\frac{\partial f\left( Y \right)}{\partial x_{11}}=\frac{\partial f\left( Y \right)}{\partial y_{11}}w_{11}
∂x11∂f(Y)=∂y11∂f(Y)w11
∂
f
(
Y
)
∂
x
12
=
∂
f
(
Y
)
∂
y
11
w
12
+
∂
f
(
Y
)
∂
y
12
w
11
\frac{\partial f\left( Y \right)}{\partial x_{12}}=\frac{\partial f\left( Y \right)}{\partial y_{11}}w_{12}+\frac{\partial f\left( Y \right)}{\partial y_{12}}w_{11}
∂x12∂f(Y)=∂y11∂f(Y)w12+∂y12∂f(Y)w11
∂
f
(
Y
)
∂
x
13
=
∂
f
(
Y
)
∂
y
12
w
12
\frac{\partial f\left( Y \right)}{\partial x_{13}}=\frac{\partial f\left( Y \right)}{\partial y_{12}}w_{12}
∂x13∂f(Y)=∂y12∂f(Y)w12
∂
f
(
Y
)
∂
x
21
=
∂
f
(
Y
)
∂
y
11
w
21
+
∂
f
(
Y
)
∂
y
21
w
11
\frac{\partial f\left( Y \right)}{\partial x_{21}}=\frac{\partial f\left( Y \right)}{\partial y_{11}}w_{21}+\frac{\partial f\left( Y \right)}{\partial y_{21}}w_{11}
∂x21∂f(Y)=∂y11∂f(Y)w21+∂y21∂f(Y)w11
∂
f
(
Y
)
∂
x
22
=
∂
f
(
Y
)
∂
y
11
w
22
+
∂
f
(
Y
)
∂
y
12
w
21
+
∂
f
(
Y
)
∂
y
21
w
12
+
∂
f
(
Y
)
∂
y
22
w
11
\frac{\partial f\left( Y \right)}{\partial x_{22}}=\frac{\partial f\left( Y \right)}{\partial y_{11}}w_{22}+\frac{\partial f\left( Y \right)}{\partial y_{12}}w_{21}+\frac{\partial f\left( Y \right)}{\partial y_{21}}w_{12}+\frac{\partial f\left( Y \right)}{\partial y_{22}}w_{11}
∂x22∂f(Y)=∂y11∂f(Y)w22+∂y12∂f(Y)w21+∂y21∂f(Y)w12+∂y22∂f(Y)w11
∂
f
(
Y
)
∂
x
23
=
∂
f
(
Y
)
∂
y
12
w
22
+
∂
f
(
Y
)
∂
y
22
w
12
\frac{\partial f\left( Y \right)}{\partial x_{23}}=\frac{\partial f\left( Y \right)}{\partial y_{12}}w_{22}+\frac{\partial f\left( Y \right)}{\partial y_{22}}w_{12}
∂x23∂f(Y)=∂y12∂f(Y)w22+∂y22∂f(Y)w12
∂
f
(
Y
)
∂
x
31
=
∂
f
(
Y
)
∂
y
21
w
21
\frac{\partial f\left( Y \right)}{\partial x_{31}}=\frac{\partial f\left( Y \right)}{\partial y_{21}}w_{21}
∂x31∂f(Y)=∂y21∂f(Y)w21
∂
f
(
Y
)
∂
x
32
=
∂
f
(
Y
)
∂
y
21
w
22
+
∂
f
(
Y
)
∂
y
22
w
21
\frac{\partial f\left( Y \right)}{\partial x_{32}}=\frac{\partial f\left( Y \right)}{\partial y_{21}}w_{22}+\frac{\partial f\left( Y \right)}{\partial y_{22}}w_{21}
∂x32∂f(Y)=∂y21∂f(Y)w22+∂y22∂f(Y)w21
∂
f
(
Y
)
∂
x
33
=
∂
f
(
Y
)
∂
y
22
w
22
\frac{\partial f\left( Y \right)}{\partial x_{33}}=\frac{\partial f\left( Y \right)}{\partial y_{22}}w_{22}
∂x33∂f(Y)=∂y22∂f(Y)w22
矩阵形式:
[
∂
f
(
Y
)
∂
x
11
∂
f
(
Y
)
∂
x
12
∂
f
(
Y
)
∂
x
13
∂
f
(
Y
)
∂
x
21
∂
f
(
Y
)
∂
x
22
∂
f
(
Y
)
∂
x
23
∂
f
(
Y
)
∂
x
31
∂
f
(
Y
)
∂
x
32
∂
f
(
Y
)
∂
x
33
]
=
[
w
11
0
0
0
w
12
w
11
0
0
0
w
12
0
0
w
21
0
w
11
0
w
22
w
21
w
12
w
11
0
w
22
0
w
12
0
0
w
21
0
0
0
w
22
w
21
0
0
0
w
22
]
[
∂
f
(
Y
)
∂
y
11
∂
f
(
Y
)
∂
y
12
∂
f
(
Y
)
∂
y
21
∂
f
(
Y
)
∂
y
22
]
\mathbf{矩阵形式:}\left[ \begin{array}{l} \frac{\partial f\left( Y \right)}{\partial x_{11}}\\ \frac{\partial f\left( Y \right)}{\partial x_{12}}\\ \frac{\partial f\left( Y \right)}{\partial x_{13}}\\ \frac{\partial f\left( Y \right)}{\partial x_{21}}\\ \frac{\partial f\left( Y \right)}{\partial x_{22}}\\ \frac{\partial f\left( Y \right)}{\partial x_{23}}\\ \frac{\partial f\left( Y \right)}{\partial x_{31}}\\ \frac{\partial f\left( Y \right)}{\partial x_{32}}\\ \frac{\partial f\left( Y \right)}{\partial x_{33}}\\ \end{array} \right] =\left[ \begin{matrix} w_{11}& \ 0& \ 0& \ 0\\ w_{12}& w_{11}& \ 0& \ 0\\ \ 0& w_{12}& \ 0& \ 0\\ w_{21}& \ 0& w_{11}& \ 0\\ w_{22}& w_{21}& w_{12}& w_{11}\\ \ 0& w_{22}& \ 0& w_{12}\\ \ 0& \ 0& w_{21}& \ 0\\ \ 0& \ 0& w_{22}& w_{21}\\ \ 0& \ 0& \ 0& w_{22}\\ \end{matrix} \right] \left[ \begin{array}{c} \frac{\partial f\left( Y \right)}{\partial y_{11}}\\ \frac{\partial f\left( Y \right)}{\partial y_{12}}\\ \frac{\partial f\left( Y \right)}{\partial y_{21}}\\ \frac{\partial f\left( Y \right)}{\partial y_{22}}\\ \end{array} \right]
矩阵形式:
∂x11∂f(Y)∂x12∂f(Y)∂x13∂f(Y)∂x21∂f(Y)∂x22∂f(Y)∂x23∂f(Y)∂x31∂f(Y)∂x32∂f(Y)∂x33∂f(Y)
=
w11w12 0w21w22 0 0 0 0 0w11w12 0w21w22 0 0 0 0 0 0w11w12 0w21w22 0 0 0 0 0w11w12 0w21w22
∂y11∂f(Y)∂y12∂f(Y)∂y21∂f(Y)∂y22∂f(Y)
卷积形式:
[
∂
f
(
Y
)
∂
x
11
∂
f
(
Y
)
∂
x
12
∂
f
(
Y
)
∂
x
13
∂
f
(
Y
)
∂
x
21
∂
f
(
Y
)
∂
x
22
∂
f
(
Y
)
∂
x
23
∂
f
(
Y
)
∂
x
31
∂
f
(
Y
)
∂
x
32
∂
f
(
Y
)
∂
x
33
]
=
[
∂
f
(
Y
)
∂
y
22
∂
f
(
Y
)
∂
y
21
∂
f
(
Y
)
∂
y
12
∂
f
(
Y
)
∂
y
11
]
⊗
[
0
0
0
0
0
w
11
w
12
0
0
w
21
w
22
0
0
0
0
0
]
\mathbf{卷积形式:}\left[ \begin{matrix} \frac{\partial f\left( Y \right)}{\partial x_{11}}& \frac{\partial f\left( Y \right)}{\partial x_{12}}& \frac{\partial f\left( Y \right)}{\partial x_{13}}\\ \frac{\partial f\left( Y \right)}{\partial x_{21}}& \frac{\partial f\left( Y \right)}{\partial x_{22}}& \frac{\partial f\left( Y \right)}{\partial x_{23}}\\ \frac{\partial f\left( Y \right)}{\partial x_{31}}& \frac{\partial f\left( Y \right)}{\partial x_{32}}& \frac{\partial f\left( Y \right)}{\partial x_{33}}\\ \end{matrix} \right] =\left[ \begin{matrix} \frac{\partial f\left( Y \right)}{\partial y_{22}}& \frac{\partial f\left( Y \right)}{\partial y_{21}}\\ \frac{\partial f\left( Y \right)}{\partial y_{12}}& \frac{\partial f\left( Y \right)}{\partial y_{11}}\\ \end{matrix} \right] \otimes \left[ \begin{matrix} \ 0& \ 0& \ 0& \ 0\\ \ 0& w_{11}& w_{12}& \ 0\\ \ 0& w_{21}& w_{22}& \ 0\\ \ 0& \ 0& \ 0& \ 0\\ \end{matrix} \right]
卷积形式:
∂x11∂f(Y)∂x21∂f(Y)∂x31∂f(Y)∂x12∂f(Y)∂x22∂f(Y)∂x32∂f(Y)∂x13∂f(Y)∂x23∂f(Y)∂x33∂f(Y)
=[∂y22∂f(Y)∂y12∂f(Y)∂y21∂f(Y)∂y11∂f(Y)]⊗
0 0 0 0 0w11w21 0 0w12w22 0 0 0 0 0
=
[
w
22
w
21
w
12
w
11
]
⊗
[
0
0
0
0
0
∂
f
(
Y
)
∂
y
11
∂
f
(
Y
)
∂
y
12
0
0
∂
f
(
Y
)
∂
y
21
∂
f
(
Y
)
∂
y
22
0
0
0
0
0
]
\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =\left[ \begin{matrix} w_{22}& w_{21}\\ w_{12}& w_{11}\\ \end{matrix} \right] \otimes \left[ \begin{matrix} \ 0& \ 0& \ 0& \ 0\\ \ 0& \frac{\partial f\left( Y \right)}{\partial y_{11}}& \frac{\partial f\left( Y \right)}{\partial y_{12}}& \ 0\\ \ 0& \frac{\partial f\left( Y \right)}{\partial y_{21}}& \frac{\partial f\left( Y \right)}{\partial y_{22}}& \ 0\\ \ 0& \ 0& \ 0& \ 0\\ \end{matrix} \right]
=[w22w12w21w11]⊗
0 0 0 0 0∂y11∂f(Y)∂y21∂f(Y) 0 0∂y12∂f(Y)∂y22∂f(Y) 0 0 0 0 0
即:
∂
f
(
Y
)
∂
X
=
r
o
t
180
(
∂
f
(
Y
)
∂
Y
)
⊗
~
W
\text{即:}\frac{\mathbf{\partial f}\left( \mathbf{Y} \right)}{\mathbf{\partial X}}=\mathbf{rot180}\left( \frac{\mathbf{\partial f}\left( \mathbf{Y} \right)}{\mathbf{\partial Y}} \right) \widetilde{\otimes }\mathbf{W}
即:∂X∂f(Y)=rot180(∂Y∂f(Y))⊗
W
=
r
o
t
180
(
W
)
⊗
~
∂
f
(
Y
)
∂
Y
\ \ \ \ \ \ \ \ \ \ \ =\mathbf{rot180}\left( \mathbf{W} \right) \widetilde{\otimes }\frac{\mathbf{\partial f}\left( \mathbf{Y} \right)}{\mathbf{\partial Y}}
=rot180(W)⊗
∂Y∂f(Y)
求
f
(
Y
)
对
b
的偏导:
\mathbf{求f}\left( \mathbf{Y} \right) \mathbf{对b的偏导:}
求f(Y)对b的偏导:
∂
f
(
Y
)
∂
b
=
∂
f
(
Y
)
∂
y
11
+
∂
f
(
Y
)
∂
y
12
+
∂
f
(
Y
)
∂
y
21
+
∂
f
(
Y
)
∂
y
22
\frac{\partial f\left( Y \right)}{\partial b}=\frac{\partial f\left( Y \right)}{\partial y_{11}}+\frac{\partial f\left( Y \right)}{\partial y_{12}}+\frac{\partial f\left( Y \right)}{\partial y_{21}}+\frac{\partial f\left( Y \right)}{\partial y_{22}}
∂b∂f(Y)=∂y11∂f(Y)+∂y12∂f(Y)+∂y21∂f(Y)+∂y22∂f(Y)
5.忽略激活函数, 分析卷积网络中卷积层的前向计算和反向传播(公式(5.39)) 是一种转置关系。
根据我上面推导时的公式:
前向计算:
y
11
=
w
11
x
11
+
w
12
x
12
+
w
21
x
21
+
w
22
x
22
+
b
y_{11}=w_{11}x_{11}+w_{12}x_{12}+w_{21}x_{21}+w_{22}x_{22}+b
y11=w11x11+w12x12+w21x21+w22x22+b
y
12
=
w
11
x
12
+
w
12
x
13
+
w
21
x
22
+
w
22
x
23
+
b
y_{12}=w_{11}x_{12}+w_{12}x_{13}+w_{21}x_{22}+w_{22}x_{23}+b
y12=w11x12+w12x13+w21x22+w22x23+b
y
21
=
w
11
x
21
+
w
12
x
22
+
w
21
x
31
+
w
22
x
32
+
b
y_{21}=w_{11}x_{21}+w_{12}x_{22}+w_{21}x_{31}+w_{22}x_{32}+b
y21=w11x21+w12x22+w21x31+w22x32+b
y
22
=
w
11
x
22
+
w
12
x
23
+
w
21
x
32
+
w
22
x
33
+
b
y_{22}=w_{11}x_{22}+w_{12}x_{23}+w_{21}x_{32}+w_{22}x_{33}+b
y22=w11x22+w12x23+w21x32+w22x33+b
矩阵形式
:
[
y
11
y
12
y
21
y
22
]
=
[
w
11
w
12
0
w
21
w
22
0
0
0
0
0
w
11
w
12
0
w
21
w
22
0
0
0
0
0
0
w
11
w
12
0
w
21
w
22
0
0
0
0
0
w
11
w
12
0
w
21
w
22
]
[
x
11
x
12
x
13
x
21
x
22
x
23
x
31
x
32
x
33
]
\mathbf{矩阵形式}\text{:}\left[ \begin{array}{c} y_{11}\\ y_{12}\\ y_{21}\\ y_{22}\\ \end{array} \right] =\left[ \begin{matrix} w_{11}& w_{12}& \ 0& w_{21}& w_{22}& \ 0& \ 0& \ 0& 0\\ \ 0& w_{11}& w_{12}& \ 0& w_{21}& w_{22}& \ 0& \ 0& 0\\ \ 0& \ 0& \ 0& w_{11}& w_{12}& 0& w_{21}& w_{22}& 0\\ \ 0& \ 0& \ 0& \ 0& w_{11}& w_{12}& \ 0& w_{21}& w_{22}\\ \end{matrix} \right] \left[ \begin{array}{l} x_{11}\\ x_{12}\\ x_{13}\\ x_{21}\\ x_{22}\\ x_{23}\\ x_{31}\\ x_{32}\\ x_{33}\\ \end{array} \right]
矩阵形式:
y11y12y21y22
=
w11 0 0 0w12w11 0 0 0w12 0 0w21 0w11 0w22w21w12w11 0w220w12 0 0w21 0 0 0w22w21000w22
x11x12x13x21x22x23x31x32x33
反向传播:
矩阵形式:
[
∂
f
(
Y
)
∂
x
11
∂
f
(
Y
)
∂
x
12
∂
f
(
Y
)
∂
x
13
∂
f
(
Y
)
∂
x
21
∂
f
(
Y
)
∂
x
22
∂
f
(
Y
)
∂
x
23
∂
f
(
Y
)
∂
x
31
∂
f
(
Y
)
∂
x
32
∂
f
(
Y
)
∂
x
33
]
=
[
w
11
0
0
0
w
12
w
11
0
0
0
w
12
0
0
w
21
0
w
11
0
w
22
w
21
w
12
w
11
0
w
22
0
w
12
0
0
w
21
0
0
0
w
22
w
21
0
0
0
w
22
]
[
∂
f
(
Y
)
∂
y
11
∂
f
(
Y
)
∂
y
12
∂
f
(
Y
)
∂
y
21
∂
f
(
Y
)
∂
y
22
]
\mathbf{矩阵形式:}\left[ \begin{array}{l} \frac{\partial f\left( Y \right)}{\partial x_{11}}\\ \frac{\partial f\left( Y \right)}{\partial x_{12}}\\ \frac{\partial f\left( Y \right)}{\partial x_{13}}\\ \frac{\partial f\left( Y \right)}{\partial x_{21}}\\ \frac{\partial f\left( Y \right)}{\partial x_{22}}\\ \frac{\partial f\left( Y \right)}{\partial x_{23}}\\ \frac{\partial f\left( Y \right)}{\partial x_{31}}\\ \frac{\partial f\left( Y \right)}{\partial x_{32}}\\ \frac{\partial f\left( Y \right)}{\partial x_{33}}\\ \end{array} \right] =\left[ \begin{matrix} w_{11}& \ 0& \ 0& \ 0\\ w_{12}& w_{11}& \ 0& \ 0\\ \ 0& w_{12}& \ 0& \ 0\\ w_{21}& \ 0& w_{11}& \ 0\\ w_{22}& w_{21}& w_{12}& w_{11}\\ \ 0& w_{22}& \ 0& w_{12}\\ \ 0& \ 0& w_{21}& \ 0\\ \ 0& \ 0& w_{22}& w_{21}\\ \ 0& \ 0& \ 0& w_{22}\\ \end{matrix} \right] \left[ \begin{array}{c} \frac{\partial f\left( Y \right)}{\partial y_{11}}\\ \frac{\partial f\left( Y \right)}{\partial y_{12}}\\ \frac{\partial f\left( Y \right)}{\partial y_{21}}\\ \frac{\partial f\left( Y \right)}{\partial y_{22}}\\ \end{array} \right]
矩阵形式:
∂x11∂f(Y)∂x12∂f(Y)∂x13∂f(Y)∂x21∂f(Y)∂x22∂f(Y)∂x23∂f(Y)∂x31∂f(Y)∂x32∂f(Y)∂x33∂f(Y)
=
w11w12 0w21w22 0 0 0 0 0w11w12 0w21w22 0 0 0 0 0 0w11w12 0w21w22 0 0 0 0 0w11w12 0w21w22
∂y11∂f(Y)∂y12∂f(Y)∂y21∂f(Y)∂y22∂f(Y)
可以看出,卷积网络中卷积层的前向计算和反向传播是一种转置关系。
前向计算时,第
l
+
1
l+1
l+1层的净输入为
z
l
+
1
=
W
l
+
1
z
l
z^{l+1}=W^{l+1}z^l
zl+1=Wl+1zl 。
反向传播时,第
l
l
l层的误差为
σ
l
=
(
W
l
+
1
)
T
σ
l
+
1
\sigma ^l=\left( W^{l+1} \right) ^T\sigma ^{l+1}
σl=(Wl+1)Tσl+1 。
6.在空洞卷积中, 当卷积核大小为𝐾, 膨胀率为𝐷时, 如何设置零填充𝑃的值以使得卷积为等宽卷积 。
空洞卷积(膨胀卷积):注意是在卷积核
上插入空洞,从而增大感受野,同时不增加参数数量。
首先看下面的公式,如何根据输入尺寸、卷积核大小、填充、步长,求输出的尺寸。
总结:
这是实验收获最大的就是通过手推卷积层的公式,对卷积层反向传播的过程有了更深的了解。同时知道了宽卷积具有交换性,知道了怎么计算卷积层的时间复杂度和空间复杂度,知道了卷积层的前向传播和反向传播是“转置”关系(注意:不是逆运算,只是形式上的转置关系),还学会了空洞卷积。
提醒
:画图、编辑文章、公式编辑完之后一定一定要及时保存,我就因为没及时保存卷积反向传播的公式,结果又推了一遍!!!但是印象也更加深刻了。