神经网络的池化层一般是没有参数更新的,但是它仍旧要参与反向传播的参数传递。那应该怎么传递呢?
前向传播
平均池化和最大池化是两种较为常见的池化方式。先来回顾一下池化层的前向传播方式。
以输入 3x3, 池化核 2x2 为例,(无填充(padding))则输出为 2x2
平均池化
[ x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 ] → [ x 11 + x 12 + x 21 + x 22 4 x 12 + x 13 + x 22 + x 23 4 x 21 + x 22 + x 31 + x 32 4 x 22 + x 23 + x 32 + x 33 4 ] \left[\begin{matrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23}\\ x_{31} & x_{32} & x_{33} \end{matrix}\right] →\left[\begin{matrix} \frac{x_{11}+x_{12}+x_{21}+x_{22}}{4} & \frac{x_{12}+x_{13}+x_{22}+x_{23}}{4} \\ \frac{x_{21}+x_{22}+x_{31}+x_{32}}{4} & \frac{x_{22}+x_{23}+x_{32}+x_{33}}{4} \end{matrix}\right] ⎣⎡x11x21x31x12x22x32x13x23x33⎦⎤→[4x11+x12+x21+x224x21+x22+x31+x324x12+x13+x22+x234x22+x23+x32+x33]
最大池化
[
x
11
x
12
x
13
x
21
x
22
x
23
x
31
x
32
x
33
]
→
[
m
a
x
{
x
11
,
x
12
,
x
21
,
x
22
}
m
a
x
{
x
12
,
x
13
,
x
22
,
x
23
}
m
a
x
{
x
21
,
x
22
,
x
31
,
x
32
}
m
a
x
{
x
22
,
x
23
,
x
32
,
x
33
}
]
\left[\begin{matrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23}\\ x_{31} & x_{32} & x_{33} \end{matrix}\right] →\left[\begin{matrix} max\{x_{11},x_{12},x_{21},x_{22}\}& max\{x_{12},x_{13},x_{22},x_{23}\} \\ max\{x_{21},x_{22},x_{31},x_{32}\}& max\{x_{22},x_{23},x_{32},x_{33}\} \end{matrix}\right]
⎣⎡x11x21x31x12x22x32x13x23x33⎦⎤→[max{x11,x12,x21,x22}max{x21,x22,x31,x32}max{x12,x13,x22,x23}max{x22,x23,x32,x33}]
反向传播
由这一篇《优雅地理解神经网络反向传播》可知,神经网络反向传播中,每一层计算损失函数关于该层输入的梯度,将其传给前一层。 (*)
假定输出层损失函数为
L
L
L
平均池化
[
∂
L
∂
x
11
∂
L
∂
x
12
∂
L
∂
x
13
∂
L
∂
x
21
∂
L
∂
x
22
∂
L
∂
x
23
∂
L
∂
x
31
∂
L
∂
x
32
∂
L
∂
x
33
]
←
[
∂
L
∂
z
11
∂
L
∂
z
12
∂
L
∂
z
21
∂
L
∂
z
22
]
\left[\begin{matrix} \frac{\partial L}{\partial x_{11}} & \frac{\partial L}{\partial x_{12}}&\frac{\partial L}{\partial x_{13}}\\ \\ \frac{\partial L}{\partial x_{21}} & \frac{\partial L}{\partial x_{22}}&\frac{\partial L}{\partial x_{23}}\\ \\ \frac{\partial L}{\partial x_{31}} & \frac{\partial L}{\partial x_{32}}&\frac{\partial L}{\partial x_{33}} \end{matrix}\right] ←\left[\begin{matrix} \frac{\partial L}{\partial z_{11}} & \frac{\partial L}{\partial z_{12}} \\ \\ \frac{\partial L}{\partial z_{21}}&\frac{\partial L}{\partial z_{22}} \end{matrix}\right]
⎣⎢⎢⎢⎢⎡∂x11∂L∂x21∂L∂x31∂L∂x12∂L∂x22∂L∂x32∂L∂x13∂L∂x23∂L∂x33∂L⎦⎥⎥⎥⎥⎤←⎣⎡∂z11∂L∂z21∂L∂z12∂L∂z22∂L⎦⎤
也就是按照箭头方向,已知所有
∂
L
∂
z
i
j
\frac{\partial L}{\partial z_{ij}}
∂zij∂L 求解所有
∂
L
∂
x
i
j
\frac{\partial L}{\partial x_{ij}}
∂xij∂L 的过程。
比如上图中 ( i , j ) = ( 2 , 2 ) (i,j)=(2,2) (i,j)=(2,2) 的情形, x 22 x_{22} x22 对最终损失函数 L L L 的贡献反映在四项 ∂ L ∂ z 11 , ∂ L ∂ z 12 , ∂ L ∂ z 21 , ∂ L ∂ z 22 \frac{\partial L}{\partial z_{11}} , \frac{\partial L}{\partial z_{12}} , \frac{\partial L}{\partial z_{21}} , \frac{\partial L}{\partial z_{22}} ∂z11∂L,∂z12∂L,∂z21∂L,∂z22∂L 上。
于是由链式法则,
∂
L
∂
x
22
=
∂
L
∂
z
11
∂
z
11
∂
x
22
+
∂
L
∂
z
12
∂
z
12
∂
x
22
+
∂
L
∂
z
21
∂
z
21
∂
x
22
+
∂
L
∂
z
22
∂
z
22
∂
x
22
=
1
4
(
∂
L
∂
z
11
+
∂
L
∂
z
12
+
∂
L
∂
z
21
+
∂
L
∂
z
22
)
\frac{\partial L}{\partial x_{22}}=\frac{\partial L}{\partial z_{11}}\frac{\partial z_{11}}{\partial x_{22}} + \frac{\partial L}{\partial z_{12}}\frac{\partial z_{12}}{\partial x_{22}}+\frac{\partial L}{\partial z_{21}}\frac{\partial z_{21}}{\partial x_{22}}+\frac{\partial L}{\partial z_{22}}\frac{\partial z_{22}}{\partial x_{22}} \\\ \\ =\frac{1}{4}(\frac{\partial L}{\partial z_{11}}+\frac{\partial L}{\partial z_{12}}+\frac{\partial L}{\partial z_{21}}+\frac{\partial L}{\partial z_{22}})
∂x22∂L=∂z11∂L∂x22∂z11+∂z12∂L∂x22∂z12+∂z21∂L∂x22∂z21+∂z22∂L∂x22∂z22 =41(∂z11∂L+∂z12∂L+∂z21∂L+∂z22∂L)
同理,易求得
[
∂
L
∂
x
11
∂
L
∂
x
12
∂
L
∂
x
13
∂
L
∂
x
21
∂
L
∂
x
22
∂
L
∂
x
23
∂
L
∂
x
31
∂
L
∂
x
32
∂
L
∂
x
33
]
=
1
4
[
∂
L
∂
z
11
∂
L
∂
z
11
+
∂
L
∂
z
12
∂
L
∂
z
12
∂
L
∂
x
11
+
∂
L
∂
z
21
∂
L
∂
z
11
+
∂
L
∂
z
12
+
∂
L
∂
z
21
+
∂
L
∂
z
22
∂
L
∂
x
12
+
∂
L
∂
z
22
∂
L
∂
z
21
∂
L
∂
z
21
+
∂
L
∂
z
22
∂
L
∂
z
22
]
\left[\begin{matrix} \frac{\partial L}{\partial x_{11}} & \frac{\partial L}{\partial x_{12}}&\frac{\partial L}{\partial x_{13}}\\ \\ \frac{\partial L}{\partial x_{21}} & \frac{\partial L}{\partial x_{22}}&\frac{\partial L}{\partial x_{23}}\\ \\ \frac{\partial L}{\partial x_{31}} & \frac{\partial L}{\partial x_{32}}&\frac{\partial L}{\partial x_{33}} \end{matrix}\right] \\\ \\\ \\=\frac{1}{4}\left[\begin{matrix} \frac{\partial L}{\partial z_{11}} & \frac{\partial L}{\partial z_{11}}+ \frac{\partial L}{\partial z_{12}}& \frac{\partial L}{\partial z_{12}}\\ \\ \frac{\partial L}{\partial x_{11}}+\frac{\partial L}{\partial z_{21}} & \frac{\partial L}{\partial z_{11}}+\frac{\partial L}{\partial z_{12}}+\frac{\partial L}{\partial z_{21}}+\frac{\partial L}{\partial z_{22}}&\frac{\partial L}{\partial x_{12}}+ \frac{\partial L}{\partial z_{22}}\\ \\ \frac{\partial L}{\partial z_{21}} & \frac{\partial L}{\partial z_{21}}+ \frac{\partial L}{\partial z_{22}}& \frac{\partial L}{\partial z_{22}} \end{matrix}\right]
⎣⎢⎢⎢⎢⎡∂x11∂L∂x21∂L∂x31∂L∂x12∂L∂x22∂L∂x32∂L∂x13∂L∂x23∂L∂x33∂L⎦⎥⎥⎥⎥⎤ =41⎣⎢⎢⎢⎢⎡∂z11∂L∂x11∂L+∂z21∂L∂z21∂L∂z11∂L+∂z12∂L∂z11∂L+∂z12∂L+∂z21∂L+∂z22∂L∂z21∂L+∂z22∂L∂z12∂L∂x12∂L+∂z22∂L∂z22∂L⎦⎥⎥⎥⎥⎤
为了方便边界处理,可以填充(padding),也可以写成如下形式:
=
1
4
(
∂
L
∂
z
11
[
1
1
0
1
1
0
0
0
0
]
+
∂
L
∂
z
12
[
0
1
1
0
1
1
0
0
0
]
+
∂
L
∂
z
21
[
0
0
0
1
1
0
1
1
0
]
+
∂
L
∂
z
22
[
0
0
0
0
1
1
0
1
1
]
)
=\frac{1}{4}(\frac{\partial L}{\partial z_{11}} \left[\begin{matrix} 1 & 1 & 0 \\ \\ 1 & 1 & 0 \\ \\ 0 & 0 & 0 \end{matrix}\right] +\frac{\partial L}{\partial z_{12}}\left[\begin{matrix} 0 & 1 & 1 \\ \\ 0& 1 & 1 \\ \\ 0 & 0 & 0 \end{matrix}\right] +\frac{\partial L}{\partial z_{21}}\left[\begin{matrix} 0 & 0 & 0 \\ \\ 1 &1& 0 \\ \\ 1 &1& 0 \end{matrix}\right] +\frac{\partial L}{\partial z_{22}}\left[\begin{matrix} 0 &0& 0 \\ \\ 0&1 & 1 \\ \\0&1& 1 \end{matrix}\right] )
=41(∂z11∂L⎣⎢⎢⎢⎢⎡110110000⎦⎥⎥⎥⎥⎤+∂z12∂L⎣⎢⎢⎢⎢⎡000110110⎦⎥⎥⎥⎥⎤+∂z21∂L⎣⎢⎢⎢⎢⎡011011000⎦⎥⎥⎥⎥⎤+∂z22∂L⎣⎢⎢⎢⎢⎡000011011⎦⎥⎥⎥⎥⎤)
可以理解为:
对于每个
z
i
j
z_{ij}
zij, 将其对损失函数的贡献(偏微分)分配给求得它的
x
i
′
j
′
x_{i'j'}
xi′j′。
最大池化
池化核在每一个位置处时,只有该区域中最大的
x
x
x 对损失函数有贡献。其表现类似于 relu 激活函数。需要记录最大的
x
x
x 所在的位置。
例如假设前向传播时
[
x
11
x
12
x
13
x
21
x
22
x
23
x
31
x
32
x
33
]
→
[
x
(
i
,
j
)
1
,
1
x
(
i
,
j
)
1
,
2
x
(
i
,
j
)
2
,
1
x
(
i
,
j
)
2
,
2
]
\left[\begin{matrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23}\\ x_{31} & x_{32} & x_{33} \end{matrix}\right] →\left[\begin{matrix} x_{(i,j)_{1,1}}& x_{(i,j)_{1,2}} \\ x_{(i,j)_{2,1}}& x_{(i,j)_{2,2}} \end{matrix}\right]
⎣⎡x11x21x31x12x22x32x13x23x33⎦⎤→[x(i,j)1,1x(i,j)2,1x(i,j)1,2x(i,j)2,2]
则反向传播时
[
∂
L
∂
x
11
∂
L
∂
x
12
∂
L
∂
x
13
∂
L
∂
x
21
∂
L
∂
x
22
∂
L
∂
x
23
∂
L
∂
x
31
∂
L
∂
x
32
∂
L
∂
x
33
]
=
∂
L
∂
z
11
δ
(
i
,
j
)
1
,
1
+
∂
L
∂
z
12
δ
(
i
,
j
)
1
,
2
+
∂
L
∂
z
21
δ
(
i
,
j
)
2
,
1
+
∂
L
∂
z
22
δ
(
i
,
j
)
2
,
2
\left[\begin{matrix} \frac{\partial L}{\partial x_{11}} & \frac{\partial L}{\partial x_{12}}&\frac{\partial L}{\partial x_{13}}\\ \\ \frac{\partial L}{\partial x_{21}} & \frac{\partial L}{\partial x_{22}}&\frac{\partial L}{\partial x_{23}}\\ \\ \frac{\partial L}{\partial x_{31}} & \frac{\partial L}{\partial x_{32}}&\frac{\partial L}{\partial x_{33}} \end{matrix}\right] = \frac{\partial L}{\partial z_{11}} \delta_{(i,j)_{1,1}}+\frac{\partial L}{\partial z_{12}} \delta_{(i,j)_{1,2}}+\frac{\partial L}{\partial z_{21}} \delta_{(i,j)_{2,1}}+\frac{\partial L}{\partial z_{22}} \delta_{(i,j)_{2,2}}
⎣⎢⎢⎢⎢⎡∂x11∂L∂x21∂L∂x31∂L∂x12∂L∂x22∂L∂x32∂L∂x13∂L∂x23∂L∂x33∂L⎦⎥⎥⎥⎥⎤=∂z11∂Lδ(i,j)1,1+∂z12∂Lδ(i,j)1,2+∂z21∂Lδ(i,j)2,1+∂z22∂Lδ(i,j)2,2
其中 δ ( i , j ) \delta_{(i,j)} δ(i,j) 表示只在 ( i , j ) (i,j) (i,j) 处为1,其余地方为0的矩阵。