池化层的反向传播


神经网络的池化层一般是没有参数更新的,但是它仍旧要参与反向传播的参数传递。那应该怎么传递呢?

前向传播

平均池化和最大池化是两种较为常见的池化方式。先来回顾一下池化层的前向传播方式。
以输入 3x3, 池化核 2x2 为例,(无填充(padding))则输出为 2x2

平均池化

[ x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 ] → [ x 11 + x 12 + x 21 + x 22 4 x 12 + x 13 + x 22 + x 23 4 x 21 + x 22 + x 31 + x 32 4 x 22 + x 23 + x 32 + x 33 4 ] \left[\begin{matrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23}\\ x_{31} & x_{32} & x_{33} \end{matrix}\right] →\left[\begin{matrix} \frac{x_{11}+x_{12}+x_{21}+x_{22}}{4} & \frac{x_{12}+x_{13}+x_{22}+x_{23}}{4} \\ \frac{x_{21}+x_{22}+x_{31}+x_{32}}{4} & \frac{x_{22}+x_{23}+x_{32}+x_{33}}{4} \end{matrix}\right] x11x21x31x12x22x32x13x23x33[4x11+x12+x21+x224x21+x22+x31+x324x12+x13+x22+x234x22+x23+x32+x33]

最大池化

[ x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 ] → [ m a x { x 11 , x 12 , x 21 , x 22 } m a x { x 12 , x 13 , x 22 , x 23 } m a x { x 21 , x 22 , x 31 , x 32 } m a x { x 22 , x 23 , x 32 , x 33 } ] \left[\begin{matrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23}\\ x_{31} & x_{32} & x_{33} \end{matrix}\right] →\left[\begin{matrix} max\{x_{11},x_{12},x_{21},x_{22}\}& max\{x_{12},x_{13},x_{22},x_{23}\} \\ max\{x_{21},x_{22},x_{31},x_{32}\}& max\{x_{22},x_{23},x_{32},x_{33}\} \end{matrix}\right] x11x21x31x12x22x32x13x23x33[max{x11,x12,x21,x22}max{x21,x22,x31,x32}max{x12,x13,x22,x23}max{x22,x23,x32,x33}]

反向传播

由这一篇《优雅地理解神经网络反向传播》可知,神经网络反向传播中,每一层计算损失函数关于该层输入的梯度,将其传给前一层(*)
假定输出层损失函数为 L L L

平均池化

[ ∂ L ∂ x 11 ∂ L ∂ x 12 ∂ L ∂ x 13 ∂ L ∂ x 21 ∂ L ∂ x 22 ∂ L ∂ x 23 ∂ L ∂ x 31 ∂ L ∂ x 32 ∂ L ∂ x 33 ] ← [ ∂ L ∂ z 11 ∂ L ∂ z 12 ∂ L ∂ z 21 ∂ L ∂ z 22 ] \left[\begin{matrix} \frac{\partial L}{\partial x_{11}} & \frac{\partial L}{\partial x_{12}}&\frac{\partial L}{\partial x_{13}}\\ \\ \frac{\partial L}{\partial x_{21}} & \frac{\partial L}{\partial x_{22}}&\frac{\partial L}{\partial x_{23}}\\ \\ \frac{\partial L}{\partial x_{31}} & \frac{\partial L}{\partial x_{32}}&\frac{\partial L}{\partial x_{33}} \end{matrix}\right] ←\left[\begin{matrix} \frac{\partial L}{\partial z_{11}} & \frac{\partial L}{\partial z_{12}} \\ \\ \frac{\partial L}{\partial z_{21}}&\frac{\partial L}{\partial z_{22}} \end{matrix}\right] x11Lx21Lx31Lx12Lx22Lx32Lx13Lx23Lx33Lz11Lz21Lz12Lz22L
也就是按照箭头方向,已知所有 ∂ L ∂ z i j \frac{\partial L}{\partial z_{ij}} zijL 求解所有 ∂ L ∂ x i j \frac{\partial L}{\partial x_{ij}} xijL 的过程。

比如上图中 ( i , j ) = ( 2 , 2 ) (i,j)=(2,2) (i,j)=(2,2) 的情形, x 22 x_{22} x22 对最终损失函数 L L L 的贡献反映在四项 ∂ L ∂ z 11 , ∂ L ∂ z 12 , ∂ L ∂ z 21 , ∂ L ∂ z 22 \frac{\partial L}{\partial z_{11}} , \frac{\partial L}{\partial z_{12}} , \frac{\partial L}{\partial z_{21}} , \frac{\partial L}{\partial z_{22}} z11L,z12L,z21L,z22L 上。

于是由链式法则,

∂ L ∂ x 22 = ∂ L ∂ z 11 ∂ z 11 ∂ x 22 + ∂ L ∂ z 12 ∂ z 12 ∂ x 22 + ∂ L ∂ z 21 ∂ z 21 ∂ x 22 + ∂ L ∂ z 22 ∂ z 22 ∂ x 22   = 1 4 ( ∂ L ∂ z 11 + ∂ L ∂ z 12 + ∂ L ∂ z 21 + ∂ L ∂ z 22 ) \frac{\partial L}{\partial x_{22}}=\frac{\partial L}{\partial z_{11}}\frac{\partial z_{11}}{\partial x_{22}} + \frac{\partial L}{\partial z_{12}}\frac{\partial z_{12}}{\partial x_{22}}+\frac{\partial L}{\partial z_{21}}\frac{\partial z_{21}}{\partial x_{22}}+\frac{\partial L}{\partial z_{22}}\frac{\partial z_{22}}{\partial x_{22}} \\\ \\ =\frac{1}{4}(\frac{\partial L}{\partial z_{11}}+\frac{\partial L}{\partial z_{12}}+\frac{\partial L}{\partial z_{21}}+\frac{\partial L}{\partial z_{22}}) x22L=z11Lx22z11+z12Lx22z12+z21Lx22z21+z22Lx22z22 =41(z11L+z12L+z21L+z22L)

同理,易求得

[ ∂ L ∂ x 11 ∂ L ∂ x 12 ∂ L ∂ x 13 ∂ L ∂ x 21 ∂ L ∂ x 22 ∂ L ∂ x 23 ∂ L ∂ x 31 ∂ L ∂ x 32 ∂ L ∂ x 33 ]     = 1 4 [ ∂ L ∂ z 11 ∂ L ∂ z 11 + ∂ L ∂ z 12 ∂ L ∂ z 12 ∂ L ∂ x 11 + ∂ L ∂ z 21 ∂ L ∂ z 11 + ∂ L ∂ z 12 + ∂ L ∂ z 21 + ∂ L ∂ z 22 ∂ L ∂ x 12 + ∂ L ∂ z 22 ∂ L ∂ z 21 ∂ L ∂ z 21 + ∂ L ∂ z 22 ∂ L ∂ z 22 ] \left[\begin{matrix} \frac{\partial L}{\partial x_{11}} & \frac{\partial L}{\partial x_{12}}&\frac{\partial L}{\partial x_{13}}\\ \\ \frac{\partial L}{\partial x_{21}} & \frac{\partial L}{\partial x_{22}}&\frac{\partial L}{\partial x_{23}}\\ \\ \frac{\partial L}{\partial x_{31}} & \frac{\partial L}{\partial x_{32}}&\frac{\partial L}{\partial x_{33}} \end{matrix}\right] \\\ \\\ \\=\frac{1}{4}\left[\begin{matrix} \frac{\partial L}{\partial z_{11}} & \frac{\partial L}{\partial z_{11}}+ \frac{\partial L}{\partial z_{12}}& \frac{\partial L}{\partial z_{12}}\\ \\ \frac{\partial L}{\partial x_{11}}+\frac{\partial L}{\partial z_{21}} & \frac{\partial L}{\partial z_{11}}+\frac{\partial L}{\partial z_{12}}+\frac{\partial L}{\partial z_{21}}+\frac{\partial L}{\partial z_{22}}&\frac{\partial L}{\partial x_{12}}+ \frac{\partial L}{\partial z_{22}}\\ \\ \frac{\partial L}{\partial z_{21}} & \frac{\partial L}{\partial z_{21}}+ \frac{\partial L}{\partial z_{22}}& \frac{\partial L}{\partial z_{22}} \end{matrix}\right] x11Lx21Lx31Lx12Lx22Lx32Lx13Lx23Lx33L  =41z11Lx11L+z21Lz21Lz11L+z12Lz11L+z12L+z21L+z22Lz21L+z22Lz12Lx12L+z22Lz22L
为了方便边界处理,可以填充(padding),也可以写成如下形式:

= 1 4 ( ∂ L ∂ z 11 [ 1 1 0 1 1 0 0 0 0 ] + ∂ L ∂ z 12 [ 0 1 1 0 1 1 0 0 0 ] + ∂ L ∂ z 21 [ 0 0 0 1 1 0 1 1 0 ] + ∂ L ∂ z 22 [ 0 0 0 0 1 1 0 1 1 ] ) =\frac{1}{4}(\frac{\partial L}{\partial z_{11}} \left[\begin{matrix} 1 & 1 & 0 \\ \\ 1 & 1 & 0 \\ \\ 0 & 0 & 0 \end{matrix}\right] +\frac{\partial L}{\partial z_{12}}\left[\begin{matrix} 0 & 1 & 1 \\ \\ 0& 1 & 1 \\ \\ 0 & 0 & 0 \end{matrix}\right] +\frac{\partial L}{\partial z_{21}}\left[\begin{matrix} 0 & 0 & 0 \\ \\ 1 &1& 0 \\ \\ 1 &1& 0 \end{matrix}\right] +\frac{\partial L}{\partial z_{22}}\left[\begin{matrix} 0 &0& 0 \\ \\ 0&1 & 1 \\ \\0&1& 1 \end{matrix}\right] ) =41(z11L110110000+z12L000110110+z21L011011000+z22L000011011)

可以理解为:
对于每个 z i j z_{ij} zij, 将其对损失函数的贡献(偏微分)分配给求得它的 x i ′ j ′ x_{i'j'} xij

最大池化

池化核在每一个位置处时,只有该区域中最大的 x x x 对损失函数有贡献。其表现类似于 relu 激活函数。需要记录最大的 x x x 所在的位置。
例如假设前向传播时
[ x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 ] → [ x ( i , j ) 1 , 1 x ( i , j ) 1 , 2 x ( i , j ) 2 , 1 x ( i , j ) 2 , 2 ] \left[\begin{matrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23}\\ x_{31} & x_{32} & x_{33} \end{matrix}\right] →\left[\begin{matrix} x_{(i,j)_{1,1}}& x_{(i,j)_{1,2}} \\ x_{(i,j)_{2,1}}& x_{(i,j)_{2,2}} \end{matrix}\right] x11x21x31x12x22x32x13x23x33[x(i,j)1,1x(i,j)2,1x(i,j)1,2x(i,j)2,2]
则反向传播时

[ ∂ L ∂ x 11 ∂ L ∂ x 12 ∂ L ∂ x 13 ∂ L ∂ x 21 ∂ L ∂ x 22 ∂ L ∂ x 23 ∂ L ∂ x 31 ∂ L ∂ x 32 ∂ L ∂ x 33 ] = ∂ L ∂ z 11 δ ( i , j ) 1 , 1 + ∂ L ∂ z 12 δ ( i , j ) 1 , 2 + ∂ L ∂ z 21 δ ( i , j ) 2 , 1 + ∂ L ∂ z 22 δ ( i , j ) 2 , 2 \left[\begin{matrix} \frac{\partial L}{\partial x_{11}} & \frac{\partial L}{\partial x_{12}}&\frac{\partial L}{\partial x_{13}}\\ \\ \frac{\partial L}{\partial x_{21}} & \frac{\partial L}{\partial x_{22}}&\frac{\partial L}{\partial x_{23}}\\ \\ \frac{\partial L}{\partial x_{31}} & \frac{\partial L}{\partial x_{32}}&\frac{\partial L}{\partial x_{33}} \end{matrix}\right] = \frac{\partial L}{\partial z_{11}} \delta_{(i,j)_{1,1}}+\frac{\partial L}{\partial z_{12}} \delta_{(i,j)_{1,2}}+\frac{\partial L}{\partial z_{21}} \delta_{(i,j)_{2,1}}+\frac{\partial L}{\partial z_{22}} \delta_{(i,j)_{2,2}} x11Lx21Lx31Lx12Lx22Lx32Lx13Lx23Lx33L=z11Lδ(i,j)1,1+z12Lδ(i,j)1,2+z21Lδ(i,j)2,1+z22Lδ(i,j)2,2

其中 δ ( i , j ) \delta_{(i,j)} δ(i,j) 表示只在 ( i , j ) (i,j) (i,j) 处为1,其余地方为0的矩阵。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值