机器学习复习:卷积的方向传播之二:步长stride为s的二维卷积方法的反向传播算法

我的个人博客:https://huaxuan0720.github.io/ ,欢迎访问

前言

  在之前讨论了步长stride为1的卷积方式的反向传播,但是很多时候,使用的卷积步长会大于1,这个情况下的卷积方式的反向传播和步长为1的情况稍稍有些区别,不过区别并没有想象中那么大,因此下面就对步长stride大于1的情况进行简单的阐述。请注意:这里的所有推导过程都只是针对当前设置的参数信息,并不具有一般性,但是所有的推导过程可以推导到一般的运算,因此以下给出的并不是反向传播算法的严格证明,不涉及十分复杂的公式推导,争取可以以一种简单的方式来理解卷积的反向传播。希望可以很好的帮助理解反向传播算法。

  需要注意的是,在本文中,所有的正向传播过程中,卷积的步长stride均固定为2。

一,参数设置

  这里我们设置我们的数据矩阵(记作 x x x)大小为5x5,卷积核(记作 k k k)大小为3x3,由于步长是2,因此,卷积之后获得的结果是一个2x2大小的数据矩阵(不妨我们记作 u u u)。偏置项我们记为 b b b,将和卷积之后的矩阵进行相加。

  我们的参数汇总如下:

参数设置
输入矩阵 x x x一个二维矩阵,大小为5x5
输入卷积核 k k k一个二维矩阵,大小为3x3
步长 s t r i d e stride stride设置为2
paddingVALID
偏置项 b b b一个浮点数

  和前面一样,我们定义卷积操作的符号为 c o n v conv conv,我们可以将卷积表示为(需要注意的是这里步长选取为2):

x    c o n v    k + b = u x \; conv \; k + b = u xconvk+b=u

  展开之后,我们可以得到:

[ x 1 , 1 x 1 , 2 x 1 , 3 x 1 , 4 x 1 , 5 x 2 , 1 x 2 , 2 x 2 , 3 x 2 , 4 x 2 , 5 x 3 , 1 x 3 , 2 x 3 , 3 x 3 , 4 x 3 , 5 x 4 , 1 x 4 , 2 x 4 , 3 x 4 , 4 x 4 , 5 x 5 , 1 x 5 , 2 x 5 , 3 x 5 , 4 x 5 , 5 ]    c o n v    [ k 1 , 1 k 1 , 2 k 1 , 3 k 2 , 1 k 2 , 2 k 2 , 3 k 3 , 1 k 3 , 2 k 3 , 3 ] + b = [ u 1 , 1 u 1 , 2 u 2 , 1 u 2 , 2 ] \begin{bmatrix} x_{1, 1} & x_{1, 2} & x_{1, 3} &x_{1, 4} &x_{1, 5} \\ x_{2, 1} & x_{2, 2} & x_{2, 3} &x_{2, 4} &x_{2, 5} \\ x_{3, 1} & x_{3, 2} & x_{3, 3} &x_{3, 4} &x_{3, 5} \\ x_{4, 1} & x_{4, 2} & x_{4, 3} &x_{4, 4} &x_{4, 5} \\ x_{5, 1} & x_{5, 2} & x_{5, 3} &x_{5, 4} &x_{5, 5} \\ \end{bmatrix} \; conv \; \begin{bmatrix} k_{1, 1} & k_{1, 2} & k_{1, 3}\\ k_{2, 1} & k_{2, 2} & k_{2, 3}\\ k_{3, 1} & k_{3, 2} & k_{3, 3}\\ \end{bmatrix} + b = \begin{bmatrix} u_{1, 1} & u_{1, 2} \\ u_{2, 1} & u_{2, 2} \\ \end{bmatrix} x1,1x2,1x3,1x4,1x5,1x1,2x2,2x3,2x4,2x5,2x1,3x2,3x3,3x4,3x5,3x1,4x2,4x3,4x4,4x5,4x1,5x2,5x3,5x4,5x5,5convk1,1k2,1k3,1k1,2k2,2k3,2k1,3k2,3k3,3+b=[u1,1u2,1u1,2u2,2]

  将矩阵 u u u进一步展开,我们有:

[ u 1 , 1 u 1 , 2 u 2 , 1 u 2 , 2 ] = [ x 1 , 1 k 1 , 1 + x 1 , 2 k 1 , 2 + x 1 , 3 k 1 , 3 + x 2 , 1 k 2 , 1 + x 2 , 2 k 2 , 2 + x 2 , 3 k 2 , 3 + x 3 , 1 k 3 , 1 + x 3 , 2 k 3 , 2 + x 3 , 3 k 3 , 3 + b x 1 , 3 k 1 , 1 + x 1 , 4 k 1 , 2 + x 1 , 5 k 1 , 3 + x 2 , 3 k 2 , 1 + x 2 , 4 k 2 , 2 + x 2 , 5 k 2 , 3 + x 3 , 3 k 3 , 1 + x 3 , 4 k 3 , 2 + x 3 , 5 k 3 , 3 + b x 3 , 1 k 1 , 1 + x 3 , 2 k 1 , 2 + x 3 , 3 k 1 , 3 + x 4 , 1 k 2 , 1 + x 4 , 2 k 2 , 2 + x 4 , 3 k 2 , 3 + x 5 , 1 k 3 , 1 + x 5 , 2 k 3 , 2 + x 5 , 3 k 3 , 3 + b x 3 , 3 k 1 , 1 + x 3 , 4 k 1 , 2 + x 3 , 5 k 1 , 3 + x 4 , 3 k 2 , 1 + x 4 , 4 k 2 , 2 + x 4 , 5 k 2 , 3 + x 5 , 3 k 3 , 1 + x 5 , 4 k 3 , 2 + x 5 , 5 k 3 , 3 + b ] \begin{bmatrix} u_{1, 1} & u_{1, 2} \\ u_{2, 1} & u_{2, 2} \\ \end{bmatrix} = \\ \begin{bmatrix} \begin{matrix} x_{1, 1}k_{1, 1} + x_{1, 2}k_{1, 2} +x_{1, 3}k_{1, 3} + \\ x_{2, 1}k_{2, 1} + x_{2, 2}k_{2, 2} +x_{2, 3}k_{2, 3} + \\ x_{3, 1}k_{3, 1} + x_{3, 2}k_{3, 2} +x_{3, 3}k_{3, 3} + b \\ \end{matrix} & \begin{matrix} x_{1, 3}k_{1, 1} + x_{1, 4}k_{1, 2} +x_{1, 5}k_{1, 3} + \\ x_{2, 3}k_{2, 1} + x_{2, 4}k_{2, 2} +x_{2, 5}k_{2, 3} + \\ x_{3, 3}k_{3, 1} + x_{3, 4}k_{3, 2} +x_{3, 5}k_{3, 3} + b \\ \end{matrix} \\ \\ \begin{matrix} x_{3, 1}k_{1, 1} + x_{3, 2}k_{1, 2} +x_{3, 3}k_{1, 3} + \\ x_{4, 1}k_{2, 1} + x_{4, 2}k_{2, 2} +x_{4, 3}k_{2, 3} + \\ x_{5, 1}k_{3, 1} + x_{5, 2}k_{3, 2} +x_{5, 3}k_{3, 3} + b \\ \end{matrix} & \begin{matrix} x_{3, 3}k_{1, 1} + x_{3, 4}k_{1, 2} +x_{3, 5}k_{1, 3} + \\ x_{4, 3}k_{2, 1} + x_{4, 4}k_{2, 2} +x_{4, 5}k_{2, 3} + \\ x_{5, 3}k_{3, 1} + x_{5, 4}k_{3, 2} +x_{5, 5}k_{3, 3} + b \\ \end{matrix} \\ \end{bmatrix} [u1,1u2,1u1,2u2,2]=x1,1k1,1+x1,2k1,2+x1,3k1,3+x2,1k2,1+x2,2k2,2+x2,3k2,3+x3,1k3,1+x3,2k3,2+x3,3k3,3+bx3,1k1,1+x3,2k1,2+x3,3k1,3+x4,1k2,1+x4,2k2,2+x4,3k2,3+x5,1k3,1+x5,2k3,2+x5,3k3,3+bx1,3k1,1+x1,4k1,2+x1,5k1,3+x2,3k2,1+x2,4k2,2+x2,5k2,3+x3,3k3,1+x3,4k3,2+x3,5k3,3+bx3,3k1,1+x3,4k1,2+x3,5k1,3+x4,3k2,1+x4,4k2,2+x4,5k2,3+x5,3k3,1+x5,4k3,2+x5,5k3,3+b

二、误差传递

  步长为2的二维卷积已经在上面的式子中被完整的表示出来了,因此,下一步就是需要对误差进行传递,和前面步长为1的情况一样,我们可以将上面的结果保存在一张表格中,每一列表示的是一个特定的输出 ∂ u i , j \partial u_{i, j} ui,j,每一行表示的是一个特定的输入值 ∂ x p , k \partial x_{p, k} xp,k,行与列相交的地方表示的就是二者相除的结果,表示的是输出对于输入的偏导数,即 ∂ u i , j ∂ x p , k \frac{\partial u_{i, j}}{\partial x_{p, k}} xp,kui,j。于是,表格如下:

∂ u 1 , 1 \partial u_{1, 1} u1,1 ∂ u 1 , 2 \partial u_{1, 2} u1,2 ∂ u 2 , 1 \partial u_{2, 1} u2,1 ∂ u 2 , 2 \partial u_{2, 2} u2,2 ∂ L ∂ x i , j \frac{\partial L}{\partial x_{i, j}} xi,jL
∂ x 1 , 1 \partial x_{1, 1} x1,1 k 1 , 1 k_{1, 1} k1,1000 ∂ L ∂ x 1 , 1 = δ 1 , 1 k 1 , 1 \frac{\partial L}{\partial x_{1, 1}} = \delta_{1, 1} k_{1, 1} x1,1L=δ1,1k1,1
∂ x 1 , 2 \partial x_{1, 2} x1,2 k 1 , 2 k_{1, 2} k1,2000 ∂ L ∂ x 1 , 2 = δ 1 , 1 k 1 , 2 \frac{\partial L}{\partial x_{1, 2}} = \delta_{1, 1} k_{1, 2} x1,2L=δ1,1k1,2
∂ x 1 , 3 \partial x_{1, 3} x1,3 k 1 , 3 k_{1, 3} k1,3 k 1 , 1 k_{1, 1} k1,100 ∂ L ∂ x 1 , 3 = δ 1 , 1 k 1 , 3 + δ 1 , 2 k 1 , 1 \frac{\partial L}{\partial x_{1, 3}} = \delta_{1, 1} k_{1, 3} + \delta_{1, 2}k_{1, 1} x1,3L=δ1,1k1,3+δ1,2k1,1
∂ x 1 , 4 \partial x_{1, 4} x1,40 k 1 , 2 k_{1, 2} k1,200 ∂ L ∂ x 1 , 4 = δ 1 , 2 k 1 , 2 \frac{\partial L}{\partial x_{1, 4}} = \delta_{1, 2}k_{1, 2} x1,4L=δ1,2k1,2
∂ x 1 , 5 \partial x_{1, 5} x1,50 k 1 , 3 k_{1, 3} k1,300 ∂ L ∂ x 1 , 5 = δ 1 , 2 k 1 , 3 \frac{\partial L}{\partial x_{1, 5}} = \delta_{1, 2}k_{1, 3} x1,5L=δ1,2k1,3
∂ x 2 , 1 \partial x_{2, 1} x2,1 k 2 , 1 k_{2, 1} k2,1000 ∂ L ∂ x 2 , 1 = δ 1 , 1 k 2 , 1 \frac{\partial L}{\partial x_{2, 1}} = \delta_{1, 1} k_{2, 1} x2,1L=δ1,1k2,1
∂ x 2 , 2 \partial x_{2, 2} x2,2 k 2 , 2 k_{2, 2} k2,2000 ∂ L ∂ x 2 , 2 = δ 1 , 1 k 2 , 2 \frac{\partial L}{\partial x_{2, 2}} = \delta_{1, 1} k_{2, 2} x2,2L=δ1,1k2,2
∂ x 2 , 3 \partial x_{2, 3} x2,3 k 2 , 3 k_{2, 3} k2,3 k 2 , 1 k_{2, 1} k2,100 ∂ L ∂ x 2 , 3 = δ 1 , 1 k 1 , 3 + δ 1 , 2 k 2 , 1 \frac{\partial L}{\partial x_{2, 3}} = \delta_{1, 1} k_{1, 3} + \delta_{1, 2}k_{2, 1} x2,3L=δ1,1k1,3+δ1,2k2,1
∂ x 2 , 4 \partial x_{2, 4} x2,40 k 2 , 2 k_{2, 2} k2,200 ∂ L ∂ x 2 , 4 = δ 1 , 2 k 2 , 2 \frac{\partial L}{\partial x_{2, 4}} = \delta_{1, 2}k_{2, 2} x2,4L=δ1,2k2,2
∂ x 2 , 5 \partial x_{2, 5} x2,50 k 2 , 3 k_{2, 3} k2,300 ∂ L ∂ x 2 , 5 = δ 1 , 2 k 2 , 3 \frac{\partial L}{\partial x_{2, 5}} = \delta_{1, 2}k_{2, 3} x2,5L=δ1,2k2,3
∂ x 3 , 1 \partial x_{3, 1} x3,1 k 3 , 1 k_{3, 1} k3,10 k 1 , 1 k_{1, 1} k1,10 ∂ L ∂ x 3 , 1 = δ 1 , 1 k 3 , 1 + δ 2 , 1 k 1 , 1 \frac{\partial L}{\partial x_{3, 1}} = \delta_{1, 1}k_{3, 1} + \delta_{2, 1}k_{1, 1} x3,1L=δ1,1k3,1+δ2,1k1,1
∂ x 3 , 2 \partial x_{3, 2} x3,2 k 3 , 2 k_{3, 2} k3,20 k 1 , 2 k_{1, 2} k1,20 ∂ L ∂ x 3 , 2 = δ 1 , 1 k 3 , 2 + δ 2 , 1 k 1 , 2 \frac{\partial L}{\partial x_{3, 2}} = \delta_{1, 1}k_{3, 2} + \delta_{2, 1}k_{1, 2} x3,2L=δ1,1k3,2+δ2,1k1,2
∂ x 3 , 3 \partial x_{3, 3} x3,3 k 3 , 3 k_{3, 3} k3,3 k 3 , 1 k_{3, 1} k3,1 k 1 , 3 k_{1, 3} k1,3 k 1 , 1 k_{1, 1} k1,1 ∂ L ∂ x 3 , 3 = δ 1 , 1 k 3 , 3 + δ 1 , 2 k 3 , 1 + δ 2 , 1 k 1 , 3 + δ 2 , 2 k 1 , 1 \frac{\partial L}{\partial x_{3, 3}} = \delta_{1, 1}k_{3, 3} + \delta_{1, 2}k_{3, 1} + \delta_{2, 1}k_{1, 3} + \delta_{2, 2}k_{1, 1} x3,3L=δ1,1k3,3+δ1,2k3,1+δ2,1k1,3+δ2,2k1,1
∂ x 3 , 4 \partial x_{3, 4} x3,40 k 3 , 2 k_{3, 2} k3,20 k 1 , 2 k_{1, 2} k1,2 ∂ L ∂ x 3 , 4 = δ 1 , 2 k 3 , 2 + δ 2 , 2 k 1 , 2 \frac{\partial L}{\partial x_{3, 4}} = \delta_{1, 2}k_{3, 2} + \delta_{2, 2}k_{1, 2} x3,4L=δ1,2k3,2+δ2,2k1,2
∂ x 3 , 5 \partial x_{3, 5} x3,50 k 3 , 3 k_{3, 3} k3,30 k 1 , 3 k_{1, 3} k1,3 ∂ L ∂ x 3 , 5 = δ 1 , 2 k 3 , 3 + δ 2 , 2 k 1 , 3 \frac{\partial L}{\partial x_{3, 5}} = \delta_{1, 2}k_{3, 3} + \delta_{2, 2}k_{1, 3} x3,5L=δ1,2k3,3+δ2,2k1,3
∂ x 4 , 1 \partial x_{4, 1} x4,100 k 2 , 1 k_{2, 1} k2,10 ∂ L ∂ x 4 , 1 = δ 2 , 1 k 2 , 1 \frac{\partial L}{\partial x_{4, 1}} = \delta_{2, 1}k_{2, 1} x4,1L=δ2,1k2,1
∂ x 4 , 2 \partial x_{4, 2} x4,200 k 2 , 2 k_{2, 2} k2,20 ∂ L ∂ x 4 , 2 = δ 2 , 1 k 2 , 2 \frac{\partial L}{\partial x_{4, 2}} = \delta_{2, 1}k_{2, 2} x4,2L=δ2,1k2,2
∂ x 4 , 3 \partial x_{4, 3} x4,300 k 2 , 3 k_{2, 3} k2,3 k 2 , 1 k_{2, 1} k2,1 ∂ L ∂ x 4 , 3 = δ 2 , 1 k 2 , 3 + δ 2 , 2 k 2 , 1 \frac{\partial L}{\partial x_{4, 3}} = \delta_{2, 1}k_{2, 3} + \delta_{2, 2}k_{2, 1} x4,3L=δ2,1k2,3+δ2,2k2,1
∂ x 4 , 4 \partial x_{4, 4} x4,4000 k 2 , 2 k_{2, 2} k2,2 ∂ L ∂ x 4 , 4 = δ 2 , 2 k 2 , 2 \frac{\partial L}{\partial x_{4, 4}} = \delta_{2, 2}k_{2, 2} x4,4L=δ2,2k2,2
∂ x 4 , 5 \partial x_{4, 5} x4,5000 k 2 , 3 k_{2, 3} k2,3 ∂ L ∂ x 4 , 5 = δ 2 , 2 k 2 , 3 \frac{\partial L}{\partial x_{4, 5}} = \delta_{2, 2}k_{2, 3} x4,5L=δ2,2k2,3
∂ x 5 , 1 \partial x_{5, 1} x5,100 k 3 , 1 k_{3, 1} k3,10 ∂ L ∂ x 5 , 1 = δ 2 , 1 k 3 , 1 \frac{\partial L}{\partial x_{5, 1}} = \delta_{2, 1}k_{3, 1} x5,1L=δ2,1k3,1
∂ x 5 , 2 \partial x_{5, 2} x5,200 k 3 , 2 k_{3, 2} k3,20 ∂ L ∂ x 5 , 2 = δ 2 , 1 k 3 , 2 \frac{\partial L}{\partial x_{5, 2}} = \delta_{2, 1}k_{3, 2} x5,2L=δ2,1k3,2
∂ x 5 , 3 \partial x_{5, 3} x5,300 k 3 , 3 k_{3, 3} k3,3 k 3 , 1 k_{3, 1} k3,1 ∂ L ∂ x 5 , 3 = δ 2 , 1 k 3 , 3 + δ 2 , 2 k 3 , 1 \frac{\partial L}{\partial x_{5, 3}} = \delta_{2, 1}k_{3, 3} + \delta_{2, 2}k_{3, 1} x5,3L=δ2,1k3,3+δ2,2k3,1
∂ x 5 , 4 \partial x_{5, 4} x5,4000 k 3 , 2 k_{3, 2} k3,2 ∂ L ∂ x 5 , 4 = δ 2 , 2 k 3 , 2 \frac{\partial L}{\partial x_{5, 4}} = \delta_{2, 2}k_{3, 2} x5,4L=δ2,2k3,2
∂ x 5 , 5 \partial x_{5, 5} x5,5000 k 3 , 3 k_{3, 3} k3,3 ∂ L ∂ x 5 , 5 = δ 2 , 2 k 3 , 3 \frac{\partial L}{\partial x_{5, 5}} = \delta_{2, 2}k_{3, 3} x5,5L=δ2,2k3,3

  可以看出,数据依然都是很规律的进行着重复。

  我们假设后面传递过来的误差是 δ \delta δ ,即:

δ = [ δ 1 , 1 δ 1 , 2 δ 2 , 1 δ 2 , 2 ] \delta = \begin{bmatrix} \delta_{1, 1} & \delta_{1, 2} \\ \delta_{2, 1} & \delta_{2, 2} \\ \end{bmatrix} δ=[δ1,1δ2,1δ1,2δ2,2]

  其中, δ i , j = ∂ L ∂ u i , j \delta_{i, j} = \frac{\partial L}{\partial u_{i, j}} δi,j=ui,jL,误差分别对应于每一个输出项。这里的 L L L表示的是最后的Loss损失。我们的目的就是希望这个损失尽可能小。那么,根据求导的链式法则,我们有:

  根据求偏导数的链式法则,我们可以有:

∂ L ∂ x i , j = ∑ p = 1 ∑ k = 1 ∂ L ∂ u p , k ⋅ ∂ u p , k ∂ x i , j = ∑ p = 1 ∑ k = 1 δ p , k ⋅ ∂ u p , k ∂ x i , j \frac{\partial L}{\partial x_{i, j}} = \sum_{p = 1} \sum_{k = 1} \frac{\partial L}{\partial u_{p, k}} \cdot \frac{\partial u_{p, k}}{\partial x_{i, j}} = \sum_{p = 1} \sum_{k = 1} \delta_{p, k} \cdot \frac{\partial u_{p, k}}{\partial x_{i, j}} xi,jL=p=1k=1up,kLxi,jup,k=p=1k=1δp,kxi,jup,k

  我们以 ∂ L ∂ x 3 , 3 \frac{\partial L}{\partial x_{3, 3}} x3,3L为例,我们有:

∂ L ∂ x 3 , 3 = ∑ p = 1 ∑ k = 1 ∂ L ∂ u p , k ⋅ ∂ u p , k ∂ x 3 , 3 = ∑ p = 1 ∑ k = 1 δ p , k ⋅ ∂ u p , k ∂ x 3 , 3 = δ 1 , 1 ∂ u 1 , 1 ∂ x 3 , 3 + δ 1 , 2 ∂ u 1 , 2 ∂ x 3 , 3 + δ 2 , 1 ∂ u 2 , 1 ∂ x 3 , 3 + δ 2 , 2 ∂ u 2 , 2 ∂ x 3 , 3 = δ 1 , 1 k 3 , 3 + δ 1 , 2 k 3 , 1 + δ 2 , 1 k 1 , 3 + δ 2 , 2 k 1 , 1 \begin{aligned} \frac{\partial L}{\partial x_{3, 3}} &= \sum_{p = 1} \sum_{k = 1} \frac{\partial L}{\partial u_{p, k}} \cdot \frac{\partial u_{p, k}}{\partial x_{3, 3}} \\ &= \sum_{p = 1} \sum_{k = 1} \delta_{p, k} \cdot \frac{\partial u_{p, k}}{\partial x_{3, 3}} \\ &= \delta_{1, 1}\frac{\partial u_{1, 1}}{\partial x_{3, 3}} + \delta_{1, 2}\frac{\partial u_{1, 2}}{\partial x_{3, 3}} + \delta_{2, 1}\frac{\partial u_{2, 1}}{\partial x_{3, 3}} + \delta_{2, 2}\frac{\partial u_{2, 2}}{\partial x_{3, 3}} \\ &= \delta_{1, 1}k_{3, 3} + \delta_{1, 2}k_{3, 1} + \delta_{2, 1}k_{1, 3} + \delta_{2, 2}k_{1, 1} \end{aligned} x3,3L=p=1k=1up,kLx3,3up,k=p=1k=1δp,kx3,3up,k=δ1,1x3,3u1,1+δ1,2x3,3u1,2+δ2,1x3,3u2,1+δ2,2x3,3u2,2=δ1,1k3,3+δ1,2k3,1+δ2,1k1,3+δ2,2k1,1

   类似地,我们可以计算出所有的输入矩阵中的元素所对应的偏导数信息,所有的偏导数计算结果均在上表中列出。

  和前面步长stride为1的卷积方式的误差传递类似,我们需要对传递来的误差矩阵和卷积核进行一定的处理,然后再进行卷积,得到应该传递到下一层的网络结构中,所以我们需要的解决问题的问题有三个,即:1.误差矩阵如何处理,2.卷积核如何处理,3.如何进行卷积。

  同样,我们将 ∂ L ∂ x 3 , 3 \frac{\partial L}{\partial x_{3, 3}} x3,3L单独拿出来进行考察,如果需要用到全部的卷积核的元素的话,并不能和传递来的误差矩阵相匹配,为了使得两者可以再维度上相匹配,我们再误差矩阵中添加若干0,和步长stride为1的卷积反向传播一样,我们也将卷积核进行180°翻转,于是,我们可以得到:

∂ L ∂ x 3 , 3 = [ δ 1 , 1 0 δ 1 , 2 0 0 0 δ 2 , 1 0 δ 2 , 2 ]    c o n v    [ k 3 , 3 k 3 , 2 k 3 , 1 k 2 , 3 k 2 , 2 k 2 , 1 k 1 , 3 k 1 , 2 k 1 , 1 ] \frac{\partial L}{\partial x_{3, 3}} = \begin{bmatrix} \delta_{1, 1} & 0 & \delta_{1, 2} \\ 0 & 0 & 0 \\ \delta_{2, 1} & 0 & \delta_{2, 2} \\ \end{bmatrix} \; conv \;\begin{bmatrix} k_{3, 3} & k_{3, 2} & k_{3, 1} \\ k_{2, 3} & k_{2, 2} & k_{2, 1} \\ k_{1, 3} & k_{1, 2} & k_{1, 1} \\ \end{bmatrix} x3,3L=δ1,10δ2,1000δ1,20δ2,2convk3,3k2,3k1,3k3,2k2,2k1,2k3,1k2,1k1,1

  由于padding策略一直默认为是VALID,而且上面的两个矩阵形状相同,所以此时的步长stride参数不会影响到最终的结果。

  如果按照我们之前的策略,再在添加0之后的误差矩阵外面填补上合适数目的0的话,有:

[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 δ 1 , 1 0 δ 1 , 2 0 0 0 0 0 0 0 0 0 0 0 δ 2 , 1 0 δ 2 , 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]    c o n v    [ k 3 , 3 k 3 , 2 k 3 , 1 k 2 , 3 k 2 , 2 k 2 , 1 k 1 , 3 k 1 , 2 k 1 , 1 ] \begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & \delta_{1, 1} & 0 & \delta_{1, 2} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & \delta_{2, 1} & 0 & \delta_{2, 2} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \end{bmatrix} \; conv \;\begin{bmatrix} k_{3, 3} & k_{3, 2} & k_{3, 1} \\ k_{2, 3} & k_{2, 2} & k_{2, 1} \\ k_{1, 3} & k_{1, 2} & k_{1, 1} \\ \end{bmatrix} 0000000000000000δ1,10δ2,100000000000δ1,20δ2,20000000000000000convk3,3k2,3k1,3k3,2k2,2k1,2k3,1k2,1k1,1

  同样,上面的卷积过程步长stride参数为1。

  不妨将上面的卷积的结果记为 c o n v 1 conv1 conv1,然后,我们将 ∂ L ∂ x i , j \frac{\partial L}{\partial x_{i, j}} xi,jL按照对应的顺序进行排列,我们将结果记作 c o n v 2 conv2 conv2,即:

c o n v 2 = [ ∂ L ∂ x 1 , 1 ∂ L ∂ x 1 , 2 ∂ L ∂ x 1 , 3 ∂ L ∂ x 1 , 4 ∂ L ∂ x 1 , 5 ∂ L ∂ x 2 , 1 ∂ L ∂ x 2 , 2 ∂ L ∂ x 2 , 3 ∂ L ∂ x 2 , 4 ∂ L ∂ x 2 , 5 ∂ L ∂ x 3 , 1 ∂ L ∂ x 3 , 2 ∂ L ∂ x 3 , 3 ∂ L ∂ x 3 , 4 ∂ L ∂ x 3 , 5 ∂ L ∂ x 4 , 1 ∂ L ∂ x 4 , 2 ∂ L ∂ x 4 , 3 ∂ L ∂ x 4 , 4 ∂ L ∂ x 4 , 5 ∂ L ∂ x 5 , 1 ∂ L ∂ x 5 , 2 ∂ L ∂ x 5 , 3 ∂ L ∂ x 5 , 4 ∂ L ∂ x 5 , 5 ] conv2 = \begin{bmatrix} \frac{\partial L}{\partial x_{1, 1}} & \frac{\partial L}{\partial x_{1, 2}} & \frac{\partial L}{\partial x_{1, 3}}& \frac{\partial L}{\partial x_{1, 4}} & \frac{\partial L}{\partial x_{1, 5}} \\ \frac{\partial L}{\partial x_{2, 1}} & \frac{\partial L}{\partial x_{2, 2}} & \frac{\partial L}{\partial x_{2, 3}}& \frac{\partial L}{\partial x_{2, 4}} & \frac{\partial L}{\partial x_{2, 5}} \\ \frac{\partial L}{\partial x_{3, 1}} & \frac{\partial L}{\partial x_{3, 2}} & \frac{\partial L}{\partial x_{3, 3}}& \frac{\partial L}{\partial x_{3, 4}} & \frac{\partial L}{\partial x_{3, 5}} \\ \frac{\partial L}{\partial x_{4, 1}} & \frac{\partial L}{\partial x_{4, 2}} & \frac{\partial L}{\partial x_{4, 3}}& \frac{\partial L}{\partial x_{4, 4}} & \frac{\partial L}{\partial x_{4, 5}} \\ \frac{\partial L}{\partial x_{5, 1}} & \frac{\partial L}{\partial x_{5, 2}} & \frac{\partial L}{\partial x_{5, 3}}& \frac{\partial L}{\partial x_{5, 4}} & \frac{\partial L}{\partial x_{5, 5}} \\ \end{bmatrix} conv2=x1,1Lx2,1Lx3,1Lx4,1Lx5,1Lx1,2Lx2,2Lx3,2Lx4,2Lx5,2Lx1,3Lx2,3Lx3,3Lx4,3Lx5,3Lx1,4Lx2,4Lx3,4Lx4,4Lx5,4Lx1,5Lx2,5Lx3,5Lx4,5Lx5,5L

  经过计算,我们发现 c o n v 1 conv1 conv1 c o n v 2 conv2 conv2正好相等。即:

[ ∂ L ∂ x 1 , 1 ∂ L ∂ x 1 , 2 ∂ L ∂ x 1 , 3 ∂ L ∂ x 1 , 4 ∂ L ∂ x 1 , 5 ∂ L ∂ x 2 , 1 ∂ L ∂ x 2 , 2 ∂ L ∂ x 2 , 3 ∂ L ∂ x 2 , 4 ∂ L ∂ x 2 , 5 ∂ L ∂ x 3 , 1 ∂ L ∂ x 3 , 2 ∂ L ∂ x 3 , 3 ∂ L ∂ x 3 , 4 ∂ L ∂ x 3 , 5 ∂ L ∂ x 4 , 1 ∂ L ∂ x 4 , 2 ∂ L ∂ x 4 , 3 ∂ L ∂ x 4 , 4 ∂ L ∂ x 4 , 5 ∂ L ∂ x 5 , 1 ∂ L ∂ x 5 , 2 ∂ L ∂ x 5 , 3 ∂ L ∂ x 5 , 4 ∂ L ∂ x 5 , 5 ] = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 δ 1 , 1 0 δ 1 , 2 0 0 0 0 0 0 0 0 0 0 0 δ 2 , 1 0 δ 2 , 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]    c o n v    [ k 3 , 3 k 3 , 2 k 3 , 1 k 2 , 3 k 2 , 2 k 2 , 1 k 1 , 3 k 1 , 2 k 1 , 1 ] \begin{bmatrix} \frac{\partial L}{\partial x_{1, 1}} & \frac{\partial L}{\partial x_{1, 2}} & \frac{\partial L}{\partial x_{1, 3}}& \frac{\partial L}{\partial x_{1, 4}} & \frac{\partial L}{\partial x_{1, 5}} \\ \frac{\partial L}{\partial x_{2, 1}} & \frac{\partial L}{\partial x_{2, 2}} & \frac{\partial L}{\partial x_{2, 3}}& \frac{\partial L}{\partial x_{2, 4}} & \frac{\partial L}{\partial x_{2, 5}} \\ \frac{\partial L}{\partial x_{3, 1}} & \frac{\partial L}{\partial x_{3, 2}} & \frac{\partial L}{\partial x_{3, 3}}& \frac{\partial L}{\partial x_{3, 4}} & \frac{\partial L}{\partial x_{3, 5}} \\ \frac{\partial L}{\partial x_{4, 1}} & \frac{\partial L}{\partial x_{4, 2}} & \frac{\partial L}{\partial x_{4, 3}}& \frac{\partial L}{\partial x_{4, 4}} & \frac{\partial L}{\partial x_{4, 5}} \\ \frac{\partial L}{\partial x_{5, 1}} & \frac{\partial L}{\partial x_{5, 2}} & \frac{\partial L}{\partial x_{5, 3}}& \frac{\partial L}{\partial x_{5, 4}} & \frac{\partial L}{\partial x_{5, 5}} \\ \end{bmatrix} = \begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & \delta_{1, 1} & 0 & \delta_{1, 2} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & \delta_{2, 1} & 0 & \delta_{2, 2} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \end{bmatrix} \; conv \;\begin{bmatrix} k_{3, 3} & k_{3, 2} & k_{3, 1} \\ k_{2, 3} & k_{2, 2} & k_{2, 1} \\ k_{1, 3} & k_{1, 2} & k_{1, 1} \\ \end{bmatrix} x1,1Lx2,1Lx3,1Lx4,1Lx5,1Lx1,2Lx2,2Lx3,2Lx4,2Lx5,2Lx1,3Lx2,3Lx3,3Lx4,3Lx5,3Lx1,4Lx2,4Lx3,4Lx4,4Lx5,4Lx1,5Lx2,5Lx3,5Lx4,5Lx5,5L=0000000000000000δ1,10δ2,100000000000δ1,20δ2,20000000000000000convk3,3k2,3k1,3k3,2k2,2k1,2k3,1k2,1k1,1

  可以发现,在前面提出的三个问题中,有两个问题的答案是和步长stride为1的二维卷积相同的,唯一不同的是,我们需要在误差矩阵的相邻元素之间插入若干0来完成卷积误差的产生。

误差矩阵插入0的方式

  很明显,我们唯一需要解决的问题就是如何在误差矩阵中插入0。在这里直接给出结论,那就是**每个相邻的元素之间应该插入(步长stride - 1)个0,或者说每个元素之间的距离是卷积的步长。**因为在这个模型中,唯一和前面的卷积方式不同的变量就是步长stride,那么需要满足的条件也必然和步长有关。

  当我们在元素之间插入合适数目的0之后,接下来就是在误差矩阵周围填补上合适数目的0层,然后将卷积核旋转180°,最后按照步长为1的方式进行卷积,最后得到应该向前传递的误差矩阵。 这两步和步长stride为1的反向传播算法相同。

三、参数更新

  当我们解决了误差的向前传递之后,下一步就是解决参数的更新的问题。和前面的定义一样,假设我们在这一阶段接收到的后方传递过来的误差为 δ \delta δ, 即:

δ = [ δ 1 , 1 δ 1 , 2 δ 1 , 3 δ 2 , 1 δ 2 , 2 δ 2 , 3 δ 3 , 1 δ 3 , 2 δ 3 , 3 ] \delta = \begin{bmatrix} \delta_{1, 1} & \delta_{1, 2} & \delta_{1, 3} \\ \delta_{2, 1} & \delta_{2, 2} & \delta_{2, 3} \\ \delta_{3, 1} & \delta_{3, 2} & \delta_{3, 3} \\ \end{bmatrix} δ=δ1,1δ2,1δ3,1δ1,2δ2,2δ3,2δ1,3δ2,3δ3,3

  那么根据偏导数求解的链式法则,我们可以有下面的式子:这里以求解 ∂ L ∂ k 1 , 1 \frac{\partial L}{\partial k_{1, 1}} k1,1L 为例:

∂ L ∂ k 1 , 1 = ∂ L ∂ u 1 , 1 ∂ u 1 , 1 k 1 , 1 + ∂ L ∂ u 1 , 2 ∂ u 1 , 2 k 1 , 1 + ∂ L ∂ u 2 , 1 ∂ u 2 , 1 k 1 , 1 + ∂ L ∂ u 2 , 2 ∂ u 2 , 2 k 1 , 1 = δ 1 , 1 ∂ u 1 , 1 k 1 , 1 + δ 1 , 2 ∂ u 1 , 2 k 1 , 1 + δ 2 , 1 ∂ u 2 , 1 k 1 , 1 + δ 2 , 2 ∂ u 2 , 2 k 1 , 1 = δ 1 , 1 x 1 , 1 + δ 1 , 2 x 1 , 3 + δ 2 , 1 x 3 , 1 + δ 2 , 2 x 3 , 3 \begin{aligned} \frac{\partial L}{\partial k_{1, 1}} =& \frac{\partial L}{\partial u_{1, 1}} \frac{\partial u_{1, 1}}{k_{1, 1}} + \frac{\partial L}{\partial u_{1, 2}} \frac{\partial u_{1, 2}}{k_{1, 1}} + \frac{\partial L}{\partial u_{2, 1}} \frac{\partial u_{2, 1}}{k_{1, 1}} + \frac{\partial L}{\partial u_{2, 2}} \frac{\partial u_{2, 2}}{k_{1, 1}} \\ =& \delta_{1, 1} \frac{\partial u_{1, 1}}{k_{1, 1}} + \delta_{1, 2} \frac{\partial u_{1, 2}}{k_{1, 1}} + \delta_{2, 1} \frac{\partial u_{2, 1}}{k_{1, 1}} + \delta_{2, 2} \frac{\partial u_{2, 2}}{k_{1, 1}} \\ =& \delta_{1, 1} x_{1, 1} + \delta_{1, 2} x_{1, 3} + \delta_{2, 1} x_{3, 1} + \delta_{2, 2} x_{3, 3} \end{aligned} k1,1L===u1,1Lk1,1u1,1+u1,2Lk1,1u1,2+u2,1Lk1,1u2,1+u2,2Lk1,1u2,2δ1,1k1,1u1,1+δ1,2k1,1u1,2+δ2,1k1,1u2,1+δ2,2k1,1u2,2δ1,1x1,1+δ1,2x1,3+δ2,1x3,1+δ2,2x3,3

  类似地,我们将所有地偏导数信息都求出来,汇总如下:

∂ L ∂ k 1 , 1 = δ 1 , 1 x 1 , 1 + δ 1 , 2 x 1 , 3 + δ 2 , 1 x 3 , 1 + δ 2 , 2 x 3 , 3 \frac{\partial L}{\partial k_{1, 1}} = \delta_{1, 1} x_{1, 1} + \delta_{1, 2} x_{1, 3} + \delta_{2, 1} x_{3, 1} + \delta_{2, 2} x_{3, 3} k1,1L=δ1,1x1,1+δ1,2x1,3+δ2,1x3,1+δ2,2x3,3
∂ L ∂ k 1 , 2 = δ 1 , 1 x 1 , 2 + δ 1 , 2 x 1 , 4 + δ 2 , 1 x 3 , 2 + δ 2 , 2 x 3 , 4 \frac{\partial L}{\partial k_{1, 2}} = \delta_{1, 1} x_{1, 2} + \delta_{1, 2} x_{1, 4} + \delta_{2, 1} x_{3, 2} + \delta_{2, 2} x_{3, 4} k1,2L=δ1,1x1,2+δ1,2x1,4+δ2,1x3,2+δ2,2x3,4
∂ L ∂ k 1 , 3 = δ 1 , 1 x 1 , 3 + δ 1 , 2 x 1 , 5 + δ 2 , 1 x 3 , 3 + δ 2 , 2 x 3 , 5 \frac{\partial L}{\partial k_{1, 3}} = \delta_{1, 1} x_{1, 3} + \delta_{1, 2} x_{1, 5} + \delta_{2, 1} x_{3, 3} + \delta_{2, 2} x_{3, 5} k1,3L=δ1,1x1,3+δ1,2x1,5+δ2,1x3,3+δ2,2x3,5
∂ L ∂ k 2 , 1 = δ 1 , 1 x 2 , 1 + δ 1 , 2 x 2 , 3 + δ 2 , 1 x 4 , 1 + δ 2 , 2 x 4 , 3 \frac{\partial L}{\partial k_{2, 1}} = \delta_{1, 1} x_{2, 1} + \delta_{1, 2} x_{2, 3} + \delta_{2, 1} x_{4, 1} + \delta_{2, 2} x_{4, 3} k2,1L=δ1,1x2,1+δ1,2x2,3+δ2,1x4,1+δ2,2x4,3
∂ L ∂ k 2 , 2 = δ 1 , 1 x 2 , 2 + δ 1 , 2 x 2 , 4 + δ 2 , 1 x 4 , 2 + δ 2 , 2 x 4 , 4 \frac{\partial L}{\partial k_{2, 2}} = \delta_{1, 1} x_{2, 2} + \delta_{1, 2} x_{2, 4} + \delta_{2, 1} x_{4, 2} + \delta_{2, 2} x_{4, 4} k2,2L=δ1,1x2,2+δ1,2x2,4+δ2,1x4,2+δ2,2x4,4
∂ L ∂ k 2 , 3 = δ 1 , 1 x 2 , 3 + δ 1 , 2 x 2 , 5 + δ 2 , 1 x 4 , 3 + δ 2 , 2 x 4 , 5 \frac{\partial L}{\partial k_{2, 3}} = \delta_{1, 1} x_{2, 3} + \delta_{1, 2} x_{2, 5} + \delta_{2, 1} x_{4, 3} + \delta_{2, 2} x_{4, 5} k2,3L=δ1,1x2,3+δ1,2x2,5+δ2,1x4,3+δ2,2x4,5
∂ L ∂ k 3 , 1 = δ 1 , 1 x 3 , 1 + δ 1 , 2 x 3 , 3 + δ 2 , 1 x 5 , 1 + δ 2 , 2 x 5 , 3 \frac{\partial L}{\partial k_{3, 1}} = \delta_{1, 1} x_{3, 1} + \delta_{1, 2} x_{3, 3} + \delta_{2, 1} x_{5, 1} + \delta_{2, 2} x_{5, 3} k3,1L=δ1,1x3,1+δ1,2x3,3+δ2,1x5,1+δ2,2x5,3
∂ L ∂ k 3 , 2 = δ 1 , 1 x 3 , 2 + δ 1 , 2 x 3 , 4 + δ 2 , 1 x 5 , 2 + δ 2 , 2 x 5 , 4 \frac{\partial L}{\partial k_{3, 2}} = \delta_{1, 1} x_{3, 2} + \delta_{1, 2} x_{3, 4} + \delta_{2, 1} x_{5, 2} + \delta_{2, 2} x_{5, 4} k3,2L=δ1,1x3,2+δ1,2x3,4+δ2,1x5,2+δ2,2x5,4
∂ L ∂ k 3 , 3 = δ 1 , 1 x 3 , 3 + δ 1 , 2 x 3 , 5 + δ 2 , 1 x 5 , 3 + δ 2 , 2 x 5 , 5 \frac{\partial L}{\partial k_{3, 3}} = \delta_{1, 1} x_{3, 3} + \delta_{1, 2} x_{3, 5} + \delta_{2, 1} x_{5, 3} + \delta_{2, 2} x_{5, 5} k3,3L=δ1,1x3,3+δ1,2x3,5+δ2,1x5,3+δ2,2x5,5

∂ L ∂ b = δ 1 , 1 + δ 1 , 2 + δ 2 , 1 + δ 2 , 2 \frac{\partial L}{\partial b} = \delta_{1, 1} + \delta_{1, 2} + \delta_{2, 1} + \delta_{2, 2} bL=δ1,1+δ1,2+δ2,1+δ2,2

  和前面地误差传递类似,我们发现可以在误差矩阵中插入若干个0来和输入矩阵 x x x来保持维度上的匹配。即有:

∂ L ∂ k = [ ∂ L ∂ k i , j ] = [ x 1 , 1 x 1 , 2 x 1 , 3 x 1 , 4 x 1 , 5 x 2 , 1 x 2 , 2 x 2 , 3 x 2 , 4 x 2 , 5 x 3 , 1 x 3 , 2 x 3 , 3 x 3 , 4 x 3 , 5 x 4 , 1 x 4 , 2 x 4 , 3 x 4 , 4 x 4 , 5 x 5 , 1 x 5 , 2 x 5 , 3 x 5 , 4 x 5 , 5 ]    c o n v    [ δ 1 , 1 0 δ 1 , 2 0 0 0 δ 2 , 1 0 δ 2 , 2 ] \frac{\partial L}{\partial k} = [\frac{\partial L}{\partial k_{i, j}}] = \begin{bmatrix} x_{1, 1} & x_{1, 2} & x_{1, 3} &x_{1, 4} &x_{1, 5} \\ x_{2, 1} & x_{2, 2} & x_{2, 3} &x_{2, 4} &x_{2, 5} \\ x_{3, 1} & x_{3, 2} & x_{3, 3} &x_{3, 4} &x_{3, 5} \\ x_{4, 1} & x_{4, 2} & x_{4, 3} &x_{4, 4} &x_{4, 5} \\ x_{5, 1} & x_{5, 2} & x_{5, 3} &x_{5, 4} &x_{5, 5} \\ \end{bmatrix} \; conv \; \begin{bmatrix} \delta_{1, 1} & 0 & \delta_{1, 2} \\ 0 & 0 & 0 \\ \delta_{2, 1} & 0 & \delta_{2, 2} \\ \end{bmatrix} kL=[ki,jL]=x1,1x2,1x3,1x4,1x5,1x1,2x2,2x3,2x4,2x5,2x1,3x2,3x3,3x4,3x5,3x1,4x2,4x3,4x4,4x5,4x1,5x2,5x3,5x4,5x5,5convδ1,10δ2,1000δ1,20δ2,2

  据此,我们可以发现,**在卷积核参数更新的过程中,我们也需要对误差矩阵进行插入0的操作。而且插入0的方式和误差传递过程中的方式完全相同。**所以,我们可以总结出步长为s的时候卷积反向传播的卷积核参数更新的方法,即:1.首先在接收到的误差矩阵中插入合适数目的0,2.在输入矩阵 x x x上应用误差矩阵进行步长为1的卷积,从而得到卷积核的更新梯度。

  同样,我们由上面的推导可以发现,无论是何种方式的卷积操作,偏置项 b b b的更新梯度都是接收到的误差矩阵中的元素之和。

四、总结

  我们将上面的求解过程总结如下有:

参数设置
输入矩阵 x x x一个二维矩阵
输入卷积核 k k k一个二维矩阵
步长 s t r i d e stride stride一个正整数s
paddingVALID
偏置项 b b b一个浮点数

  正向传播:

conv(x, kernel, bias, "VALID")

  反向传播:

conv_backward(error, x, kernel, bias):
	# 计算传递给下一层的误差
	1.在接收到的error矩阵的矩阵中插入合适数目的0,使得每个元素之间的0的数目为(stride - 1)
	2.在error周围填补上合适数目的0
	3.将kernel旋转180°
	4.将填补上0的误差和旋转之后的kernel进行步长为1的卷积,从而得到传递给下一层的误差new_error。
	
	# 更新参数
	1.在接收到的error矩阵的矩阵中插入合适数目的0,使得每个元素之间的0的数目为(stride - 1)
	2.将输入矩阵x和插入0之后的误差矩阵error进行步长为1的卷积,得到kernel的更新梯度
	3.将上一层传递来的误差矩阵error所有元素求和,得到bias的更新梯度
	4.kernel := kernel - 学习率 * kernel的更新梯度
	5.bias := bias - 学习率 * bias的更新梯度
	
	# 返回误差,用以传递到下一层
	return new_error
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值