CNN | 03卷积层的训练

3 卷积层的训练

同全连接层一样,卷积层的训练也需要从上一层回传的误差矩阵,然后计算:

  1. 本层的权重矩阵的误差项
  2. 本层的需要回传到下一层的误差矩阵

在下面的描述中,我们假设已经得到了从上一层回传的误差矩阵,并且已经经过了激活函数的反向传导。

3.1 计算反向传播的梯度矩阵

正向公式:

Z = W ∗ A + b (0) Z = W*A+b \tag{0} Z=WA+b(0)

其中,W是卷积核,*表示卷积(互相关)计算,A为当前层的输入项,b是偏移(未在图中画出),Z为当前层的输出项,但尚未经过激活函数处理。

我们举一个具体的例子便于分析。图17-21是正向计算过程。

图17-21 卷积正向运算

分解到每一项就是下列公式:

z 11 = w 11 ⋅ a 11 + w 12 ⋅ a 12 + w 21 ⋅ a 21 + w 22 ⋅ a 22 + b (1) z_{11} = w_{11} \cdot a_{11} + w_{12} \cdot a_{12} + w_{21} \cdot a_{21} + w_{22} \cdot a_{22} + b \tag{1} z11=w11a11+w12a12+w21a21+w22a22+b(1)
z 12 = w 11 ⋅ a 12 + w 12 ⋅ a 13 + w 21 ⋅ a 22 + w 22 ⋅ a 23 + b (2) z_{12} = w_{11} \cdot a_{12} + w_{12} \cdot a_{13} + w_{21} \cdot a_{22} + w_{22} \cdot a_{23} + b \tag{2} z12=w11a12+w12a13+w21a22+w22a23+b(2)
z 21 = w 11 ⋅ a 21 + w 12 ⋅ a 22 + w 21 ⋅ a 31 + w 22 ⋅ a 32 + b (3) z_{21} = w_{11} \cdot a_{21} + w_{12} \cdot a_{22} + w_{21} \cdot a_{31} + w_{22} \cdot a_{32} + b \tag{3} z21=w11a21+w12a22+w21a31+w22a32+b(3)
z 22 = w 11 ⋅ a 22 + w 12 ⋅ a 23 + w 21 ⋅ a 32 + w 22 ⋅ a 33 + b (4) z_{22} = w_{11} \cdot a_{22} + w_{12} \cdot a_{23} + w_{21} \cdot a_{32} + w_{22} \cdot a_{33} + b \tag{4} z22=w11a22+w12a23+w21a32+w22a33+b(4)

求损失函数J对a11的梯度:

∂ J ∂ a 11 = ∂ J ∂ z 11 ∂ z 11 ∂ a 11 = δ z 11 ⋅ w 11 (5) \frac{\partial J}{\partial a_{11}}=\frac{\partial J}{\partial z_{11}} \frac{\partial z_{11}}{\partial a_{11}}=\delta_{z11}\cdot w_{11} \tag{5} a11J=z11Ja11z11=δz11w11(5)

上式中, δ z 11 \delta_{z11} δz11是从网络后端回传到本层的z11单元的梯度。

求J对a12的梯度时,先看正向公式,发现a12对z11和z12都有贡献,因此需要二者的偏导数相加:

∂ J ∂ a 12 = ∂ J ∂ z 11 ∂ z 11 ∂ a 12 + ∂ J ∂ z 12 ∂ z 12 ∂ a 12 = δ z 11 ⋅ w 12 + δ z 12 ⋅ w 11 (6) \frac{\partial J}{\partial a_{12}}=\frac{\partial J}{\partial z_{11}} \frac{\partial z_{11}}{\partial a_{12}}+\frac{\partial J}{\partial z_{12}} \frac{\partial z_{12}}{\partial a_{12}}=\delta_{z11} \cdot w_{12}+\delta_{z12} \cdot w_{11} \tag{6} a12J=z11Ja12z11+z12Ja12z12=δz11w12+δz12w11(6)

最复杂的是求a22的梯度,因为从正向公式看,所有的输出都有a22的贡献,所以:

∂ J ∂ a 22 = ∂ J ∂ z 11 ∂ z 11 ∂ a 22 + ∂ J ∂ z 12 ∂ z 12 ∂ a 22 + ∂ J ∂ z 21 ∂ z 21 ∂ a 22 + ∂ J ∂ z 22 ∂ z 22 ∂ a 22 \frac{\partial J}{\partial a_{22}}=\frac{\partial J}{\partial z_{11}} \frac{\partial z_{11}}{\partial a_{22}}+\frac{\partial J}{\partial z_{12}} \frac{\partial z_{12}}{\partial a_{22}}+\frac{\partial J}{\partial z_{21}} \frac{\partial z_{21}}{\partial a_{22}}+\frac{\partial J}{\partial z_{22}} \frac{\partial z_{22}}{\partial a_{22}} a22J=z11Ja22z11+z12Ja22z12+z21Ja22z21+z22Ja22z22
= δ z 11 ⋅ w 22 + δ z 12 ⋅ w 21 + δ z 21 ⋅ w 12 + δ z 22 ⋅ w 11 (7) =\delta_{z11} \cdot w_{22} + \delta_{z12} \cdot w_{21} + \delta_{z21} \cdot w_{12} + \delta_{z22} \cdot w_{11} \tag{7} =δz11w22+δz12w21+δz21w12+δz22w11(7)

同理可得所有a的梯度。

观察公式7中的w的顺序,貌似是把原始的卷积核旋转了180度,再与传入误差项做卷积操作,即可得到所有元素的误差项。而公式5和公式6并不完备,是因为二者处于角落,这和卷积正向计算中的padding是相同的现象。因此,我们把传入的误差矩阵Delta-In做一个zero padding,再乘以旋转180度的卷积核,就是要传出的误差矩阵Delta-Out,如图17-22所示。

图17-22 卷积运算中的误差反向传播

最后可以统一成为一个简洁的公式:

δ o u t = δ i n ∗ W r o t 180 (8) \delta_{out} = \delta_{in} * W^{rot180} \tag{8} δout=δinWrot180(8)

这个误差矩阵可以继续回传到下一层。

  • 当Weights是3x3时, δ i n \delta_{in} δin需要padding=2,即加2圈0,才能和Weights卷积后,得到正确尺寸的 δ o u t \delta_{out} δout
  • 当Weights是5x5时, δ i n \delta_{in} δin需要padding=4,即加4圈0,才能和Weights卷积后,得到正确尺寸的 δ o u t \delta_{out} δout
  • 以此类推:当Weights是NxN时, δ i n \delta_{in} δin需要padding=N-1,即加N-1圈0

举例:

正向时stride=1: A ( 10 × 8 ) ∗ W ( 5 × 5 ) = Z ( 6 × 4 ) A^{(10 \times 8)}*W^{(5 \times 5)}=Z^{(6 \times 4)} A(10×8)W(5×5)=Z(6×4)

反向时, δ z ( 6 × 4 ) + 4 p a d d i n g = δ z ( 14 × 12 ) \delta_z^{(6 \times 4)} + 4 padding = \delta_z^{(14 \times 12)} δz(6×4)+4padding=δz(14×12)

然后: δ z ( 14 × 12 ) ∗ W r o t 180 ( 5 × 5 ) = δ a ( 10 × 8 ) \delta_z^{(14 \times 12)} * W^{rot180(5 \times 5)}= \delta_a^{(10 \times 8)} δz(14×12)Wrot180(5×5)=δa(10×8)

3.2 步长不为1时的梯度矩阵还原

我们先观察一下stride=1和2时,卷积结果的差异如图17-23。

图17-23 步长为1和步长为2的卷积结果的比较

二者的差别就是中间那个结果图的灰色部分。如果反向传播时,传入的误差矩阵是stride=2时的2x2的形状,那么我们只需要把它补上一个十字,变成3x3的误差矩阵,就可以用步长为1的算法了。

以此类推,如果步长为3时,需要补一个双线的十字。所以,当知道当前的卷积层步长为S(S>1)时:

  1. 得到从上层回传的误差矩阵形状,假设为 M × N M \times N M×N
  2. 初始化一个 ( M ⋅ S ) × ( N ⋅ S ) (M \cdot S) \times (N \cdot S) (MS)×(NS)的零矩阵
  3. 把传入的误差矩阵的第一行值放到零矩阵第0行的0,S,2S,3S…位置
  4. 然后把误差矩阵的第二行的值放到零矩阵第S行的0,S,2S,3S…位置

步长为2时,用实例表示就是这样:

[ δ 11 0 δ 12 0 δ 13 0 0 0 0 0 δ 21 0 δ 22 0 δ 23 ] \begin{bmatrix} \delta_{11} & 0 & \delta_{12} & 0 & \delta_{13}\\ 0 & 0 & 0 & 0 & 0\\ \delta_{21} & 0 & \delta_{22} & 0 & \delta_{23}\\ \end{bmatrix} δ110δ21000δ120δ22000δ130δ23

步长为3时,用实例表示就是这样:

[ δ 11 0 0 δ 12 0 0 δ 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 δ 21 0 0 δ 22 0 0 δ 23 ] \begin{bmatrix} \delta_{11} & 0 & 0 & \delta_{12} & 0 & 0 & \delta_{13}\\ 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0\\ \delta_{21} & 0 & 0 & \delta_{22} & 0 & 0 & \delta_{23}\\ \end{bmatrix} δ1100δ2100000000δ1200δ2200000000δ1300δ23

3.3 有多个卷积核时的梯度计算

有多个卷积核也就意味着有多个输出通道。

也就是14.1中的升维卷积,如图17-24。

图17-24 升维卷积

正向公式:

z 111 = w 111 ⋅ a 11 + w 112 ⋅ a 12 + w 121 ⋅ a 21 + w 122 ⋅ a 22 z111 = w111 \cdot a11 + w112 \cdot a12 + w121 \cdot a21 + w122 \cdot a22 z111=w111a11+w112a12+w121a21+w122a22
z 112 = w 111 ⋅ a 12 + w 112 ⋅ a 13 + w 121 ⋅ a 22 + w 122 ⋅ a 23 z112 = w111 \cdot a12 + w112 \cdot a13 + w121 \cdot a22 + w122 \cdot a23 z112=w111a12+w112a13+w121a22+w122a23
z 121 = w 111 ⋅ a 21 + w 112 ⋅ a 22 + w 121 ⋅ a 31 + w 122 ⋅ a 32 z121 = w111 \cdot a21 + w112 \cdot a22 + w121 \cdot a31 + w122 \cdot a32 z121=w111a21+w112a22+w121a31+w122a32
z 122 = w 111 ⋅ a 22 + w 112 ⋅ a 23 + w 121 ⋅ a 32 + w 122 ⋅ a 33 z122 = w111 \cdot a22 + w112 \cdot a23 + w121 \cdot a32 + w122 \cdot a33 z122=w111a22+w112a23+w121a32+w122a33

z 211 = w 211 ⋅ a 11 + w 212 ⋅ a 12 + w 221 ⋅ a 21 + w 222 ⋅ a 22 z211 = w211 \cdot a11 + w212 \cdot a12 + w221 \cdot a21 + w222 \cdot a22 z211=w211a11+w212a12+w221a21+w222a22
z 212 = w 211 ⋅ a 12 + w 212 ⋅ a 13 + w 221 ⋅ a 22 + w 222 ⋅ a 23 z212 = w211 \cdot a12 + w212 \cdot a13 + w221 \cdot a22 + w222 \cdot a23 z212=w211a12+w212a13+w221a22+w222a23
z 221 = w 211 ⋅ a 21 + w 212 ⋅ a 22 + w 221 ⋅ a 31 + w 222 ⋅ a 32 z221 = w211 \cdot a21 + w212 \cdot a22 + w221 \cdot a31 + w222 \cdot a32 z221=w211a21+w212a22+w221a31+w222a32
z 222 = w 211 ⋅ a 22 + w 212 ⋅ a 23 + w 221 ⋅ a 32 + w 222 ⋅ a 33 z222 = w211 \cdot a22 + w212 \cdot a23 + w221 \cdot a32 + w222 \cdot a33 z222=w211a22+w212a23+w221a32+w222a33

求J对a22的梯度:

∂ J ∂ a 22 = ∂ J ∂ Z 1 ∂ Z 1 ∂ a 22 + ∂ J ∂ Z 2 ∂ Z 2 ∂ a 22 \frac{\partial J}{\partial a_{22}}=\frac{\partial J}{\partial Z_{1}} \frac{\partial Z_{1}}{\partial a_{22}}+\frac{\partial J}{\partial Z_{2}} \frac{\partial Z_{2}}{\partial a_{22}} a22J=Z1Ja22Z1+Z2Ja22Z2
= ∂ J ∂ z 111 ∂ z 111 ∂ a 22 + ∂ J ∂ z 112 ∂ z 112 ∂ a 22 + ∂ J ∂ z 121 ∂ z 121 ∂ a 22 + ∂ J ∂ z 122 ∂ z 122 ∂ a 22 =\frac{\partial J}{\partial z_{111}} \frac{\partial z_{111}}{\partial a_{22}}+\frac{\partial J}{\partial z_{112}} \frac{\partial z_{112}}{\partial a_{22}}+\frac{\partial J}{\partial z_{121}} \frac{\partial z_{121}}{\partial a_{22}}+\frac{\partial J}{\partial z_{122}} \frac{\partial z_{122}}{\partial a_{22}} =z111Ja22z111+z112Ja22z112+z121Ja22z121+z122Ja22z122
+ ∂ J ∂ z 211 ∂ z 211 ∂ a 22 + ∂ J ∂ z 212 ∂ z 212 ∂ a 22 + ∂ J ∂ z 221 ∂ z 221 ∂ a 22 + ∂ J ∂ z 222 ∂ z 222 ∂ a 22 +\frac{\partial J}{\partial z_{211}} \frac{\partial z_{211}}{\partial a_{22}}+\frac{\partial J}{\partial z_{212}} \frac{\partial z_{212}}{\partial a_{22}}+\frac{\partial J}{\partial z_{221}} \frac{\partial z_{221}}{\partial a_{22}}+\frac{\partial J}{\partial z_{222}} \frac{\partial z_{222}}{\partial a_{22}} +z211Ja22z211+z212Ja22z212+z221Ja22z221+z222Ja22z222
= ( δ z 111 ⋅ w 122 + δ z 112 ⋅ w 121 + δ z 121 ⋅ w 112 + δ z 122 ⋅ w 111 ) =(\delta_{z111} \cdot w_{122} + \delta_{z112} \cdot w_{121} + \delta_{z121} \cdot w_{112} + \delta_{z122} \cdot w_{111}) =(δz111w122+δz112w121+δz121w112+δz122w111)
+ ( δ z 211 ⋅ w 222 + δ z 212 ⋅ w 221 + δ z 221 ⋅ w 212 + δ z 222 ⋅ w 211 ) +(\delta_{z211} \cdot w_{222} + \delta_{z212} \cdot w_{221} + \delta_{z221} \cdot w_{212} + \delta_{z222} \cdot w_{211}) +(δz211w222+δz212w221+δz221w212+δz222w211)
= δ z 1 ∗ W 1 r o t 180 + δ z 2 ∗ W 2 r o t 180 =\delta_{z1} * W_1^{rot180} + \delta_{z2} * W_2^{rot180} =δz1W1rot180+δz2W2rot180

因此和公式8相似,先在 δ i n \delta_{in} δin外面加padding,然后和对应的旋转后的卷积核相乘,再把几个结果相加,就得到了需要前传的梯度矩阵:

δ o u t = ∑ m δ i n _ m ∗ W m r o t 180 (9) \delta_{out} = \sum_m \delta_{in\_m} * W^{rot180}_m \tag{9} δout=mδin_mWmrot180(9)

3.4 有多个输入时的梯度计算

当输入层是多个图层时,每个图层必须对应一个卷积核,如图17-25。

图17-25 多个图层的卷积必须有一一对应的卷积核

所以有前向公式:

z 11 = w 111 ⋅ a 111 + w 112 ⋅ a 112 + w 121 ⋅ a 121 + w 122 ⋅ a 122 + w 211 ⋅ a 211 + w 212 ⋅ a 212 + w 221 ⋅ a 221 + w 222 ⋅ a 222 (10) \begin{aligned} z11 &= w111 \cdot a111 + w112 \cdot a112 + w121 \cdot a121 + w122 \cdot a122 \\ &+ w211 \cdot a211 + w212 \cdot a212 + w221 \cdot a221 + w222 \cdot a222 \end{aligned} \tag{10} z11=w111a111+w112a112+w121a121+w122a122+w211a211+w212a212+w221a221+w222a222(10)
z 12 = w 111 ⋅ a 112 + w 112 ⋅ a 113 + w 121 ⋅ a 122 + w 122 ⋅ a 123 + w 211 ⋅ a 212 + w 212 ⋅ a 213 + w 221 ⋅ a 222 + w 222 ⋅ a 223 (11) \begin{aligned} z12 &= w111 \cdot a112 + w112 \cdot a113 + w121 \cdot a122 + w122 \cdot a123 \\ &+ w211 \cdot a212 + w212 \cdot a213 + w221 \cdot a222 + w222 \cdot a223 \end{aligned}\tag{11} z12=w111a112+w112a113+w121a122+w122a123+w211a212+w212a213+w221a222+w222a223(11)
z 21 = w 111 ⋅ a 121 + w 112 ⋅ a 122 + w 121 ⋅ a 131 + w 122 ⋅ a 132 + w 211 ⋅ a 221 + w 212 ⋅ a 222 + w 221 ⋅ a 231 + w 222 ⋅ a 232 (12) \begin{aligned} z21 &= w111 \cdot a121 + w112 \cdot a122 + w121 \cdot a131 + w122 \cdot a132 \\ &+ w211 \cdot a221 + w212 \cdot a222 + w221 \cdot a231 + w222 \cdot a232 \end{aligned}\tag{12} z21=w111a121+w112a122+w121a131+w122a132+w211a221+w212a222+w221a231+w222a232(12)
z 22 = w 111 ⋅ a 122 + w 112 ⋅ a 123 + w 121 ⋅ a 132 + w 122 ⋅ a 133 + w 211 ⋅ a 222 + w 212 ⋅ a 223 + w 221 ⋅ a 232 + w 222 ⋅ a 233 (13) \begin{aligned} z22 &= w111 \cdot a122 + w112 \cdot a123 + w121 \cdot a132 + w122 \cdot a133 \\ &+ w211 \cdot a222 + w212 \cdot a223 + w221 \cdot a232 + w222 \cdot a233 \end{aligned}\tag{13} z22=w111a122+w112a123+w121a132+w122a133+w211a222+w212a223+w221a232+w222a233(13)

最复杂的情况,求J对a122的梯度:

∂ J ∂ a 111 = ∂ J ∂ z 11 ∂ z 11 ∂ a 122 + ∂ J ∂ z 12 ∂ z 12 ∂ a 122 + ∂ J ∂ z 21 ∂ z 21 ∂ a 122 + ∂ J ∂ z 22 ∂ z 22 ∂ a 122 = δ z 11 ⋅ w 122 + δ z 12 ⋅ w 121 + δ z 21 ⋅ w 112 + δ z 22 ⋅ w 111 \begin{aligned} \frac{\partial J}{\partial a111}&=\frac{\partial J}{\partial z11}\frac{\partial z11}{\partial a122} + \frac{\partial J}{\partial z12}\frac{\partial z12}{\partial a122} + \frac{\partial J}{\partial z21}\frac{\partial z21}{\partial a122} + \frac{\partial J}{\partial z22}\frac{\partial z22}{\partial a122} \\ &=\delta_{z11} \cdot w122 + \delta_{z12} \cdot w121 + \delta_{z21} \cdot w112 + \delta_{z22} \cdot w111 \end{aligned} a111J=z11Ja122z11+z12Ja122z12+z21Ja122z21+z22Ja122z22=δz11w122+δz12w121+δz21w112+δz22w111

泛化以后得到:

δ o u t 1 = δ i n ∗ W 1 r o t 180 (14) \delta_{out1} = \delta_{in} * W_1^{rot180} \tag{14} δout1=δinW1rot180(14)

最复杂的情况,求J对a222的梯度:

∂ J ∂ a 211 = ∂ J ∂ z 11 ∂ z 11 ∂ a 222 + ∂ J ∂ z 12 ∂ z 12 ∂ a 222 + ∂ J ∂ z 21 ∂ z 21 ∂ a 222 + ∂ J ∂ z 22 ∂ z 22 ∂ a 222 = δ z 11 ⋅ w 222 + δ z 12 ⋅ w 221 + δ z 21 ⋅ w 212 + δ z 22 ⋅ w 211 \begin{aligned} \frac{\partial J}{\partial a211}&=\frac{\partial J}{\partial z11}\frac{\partial z11}{\partial a222} + \frac{\partial J}{\partial z12}\frac{\partial z12}{\partial a222} + \frac{\partial J}{\partial z21}\frac{\partial z21}{\partial a222} + \frac{\partial J}{\partial z22}\frac{\partial z22}{\partial a222} \\ &=\delta_{z11} \cdot w222 + \delta_{z12} \cdot w221 + \delta_{z21} \cdot w212 + \delta_{z22} \cdot w211 \end{aligned} a211J=z11Ja222z11+z12Ja222z12+z21Ja222z21+z22Ja222z22=δz11w222+δz12w221+δz21w212+δz22w211

泛化以后得到:

δ o u t 2 = δ i n ∗ W 2 r o t 180 (15) \delta_{out2} = \delta_{in} * W_2^{rot180} \tag{15} δout2=δinW2rot180(15)

3.5 权重(卷积核)梯度计算

图17-26展示了我们已经熟悉的卷积正向运算。

图17-26 卷积正向计算

要求J对w11的梯度,从正向公式可以看到,w11对所有的z都有贡献,所以:

∂ J ∂ w 11 = ∂ J ∂ z 11 ∂ z 11 ∂ w 11 + ∂ J ∂ z 12 ∂ z 12 ∂ w 11 + ∂ J ∂ z 21 ∂ z 21 ∂ w 11 + ∂ J ∂ z 22 ∂ z 22 ∂ w 11 = δ z 11 ⋅ a 11 + δ z 12 ⋅ a 12 + δ z 21 ⋅ a 21 + δ z 22 ⋅ a 22 (9) \begin{aligned} \frac{\partial J}{\partial w_{11}} &= \frac{\partial J}{\partial z_{11}}\frac{\partial z_{11}}{\partial w_{11}} + \frac{\partial J}{\partial z_{12}}\frac{\partial z_{12}}{\partial w_{11}} + \frac{\partial J}{\partial z_{21}}\frac{\partial z_{21}}{\partial w_{11}} + \frac{\partial J}{\partial z_{22}}\frac{\partial z_{22}}{\partial w_{11}} \\ &=\delta_{z11} \cdot a_{11} + \delta_{z12} \cdot a_{12} + \delta_{z21} \cdot a_{21} + \delta_{z22} \cdot a_{22} \end{aligned} \tag{9} w11J=z11Jw11z11+z12Jw11z12+z21Jw11z21+z22Jw11z22=δz11a11+δz12a12+δz21a21+δz22a22(9)

对W22也是一样的:

∂ J ∂ w 12 = ∂ J ∂ z 11 ∂ z 11 ∂ w 12 + ∂ J ∂ z 12 ∂ z 12 ∂ w 12 + ∂ J ∂ z 21 ∂ z 21 ∂ w 12 + ∂ J ∂ z 22 ∂ z 22 ∂ w 12 = δ z 11 ⋅ a 12 + δ z 12 ⋅ a 13 + δ z 21 ⋅ a 22 + δ z 22 ⋅ a 23 (10) \begin{aligned} \frac{\partial J}{\partial w_{12}} &= \frac{\partial J}{\partial z_{11}}\frac{\partial z_{11}}{\partial w_{12}} + \frac{\partial J}{\partial z_{12}}\frac{\partial z_{12}}{\partial w_{12}} + \frac{\partial J}{\partial z_{21}}\frac{\partial z_{21}}{\partial w_{12}} + \frac{\partial J}{\partial z_{22}}\frac{\partial z_{22}}{\partial w_{12}} \\ &=\delta_{z11} \cdot a_{12} + \delta_{z12} \cdot a_{13} + \delta_{z21} \cdot a_{22} + \delta_{z22} \cdot a_{23} \end{aligned} \tag{10} w12J=z11Jw12z11+z12Jw12z12+z21Jw12z21+z22Jw12z22=δz11a12+δz12a13+δz21a22+δz22a23(10)

观察公式8和公式9,其实也是一个标准的卷积(互相关)操作过程,因此,可以把这个过程看成图17-27。

图17-27 卷积核的梯度计算

总结成一个公式:

δ w = A ∗ δ i n (11) \delta_w = A * \delta_{in} \tag{11} δw=Aδin(11)

3.6 偏移的梯度计算

根据前向计算公式1,2,3,4,可以得到:

∂ J ∂ b = ∂ J ∂ z 11 ∂ z 11 ∂ b + ∂ J ∂ z 12 ∂ z 12 ∂ b + ∂ J ∂ z 21 ∂ z 21 ∂ b + ∂ J ∂ z 22 ∂ z 22 ∂ b = δ z 11 + δ z 12 + δ z 21 + δ z 22 (12) \begin{aligned} \frac{\partial J}{\partial b} &= \frac{\partial J}{\partial z_{11}}\frac{\partial z_{11}}{\partial b} + \frac{\partial J}{\partial z_{12}}\frac{\partial z_{12}}{\partial b} + \frac{\partial J}{\partial z_{21}}\frac{\partial z_{21}}{\partial b} + \frac{\partial J}{\partial z_{22}}\frac{\partial z_{22}}{\partial b} \\ &=\delta_{z11} + \delta_{z12} + \delta_{z21} + \delta_{z22} \end{aligned} \tag{12} bJ=z11Jbz11+z12Jbz12+z21Jbz21+z22Jbz22=δz11+δz12+δz21+δz22(12)

所以:

δ b = δ i n (13) \delta_b = \delta_{in} \tag{13} δb=δin(13)

每个卷积核W可能会有多个filter,或者叫子核,但是一个卷积核只有一个偏移,无论有多少子核。

3.7 计算卷积核梯度的实例说明

下面我们会用一个简单的例子来说明卷积核的训练过程。我们先制作一张样本图片,然后使用“横边检测”算子做为卷积核对该样本进行卷积,得到对比如图17-28。

图17-28 原图和经过横边检测算子的卷积结果

左侧为原始图片(80x80的灰度图),右侧为经过3x3的卷积后的结果图片(78x78的灰度图)。由于算子是横边检测,所以只保留了原始图片中的横边。

卷积核矩阵:

w = ( 0 − 1 0 0 2 0 0 − 1 0 ) w=\begin{pmatrix} 0 & -1 & 0 \\ 0 & 2 & 0 \\ 0 & -1 & 0 \end{pmatrix} w=000121000

现在我们转换一下问题:假设我们有一张原始图片(如左侧)和一张目标图片(如右侧),我们如何得到对应的卷积核呢?

我们在前面学习了线性拟合的解决方案,实际上这个问题是同一种性质的,只不过把直线拟合点阵的问题,变成了图像拟合图像的问题,如表17-3所示。

表17-3 直线拟合与图像拟合的比较

样本数据标签数据预测数据公式损失函数
直线拟合样本点x标签值y预测直线z z = x ⋅ w + b z=x \cdot w+b z=xw+b均方差
图片拟合原始图片x目标图片y预测图片z z = x ∗ w + b z=x * w+b z=xw+b均方差

直线拟合中的均方差,是计算预测值与样本点之间的距离;图片拟合中的均方差,可以直接计算两张图片对应的像素点之间的差值。

为了简化问题,我们令b=0,只求卷积核w的值,则前向公式为:

z = x ∗ w z = x * w z=xw
l o s s = 1 2 ( z − y ) 2 loss = {1 \over 2}(z-y)^2 loss=21(zy)2

反向求解w的梯度公式(从公式11得到):

∂ l o s s ∂ w = ∂ l o s s ∂ z ∂ z ∂ w = x ∗ ( z − y ) {\partial loss \over \partial w}={\partial loss \over \partial z}{\partial z \over \partial w}=x * (z-y) wloss=zlosswz=x(zy)

即w的梯度为预测图片z减去目标图片y的结果,再与原始图片x做卷积,其中x为被卷积图片,z-y为卷积核。

训练部分的代码实现如下:

def train(x, w, b, y):
    output = create_zero_array(x, w)
    for i in range(10000):
        # forward
        jit_conv_2d(x, w, b, output)
        # loss
        t1 = (output - y)
        m = t1.shape[0]*t1.shape[1]
        LOSS = np.multiply(t1, t1)
        loss = np.sum(LOSS)/2/m
        print(i,loss)
        if loss < 1e-7:
            break
        # delta
        delta = output - y
        # backward
        dw = np.zeros(w.shape)
        jit_conv_2d(x, delta, b, dw)
        w = w - 0.5 * dw/m
    #end for
    return w

一共迭代10000次:

  1. 用jit_conv_2d(x,w…)做一次前向计算
  2. 计算loss值以便检测停止条件,当loss值小于1e-7时停止迭代
  3. 然后计算delta值
  4. 再用jit_conv_2d(x,delta)做一次反向计算,得到w的梯度
  5. 最后更新卷积核w的值

运行结果:

......
3458 1.0063169744079507e-07
3459 1.0031151142628902e-07
3460 9.999234418532805e-08
w_true:
 [[ 0 -1  0]
 [ 0  2  0]
 [ 0 -1  0]]
w_result:
 [[-1.86879237e-03 -9.97261724e-01 -1.01212359e-03]
 [ 2.58961697e-03  1.99494606e+00  2.74435794e-03]
 [-8.67754199e-04 -9.97404263e-01 -1.87580756e-03]]
w allclose: True
y allclose: True

当迭代到3460次的时候,loss值小于1e-7,迭代停止。比较w_true和w_result的值,两者非常接近。用numpy.allclose()方法比较真实卷积核和训练出来的卷积核的值,结果为True。比如-1.86879237e-03,接近于0;-9.97261724e-01,接近于-1。

再比较卷积结果,当然也会非常接近,误差很小,allclose结果为True。用图示方法显示卷积结果比较如图17-29。

图17-29 真实值和训练值的卷积结果区别

人眼是看不出什么差异来的。由此我们可以直观地理解到卷积核的训练过程并不复杂。

代码位置

ch17, Level3

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值