卷积神经网络公式推导及numpy实现

本文主要侧重于网络的代码实现,具体的公式推导可参考:https://zhuanlan.zhihu.com/p/61898234
完整代码:https://github.com/hui126/Deep_Learning_Coding/blob/main/Conv.py

卷积神经网络可以看作是感知机网络的拓展,神经元的数目等于图像的通道数,输入到网络中的值由向量变为张量,与感知机网络最大的不同在于使用权值共享,即每一通道卷积运算过程中共享使用一个卷积核。

前向传递

基于numpy,假设输入特征图 a a a维度为 ( 1 , 3 , 4 , 4 ) (1,3,4,4) (1,3,4,4),卷积核 w w w的维度为 ( 2 , 3 , 2 , 2 ) (2, 3, 2, 2) (2,3,2,2),步长为 ( 1 , 1 ) (1, 1) (1,1)
a [ 0 , 0 , : , : ] = [ 1 6 6 2 4 3 4 3 3 6 7 7 1 5 1 2 ] , a [ 0 , 1 , : , : ] = [ 9 1 6 7 5 3 2 6 5 7 1 7 6 8 5 8 ] , a [ 0 , 2 , : , : ] = [ 8 6 9 5 4 6 1 6 2 3 3 8 3 5 3 6 ] a[0,0,:,:] = \left[\begin{matrix} 1 & 6 & 6 & 2 \\ 4 & 3 & 4 & 3 \\ 3 & 6 & 7 & 7 \\ 1 & 5 & 1 & 2 \end{matrix}\right], a[0,1,:,:] = \left[\begin{matrix} 9 & 1 & 6 & 7 \\ 5 & 3 & 2 & 6 \\ 5 & 7 & 1 & 7 \\ 6 & 8 & 5 & 8 \end{matrix}\right], a[0,2,:,:] = \left[\begin{matrix} 8 & 6& 9 & 5 \\ 4 & 6 & 1 & 6 \\ 2 & 3 & 3 & 8 \\ 3 & 5 & 3 & 6 \end{matrix}\right] a[0,0,:,:]=1431636564712372,a[0,1,:,:]=9556137862157678,a[0,2,:,:]=8423663591335686
卷积核为
w [ 0 , 0 , : , : ] = [ 5 9 5 8 ] , w [ 0 , 1 , : , : ] = [ 1 1 8 8 ] , w [ 0 , 2 , : , : ] = [ 7 1 2 8 ] w [ 1 , 0 , : , : ] = [ 5 6 9 3 ] , w [ 1 , 1 , : , : ] = [ 2 1 9 1 ] , w [ 1 , 2 , : , : ] = [ 8 4 3 6 ] w[0,0,:,:] = \left[\begin{matrix} 5 & 9 \\ 5 & 8\end{matrix}\right], w[0,1,:,:] = \left[\begin{matrix} 1 & 1 \\ 8 & 8\end{matrix}\right], w[0,2,:,:] = \left[\begin{matrix} 7 & 1 \\ 2 & 8\end{matrix}\right] \\ w[1,0,:,:] = \left[\begin{matrix} 5 & 6 \\ 9 & 3\end{matrix}\right], w[1,1,:,:] = \left[\begin{matrix} 2 & 1 \\ 9 & 1\end{matrix}\right], w[1,2,:,:] = \left[\begin{matrix} 8 & 4 \\ 3 & 6\end{matrix}\right] w[0,0,:,:]=[5598],w[0,1,:,:]=[1818],w[0,2,:,:]=[7218]w[1,0,:,:]=[5963],w[1,1,:,:]=[2911],w[1,2,:,:]=[8346]
偏置系数为 b = [ 1 , 2 ] b=[1, 2] b=[1,2]

则输出特征图为
z [ i , j , : , : ] = ∑ k = 0 2 a [ i , k , : , : ] ∗ w [ j , k , : , : ] + b [ j ] z[i,j,:,:] = \sum^{2}_{k=0}a[i,k,:,:]*w[j,k,:,:] + b[j] z[i,j,:,:]=k=02a[i,k,:,:]w[j,k,:,:]+b[j]
结果为:
z [ 0 , 0 , : , : ] = [ 296 250 288 277 280 294 302 297 315 ] , z [ 0 , 1 , : , : ] = [ 291 252 263 230 267 239 223 283 257 ] z[0,0,:,:] = \left[\begin{matrix} 296 & 250 & 288 \\ 277 & 280 & 294\\ 302 & 297 & 315 \end{matrix}\right], z[0,1,:,:] = \left[\begin{matrix} 291 & 252 & 263 \\ 230 & 267 & 239 \\ 223 & 283 & 257 \end{matrix}\right] z[0,0,:,:]=296277302250280297288294315,z[0,1,:,:]=291230223252267283263239257
为了便于梯度反向传播计算,我们将对卷积核与输入特征图进行变换,将卷积运算转化为矩阵乘法运算,其中卷积核转为 ( 3 ∗ 2 ∗ 2 , 2 ) (3*2*2,2) (322,2)的矩阵,
w t = w . r e s h a p e ( − 1 , 2 ) . T = [ 5 9 5 8 1 1 8 8 7 1 2 8 5 6 9 3 2 1 9 1 8 4 3 6 ] T w_t = w.reshape(-1,2).T= \left[\begin{matrix} 5&9&5&8&1&1&8&8&7&1&2&8\\ 5&6&9&3&2&1&9&1&8&4&3&6 \end{matrix}\right]^T wt=w.reshape(1,2).T=[559659831211898178142386]T
输入特征图转为 ( 1 , 9 , 12 ) (1,9,12) (1,9,12)的矩阵,
a t [ 0 ] = [ 1 6 4 3 9 1 5 3 8 6 4 6 6 6 3 4 1 6 3 2 6 9 6 1 6 2 4 3 6 7 2 6 9 5 1 6 4 3 3 6 5 3 5 7 4 6 2 3 3 4 6 7 3 2 7 1 6 1 3 3 4 3 7 7 2 6 1 7 1 6 3 8 3 6 1 5 5 7 6 8 2 3 3 5 6 7 5 1 7 1 8 5 3 3 5 3 7 7 1 2 1 7 5 8 3 8 3 6 ] a_t[0]=\left[\begin{matrix} 1& 6& 4& 3& 9& 1& 5& 3& 8& 6& 4& 6 \\ 6& 6& 3& 4& 1& 6& 3& 2& 6& 9& 6& 1\\ 6& 2& 4& 3& 6& 7& 2& 6& 9& 5& 1& 6\\ 4& 3& 3& 6& 5& 3& 5& 7& 4& 6& 2& 3\\ 3& 4& 6& 7& 3& 2& 7& 1& 6& 1& 3& 3\\ 4& 3& 7& 7& 2& 6& 1& 7& 1& 6& 3& 8\\ 3& 6& 1& 5& 5& 7& 6& 8& 2& 3& 3& 5\\ 6& 7& 5& 1& 7& 1& 8& 5& 3& 3& 5& 3\\ 7& 7& 1& 2& 1& 7& 5& 8& 3& 8& 3& 6 \end{matrix}\right]\\ at[0]=166434367662343677434367151343677512916532571167326717532571685326717858869461233695616338461233353616338536
z t [ 0 ] = a t [ 0 ] w t z_t[0] = a_t[0]w_t zt[0]=at[0]wt,其中 z t [ 0 ] z_t[0] zt[0]的每一列对应于卷积输出的第一张特征图的每一个通道的值,所以 z = z t . t r a n s p o s e ( [ 0 , 2 , 1 ] ) . r e h s a p e ( 1 , 2 , 3 , 3 ) + b z=z_t.transpose([0, 2, 1]).rehsape(1,2,3,3)+b z=zt.transpose([0,2,1]).rehsape(1,2,3,3)+b


符号约定:假设每一层的神经元数目(输出特征图通道数)为 n l n^l nl,共 L L L层,其中 n 0 n^0 n0为输入图像的通道数目。

a l − 1 a^{l-1} al1 l l l层卷积层的输入特征图,维度为 ( B , C l − 1 , H l − 1 , W l − 1 ) (B,C^{l-1},H^{l-1},W^{l-1}) (B,Cl1,Hl1,Wl1)

z l z^l zl 卷积输出结果,未经过激活函数,维度为 ( B , C l , H l , W l ) (B,C^l,H^l,W^l) (B,Cl,Hl,Wl)

h ( z ) h(z) h(z) 激活函数;

w l w^l wl 卷积核,维度为 ( C l , C l − 1 , h l , w l ) (C^{l}, C^{l-1}, h^l, w^l) (Cl,Cl1,hl,wl)

b l b^l bl 偏置系数,维度为 ( C l , ) (C^l,) (Cl,)

卷积输出结果为,
z l [ i , j , : , : ] = ∑ k = 0 C l − 1 − 1 a l − 1 [ i , k , : , : ] ∗ w l [ j , k , : , : ] + b l [ j ] i = 0 , ⋯   , B − 1 , j = 0 , ⋯   , C l − 1 z^{l}[i,j,:,:] = \sum^{C^{l-1}-1}_{k=0}a^{l-1}[i,k,:,:]*w^l[j,k,:,:] + b^l[j] \quad i=0,\cdots,B-1,j=0,\cdots,C^l-1 zl[i,j,:,:]=k=0Cl11al1[i,k,:,:]wl[j,k,:,:]+bl[j]i=0,,B1,j=0,,Cl1
将卷积处理过程转化为矩阵乘积,
a t l − 1 = t r a n s ( a l − 1 ) , d i m = ( B , H l ⋅ W l , h l ⋅ w l ⋅ C l − 1 ) a_t^{l-1} = trans(a^{l-1}), dim=(B, H^l\cdot W^l,h^l\cdot w^l\cdot C^{l-1}) atl1=trans(al1),dim=(B,HlWl,hlwlCl1)
其中 t r a n s ( a l − 1 ) trans(a^{l-1}) trans(al1)为将每一个卷积核停留处的对应数值展成一行,存储在 a t l − 1 a^{l-1}_t atl1中。
w t = w . r e s h a p e ( − 1 , C l ) . T w_t = w.reshape(-1, C^l).T wt=w.reshape(1,Cl).T

z t l = a t l − 1 w t z l = z t l . t r a n s p o s e ( [ 0 , 2 , 1 ] ) . r e s h a p e ( B , C l , H l , W l ) + b . r e s h a p e ( 1 , − 1 , 1 , 1 ) z_t^l = a^{l-1}_tw_t \\ z^l = z_t^l.transpose([0, 2, 1]).reshape(B, C^l, H^l,W^l)+b.reshape(1, -1, 1, 1) ztl=atl1wtzl=ztl.transpose([0,2,1]).reshape(B,Cl,Hl,Wl)+b.reshape(1,1,1,1)

def forward(self, inputs):
    inputs = self.pad(inputs)
    self.input_shape = inputs.shape
    self.batch_size, in_channels, self.H_in, self.W_in = inputs.shape
    assert in_channels == self.in_channels, 'inputs dim1({}) is not equal to convolutional in_channels({})'.format(in_channels, self.in_channels)

    self.H_out = (inputs.shape[2] - self.kernel_size[0]) // self.stride[0] + 1
    self.W_out = (inputs.shape[3] - self.kernel_size[1]) // self.stride[1] + 1

    self.input_trans = np.empty((self.batch_size, self.H_out * self.W_out, self.kernel_trans.shape[0]))

    ind = 0
    h = 0
    while (h + self.kernel_size[0] <= inputs.shape[2]):
        w = 0
        while (w + self.kernel_size[1] <= inputs.shape[3]):
            self.input_trans[:, ind, :] = inputs[:, :, h:h + self.kernel_size[0], w:w + self.kernel_size[1]].reshape(self.batch_size, -1)
            w += self.stride[1]
            ind += 1
            h += self.stride[0]

            output = self.input_trans @ self.kernel_trans
            output = output.transpose([0, 2, 1]).reshape(self.batch_size, self.out_channels, self.H_out, self.W_out)
            if self.bias is not None:
                output += self.bias.reshape(1, -1, 1, 1)
	return self.input_trans, output

反向传播

与全连接层类似,在进行梯度反向传播过程中,计算损失函数对 z l z^l zl的反向传播误差,然后再计算对卷积核及偏置的导数。

假设输入特征图为
a = [ a 11 a 12 a 13 a 14 a 21 a 22 a 23 a 24 a 31 a 32 a 33 a 34 a 41 a 42 a 43 a 44 ] a = \left[\begin{matrix} a_{11}&a_{12}&a_{13}&a_{14} \\a_{21}&a_{22}&a_{23}&a_{24} \\a_{31}&a_{32}&a_{33}&a_{34}\\a_{41}&a_{42}&a_{43}&a_{44}\end{matrix}\right] a=a11a21a31a41a12a22a32a42a13a23a33a43a14a24a34a44
卷积核为
w = [ w 11 w 12 w 21 w 22 ] w = \left[\begin{matrix}w_{11}&w_{12}\\w_{21}&w_{22}\end{matrix}\right] w=[w11w21w12w22]
卷积步长为 ( 1 , 1 ) (1,1) (1,1),损失函数对卷积结果的反向传播误差为:
δ = [ δ 11 δ 12 δ 13 δ 21 δ 22 δ 23 δ 31 δ 32 δ 33 ] \delta = \left[\begin{matrix} \delta_{11}&\delta_{12}&\delta_{13} \\ \delta_{21}&\delta_{22}&\delta_{23} \\\delta_{31}&\delta_{32}&\delta_{33} \end{matrix}\right] δ=δ11δ21δ31δ12δ22δ32δ13δ23δ33
则损失函数对输入特征图的梯度为:
[ w 11 δ 11 w 11 δ 12 + w 12 δ 11 w 12 δ 12 + w 11 δ 13 w 12 δ 13 w 21 δ 11 + w 11 δ 21 w 22 δ 11 + w 21 δ 12 + w 12 δ 21 + w 11 δ 22 w 22 δ 12 + w 21 δ 13 + w 12 δ 22 + w 11 δ 23 w 21 δ 13 + w 11 δ 23 w 21 δ 21 + w 11 δ 31 w 22 δ 21 + w 21 δ 22 + w 12 δ 31 + w 11 δ 32 w 22 δ 22 + w 21 δ 23 + w 12 δ 32 + w 11 δ 33 w 21 δ 23 + w 11 δ 33 w 21 δ 31 w 22 δ 31 + w 21 δ 32 w 22 δ 32 + w 21 δ 33 w 22 δ 33 ] \left[\begin{matrix} w_{11}\delta_{11} & w_{11}\delta_{12}+w_{12}\delta_{11}& w_{12}\delta_{12}+w_{11}\delta_{13}& w_{12}\delta_{13} \\ w_{21}\delta_{11}+w_{11}\delta_{21}& w_{22}\delta_{11}+w_{21}\delta_{12}+w_{12}\delta_{21}+w_{11}\delta_{22}& w_{22}\delta_{12}+w_{21}\delta_{13}+w_{12}\delta_{22}+w_{11}\delta_{23}& w_{21}\delta_{13}+w_{11}\delta_{23} \\ w_{21}\delta_{21}+w_{11}\delta_{31}& w_{22}\delta_{21}+w_{21}\delta_{22}+w_{12}\delta_{31}+w_{11}\delta_{32}& w_{22}\delta_{22}+w_{21}\delta_{23}+w_{12}\delta_{32}+w_{11}\delta_{33}& w_{21}\delta_{23}+w_{11}\delta_{33} \\ w_{21}\delta_{31} & w_{22}\delta_{31}+w_{21}\delta_{32}& w_{22}\delta_{32}+w_{21}\delta_{33}& w_{22}\delta_{33} \end{matrix}\right] w11δ11w21δ11+w11δ21w21δ21+w11δ31w21δ31w11δ12+w12δ11w22δ11+w21δ12+w12δ21+w11δ22w22δ21+w21δ22+w12δ31+w11δ32w22δ31+w21δ32w12δ12+w11δ13w22δ12+w21δ13+w12δ22+w11δ23w22δ22+w21δ23+w12δ32+w11δ33w22δ32+w21δ33w12δ13w21δ13+w11δ23w21δ23+w11δ33w22δ33
即将对输出图的误差进行0填充后,与卷积核旋转180度后进行卷积,即获得损失函数对输入特征图的误差。

所以当已知 δ l + 1 \delta^{l+1} δl+1时,计算 δ l \delta^l δl
δ l = δ l + 1 ∗ R O T 180 ( w l + 1 ) ⊙ ∂ a l ∂ z l \delta^l = \delta^{l+1}*ROT180(w^{l+1})\odot \frac{\partial a^l}{\partial z^l} δl=δl+1ROT180(wl+1)zlal


在这里同样换一种思路,根据上一小节的转换后的前向传播公式,我们可以计算损失函数对 a t l − 1 a^{l-1}_t atl1的反向传播误差,然后将结果转为对应的 a l − 1 a^{l-1} al1。已知 δ l \delta^l δl
δ t l = δ l . t r a n s p o s e ( [ 0 , 2 , 3 , 1 ] ) . r e s h a p e ( B , H l ⋅ W l , C l ) \delta^l_t = \delta^l.transpose([0, 2, 3, 1]).reshape(B, H^l\cdot W^l,C^l)\\ δtl=δl.transpose([0,2,3,1]).reshape(B,HlWl,Cl)
计算 δ t l , d i m = ( B , H l W l , C l ) \delta^l_t,dim=(B,H^lW^l,C^l) δtl,dim=(B,HlWl,Cl) ( w t l ) T , d i m = ( C l , C l − 1 h l w l ) (w^l_t)^T,dim=(C^l,C^{l-1}h^lw^l) (wtl)T,dim=(Cl,Cl1hlwl)的张量乘积即可获得损失函数对 a t l − 1 a^{l-1}_t atl1的梯度信息。
∂ C ∂ a t l − 1 = n p . t e n s o r d o t ( δ t l , ( w t l ) T , [ ( 2 ) , ( 0 ) ] ) \frac{\partial C}{\partial a^{l-1}_t} = np.tensordot(\delta^l_t,(w^l_t)^T, [(2),(0)]) atl1C=np.tensordot(δtl,(wtl)T,[(2),(0)])
其中 [ ( 2 ) , ( 0 ) ] [(2),(0)] [(2),(0)]表示对 δ t l \delta^l_t δtl的第3维度和 ( w t l ) T (w^l_t)^T (wtl)T的第1维度进行计算,结果的维度为 ( B , H l , W l , C l − 1 h l w l ) (B,H^l,W^l, C^{l-1}h^lw^l) (B,Hl,Wl,Cl1hlwl)

将获得中间误差信息反向变换(映射到同一位置处执行加法运算)可以获得损失函数对 a l − 1 a^{l-1} al1的梯度。

def backward(self, grad):
	grad_trans = grad.transpose([0, 2, 3, 1]).reshape(self.batch_size, -1, self.out_channels)
	grad_backward_trans = np.tensordot(grad_trans, self.kernel_trans.T, [(2), [0]])
	grad_backward = np.zeros(self.input_shape)

	ind = 0
	for ih in range(grad.shape[2]):
		begin_h = ih * self.stride[0]
		for iw in range(grad.shape[3]):
			begin_w = iw * self.stride[1]
			grad_backward[:, :, begin_h:(begin_h+self.kernel_size[0]), begin_w:(begin_w+self.kernel_size[1])] += \
			grad_backward_trans[:, ind, :].reshape(self.batch_size, self.in_channels, self.kernel_size[0], self.kernel_size[1])
			ind += 1
	grad_backward = grad_backward[:, :, self.padding[0]:self.input_shape[2]-self.padding[0], self.padding[1]:self.input_shape[3]-self.padding[1]]
	# print(grad_backward.shape)

	self.grad_k_trans = np.tensordot(self.input_trans, grad_trans, [(0, 1), (0, 1)])
	if self.bias is not None:
		self.grad_b = np.sum(grad_trans, axis=(0, 1)).reshape(1, -1)
	return grad_backward

已知 δ t l \delta^l_t δtl时,计算损失函数对 w t l , b l w^l_t,b^l wtl,bl的梯度,
∂ C ∂ w t l = n p . t e n s o r d o t ( a t l − 1 , δ t l , [ ( 0 , 1 ) , ( 0 , 1 ) ] ) ∂ C ∂ b l = n p . s u m ( δ t l , a x i s = ( 0 , 1 ) ) \frac{\partial C}{\partial w^l_t} = np.tensordot(a^{l-1}_t,\delta^l_t,[(0,1),(0,1)]) \\ \frac{\partial C}{\partial b^l} = np.sum(\delta^l_t, axis=(0,1)) wtlC=np.tensordot(atl1,δtl,[(0,1),(0,1)])blC=np.sum(δtl,axis=(0,1))
对于最大池化,可以利用相近的思想进行处理。

假设输入特征图 a a a维度为 ( 16 , 3 , 128 , 128 ) (16, 3, 128, 128) (16,3,128,128),卷积核 w w w ( 8 , 3 , 3 , 3 ) (8, 3, 3, 3) (8,3,3,3),偏置 b b b ( 8 ) (8) (8)步长为 ( 1 , 1 ) (1,1) (1,1),不进行填充,则前向传播过程为:

在这里插入图片描述

获得输出特征图的反向传播误差 δ ( 16 , 8 , 127 , 127 ) \delta(16, 8, 127,127) δ(16,8,127,127)后,计算对输入特征图的反向传播误差,
Font metrics not found for font: .
对mnist数据集进行分类,构建网络结构如下:

layers = [Conv2d(in_channels=1, out_channels=6, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)),
          MaxPool2d(kernel_size=(2, 2), stride=(2, 2)),
          ReLU(),
          Conv2d(in_channels=6, out_channels=16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)),
          MaxPool2d(kernel_size=(2, 2), stride=(2, 2)),
          ReLU(),
          Flatten(),
          Linear(in_features=784, out_features=120),
          ReLU(),
          Linear(in_features=120, out_features=10)]

使用随机梯度下降进行梯度更新,完成5轮训练,训练损失变化曲线为

在这里插入图片描述

验证准确率变化曲线为:

在这里插入图片描述

测试集准确率为0.9798。

github:https://github.com/hui126/Deep_Learning_Coding

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值