python实现卷积层的前向后向传播过程

Convolution Layer Forward

卷积层的前向激活过程,我们首先忽略激活层。认为f(x)=x,那么纯卷积层的前向激活公式如下:

 outn,f,ho,wo=conv(XP,W,b,params)  outn,f,ho,wo=conv(XP,W,b,params)
 =c=0,ho=0,wo=0C1,Ho1,Wo1XPn,c,hoS+(1:HH),woS+(1:WW)Wf,c,:,:+bf  =∑c=0,ho=0,wo=0C−1,Ho−1,Wo−1XPn,c,ho∗S+(1:HH),wo∗S+(1:WW)∗Wf,c,:,:+bf

n是输入的个数,比如输入100张图片,n=100.

C是input channel,比如输入的图片是RGB三通道的,C=3.

S是stride,stride为1,逐行扫描。stride为2,隔一行扫描一次。不理解stride的还要先查查其它文章。

XP是填0后的输入。若不填0,则XP=X. 不理解填0操作的还要先查查其它文章。

F是filter number,系数的高和宽分别是HH,WW。Ho和Wo是输出的高,宽。

根据这个公式我们可以写出最基础的前向过程,理解原理不用担心你的for循环有几层,那些是以后优化的工作。理解了上面的公式,你就可以理解下面的实现代码。

import numpy as np

%load_ext autoreload
%autoreload 2

def conv_forward_naive(x, w, b, conv_param):
  """
  A naive implementation of the forward pass for a convolutional layer.

  The input consists of N data points, each with C channels, height H and width
  W. We convolve each input with F different filters, where each filter spans
  all C channels and has height HH and width HH.

  Input:
  - x: Input data of shape (N, C, H, W)
  - w: Filter weights of shape (F, C, HH, WW)
  - b: Biases, of shape (F,)
  - conv_param: A dictionary with the following keys:
    - 'stride': The number of pixels between adjacent receptive fields in the
      horizontal and vertical directions.
    - 'pad': The number of pixels that will be used to zero-pad the input.

  Returns a tuple of:
  - out: Output data, of shape (N, F, H', W') where H' and W' are given by
    H' = 1 + (H + 2 * pad - HH) / stride
    W' = 1 + (W + 2 * pad - WW) / stride
  - cache: (x, w, b, conv_param)
  """
  out = None
  N,C,H,W = x.shape
  F,_,HH,WW = w.shape
  S = conv_param['stride']
  P = conv_param['pad']
  Ho = 1 + (H + 2 * P - HH) / S
  Wo = 1 + (W + 2 * P - WW) / S
  x_pad = np.zeros((N,C,H+2*P,W+2*P))
  x_pad[:,:,P:P+H,P:P+W]=x
  #x_pad = np.pad(x, ((0,), (0,), (P,), (P,)), 'constant')
  out = np.zeros((N,F,Ho,Wo))

  for f in xrange(F):
    for i in xrange(Ho):
      for j in xrange(Wo):
        # N*C*HH*WW, C*HH*WW = N*C*HH*WW, sum -> N*1
        out[:,f,i,j] = np.sum(x_pad[:, :, i*S : i*S+HH, j*S : j*S+WW] * w[f, :, :, :], axis=(1, 2, 3)) 

    out[:,f,:,:]+=b[f]
  cache = (x, w, b, conv_param)
  return out, cache

我们可以用几个例子试试它的输出

x_shape = (2, 3, 4, 4) #n,c,h,w
w_shape = (2, 3, 3, 3) #f,c,hw,ww
x = np.ones(x_shape)
w = np.ones(w_shape)
b = np.array([1,2])

conv_param = {'stride': 1, 'pad': 0}
out, _ = conv_forward_naive(x, w, b, conv_param)

print out
print out.shape  #n,f,ho,wo

Convolution Layer Backward

后向传播过程复杂一些,不过一旦你掌握了偏微分和链式法则,应该也难不倒你。

假设卷积层后直接跟了Loss层,那么

Lw=Loutoutw ∂L∂w=∂L∂out∗∂out∂w

Lx=Loutoutx ∂L∂x=∂L∂out∗∂out∂x

Lb=Loutoutb ∂L∂b=∂L∂out∗∂out∂b

而且

Lout=dout ∂L∂out=dout

dout在卷积层的后向过程是已知的,所以公式看上去很简单,就是下标处理复杂了点。我们慢慢来继续推导它。

LWf,c,:,:=n=0,ho=0,wo=0N1,Ho1,Wo1doutn,f,ho,wo(XPn,c,hwin,wwinWf,c,:,:)Wf,c,:,: ∂L∂Wf,c,:,:=∑n=0,ho=0,wo=0N−1,Ho−1,Wo−1doutn,f,ho,wo∗∂(XPn,c,hwin,wwin∗Wf,c,:,:)∂Wf,c,:,:
 =n=0,ho=0,wo=0N1,Ho1,Wo1doutn,f,ho,woXPn,c,hwin,wwin  =∑n=0,ho=0,wo=0N−1,Ho−1,Wo−1doutn,f,ho,wo∗XPn,c,hwin,wwin

为了简化期间,我们把偏导一个w,改成偏导一个w的二维矩阵,这样dout的偏微分就更好理解一些。对x的偏导我们也同样处理。其中XP的下标h_win,w_win是前向过程公式里hos+(1:HH)和wos+(1:WW)的缩写。

LXPn,c,hwin,wwin=f=0,ho=0,wo=0F1,Ho1,Wo1doutn,f,ho,wo(XPn,c,hwin,wwinwf,c,:,:)Xn,c,hwin,wwin ∂L∂XPn,c,hwin,wwin=∑f=0,ho=0,wo=0F−1,Ho−1,Wo−1doutn,f,ho,wo∗∂(XPn,c,hwin,wwin∗wf,c,:,:)∂Xn,c,hwin,wwin
 =f=0,ho=0,wo=0F1,Ho1,Wo1doutn,f,ho,woWf,c,:,:  =∑f=0,ho=0,wo=0F−1,Ho−1,Wo−1doutn,f,ho,wo∗Wf,c,:,:

Lbf=n=0,ho=0,wo=0N1,Ho1,Wo1doutn,f,ho,wo(XPn,c,hwin,wwinWf,c,:,:+bf)bf ∂L∂bf=∑n=0,ho=0,wo=0N−1,Ho−1,Wo−1doutn,f,ho,wo∗∂(XPn,c,hwin,wwin∗Wf,c,:,:+bf)∂bf
 =n=0,ho=0,wo=0N1,Ho1,Wo1doutn,f,ho,wo  =∑n=0,ho=0,wo=0N−1,Ho−1,Wo−1doutn,f,ho,wo

理解了上面的公式,接下来我们再理解下面的实现代码就简单多了。

def conv_backward_naive(dout, cache):
  """
  A naive implementation of the backward pass for a convolutional layer.

  Inputs:
  - dout: Upstream derivatives.
  - cache: A tuple of (x, w, b, conv_param) as in conv_forward_naive

  Returns a tuple of:
  - dx: Gradient with respect to x
  - dw: Gradient with respect to w
  - db: Gradient with respect to b
  """
  dx, dw, db = None, None, None

  N, F, H1, W1 = dout.shape
  x, w, b, conv_param = cache
  N, C, H, W = x.shape
  HH = w.shape[2]
  WW = w.shape[3]
  S = conv_param['stride']
  P = conv_param['pad']


  dx, dw, db = np.zeros_like(x), np.zeros_like(w), np.zeros_like(b)
  x_pad = np.pad(x, [(0,0), (0,0), (P,P), (P,P)], 'constant')
  dx_pad = np.pad(dx, [(0,0), (0,0), (P,P), (P,P)], 'constant')
  db = np.sum(dout, axis=(0,2,3))

  for n in xrange(N):
    for i in xrange(H1):
      for j in xrange(W1):
        # Window we want to apply the respective f th filter over (C, HH, WW)
        x_window = x_pad[n, :, i * S : i * S + HH, j * S : j * S + WW]

        for f in xrange(F):
          dw[f] += x_window * dout[n, f, i, j] #F,C,HH,WW
          #C,HH,WW
          dx_pad[n, :, i * S : i * S + HH, j * S : j * S + WW] += w[f] * dout[n, f, i, j]

  dx = dx_pad[:, :, P:P+H, P:P+W]

  return dx, dw, db

上面的实现代码是最原始的。 matlab上为了加速,使用已有的conv函数实现上述过程,才有了很多博文上提到的翻转180度两次的过程,翻来翻去的反而不容易理解整个过程。其实卷积层的前向和后向传播,跟信号处理的卷积操作没有直接关系。就是相关和点乘操作。其它实现都是优化加速方法。

我们对反向传播也举个例子

x_shape = (2, 3, 4, 4)
w_shape = (2, 3, 3, 3)
x = np.ones(x_shape)
w = np.ones(w_shape)
b = np.array([1,2])

conv_param = {'stride': 1, 'pad': 0}

Ho = (x_shape[3]+2*conv_param['pad']-w_shape[3])/conv_param['stride']+1
Wo = Ho

dout = np.ones((x_shape[0], w_shape[0], Ho, Wo))

out, cache = conv_forward_naive(x, w, b, conv_param)
dx, dw, db = conv_backward_naive(dout, cache)

print "out shape",out.shape
print "dw=========================="
print dw
print "dx=========================="
print dx
print "db=========================="
print db

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值