斯坦福大学卷积神经网络教程UFLDL Tutorial - Convolutional Neural Network

Convolutional Neural Network


Overview

A Convolutional Neural Network (CNN) is comprised of one or more convolutional layers (often with a subsampling step) and then followed by one or more fully connected layers as in a standard multilayer neural network. The architecture of a CNN is designed to take advantage of the 2D structure of an input image (or other 2D input such as a speech signal). This is achieved with local connections and tied weights followed by some form of pooling which results in translation invariant features. Another benefit of CNNs is that they are easier to train and have many fewer parameters than fully connected networks with the same number of hidden units. In this article we will discuss the architecture of a CNN and the back propagation algorithm to compute the gradient with respect to the parameters of the model in order to use gradient based optimization. See the respective tutorials on convolution andpooling for more details on those specific operations.

Architecture

A CNN consists of a number of convolutional and subsampling layers optionally followed by fully connected layers. The input to a convolutional layer is a  m x m x r m x m x r image where  m m is the height and width of the image and  r r is the number of channels, e.g. an RGB image has  r=3 r=3. The convolutional layer will have  k k filters (or kernels) of size  n x n x q n x n x q where  n n is smaller than the dimension of the image and  q q can either be the same as the number of channels  r r or smaller and may vary for each kernel. The size of the filters gives rise to the locally connected structure which are each convolved with the image to produce  k k feature maps of size  mn+1 m−n+1. Each map is then subsampled typically with mean or max pooling over  p x p p x p contiguous regions where p ranges between 2 for small images (e.g. MNIST) and is usually not more than 5 for larger inputs. Either before or after the subsampling layer an additive bias and sigmoidal nonlinearity is applied to each feature map. The figure below illustrates a full layer in a CNN consisting of convolutional and subsampling sublayers. Units of the same color have tied weights.

Fig 1: First layer of a convolutional neural network with pooling. Units of the same color have tied weights and units of different color represent different filter maps.

After the convolutional layers there may be any number of fully connected layers. The densely connected layers are identical to the layers in a standard multilayer neural network.

Back Propagation

Let  δ(l+1) δ(l+1) be the error term for the  (l+1) (l+1)-st layer in the network with a cost function  J(W,b;x,y) J(W,b;x,y)where  (W,b) (W,b) are the parameters and  (x,y) (x,y) are the training data and label pairs. If the  l l-th layer is densely connected to the  (l+1) (l+1)-st layer, then the error for the  l l-th layer is computed as

δ(l)=((W(l))Tδ(l+1))f(z(l)) δ(l)=((W(l))Tδ(l+1))∙f′(z(l))

and the gradients are

W(l)J(W,b;x,y)b(l)J(W,b;x,y)=δ(l+1)(a(l))T,=δ(l+1). ∇W(l)J(W,b;x,y)=δ(l+1)(a(l))T,∇b(l)J(W,b;x,y)=δ(l+1).

If the  l l-th layer is a convolutional and subsampling layer then the error is propagated through as

δ(l)k=upsample((W(l)k)Tδ(l+1)k)f(z(l)k) δk(l)=upsample((Wk(l))Tδk(l+1))∙f′(zk(l))

Where  k k indexes the filter number and  f(z(l)k) f′(zk(l)) is the derivative of the activation function. The upsampleoperation has to propagate the error through the pooling layer by calculating the error w.r.t to each unit incoming to the pooling layer. For example, if we have mean pooling then upsample simply uniformly distributes the error for a single pooling unit among the units which feed into it in the previous layer. In max pooling the unit which was chosen as the max receives all the error since very small changes in input would perturb the result only through that unit.

Finally, to calculate the gradient w.r.t to the filter maps, we rely on the border handling convolution operation again and flip the error matrix  δ(l)k δk(l) the same way we flip the filters in the convolutional layer.

W(l)kJ(W,b;x,y)b(l)kJ(W,b;x,y)=i=1m(a(l)i)rot90(δ(l+1)k,2),=a,b(δ(l+1)k)a,b. ∇Wk(l)J(W,b;x,y)=∑i=1m(ai(l))∗rot90(δk(l+1),2),∇bk(l)J(W,b;x,y)=∑a,b(δk(l+1))a,b.

Where  a(l) a(l) is the input to the  l l-th layer, and  a(1) a(1) is the input image. The operation  (a(l)i)δ(l+1)k (ai(l))∗δk(l+1) is the “valid” convolution between  i i-th input in the  l l-th layer and the error w.r.t. the  k k-th filter.


from: http://ufldl.stanford.edu/tutorial/supervised/ConvolutionalNeuralNetwork/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
深度学习-面向视觉识别的卷积神经网络,2016斯坦福大学公开课。课程介绍: 计算机视觉在社会中已经逐渐普及,并广泛运用于搜索检索、图像理解、手机应用、地图导航、医疗制药、无人机和无人驾驶汽车等领域。而这些应用的核心技术就是图像分类、图像定位和图像探测等视觉识别任务。近期神经网络(也就是“深度学习”)方法上的进展极大地提升了这些代表当前发展水平的视觉识别系统的性能。 本课程将深入讲解深度学习框架的细节问题,聚焦面向视觉识别任务(尤其是图像分类任务)的端到端学习模型。在10周的课程中,学生们将会学习如何实现、训练和调试他们自己的神经网络,并建立起对计算机视觉领域的前沿研究方向的细节理解。最终的作业将包括训练一个有几百万参数的卷积神经网络,并将其应用到最大的图像分类数据库(ImageNet)上。我们将会聚焦于教授如何确定图像识别问题,学习算法(比如反向传播算法),对网络的训练和精细调整(fine-tuning)中的工程实践技巧,指导学生动手完成课程作业和最终的课程项目。本课程的大部分背景知识和素材都来源于ImageNet Challenge竞赛。 主讲人: 李飞飞,斯坦福大学计算机科学系副教授。担任斯坦福大学人工智能实验室和视觉实验室主任,主要研究方向为机器学习、计算机视觉、认知计算神经学。她在TED上的演讲,如何教计算机理解图片。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值