卷积神经网络

详解卷积神经网络(CNN)


卷积神经网络(Convolutional Neural Network, CNN)是一种前馈神经网络,它的人工神经元可以响应一部分覆盖范围内的周围单元,对于大型图像处理有出色表现。

本文首发 : http://www.liuhe.website/index.php?/Articles/single/37


概揽

卷积神经网络(Convolutional Neural Networks / CNNs / ConvNets)与普通神经网络非常相似,它们都由具有可学习的权重和偏置常量(biases)的神经元组成。每个神经元都接收一些输入,并做一些点积计算,输出是每个分类的分数,普通神经网络里的一些计算技巧到这里依旧适用。

所以哪里不同呢?卷积神经网络默认输入是图像,可以让我们把特定的性质编码入网络结构,使是我们的前馈函数更加有效率,并减少了大量参数。

具有三维体积的神经元(3D volumes of neurons)
卷积神经网络利用输入是图片的特点,把神经元设计成三个维度 : width, height, depth(注意这个depth不是神经网络的深度,而是用来描述神经元的) 。比如输入的图片大小是 32 × 32 × 3 (rgb),那么输入神经元就也具有 32×32×3 的维度。下面是图解:

这里写图片描述
传统神经网络

这里写图片描述
卷积神经网络

一个卷积神经网络由很多层组成,它们的输入是三维的,输出也是三维的,有的层有参数,有的层不需要参数。


Layers used to build ConvNets

卷积神经网络通常包含以下几种层:

  • 卷积层(Convolutional layer),卷积神经网路中每层卷积层由若干卷积单元组成,每个卷积单元的参数都是通过反向传播算法优化得到的。卷积运算的目的是提取输入的不同特征,第一层卷积层可能只能提取一些低级的特征如边缘、线条和角等层级,更多层的网络能从低级特征中迭代提取更复杂的特征。
  • 线性整流层(Rectified Linear Units layer, ReLU layer),这一层神经的活性化函数(Activation function)使用线性整流(Rectified Linear Units, ReLU)f(x)=max(0,x)f(x)=max(0,x) : 深度,输出单元的深度

则可以用以下公式计算一个维度(宽或高)内一个输出单元里可以有几个隐藏单元:

WF+2PS+1W−F+2PS+1
吧。这种情况下,卷积核实际上有9个神经元,他们的输出又组成一个3×3的矩阵,称为特征图。第一个神经元连接到图像的第一个3×3的局部,第二个神经元则连接到第二个局部(注意,有重叠!就跟你的目光扫视时也是连续扫视一样)。具体如下图所示。

卷积

图的上方是第一个神经元的输出,下方是第二个神经元的输出。每个神经元的运算依旧是

f(x)=act(i,jnθ(ni)(nj)xij+b)f(x)=act(∑i,jnθ(n−i)(n−j)xij+b)
, 那么它们的卷积定义为
f(m,n)g(m,n)=uvf(u,v)g(mu,nv)f(m,n)∗g(m,n)=∑u∞∑v∞f(u,v)g(m−u,n−v)
个偏置。
  • 在输出单元,第d个深度切片的结果是由第d个filter 和输入单元做卷积运算,然后再加上偏置而来。

  • 池化层(Pooling Layer)

    池化(pool)下采样(downsamples),目的是为了减少特征图。池化操作对每个深度切片独立,规模一般为 2*2,相对于卷积层进行卷积运算,池化层进行的运算一般有以下几种:
    * 最大池化(Max Pooling)。取4个点的最大值。这是最常用的池化方法。
    * 均值池化(Mean Pooling)。取4个点的均值。
    * 高斯池化。借鉴高斯模糊的方法。不常用。
    * 可训练池化。训练函数 ff ,接受4个点为输入,出入1个点。不常用。

    最常见的池化层是规模为2*2, 步幅为2,对输入的每个深度切片进行下采样。每个MAX操作对四个数进行,如下图所示:
    池化

    池化操作将保存深度大小不变

    如果池化层的输入单元大小不是二的整数倍,一般采取边缘补零(zero-padding)的方式补成2的倍数,然后再池化。


    池化层总结(Summary)

    • 接收单元大小为:W1H1D1W1∗H1∗D1

    Case Studies

    大牛们构建的网络

    • LeNet. The first successful applications of Convolutional Networks were developed by Yann LeCun in 1990’s. Of these, the best known is the LeNet architecture that was used to read zip codes, digits, etc.
    • AlexNet. The first work that popularized Convolutional Networks in Computer Vision was the AlexNet, developed by Alex Krizhevsky, Ilya Sutskever and Geoff Hinton. The AlexNet was submitted to the ImageNet ILSVRC challenge in 2012 and significantly outperformed the second runner-up (top 5 error of 16% compared to runner-up with 26% error). The Network had a similar architecture basic as LeNet, but was deeper, bigger, and featured Convolutional Layers stacked on top of each other (previously it was common to only have a single CONV layer immediately followed by a POOL layer).
    • ZF Net. The ILSVRC 2013 winner was a Convolutional Network from Matthew Zeiler and Rob Fergus. It became known as the ZFNet (short for Zeiler & Fergus Net). It was an improvement on AlexNet by tweaking the architecture hyperparameters, in particular by expanding the size of the middle convolutional layers.
    • GoogLeNet. The ILSVRC 2014 winner was a Convolutional Network from Szegedy et al. from Google. Its main contribution was the development of an Inception Module that dramatically reduced the number of parameters in the network (4M, compared to AlexNet with 60M). Additionally, this paper uses Average Pooling instead of Fully Connected layers at the top of the ConvNet, eliminating a large amount of parameters that do not seem to matter much.
    • VGGNet. The runner-up in ILSVRC 2014 was the network from Karen Simonyan and Andrew Zisserman that became known as the VGGNet. Its main contribution was in showing that the depth of the network is a critical component for good performance. Their final best network contains 16 CONV/FC layers and, appealingly, features an extremely homogeneous architecture that only performs 3x3 convolutions and 2x2 pooling from the beginning to the end. It was later found that despite its slightly weaker classification performance, the VGG ConvNet features outperform those of GoogLeNet in multiple transfer learning tasks. Hence, the VGG network is currently the most preferred choice in the community when extracting CNN features from images. In particular, their pretrained model is available for plug and play use in Caffe. A downside of the VGGNet is that it is more expensive to evaluate and uses a lot more memory and parameters (140M).
    • ResNet. Residual Network developed by Kaiming He et al. was the winner of ILSVRC 2015. It features an interesting architecture with special skip connections and features heavy use of batch normalization. The architecture is also missing fully connected layers at the end of the network. The reader is also referred to Kaiming’s presentation (video, slides), and some recent experiments that reproduce these networks in Torch.

    参考

    CS231n: Convolutional Neural Networks for Visual Recognition
    卷积神经网络-维基百科
    卷积特征提取
    卷积神经网络全面解析

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值