论文阅读笔记(三):Network In Network ( NIN )

本文是论文的相关摘要,因为作者的原话最容易理解,所以将精彩语句摘录,帮助快速回忆起文章主要信息。后续会将把论文翻译补充进来,持续更新。


We propose a novel deep network structure called “Network In Network”(NIN) to enhance model discriminability for local patches within the receptive field.

提出了一种新的深网络结构, 称为 “Network In Network” (NIN), 用于增强感受野内局部补丁的模型。

The conventional convolutional layer uses linear filters followed by a nonlinear activation function to scan the input.

传统的卷积层使用线性滤波器, 后跟一个非线性激活函数来扫描输入。

Instead, we build micro neural networks with more complex structures to abstract the data within the receptive field. We instantiate the micro neural network with a multilayer perceptron, which is a potent function approximator.

相反, 我们构建了具有更复杂结构的微神经网络, 以便在感受野中抽象数据。我们用多层感知器来实例化微神经网络, 这是一个强大的函数逼近器。

The feature maps are obtained by sliding the micro networks over the input in a similar manner as CNN; they are then fed into the next layer.

Deep NIN can be implemented by stacking mutiple of the above described structure.

特征图是通过在输入上滑动micro network与 CNN 类似的方式获得的;然后将它们送入下一层。

通过以上所述结构的叠加, 可以实现深NIN 。

With enhanced local modeling via the micro network, we are able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers.

通过micro network增强的局部建模, 我们能够利用全局平均池化在分类层中的特征映射, 这比传统的完全连接层更容易解释, 也不容易 overfitting。

Convolution layers take inner product of the linear filter and the underlying receptive field followed by a nonlinear activation function at every local portion of the input. The resulting outputs are called feature maps.

卷积层采用线性滤波器的内积和底层接受域, 在输入的每个局部部分采用非线性激活函数。产生的输出称为特征映射。

The convolution filter in CNN is a generalized linear model (GLM) for the underlying data patch, and we argue that the level of abstraction is low with GLM.

CNN 的卷积滤波器是底层数据块的广义线性模型 (GLM), 我们认为 GLM 的抽象水平较低。

In NIN, the GLM is replaced with a ”micro network” structure which is a general nonlinear function approximator.

在 NIN 中, GLM被广义非线性函数逼近器 “micro network” 结构所代替了。

In this work, we choose multilayer perceptron [3] as the instantiation of the micro network, which is a universal function approximator and a neural network trainable by back-propagation.

在这项工作中, 我们选择了多层感知器 [3] 作为微网络的实例化, 它是一个泛函逼近器和神经网络可训练的反向传播。

The resulting structure which we call an mlpconv layer is compared with CNN in Figure 1. Both the linear convolutional layer and the mlpconv layer map the local receptive field to an output feature vector.

我们称之为 mlpconv 层的生成结构与图1中的 CNN 进行了比较。线性卷积层和 mlpconv 层将局部接受域映射到输出特征向量。

The mlpconv maps the input local patch to the output feature vector with a multilayer perceptron (MLP) consisting of multiple fully connected layers with nonlinear activation functions.

mlpconv 将输入局部补丁映射到输出特征向量中, 多层感知器 (MLP) 由具有非线性激活函数的多个完全连接层组成。

The feature maps are obtained by sliding the MLP over the input in a similar manner as CNN and are then fed into the next layer.

特征图是通过滑动 MLP 在输入与 CNN 类似的方式获得, 然后被送入下一层。

The overall structure of the NIN is the stacking of multiple mlpconv layers. It is called “Network In Network” (NIN) as we have micro networks (MLP), which are composing elements of the overall deep network, within mlpconv layers.

mlpconv 的总体结构是多层多层的叠加。它被称为 “Network In Network” (NIN), 因为我们有微网络 (MLP), 这是构成元素的整体深网络, 在 mlpconv 层。

Instead of adopting the traditional fully connected layers for classification in CNN, we directly output the spatial average of the feature maps from the last mlpconv layer as the confidence of categories via a global average pooling layer, and then the resulting vector is fed into the softmax layer.

我们不采用传统的全连接层进行分类, 而是直接将最后 mlpconv 层的特征映射的空间平均值输出为通过全局平均池化层对类别的置信度, 然后由此产生的向量被送入 softmax 层。

In traditional CNN, it is difficult to interpret how the category level information from the objective cost layer is passed back to the previous convolution layer due to the fully connected layers which act as a black box in between.

在传统的 CNN 中, 很难解释如何将目标成本层中的类别级信息传递回以前的卷积层, 因为完全连接的层是介于两者之间的黑箱。

In contrast, global average pooling is more meaningful and interpretable as it enforces correspondance between feature maps and categories, which is made possible by a stronger local modeling using the micro network.

相比之下, 全局平均池化更有意义和解释, 因为它在功能映射和类别之间强制对应, 这是通过使用微网络的更强的本地建模来实现的。

Furthermore, the fully connected layers are prone to overfitting and heavily depend on dropout regularization [4] [5], while global average pooling is itself a structural regularizer, which natively prevents overfitting for the overall structure.

此外, 完全连接层容易 overfitting 和严重依赖于Dropout经常化 [4] [5], 而全局平均池化本身是一个结构正则化, 本机防止 overfitting 的整体结构。

a micro network is introduced within each convolutional layer to compute more abstract features for local patches.

在每个卷积层中引入一个微网络来计算局部补丁的更多抽象特征。

NIN is proposed from a more general perspective, the micro network is integrated into CNN structure in persuit of better abstractions for all levels of features.

从更一般的角度提出, 将micro network集成到 CNN 结构中, 追求了对所有级别特征的更好的抽象。

Given no priors about the distributions of the latent concepts, it is desirable to use a universal function approximator for feature extraction of the local patches, as it is capable of approximating more abstract representations of the latent concepts.

由于没有关于潜在概念的分布的先验, 所以最好使用一个通用的函数逼近器来提取局部补丁的特征, 因为它能够逼近潜在概念的更抽象表示。

Radial basis network and multilayer perceptron are two well known universal function approximators.

径向基网络和多层感知器是两个众所周知的通用函数逼近。

We choose multilayer perceptron in this work for two reasons.

我们在这项工作中选择多层感知器有两个原因。

First, multilayer perceptron is compatible with the structure of convolutional neural networks, which is trained using back-propagation.

首先, 多层感知器与卷积神经网络的结构兼容, 采用反向传播技术进行训练。

Second, multilayer perceptron can be a deep model itself, which is consistent with the spirit of feature re-use [2].

其次, 多层感知器可以是一个深刻的模型本身, 这是符合精神的功能重用 [2]。

This new type of layer is called mlpconv in this paper, in which MLP replaces the GLM to convolve over the input.

此类新的层称为 mlpconv, 在该文件中, MLP 将 GLM 替换为卷积在输入上。

Mlpconv layer differs from maxout layer in that the convex function approximator is replaced by a universal function approximator, which has greater capability in modeling various distributions of latent concepts.

Mlpconv 层与 maxout 层不同, 凸函数逼近器被一个泛函逼近器所取代, 它在模拟潜概念的各种分布方面具有较大的能力。

Dropout is proposed by Hinton et al. [5] as a regularizer which randomly sets half of the activations to the fully connected layers to zero during training. It has improved the generalization ability and largely prevents overfitting [4].

Dropout是由韩丁等人提出的 [5] 作为一个 regularizer 随机设置一半的激活到完全连接的层在训练期间零。提高了泛化能力, 主要防止了 overfitting [4]。

In this paper, we propose another strategy called global average pooling to replace the traditional fully connected layers in CNN.

在本文中, 我们提出了另一种称为全球平均池化的策略, 以取代 CNN 中传统的完全连接层。

The idea is to generate one feature map for each corresponding category of the classification task in the last mlpconv layer. Instead of adding fully connected layers on top of the feature maps, we take the average of each feature map, and the resulting vector is fed directly into the softmax layer.

其目的是为最后一个 mlpconv 层中的分类任务的每个对应类别生成一个特征映射。我们不需要在特征图的顶端添加完全连通的层, 而是取每个特征映射的平均值, 结果向量直接送入 softmax 层。

One advantage of global average pooling over the fully connected layers is that it is more native to the convolution structure by enforcing correspondences between feature maps and categories. Thus the feature maps can be easily interpreted as categories confidence maps.

在完全连通的层上, 全局平均池化的一个优点是, 通过强制特征映射和类别之间的对应, 它更具有卷积结构的固有性。因而特征图可以容易地被解释作为类别信心地图。

Another advantage is that there is no parameter to optimize in the global average pooling thus overfitting is avoided at this layer.

另一个优点是在全局平均池化中没有要优化的参数, 因此在这一层中避免了 overfitting。

Futhermore, global average pooling sums out the spatial information, thus it is more robust to spatial translations of the input.

同时, 全局平均池化对空间信息进行了求和, 从而对输入的空间平移具有更强的鲁棒性。

We can see global average pooling as a structural regularizer that explicitly enforces feature maps to be confidence maps of concepts (categories).

我们可以将全局平均池化看作是一个结构正则化, 它明确地强制功能映射为概念 (类别) 的置信图。

This is made possible by the mlpconv layers, as they makes better approximation to the confidence maps than GLMs.

这由 mlpconv 层使成为可能, 因为他们比 GLMs 做更好的近似到信心地图。

The overall structure of NIN is a stack of mlpconv layers, on top of which lie the global average pooling and the objective cost layer. Sub-sampling layers can be added in between the mlpconv layers as in CNN and maxout networks.

mlpconv 的总体结构是一叠层, 上面是全局平均池化和目标成本层。下采样层可以添加到 mlpconv 层之间, 如 CNN 和 maxout 网络。

We proposed a novel deep network called “Network In Network” (NIN) for classification tasks.

我们提出了一个新的深网络称为 “Network In Network” (NIN) 的分类任务。

This new structure consists of mlpconv layers which use multilayer perceptrons to convolve the input and a global average pooling layer as a replacement for the fully connected layers in conventional CNN.

这个新结构由 mlpconv 层组成, 使用多层感知器卷积输入和一个全局平均池化层作为替换传统 CNN 中全连接层。

Mlpconv layers model the local patches better, and global average pooling acts as a structural regularizer that prevents overfitting globally.

Mlpconv 层对本地修补程序进行了更好的建模, 而全局平均池化充当了防止全局 overfitting 的结构正则化。

Through visualization of the feature maps, we demonstrated that feature maps from the last mlpconv layer of NIN were confidence maps of the categories, and this motivates the possibility of performing object detection via NIN.

通过对特征图的可视化, 我们证明了 mlpconv 最后一层的特征映射是这些类别的置信图, 这就激发了通过NIN进行对象检测的可能性。

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值