文章目录
1 摘要
1.1 本文要解决的问题
By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant.
1.2 传统增加depth和width的后果
1.Bigger size typically means a larger number of parameters, which makes the enlarged network more prone to over-fitting, especially if the number of labeled examples in the training set is limited.
2.The other drawback of uniformly increased networksize is the dramatically increased use of computational resources.
2 如何解决增加depth和width带来的问题
2.1 提出 Inception 结构,人为构建稀疏连接,引入多尺度感受野和多尺度融合(增加特征的密度和相关性,简单说就是类似横向的特征金字塔,最后来个融合)
A fundamental way of solving both of these issues would be to introduce sparsity and replace the fully connected layers by the sparse ones, even inside the convolutions
比方说 3x3 的卷积核,提取 256 个特征,其总特征可能会均匀分散于每个 feature map 上,可以理解为一个稀疏连接的特征集。可以极端假设,64 个 filter 提取的特征的密度,显然比 256 个 filter 提取的特征的密度要高。
因此,通过使用 1x1,3x3 , 5x5 等,卷积核数分别为 96, 96, 64 个,分别提取不同尺度的特征,并保持总的 filter bank 尺寸不变。这样,同一尺度下的特征之间的相关性更强,密度更大,而不同尺度的特征之间的相关性被弱化。
综上所述,可以理解为将一个包含 256 个均匀分布的特征,分解为了几组强相关性的特征组。同样是 256 个特征,但是其输出特征的冗余信息更少。(感觉有点儿类似于分组卷积)
2.2 使用 1X1 卷积(2.1带来的问题是3x3和5x5在较深的层中仍然会产生非常多的参数)
1 × 1 convolutions have dual purpose: most critically, they are used mainly as dimension reduction modules to remove computational bottlenecks, that would otherwise limit the size of our networks. This allows for not just increasing the depth, but also the width of our networks without a significant performance penalty.
下图是我自己画的,不同深度(featuremap channel)如何影响卷积参数量变换以及加入1x1卷积的作用