深度学习|Global Average Pooling是否可以替代全连接层

最新推荐文章于 2022-04-02 22:51:48 发布

自然最美。

最新推荐文章于 2022-04-02 22:51:48 发布

阅读量2.4k

点赞数 1

文章标签：深度学习

原文链接：https://www.cnblogs.com/hutao722/p/10008581.html

版权

在看all convolution network 中看到中用到一个global average pooling

下面就介绍一下global average pooling

这个概念出自于 network in network

主要是用来解决全连接的问题，其主要是是将最后一层的特征图进行整张图的一个均值池化，形成一个特征点，将这些特征点组成最后的特征向量

进行softmax中进行计算。

举个例子

假如，最后的一层的数据是10个6*6的特征图，global average pooling是将每一张特征图计算所有像素点的均值，输出一个数据值，

这样10 个特征图就会输出10个数据点，将这些数据点组成一个1*10的向量的话，就成为一个特征向量，就可以送入到softmax的分类中计算了

在这里插入图片描述
上图是从PPT中截取的对比全连接与全局均值池化的差异

原文中介绍这样做主要是进行全连接的替换，减少参数的数量，这样计算的话，global average pooling层是没有数据参数的

这也与network in network 有关，其文章中提出了一种非线性的类似卷积核的mlpconv的感知器的方法，计算图像的分块的值

可以得到空间的效果，这样就取代了pooling的作用，但是会引入一些参数，但是为了平衡，作者提出了使用global average pooling

一、什么是GAP？
　　先看看原论文的定义：

In this paper, we propose another strategy called global average pooling to replace the traditional fully connected layers in CNN. The idea is to generate one feature map for each corresponding category of the classification task in the last mlpconv layer. Instead of adding fully connected layers on top of the feature maps, we take the average of each feature map, and the resulting vector is fed directly into the softmax layer. One advantage of global average pooling over the fully connected layers is that it is more native to the convolution structure by enforcing correspondences between feature maps and categories. Thus the feature maps can be easily interpreted as categories confidence maps. Another advantage is that there is no parameter to optimize in the global average pooling thus overfitting is avoided at this layer. Futhermore, global average pooling sums out the spatial information, thus it is more robust to spatial translations of the input.

简单来说，就是在卷积层之后，用GAP替代FC全连接层。有两个有点：一是GAP在特征图与最终的分类间转换更加简单自然；二是不像FC层需要大量训练调优的参数，降低了空间参数会使模型更加健壮，抗过拟合效果更佳。

我们再用更直观的图像来看GAP的工作原理：
　　在这里插入图片描述
　　假设卷积层的最后输出是h × w × d 的三维特征图，具体大小为6 × 6 × 3，经过GAP转换后，变成了大小为 1 × 1 × 3 的输出值，也就是每一层 h × w 会被平均化成一个值。
　　NOTE.

GAP和GMP都是将参数的数量进行缩减，这样一方面可以避免过拟合，另一方面这也更符合CNN的工作结构，把每个feature map和类别输出进行了关联，而不是feature map的unit直接和类别输出进行关联。

差别在于，GMP只取每个feature map中的最重要的region，这样会导致，一个feature map中哪怕只有一个region是和某个类相关的，这个feature map都会对最终的预测产生很大的影响。而GAP则是每个region都进行了考虑，这样可以保证不会被一两个很特殊的region干扰。

二、 GAP在Keras中的定义
　　GAP的使用一般在卷积层之后，输出层之前：

x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x) #卷积层最后一层
x = layers.GlobalAveragePooling2D()(x) #GAP层
prediction = Dense(10, activation='softmax')(x) #输出层

再看看GAP的代码具体实现：

复制代码

@tf_export('keras.layers.GlobalAveragePooling2D',
           'keras.layers.GlobalAvgPool2D')
class GlobalAveragePooling2D(GlobalPooling2D):
  """Global average pooling operation for spatial data.
  Arguments:
      data_format: A string,
          one of `channels_last` (default) or `channels_first`.
          The ordering of the dimensions in the inputs.
          `channels_last` corresponds to inputs with shape
          `(batch, height, width, channels)` while `channels_first`
          corresponds to inputs with shape
          `(batch, channels, height, width)`.
          It defaults to the `image_data_format` value found in your
          Keras config file at `~/.keras/keras.json`.
          If you never set it, then it will be "channels_last".
  Input shape:
      - If `data_format='channels_last'`:
          4D tensor with shape:
          `(batch_size, rows, cols, channels)`
      - If `data_format='channels_first'`:
          4D tensor with shape:
          `(batch_size, channels, rows, cols)`
  Output shape:
      2D tensor with shape:
      `(batch_size, channels)`
  """

  def call(self, inputs):
    if self.data_format == 'channels_last':
      return backend.mean(inputs, axis=[1, 2])
    else:
      return backend.mean(inputs, axis=[2, 3])

复制代码
　　实现很简单，对宽度和高度两个维度的特征数据进行平均化求值。如果是NHWC结构（数量、宽度、高度、通道数），则axis=[1, 2]；反之如果是CNHW，则axis=[2, 3]。

  if include_top:
        # Classification block
        x = layers.Flatten(name='flatten')(x)
        x = layers.Dense(4096, activation='relu', name='fc1')(x)
        x = layers.Dense(4096, activation='relu', name='fc2')(x)
        x = layers.Dense(classes, activation='softmax', name='predictions')(x)
    else:
        if pooling == 'avg':
            x = layers.GlobalAveragePooling2D()(x)
        elif pooling == 'max':
            x = layers.GlobalMaxPooling2D()(x)