object detection api 构建基于vgg的faster rcnn网络

最新推荐文章于 2021-04-25 20:30:40 发布

_szxy

最新推荐文章于 2021-04-25 20:30:40 发布

阅读量795

点赞数

分类专栏： tensorflow 深度学习目标检测

本文链接：https://blog.csdn.net/q199502092010/article/details/89669841

版权

本文在前一篇的基础上，参考了两篇博客，详细介绍了如何利用TensorFlow Object Detection API构建基于VGG的Faster R-CNN网络。主要工作集中在修改配置文件和特征提取器，尽管VGG网络在ResNet等现代网络面前并不占优势，但作者仍然完成了实验流程。目前遇到的问题包括代码封装复杂导致的调整困难，以及在现有框架中添加新功能如Soft NMS的挑战。

摘要由CSDN通过智能技术生成

在上篇文章 https://blog.csdn.net/q199502092010/article/details/89381472 基础之上，参考了
https://www.jianshu.com/p/b9e38c1c94b1
https://www.jianshu.com/p/4b5ff96e70b5
这两篇文章，构建了基于tensorflow object detection api 中的faster rcnn vgg网络。

构建新的网络只需要更换新的特征提取器即可，上篇文章已将新的特征提取器加入到了config配置中，这篇文章主要用来修改具体特征提取网络的内容。vgg网络结构位于research/slim/net文件夹下。如果要简单能够使用的话，其实不需要更改其中的内容。但实际上通常我们将pool5之前的内容作为特征提取器，可以将不需要的网络层注释掉。

def vgg_a(inputs,
          num_classes=1000,
          is_training=True,
          dropout_keep_prob=0.5,
          spatial_squeeze=True,
          scope='vgg_a',
          fc_conv_padding='VALID',
          global_pool=False):
  """Oxford Net VGG 11-Layers version A Example.

  Note: All the fully_connected layers have been transformed to conv2d layers.
        To use in classification mode, resize input to 224x224.

  Args:
    inputs: a tensor of size [batch_size, height, width, channels].
    num_classes: number of predicted classes. If 0 or None, the logits layer is
      omitted and the input features to the logits layer are returned instead.
    is_training: whether or not the model is being trained.
    dropout_keep_prob: the probability that activations are kept in the dropout
      layers during training.
    spatial_squeeze: whether or not should squeeze the spatial dimensions of the
      outputs. Useful to remove unnecessary dimensions for classification.
    scope: Optional scope for the variables.
    fc_conv_padding: the type of padding to use for the fully connected layer
      that is implemented as a convolutional layer. Use 'SAME' padding if you
      are applying the network in a fully convolutional manner and want to
      get a prediction map downsampled by a factor of 32 as an output.
      Otherwise, the output prediction map will be (input / 32) - 6 in case of
      'VALID' padding.
    global_pool: Optional boolean flag. If True, the input to the classification
      layer is avgpooled to size 1x1, for any input size. (This is not part
      of the original VGG architecture.)

  Returns:
    net: the output of the logits layer (if num_classes is a non-zero integer),
      or the input to the logits layer (if num_classes is 0 or None).
    end_points: a dict of tensors with intermediate activations.
  """
  with tf.variable_scope(scope, &#