Weakly supervised object recognition with convolutional neural networks 论文解读

最新推荐文章于 2020-12-08 18:55:29 发布

zhoujunr1

最新推荐文章于 2020-12-08 18:55:29 发布

阅读量556

点赞数

分类专栏：读读论文

本文链接：https://blog.csdn.net/zhoujunr1/article/details/77101134

版权

本文探讨了使用卷积神经网络进行弱监督物体识别，解释了弱监督学习的概念，并详细介绍了网络架构，包括卷积适应层、全局最大池化和损失函数。论文通过将全连接层视为卷积，适应不同大小的输入图像，并使用全局最大池化搜索最高得分对象位置。

摘要由CSDN通过智能技术生成

1. Model Overview

1.1 什么是weakly supervised learning

https://stackoverflow.com/questions/18944805/what-is-weakly-supervised-learning-bootstrapping

In short: In weakly supervised learning, you use a limited amount of labeled data.
也就是说，样本的label不好，怎么不好呢？不充分，只对应一部分，可能错误，都算。
比如对于图片，基本都是人工标记的，这里面就可能出现错误，标注不全

2. Network architecture

这里写图片描述

总体来说是，5个卷积层，4个全联接层
To adapt this architecture to weakly supervised learning we introduce the following three modifications. First, we treat the fully connected layers as convolutions, which allows us to deal with nearly arbitrary-sized images as input. Second, we explicitly search for the highest scoring object position in the image by adding a single global max-pooling layer at the out- put. Third, we use a cost function that can explicitly model multiple objects present in the image.
暂时不明白，往下看

2.1 Convolutional adaptation layers

目的：treat the fully connected layers as convolutions, which allows us to deal with nearly arbitrary-sized images as input
文章中提到了这篇论文：
《Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks》
这篇论文的主要思想是，迁移学习，也就是transfer learning，将用于大数据的训练好的网络用于别的任务。
Transfer learning aims to transfer knowledge between related source and target domains
就像论文中说的：
To address this problem, we propose to transfer image representations learned with CNNs on large datasets to other visual recognition tasks with limited training data.

但是原分类问题中的targets和现在的分类问题的targets不一样，于是，作者将原来网络层最后的输出层替换为FCa，FCb。（我理解就是，如果直接从头训练一个深层网络，由于参数过多，数据过少，导致训练结果不好，现在将网络层在一个大数据集训练好后，去掉最后的输出层，加两个全联接层，那么网络中的参数就是这两层的参数–也就减少了参数）
In order to achieve the transfer, we remove the output layer FC8 of the pre-trained network and add an adaptation layer formed by two fully connected layers FCa and FCb

那么如何使得我们自己定义的两层 e.g. Adaptation layer 更好地适应源与当前问题的不同呢？
we train the adaptation layer using a procedure inspired by training slid- ing window object detectors (e.g. [A discriminatively trained, multiscale, deformable part model]) described next.

方法：
提取patch，大概就是取图片的一个个小方块，方块之间有overlap，具体请参考论文。