Weakly supervised object recognition with convolutional neural networks 论文解读

本文探讨了使用卷积神经网络进行弱监督物体识别,解释了弱监督学习的概念,并详细介绍了网络架构,包括卷积适应层、全局最大池化和损失函数。论文通过将全连接层视为卷积,适应不同大小的输入图像,并使用全局最大池化搜索最高得分对象位置。
摘要由CSDN通过智能技术生成

1. Model Overview

1.1 什么是weakly supervised learning

https://stackoverflow.com/questions/18944805/what-is-weakly-supervised-learning-bootstrapping

In short: In weakly supervised learning, you use a limited amount of labeled data.
也就是说,样本的label不好,怎么不好呢?不充分,只对应一部分,可能错误,都算。
比如对于图片,基本都是人工标记的,这里面就可能出现错误,标注不全

2. Network architecture

这里写图片描述

总体来说是,5个卷积层,4个全联接层
To adapt this architecture to weakly supervised learning we introduce the following three modifications. First, we treat the fully connected layers as convolutions, which allows us to deal with nearly arbitrary-sized images as input. Second, we explicitly search for the highest scoring object position in the image by adding a single global max-pooling layer at the out- put. Third, we use a cost function that can explicitly model multiple objects present in the image.
暂时不明白,往下看

2.1 Convolutional adaptation layers

目的:treat the fully connected layers as convolutions, which allows us to deal with nearly arbitrary-sized images as input
文章中提到了这篇论文:
《Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks》
这篇论文的主要思想是,迁移学习,也就是transfer learning,将用于大数据的训练好的网络用于别的任务。
Transfer learning aims to transfer knowledge between related source and target domains
就像论文中说的:
To address this problem, we propose to transfer image representations learned with CNNs on large datasets to other visual recognition tasks with limited training data.

但是原分类问题中的targets和现在的分类问题的targets不一样,于是,作者将原来网络层最后的输出层替换为FCa,FCb。(我理解就是,如果直接从头训练一个深层网络,由于参数过多,数据过少,导致训练结果不好,现在将网络层在一个大数据集训练好后,去掉最后的输出层,加两个全联接层,那么网络中的参数就是这两层的参数–也就减少了参数)
In order to achieve the transfer, we remove the output layer FC8 of the pre-trained network and add an adaptation layer formed by two fully connected layers FCa and FCb

那么如何使得我们自己定义的两层 e.g. Adaptation layer 更好地适应源与当前问题的不同呢?
we train the adaptation layer using a procedure inspired by training slid- ing window object detectors (e.g. [A discriminatively trained, multiscale, deformable part model]) described next.

方法:
提取patch,大概就是取图片的一个个小方块,方块之间有overlap,具体请参考论文。

2.2 回归正题

在介绍论文的模型前,先来看一下这篇论文。
《ImageNet Classification with Deep Convolutional Neural Networks》

这里写图片描述

模型的构造如上图,原图 224 * 224 * 3,经过96 个 11 * 11 * 3 的卷积核,再经过256 个 5 * 5 * 48 (这里是48而不是96的原因,应该是作者将它分在了两个GPU上训练)的卷积核 。。。具体请参考论文。

用上面提到的那篇文章的 network architecture,再做一些改进。

如下面这个图所示&#x

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值