解读 Fast-RCNN(1)

解读 Fast-RCNN(1)


大家都知道,fast-rcnn 用于图像的检测,比如图像里有一只猫,可以通过这个算法,检测到有猫,并且

可以用一个红框框把猫框出来

目标检测,深度估计和语义分割一样,是图像理解这一块,准确讲是 image understanding


来看一下Abstract部分,可以了解到,

1, This paper proposes a Fast Region-based Convolution Network method (Fast R-CNN) for object

     detection.

2, Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolution

     networks.

另外,fast R-CNN 真的很快,这就不细说了,作者把源代码公开了,这是值得赞扬的


再来看Introduction部分,可以认识一些基本的,概念上的东西,

1, Compared to image classification, object detection is a more challenging task that requires more

     complex methods to solve

目标跟踪是复杂的原因在于,

1, Complexity arises because detection requires the accurate localization of objects

2, Numerous candidate object locations must be processed

3, These candidates provide only rough localization that must be refined to achieve

     precise localization

4, Solution to these problems often compromise speed, accuracy, or simplicity

这篇文章的贡献是

We propose a single-stage training algorithm that jointly learns to classify object proposals and

refine their spatial locations


作者综述之前的经典方法,比如R-CNN和SPPnet,先总体说一下:

The Region-based Convolution Network (R-CNN) achieves excellent object detection accuracy by using

a deep ConvNet to classify object proposals

R-CNN 自身存在一些问题:

1, Training is a multi-stage pipeline

2, Training is expensive in space and time

3, Object detection is slow

总而言之,R-CNN算法很慢。很慢的原因是,

1, R-CNN is slow because it performs a ConvNet forward pass for each object proposal

     without sharing computation

因此,引出了SPPnets的方法,也算是R-CNN的改进,主要在 sharing computation 做文章

Spatial pyramid pooling networks (SPPnets) were proposed to speed up R-CNN by sharing computation

SPPnets的简述,

1, The SPPnet method computes a convolutional feature map for the entire input image and then

     classifies each object proposal using a feature vector extracted from the shared feature map

2, Features are extracted for a proposal by max-pooling the portion of the feature map inside the

     the proposal into a fixed-size output

3, Multiple output sizes are pooled and then concatenated as in spatial pyramid pooling

我之前没有看过SPPnet,所以不能特别清除它的用处,就仅作了解吧

同时,SPPnet 有一些缺点,不急,听作者的叙述,

1, Like R-CNN, training is a multi-stage pipeline that involves extracting features, fine-tuning a

    network with log loss, training SVMs, and finally fitting bounding-box regressors.

2, But unlike R-CNN, the fine-tuning algorithm cannot update the convolutional layers that

     precede the spatial pyramid pooling.

3, Features are also written to disk

于是,

Unsurprisingly, this limitation (fixed convolutional layers) limits the accuracy of very deep networks


Contributions

作者说自己的方法可以克服这些困难,

1, Higher detection quality than R-CNN, SPPnet

2, Training is single-stage, using a multi-task loss

3, Training can update all networks layers

4, No disk storage is required for feature caching


 下次接着讨论!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
PV-RCNN(Point-Voxel-Region-based Convolutional Neural Network)是一种用于点云目标检测的神经网络模型,其主要思想是将点云数据转换为体素(Voxel)表示,通过将点云划分为不同的体素,将点云数据转换为体素特征图,然后通过卷积神经网络提取特征,最后使用区域建议网络(Region Proposal Network,RPN)和ROI池化层进行目标检测。 PV-RCNN代码解读主要可以从数据预处理、网络模型构建和训练步骤三个方面进行说明。 首先,在数据预处理阶段,代码会先将原始的点云数据转换为体素表达。将点云数据进行体素化可以提高计算效率,同时也有利于在三维空间中建模目标。其次,代码会对体素进行特征提取,通过卷积神经网络对体素特征图进行卷积和池化操作,从而得到更具有表达能力的特征表示。 其次,在网络模型构建阶段,代码会搭建PV-RCNN的网络结构,包括体素特征提取网络和目标检测网络。体素特征提取网络通常采用多层卷积神经网络,用于提取点云体素的特征表示。目标检测网络则包括RPN和ROI池化层,用于生成目标的候选框,并对候选框进行分类和回归。 最后,在训练步骤中,代码会使用已标注的点云数据进行网络模型的训练。通常采用交叉熵损失函数进行分类损失计算,并使用边界框回归损失函数计算位置预测的误差。通过反向传播和优化算法,更新网络参数,使网络能够更好地适应目标检测的任务。 通过对PV-RCNN代码解读,我们可以更深入地理解其工作原理和实现方式,从而更好地应用于点云目标检测的研究和实践。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值