端到端车道线检测_弱监督对象检测-端到端培训管道

端到端车道线检测

Object detection is a well-known computer vision problem in which a tremendous amount of research has been done. Fully supervised object detection methods have become state-of-the-art for object detection. However, due to the inconvenience of gathering a large amount of data with accurate object-level annotations, weakly supervised object detection (semi-supervised approach) is recently seeking a lot of attention.

目标检测是一个众所周知的计算机视觉问题,已经进行了大量研究。 全面监督的物体检测方法已成为物体检测的最新技术。 但是,由于使用准确的对象级注释收集大量数据带来的不便,近来弱监督对象检测(半监督方法)受到了广泛关注。

目录 (Table of Contents)

  1. Introduction

    介绍
  2. Motivation

    动机
  3. Understanding the Basic Building Blocks

    了解基本构建块
  4. The Method

    方法
  5. Experiments and Results

    实验与结果
  6. Conclusion

    结论

1.简介 (1. Introduction)

  • In weakly supervised object detection [1], there are image-level annotations that determine whether an object is present or not.

    在弱监督的对象检测[1]中 ,存在确定对象是否存在的图像级注释。

  • This is different than the baseline supervised object detection since it contains instance-level annotations.

    这与基线受监控对象检测不同,因为它包含实例级注释。
  • Usually, this is a two-phase learning procedure: 1. multiple instance learning detector and 2. fully supervised learning detector with bounding-box regression. (explained in detail in section 3)

    通常,这是一个两阶段的学习过程:1. 多实例学习检测器 2.具有边界框回归的 完全监督学习检测器 。 ( 在第3节中详细说明)

  • In [1], a single end-to-end network is designed with both multiple instances learning detector and bounding box regression to get rid of the local minima problem (explained in section 2) with the two-phase approach.

    [1]中 ,设计了一个具有多个实例学习检测器和边界框回归的单端到端网络,以两阶段方法摆脱局部极小问题 (在第2节中进行了说明)。

Image for post
Figure 1: The learning strategy comparison of existing weakly supervised object detection methods (above the solid blue line) and our proposed method (below the solid blue line). [Source: [1]]
图1:现有弱监督对象检测方法(在实线蓝色上方)和我们提出的方法(在实线蓝色下方)的学习策略比较。 [资料来源:[1]]

2.动机 (2. Motivation)

  • The earlier two-phase approach uses multiple instance learning to train a multiple instance learning detector, which uses CNN as a feature extractor.

    早期的两阶段方法使用多实例学习来训练多实例学习检测器,该检测器使用CNN作为特征提取器。
  • The second phase uses a fully supervised detector Fast R-CNN to further refine (regress) the object locations by considering the output of region proposals (pseudo GT) from the first phase for supervision.

    第二阶段使用完全监督的检测器Fast R-CNN,通过考虑第一阶段进行监督的区域提议(伪GT)的输出来进一步细化(回归)对象位置。
  • This two-phase approach can lead to the following local minima problem.

    此两阶段方法可能导致以下局部最小值问题。

2.1局部极小问题 (2.1 Local Minima Problem)

  • Sometimes the multiple instance learning detector in the first phase begins by initiating weakly accurate bounding boxes. The detector may focus on very discriminative parts of the objects, for eg, the head of a cat.

    有时,在第一阶段的多实例学习检测器通过启动弱准确的边界框开始。 检测器可以聚焦在物体的非常有区别的部分上, 例如猫的头。

  • This can cause the creation of the wrong region proposals, which are then used as pseudo ground truth (since instance-level annotations are not present; it is pseudo GT) for the next phase.

    这可能会导致创建错误的区域建议,然后将其用作下一个阶段的伪基础事实(因为不存在实例级别的注释;它是伪GT)。
  • Ultimately the accurate location of the object cannot be learned in the second phase since the input is already overfitted to the wrong region.

    最终,由于输入已经被过度拟合到错误的区域,因此在第二阶段无法得知对象的准确位置。
Image for post
Figure 2: (1) Detection results of MIL Detector (2) Fast R-CNN with pseudo GT from MIL detector [Source: [1]]
图2:(1)MIL检测器的检测结果(2)来自MIL检测器的带有伪GT的快速R-CNN [来源:[1]]

Therefore, the MIL detector and bounding box regressor are jointly trained so that the regressor is able to start to adjust the predicted boxes before the MIL detector focuses seriously to small discriminative parts to give inaccurate results.

因此,对MIL检测器和包围盒回归器进行联合训练,以便回归器能够在MIL检测器将注意力集中到较小的区分部分以给出不准确的结果之前开始调整预测的盒。

3.了解基本构建基块 (3. Understanding the Basic Building Blocks)

3.1多实例学习(MIL) (3.1 Multiple Instance Learning (MIL))

  • MIL is basically a variation to supervised learning which assigns a single label to a set (bag) of instances instead of labeling individual instances.

    MIL基本上是监督学习的一种变体,它是将单个标签分配给一组实例(袋)而不是对单个实例进行标签。
  • A particular bag is labeled as negative if all the instances in the bag are negative.

    如果某个袋子中的所有实例均为负,则将其标记为负。
  • If at least one positive instance is present then that bag is labeled positive.

    如果存在至少一个阳性实例,则将该袋子标记为阳性。
Image for post
Link] 链接 ]
  • The MIL is a weakly learning process that selects the object predictions from the region proposals generated by some method which in [1] is the Selective Search Windows (SSW) method (section 3.3).

    MIL是一个弱学习过程,它从通过[1]中的“ 选择性搜索窗口”(SSW)方法(第3.3节)方法生成的区域建议中选择对象预测。

3.2完全监督的学习探测器(快速R-CNN) (3.2 Fully Supervised Learning Detector (Fast R-CNN))

  • The architecture of Fast R-CNN comprises of a CNN, pre-trained on the ImageNet weights is used for feature extraction.

    快速R-CNN的体系结构由CNN组成,在ImageNet权重上进行了预训练,用于特征提取。
  • The final pooling layer is replaced by an ROI pooling layer, which will generate bounding boxes around the locations of objects.

    最终的合并层被ROI合并层代替,该ROI合并层将在对象的位置周围生成边界框。

  • The final fully connected layer is replaced by two branches: 1. Classification Branch and 2. Bounding Box Regression Branch.

    最终的完全连接层由两个分支代替:1. 分类分支和2. 边界框回归分支

  • The classification branch will predict the class to which the object belongs and the regression branch will make the coordinates of the bounding box more precise.

    分类分支将预测对象所属的类,而回归分支将使边界框的坐标更精确。
Image for post
Link] 链接 ]

3.3选择性搜寻视窗 (3.3 Selective Search Windows)

  • Selective search is a region proposal algorithm used in object detection.

    选择性搜索是在对象检测中使用的区域提议算法。
  • It is a method that uses a hierarchical grouping of similar regions based on color, texture, size, and shape.

    它是一种根据颜色,纹理,大小和形状对相似区域进行分层分组的方法。
  • It begins by over-segmenting the image, then adds all the bounding boxes corresponding to segmented parts to the list of region proposals then groups adjacent segments based on similarity and repeats the procedure.

    首先从对图像进行过度分割开始,然后将与分割后的部分相对应的所有边界框添加到区域建议列表中,然后基于相似性对相邻的片段进行分组并重复该过程。
Image for post
Link] 链接 ]

4.方法 (4. The Method)

  • There are three major components: guided attention module (GAM), MIL branch, and regression branch in the proposed weakly supervised object detection network [1].

    拟议的弱监督对象检测网络[1]中包含三个主要组件:引导注意力模块(GAM),MIL分支和回归分支

  • An enhanced feature map is first extracted from the CNN network with GAM from the given input image.

    首先使用给定输入图像的GAM从CNN网络中提取增强的特征图。
  • The ROI pooling layer from the CNN generates the region features which are later sent to the MIL branch and regression branch.

    CNN的ROI合并层生成区域特征,然后将其发送到MIL分支和回归分支。
  • The MIL branch then proposes object locations and categories which are further taken as pseudo GT for the regression branch for location regression and classification.

    然后,MIL分支提出对象位置和类别,这些对象的位置和类别将进一步作为伪GT用于回归分支,以进行位置回归和分类。
Image for post
Figure 4: Architecture of the proposed network from [1]. (1) Generate discriminate features using the attention mechanisms. (2) Generate the RoI features from the enhanced feature map. (3) MIL branch: Feed the extracted RoI features into a MIL network for pseudo GT boxes annotation initialization. (4) Regression branch: Feed the extracted RoI features and generated pseudo GT to the regression branch for RoI classification and regression.
图4:[1]中提出的网络架构。 (1)使用注意力机制生成区分特征。 (2)从增强的特征图中生成RoI特征。 (3)MIL分支:将提取的RoI特征输入到MIL网络中,以进行伪GT盒注释初始化。 (4)回归分支:将提取的RoI特征和生成的伪GT馈送到回归分支,以进行RoI分类和回归。

4.1引导注意模块 (4.1 Guided Attention Module)

  • The following is the conventional spatial neural attention structure.

    以下是常规的空间神经注意结构。
  • The attention module takes the feature map X extracted from the ConvNet as input and a spatial-normalized attention weight map is generated as output.

    注意模块将从ConvNet中提取的特征图X作为输入,并生成空间标准化的关注权重图作为输出。
  • The output attention map is multiplied with the original feature map X to get the attended feature. The attended feature is then added to X to get an enhanced feature map.

    将输出的注意力图与原始特征图X相乘,以获得关注的特征。 然后将有人照看的要素添加到X中以获得增强的要素图。
  • This will help to give more importance to relevant features and suppress the irrelevant features.

    这将有助于更加重视相关功能并消除不相关的功能。
  • In order to trace the learning of attention weights, a classification loss is added.

    为了追踪注意权重的学习,添加了分类损失。
  • To get the classification score vector, an attention map is fed to another convolutional layer and a Global Average Pooling (GAP) layer.

    为了获得分类得分向量,将注意力图馈送到另一个卷积层和全局平均池(GAP)层。

For detailed mathematical understanding, refer to section 3.1 from [1].

有关详细的数学理解,请参阅[1]中的3.1节。

4.2 MIL分支 (4.2 MIL Branch)

  • MIL branch is introduced to initialize the pseudo GT annotations.

    引入了MIL分支以初始化伪GT注释。
  • The network adopted here is Online Instance Classifier Refinement (OICR) which is based on WSDDN for its effectiveness and end-to-end training.

    这里采用的网络是基于WSDDN的 在线实例分类器细化(OICR) ,以实现其有效性和端到端培训。

  • Classification and Detection are the two streams that are employed by WSDDN. When these two streams are combined, instance-level predictions can be achieved.

    分类和检测是WSDDN使用的两个流。 当这两个流组合在一起时,可以实现实例级别的预测。
  • WSDDN has its own disadvantages so to further improve the performance for generating tight bounding boxes, OICR and its upgraded version Proposal Cluster Learning (PCL) are used.

    WSDDN有其自身的缺点,因此为了进一步提高生成紧密边界框的性能,使用了OICR及其升级版的提案集群学习(PCL)

4.3多任务分支 (4.3 Multi-Task Branch)

  • A multi-task branch is used to operate fully supervised classification and regression after the pseudo GT annotations are generated.

    在生成伪GT注释后,多任务分支用于进行完全监督的分类和回归。
  • There is a detection branch that has two sibling branches. The first branch predicts a discrete probability distribution, which is computed by a softmax over the outputs of a fully connected layer.

    有一个具有两个同级分支的检测分支。 第一个分支预测离散的概率分布,这是由softmax在完全连接的层的输出上计算得出的。
  • The second sibling branch outputs bounding-box regression offsets for each of the object classes.

    第二个同级分支为每个对象类输出边界框回归偏移。
  • This works similar to a Fast R-CNN architecture.

    这类似于快速R-CNN架构。

5.实验与结果 (5. Experiments and Results)

5.1数据集和评估指标 (5.1 Datasets and Evaluation Metrics)

  • The datasets used for evaluation are PASCAL VOC 2007 and 2012. They comprise 9963 and 22531 images with 20 classes respectively. The train-val set used for training is 5011 images for PASCAL VOC 2007 and 11540 for PASCAL VOC 2012.

    用于评估的数据集是PASCAL VOC 2007和2012。它们分别包含20种类别的9963和22531图像。 用于训练的训练值集对于PASCAL VOC 2007是5011图像,对于PASCAL VOC 2012是11540图像。
  • The evaluation metrics Average Precision (AP) and the mean of AP (mAP) are used to test the model on the test set. To measure the localization accuracy, Correct localization (CorLoc) is also used to evaluate the model.

    评估指标的平均精度(AP)和AP平均值(mAP)用于在测试集上测试模型。 为了测量定位精度,还使用正确定位(CorLoc)评估模型。
  • PASCAL criteria: IOU>0.5 between ground truth boxes and predicted boxes is used for evaluation.

    PASCAL标准:地面真值框与预测框之间的IOU> 0.5用于评估。

5.2与最新技术的比较 (5.2 Comparison with State-of-the-Art)

  • The mAP performance is improved by the proposed method (48.6%) over all other methods on the PASCAL VOC 2007 test set.

    与PASCAL VOC 2007测试集上的所有其他方法相比,通过建议的方法(48.6%)改进了mAP性能。
Image for post
Figure 5: Comparison of AP performance(%) on PASCAL VOC 2007 test. The upper part shows results by using single end-to-end model. The lower part shows results by multi-phase approaches or ensemble model. [Source: [1]]
图5:PASCAL VOC 2007测试中AP性能的比较(%)。 上部显示了使用单个端到端模型的结果。 下部显示了多阶段方法或集成模型的结果。 [资料来源:[1]]
  • The mAP performance is improved by the proposed method (46.8%) over all other methods on the PASCAL VOC 2012 test set.

    与PASCAL VOC 2012测试集上的所有其他方法相比,通过建议的方法(46.8%)改进了mAP性能。
Image for post
Figure 6: Comparison of AP performance(%) on the PASCAL VOC 2012 test. The upper part shows results by using a single end-to-end model. The lower part shows results by multi-phase approaches or ensemble model. [Source: [1]]
图6:PASCAL VOC 2012测试中AP性能的比较(%)。 上部显示了使用单个端到端模型的结果。 下部显示了多阶段方法或集成模型的结果。 [资料来源:[1]]
  • The correct localization (CorLoc) performance is improved by the proposed method (66.8%) over all other methods on the PASCAL VOC 2007 train-val set.

    通过提出的方法(66.8%),可以在PASCAL VOC 2007火车参数集上比所有其他方法提高正确的定位(CorLoc)性能。
Image for post
Figure 7: Comparison of correct localization (CorLoc) (%) on PASCAL VOC 2007 trainval. The upper part shows results by a single end-to-end model. The lower part shows results by multi-phase approaches or ensemble model. [Source: [1]]
图7:在PASCAL VOC 2007列车上正确定位(CorLoc)(%)的比较。 上部显示了单个端到端模型的结果。 下部显示了多阶段方法或集成模型的结果。 [资料来源:[1]]
  • The correct localization (CorLoc) performance is improved by the proposed method (69.5%) over all other methods on the PASCAL VOC 2012 train-val set.

    通过提出的方法(69.5%),相对于PASCAL VOC 2012火车车具组上的所有其他方法,可以提高正确的定位(CorLoc)性能。
Image for post
Figure 8: Comparison of correct localization (CorLoc) (%) on PASCAL VOC 2012 trainval. The upper part shows results by a single end-to-end model. The lower part shows results by multi-phase approaches or ensemble model [Source: [1]]
图8:在PASCAL VOC 2012列车上正确定位的比较(CorLoc)(%)。 上部显示了单个端到端模型的结果。 下部显示了通过多阶段方法或集成模型得出的结果[来源:[1]]

5.3建议方法的改进 (5.3 Improvement with the Proposed Method)

Image for post
Figure 9: Detection results of MIL detector (left part) , Fast R-CNN with pseudo GT from MIL detector (middle part), and the proposed jointly training network (right part) [1] at different training iterations. [Source: [1]]
图9:MIL检测器的检测结果(左部分),来自MIL检测器的带有伪GT的Fast R-CNN(中间部分)和拟议的联合训练网络(右部分)[1]在不同的训练迭代中。 [资料来源:[1]]

For detailed implementation details and results refer to Section 4 of [1].

有关详细的实现细节和结果,请参阅[1]的第4节。

六,结论 (6. Conclusion)

  • A novel framework [1] is presented for the task of weakly supervised object detection which proves to be better than the traditional approaches in this field.

    针对弱监督目标检测的任务,提出了一种新颖的框架[1] ,事实证明该框架优于该领域的传统方法。

  • The proposed method of jointly optimizing the MIL detection and regression in an end-to-end manner achieves the desired results by eliminating the local minima problem and achieves higher accuracy on the state-of-the-art PASCAL VOC 2007 and 2012 datasets.

    通过消除局部最小值问题,以端到端的方式共同优化MIL检测和回归的方法在最新的PASCAL VOC 2007和2012数据集上实现了较高的准确性。
  • For better feature learning, guided attention module (GAM) is introduced. The proposed framework could also prove to be useful for future visual learning tasks.

    为了更好地进行特征学习,引入了引导注意模块(GAM)。 提议的框架也可能被证明对将来的视觉学习任务很有用。

7.参考 (7. References)

[1] Ke Yang, Dongsheng Li, and Yong Dou. “Towards Precise End-to-end Weakly Supervised Object Detection Network”. ICCV, 2019.

[1]杨克,李东升和窦勇。 “迈向精确的端到端弱监督对象检测网络”。 ICCV,2019年。

[2] Maximilian Ilse, Jakub M. Tomczak, and Max Welling. “Attention-based Deep Multiple Instance Learning.” ICML, 2018.

[2] Maximilian Ilse,Jakub M. Tomczak和Max Welling。 “基于注意力的深度多实例学习。” ICML,2018年。

[3] Jyoti G. Wadmare, Sunita R. Patil. “Improvising Weakly Supervised Object Detection (WSOD) using Deep Learning Technique.” International Journal of Engineering and Advanced Technology (IJEAT), 2020.

[3] Jyoti G. Wadmare,Sunita R. Patil。 “使用深度学习技术改进弱监督对象检测(WSOD)。” 国际工程与先进技术杂志(IJEAT),2020年。

[4] https://towardsdatascience.com/fast-r-cnn-for-object-detection-a-technical-summary-a0ff94faa022

[4] https://towardsdatascience.com/fast-r-cnn-for-object-detection-a-technical-summary-a0ff94faa022

[5] https://www.learnopencv.com/selective-search-for-object-detection-cpp-python/

[5] https://www.learnopencv.com/selective-search-for-object-detection-cpp-python/

翻译自: https://medium.com/visionwizard/weakly-supervised-object-detection-a-precise-end-to-end-approach-ed48d51128fc

端到端车道线检测

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值