可变形部件模型DPM和HOG特征for行人检测Discriminatively trained deformable part models

本文介绍了一种基于变形部件模型的目标检测系统,该系统使用混合变形部件模型表示对象,并采用判别训练方法仅需图像中对象的边界框进行训练。系统在PASCAL和INRIA人数据集上取得了优秀的检测效果。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Discriminatively trained deformable part models

Version 5 (Sept. 5, 2012)

 

Introduction

Over the past few years we have developed a complete learning-based system for detecting and localizing objects in images. Our system represents objects using mixtures of deformable part models. These models are trained using a discriminative method that only requires bounding boxes for the objects in an image. The approach leads to efficient object detectors that achieve state of the art results on the PASCAL and INRIA person datasets. 

At a high level our system can be characterized by the combination of

  1. Strong low-level features based on histograms of oriented gradients (HOG)
  2. Efficient matching algorithms for deformable part-based models (pictorial structures)
  3. Discriminative learning with latent variables (latent SVM)

This work was awarded the PASCAL VOC "Lifetime Achievement" Prize in 2010.


Code

Here you can download a complete implementation of our system. The current implementation extends the system in [2] as described in [6]. The models in this implementation are structured using the grammar formalism presented in [4]. Previous releases are available below.

The distribution contains object detection and model learning code, as well as models trained on the PASCAL and INRIA Person datasets.

This release also includes code for

  1. Rescoring detections based on contextual information
  2. The fast cascade detection algorithm described in [3]
  3. Training the person detection grammar described in [5]

The system is implemented in MATLAB, with helper functions written in C/C++ for efficiency reasons. The software was tested on several versions of Linux and Mac OS X using MATLAB version R2011a. Earlier versions of MATLAB should also work, though there may be compatibility issues with releases prior to 2009.

For questions regarding the source code please read the FAQ first. Contact Ross Girshick at ross...@gmail.com (click the "..." to reveal the email address) if you're still stuck.

Source code and model download: voc-release5.tgz (updated Sept. 5, 2012).

New: I also maintain a repository on github that includes bug fixes, speed improvements, and other updates. In general that code will produce different (though similar) results to the tables listed below.

What's changed since voc-release4? changelog

This project has been supported by the National Science Foundation under Grant No. 0534820, 0746569 and 0811340.

How to cite

When citing our system, please cite reference [2] and the website for this specific release. Bibtex entries are provided below for your convenience.

@misc{voc-release5,
 author = "Girshick, R. B. and Felzenszwalb, P. F. and McAllester, D.",
 title = "Discriminatively Trained Deformable Part Models, Release 5",
 howpublished = "http://people.cs.uchicago.edu/~rbg/latent-release5/"}

@article{lsvm-pami,
  title = "Object Detection with Discriminatively Trained Part Based Models",
  author = "Felzenszwalb, P. F. and Girshick, R. B. and McAllester, D. and Ramanan, D.",
  journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",
  year = "2010", volume = "32", number = "9", pages = "1627--1645"}


References
  1. P. Felzenszwalb, D. McAllester, D. Ramanan
    A Discriminatively Trained, Multiscale, Deformable Part Model
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008
  2. P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan
    Object Detection with Discriminatively Trained Part Based Models
    IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 9, Sep. 2010 
    pdf
  3. P. Felzenszwalb, R. Girshick, D. McAllester
    Cascade Object Detection with Deformable Part Models
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010
    pdf
  4. P. Felzenszwalb, D. McAllester
    Object Detection Grammars
    University of Chicago, Computer Science TR-2010-02, February 2010
    pdf
  5. R. Girshick, P. Felzenszwalb, D. McAllester
    Object Detection with Grammar Models
    Neural Information Processing Systems (NIPS), 2011
    pdf
  6. R. Girshick
    From Rigid Templates to Grammars: Object Detection with Structured Models
    Ph.D. dissertation, The University of Chicago, Apr. 2012
    pdf slides

Example detections

  


Detection results — PASCAL datasets

The models included with the source code were trained on the train+val dataset from each year and evaluated on the corresponding test dataset. 
This is exactly the protocol of the "comp3" competition. Below are the average precision scores we obtain in each category.

Table 1. PASCAL VOC 2010 comp3
  aero bicycle bird boat bottle bus car cat chair cow table dog horse mbike person plant sheep sofa train tv mean
without context 45.6 49.0 11.0 11.6 27.2 50.5 43.1 23.6 17.2 23.2 10.7 20.5 42.5 44.5 41.3 8.7 29.0 18.7 40.0 34.5 29.6
with context 48.2 52.2 14.8 13.8 28.7 53.2 44.9 26.0 18.4 24.4 13.7 23.1 45.8 50.5 43.7 9.8 31.1 21.5 44.4 35.7 32.2
with context &
extra octave
49.2 53.8 13.1 15.3 35.5 53.4 49.7 27.0 17.2 28.8 14.7 17.8 46.4 51.2 47.7 10.8 34.2 20.7 43.8 38.3 33.4
person detection
grammar
                            49.9            
Table 2. PASCAL VOC 2007 comp3
  aero bicycle bird boat bottle bus car cat chair cow table dog horse mbike person plant sheep sofa train tv mean
without context 33.2 60.3 10.2 16.1 27.3 54.3 58.2 23.0 20.0 24.1 26.7 12.7 58.1 48.2 43.2 12.0 21.1 36.1 46.0 43.5 33.7
with context 36.6 62.2 12.1 17.6 28.7 54.6 60.4 25.5 21.1 25.6 26.6 14.6 60.9 50.7 44.7 14.3 21.5 38.2 49.3 43.6 35.4
person detection
grammar
                            48.7            

Detection Results — INRIA Person

We also trained and tested a model on the INRIA Person dataset.
We scored the model using the PASCAL evaluation methodology in the complete test dataset, including images without people.

Annotations for the INRIA dataset in the PASCAL VOC format are available: 
   INRIA person training README    INRIA person annotations 

INRIA Person average precision: 88.0

Plot of Recall / False positives per image (FPPI) 


Previous Releases

voc-release4
voc-release3
voc-release2
voc-release1

Back to rbg's homepage


from: http://www.cs.berkeley.edu/~rbg/latent/index.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值