something about pedestrian detection

some related papers:

[1] Piotr Dolla´ r, Christian Wojek, Bernt Schiele, and Pietro Perona. Pedestrian Detection: An Evaluation of the State of the Art. TPAMI, 34(4), 2012

[2] Mohamed Hussein, Fatih Porikli, andLarry Davis. A Comprehensive Evaluation Framework and a Comparative Study for Human Detectors. TITS, 10(3), 2009

[3] Markus Enzweiler, Dariu M. Gavrila. Monocular Pedestrian Detection: Survey and Experiments. TPAMI, 31(12), 2009

[4] David Gero´ nimo, Antonio M. Lo´ pez, Angel D. Sappa, and Thorsten Graf. Survey of Pedestrian Detection for Advanced Driver Assistance Systems. TPAMI, 32(7), 2010

[5] Tarak Gandhi and Mohan Manubhai Trivedi. Pedestrian Protection Systems: Issues, Survey, and Challenges. TITS, 8(3), 2007


    The performance of pedestrian detection still has much room for improvement, especially at low resolutions and for partially occluded pedestrians.
    The descriptors employed to represent features and the training window size are more important for the detector performances than the imaged electromagnetic band, i.e., visible and NIR images. And also the choice between resizing images or features has significant impact on the performance, depending on the descriptors.
    A clear advantage of hog/linSVM at higher image resolutions and lower processing speeds (e.g., independent from number of support vectors), and a superiority of the wavelet-based adaboost cascade approach at lower image resolutions and (near) real-time processing speeds. Also this paper proposes a pedestrian data set driving in urban traffic.
    At low-resolution pedestrian images (18*36 pixels), dense harr wavelet features represent most viable option. HOG features perform best at intermediate resolutions (48*96 pixels).

Sensors
   It is a marvel that the human visual system can process vast amount of data from the scene and extract information in real time that enables driving. Video sensors would therefore be a natural choice for intelligent driver support systems. The output of cameras are the most widely used sensors, due to the high potential of visual features, high spatial resolution, and richness of texture and color cues, while image analysis is far from simple, and VS can be affected by glaring sources of light Visible spectra (VS): 0.4-0.75um, typically for daytime;
   Thermal Infrared (TIR) sensors are very effective for detection of pedestrians at night, while it is quite expensive, have low resolution, are more difficult to integrate since they cannot see through windshields, and are not producing quite convincing results, and it is less effective in hot daytime conditions where there is less temperature difference between the pedestrians and the background, and can be influenced by other hot objects, changing weather conditions (i.e., relative temperature changes), year/season, etc.. TIR: 6-14um, capture relative temperature, suitable to daytime and nighttimes, sometime called night vision;
   Near Infrared (NIR) sensors are useful for night vision accompanied by an illuminator, and less expensive than TIR sensors. NIR also have higher resolution and produce images that resemble visible light images, suitable to image processing techniques. NIR: 04-1.4um, work in the VS+NIR spectrum, suitable to daytime and nighttime;
   VS, NIR, TIR are passive sensors, while radars, laser scanners, as the time-of-flight sensors, are active sensors. They can directly give accurate depth information. In general, active sensors are convenient for detecting objects and providing superior range estimates out to larger distances relative to passive ones. While they have low angular resolution, and the radar became unreliable at 10-15 m when working in real scenes due to reflections from other objects (human have low reflectance). Laser scanners, working with infrared beams, can detect pedestrians while providing accurate distance estimates, but laser scanners are very expensive and can be affected by adverse weather conditions just like cameras, which is not the case for radar.
   Fusion of multiple sensors is used in many systems to obtain complementary information for overall system performance improvement. An important issue in multisensor fusion is the registration of images from each sensor.

Data set

  1. ETH: A. Ess, B. Leibe, and L. Van Gool, “Depth and Appearance for Mobile Scene Analysis,” Proc. IEEE Int’l Conf. Computer Vision, 2007.
  2. TUD-Brussels: C. Wojek, S. Walk, and B. Schiele, “Multi-Cue Onboard Pedestrian Detection,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
  3. Daimler-DB: M. Enzweiler and D.M. Gavrila, “Monocular Pedestrian Detection: Survey and Experiments,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 12, pp. 2179- 2195, Dec. 2009.
  4. INRIA: N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005. http://lear.inrialpes.fr/data;
  5. Caltech pedestrian data set: P. Dolla? r, C. Wojek, B. Schiele, and P. Perona, “Pedestrian Detection: A Benchmark,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
  6. the Daimler Chrysler Pedestrian classification Benchmark, samples with 18*36 pixels. http://www.science.uva.nl/research/isla/downloads/pedestrians;
  7. the Computer Vision Center Pedestrian Database, sample with 140*280 to 12*24. http://www.cvc.uab.es/adas/databases;
  8. MIT pedestrian data set, while is outdata. http://cbcl.mit.edu/software-datasets;
  9. USC pedestrian detection test set, divided into front/rear full view, front/rear partial interhuman occlusions, and front/rear/side viewed pedestrians. http://iris.usc.edu/Vision-Users/OldUsers/bowu/DatasetWebpage/dataset.html

   The resource website: www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/

Applications:
   robotics, entertainment, surveillance, care for elderly and disabled, and content-based indexing, pedestrian protection in traffic.

Challenges:
   The appearance shows very high variability due to pose changes, different clothes, carrying objects, different heights; cluttered background in outdoor urban scenarios or highways under a wide range of illumination and weather conditions (e.g., shadows and poor contrast in the visible spectrum); occlusions by each other or environment; dynamic scenes by pedestrians and camera motions; multiple viewing angles of pedestrians; required performance (e.g., false alarm rate and miss rate); required processing speed.

Evaluation Methodology
   The Evaluation Criteria in identical test data includes sensitivity, number of FPs in unit time or unit frame number, processing speed, localization tolerance, coverage area, etc. (ps: ROC curve corresponds to sensitivity vs. 1-specificity, and specificity = TN/(TN+FP), while in practice, ROC curve usually to sensitivity vs. Average FP Number per frame.)
   1) Full image evaluation: A detection system needs to take an image and return a BB and a score or confidence for each detection, where it performs multiscale detection and nonmaximal suppression (NMS). The match criterion is:  , which is the well-known PASCAL measure proposed in 2010.
   Each detected BB and ground truth BB may be matched at most once. Detection with the highest confidence are matched first, if a detected BB matches multiple ground truth BBs, the match with highest overlap is used, which may be suboptimal in crowded scenes. Unmatched detected BB count as false positives and unmatched ground truth BB as false negatives.
  ROC curves (miss rate vs. false positives per image (FPPI), using log-log plots (DET)) are plotted by varying the threshold on detection confidence, where the same range of false alarm rate is covered. The log-average miss rate is used to summarize detector performance, computed by averaging miss rate at nine FPPI rates evenly spaced in log-space in the range 10-2 to 100, and giving a stable and informative assessment of performance.
  2) confidence intervals: based on multiple testing, there exist multiple ROC curves, and for each point with the same false alarm rate, there is a range of miss rate. Then we can evaluate the confidence intervals of points in terms of miss rate. I consider confidence intervals are variance of miss rate.  Average Log Miss Rate (ALMR) is introduced to evaluate the confidence intervals based on a 10-fold cross validation, i.e., 3 for training and 2 for testing with 5 datasets, then   experiments. Also, we can plot the average ROC curve. Based on ALMR, we compare different curves in all experiments by a box plot for the mean, confidence interval, and range of ALMR scores, where each box plot corresponding to a experiment based on a 10-fold cross validation.
  3) ALMR:  , n is the number of points in the ROC curve, and mri  is miss rate of point i. sigma is a small regularization constant, and not significant in comparing curves. The higher the value of ALMR, the lower the miss rate over the curve on average, the better the performance is. ALMR is related to the geometric mean of the miss rate values and also proportional to the area under curve in the log-log domain.


  4) Filtering ground truth with different types: Sometimes, portions of a data set are wished to exclude, such as pedestrians under 20 pixels high, or truncated by image boundaries, containing a “Person?”, or containing “People”. This paper introduces the notion of ignore regions. Ground truth BBs selected to be ignored, denoted using BBig, need not to be matched. However, matches are not considered mistakes either. i.e., detected BB matched to BBig do not count as true positives and unmatched BBig do not count as false negatives, meaning a detected BB can only match a BBig if it dose not match any ground truth BB. Multiple matches to a single BBig are allowed. Match criterion between detected BB and BBig is:  , where Th is usually set to 0.5, and the threshold has an significant influence on localization accuracy.
  5) Filtering detections with different scales: When evaluating performance in a fixed scale range, detections far outside the scale range under consideration should not influence the evaluation. Three possible filtering strategies: strict filtering, that is, all detections outside the selected range are removed prior to matching, which makes more false negatives and causes underreported performance; Postfiltering, that is, detections outside the selected range are allowed to match ground truth BB inside the range, after matching, any unmatched detected BB outside the range is removed and does not count as a false positives, which causes overreported performance; Expanded filtering, a good compromise between strict filtering and postfiltering, that is, all detections outside an expanded evaluation range are removed prior to evaluation, such as the range is from S0 to S1, then the expanded is from S0/r to S1*r, where r is ratio and is often set to 1.25. Expanded filtering is believed in this paper to reflect true performance most accurately.
  6) Per-Window (also Cropped Window) vs. whole image: PW evaluation is commonly used to compare classifiers (as opposed to detectors) or to evaluate systems that perform automatic region of ROI generation, while not all detectors are based on classifiers. Also, this paper find that in practice, PW and full image performance are only weakly correlated, not the better PW performance is, the better detection performance is. The reasons include: choices for spatial and scale stride and NMS, and the tested windows are typically not same. Full image metrics provide a natural measure of error of an overall detection system, and should be standard for general object detection instead of cropped window metrics. For the whole images, the evaluations include two types, resizing images or resizing features, i.e., a multisize image pyramid with a fixed scanning window size, or a single image size with different scanning window sizes.
  7) The evaluation for n detectors on m data sets is analyzed using the nonparametric Friedman test along with the Shaffer posthoc analysis. n detectors are ranked on each data set based on the log-average miss rate, which yields a total of m rankings for n detectors, represented by a n*n matrix. Detectors are ordered by improving mean rank (displayed in brackets).
 



Detectors with best performance:

[1] S. Walk, N. Majer, K. Schindler, and B. Schiele, “New Features and Insights for Pedestrian Detection,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010
[2] P. Dolla? r, S. Belongie, and P. Perona, “The Fastest Pedestrian Detector in the West,” Proc. British Machine Vision Conf., 2010.
[3] P. Dolla? r, Z. Tu, P. Perona, and S. Belongie, “Integral Channel Features,” Proc. British Machine Vision Conf., 2009. (the former version of FPDW)
[4] A. Bar-Hillel, D. Levi, E. Krupka, and C. Goldberg, “Part-Based Feature Synthesis for Human Detection,” Proc. European Conf. Computer Vision, 2010.

HOG detector: Q. Zhu, S. Avidan, M.-C. Yeh, and K.-T. Cheng, “Fast human detection using a cascade of histograms of oriented gradients,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recog., New York, Jun. 2006, pp. 1491–1498.
COV detector: O. Tuzel, F. Porikli, and P. Meer, “Human detection via classification on Riemannian manifolds,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recog., 2007, pp. 1–8.
LogitBoost algorithm: J. Friedman, T. Hastie, and R. Tibshirani, “Additive logistic regression: A statistical view of boosting,” Ann. Stat., vol. 28, no. 2, pp. 337–407, 2000.
wavelet-based adaboost cascade: P. Viola, M. Jones, and D. Snow, “Detecting Pedestrians Using Patterns of Motion and Appearance,” Int’l J. Computer Vision,vol. 63, no. 2, pp. 153-161, 2005.
hog/linSVM: Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, pp. 886-893, 2005.
NN/LRF: C. Wo¨hler and J. Anlauf, “An Adaptable Time-Delay Neural-Network Algorithm for Image Sequence Analysis,” IEEE Trans. Neural Networks, vol. 10, no. 6, pp. 1531-1536, Nov. 1999.
combined shape-texture detection: D.M. Gavrila and S. Munder, “Multi-Cue Pedestrian Detection and Tracking from a Moving Vehicle,” Int’l J. Computer Vision, vol. 73, no. 1, pp. 41-59, 2007.
PROTECTOR: D. Gavrila, J. Giebel, and S. Munder, “Vision-Based Pedestrian Detection: The PROTECTOR System,” Proc. IEEE Intelligent Vehicles Symp., pp. 13-18, 2004. &&  D. M. Gavrila and S.Munder, “Multi-cue pedestrian detection and tracking from a moving vehicle,” Int. J. Comput. Vis., vol. 73, no. 1, pp. 41– 59, Jun. 2007.
SAVE-U: M. M. Meinecke, M. A. Obojski, M. T?ns, and M. Dehesa, “SAVE-U: First experiences with a pre-crash system for enhancing pedestrian safety,” in Proc. 5th Eur. Congr. Intell. Transp., Jun. 2005. && M. T?ns, R. Doerfler, M.-M. Meinecke, and M. A. Obojski, “Radar sensors and sensor platform used for pedestrian protection in the ECfunded project SAVE-U,” in Proc. IEEE Intell. Veh. Symp., Jun. 2004, pp. 813–818.
Moving object detection with moving cameras: Y. Zhang, S. J. Kiselewich, W. A. Bauson, and R. Hammoud, “Robust moving object detection at distance in the visible spectrum and beyond using a moving camera,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. Workshop, 2006, p. 131.
CONDENSATION (a variant of PF): M. Isard and A. Blake, “Contour Tracking by Stochastic Propagation of Conditional Density,” Proc. European Conf. Computer Vision, pp. 343-356, 1996.

Problems

  1. How is the detection rate and effective distance of face detection? Can face detection substitute for pedestrian detection?
  2. How to order the detectors by improving mean rank, i.e., how to calculate the improving mean rank? How to get critical difference diagram with a confidence level?
  3. How to train the cascade classifier?
  4. What is LogitBoost algorithm for training a layer?
  5. How to computer COV features?
  6. How to resize features?
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值