Object Recognition, Computer Vision, and the Caltech 101: A Response to Pinto et al.

转载 2011年06月12日 10:59:00

Object Recognition, Computer Vision, and the Caltech 101: A Response to Pinto et al.

Yann LeCun, David G. Lowe, Jitendra Malik, Jim Mutch, Pietro Perona, and Tomaso Poggio

Readers of the recent paper “Why is Real-World Visual Object Recognition Hard?” [8] who are unfamiliar with the literature on computer vision are likely to come away with the impression that the problem of making visual recognition invariant with respect to position, scale, and pose has been overlooked. We would therefore like to clarify two main points.

(1) The paper criticizes the popular Caltech 101 benchmark dataset for not containing images of objects at a variety of positions, scales, and poses. It is true that Caltech 101 does not test these kinds of variability; however, this omission is intentional. Techniques for addressing these issues were the focus of much work in the 1980s [11]. For example, datasets like that of Murase and Nayar [6] focused on the problem of recognizing specific objects from a variety of 3d poses, but did not address the issue of object categories and the attendant intra-category variation in shape and texture. Pinto et al.’s synthetic dataset is in much the same spirit as Murase and Nayar’s. Caltech 101 was created to test a system [4,3] that was already position, scale, and pose invariant, with the goal of focusing on the more difficult problem of categorization. Its lack of position, scale, and pose variation is stated explicitly on the Caltech 101 website [2], where the dataset is available for download, and is often explicitly restated in later papers that use the dataset (including three of the five cited in Fig. 1). This is not to say that Caltech 101 is without problems. For example, as the authors state, correlation of object classes and backgrounds is a concern, and the relative success of their “toy” model does seem to suggest that the baseline for what is considered good performance on this dataset should be raised.

(2) The paper mentions the existence of other standard datasets (LabelMe [10], Peekaboom [12], StreetScenes [1], NORB [5], PASCAL [7]), many of which contain other forms of variability such as position, scale, and pose variation, occlusion, and multiple objects. But the authors do not mention that, unlike their “toy” model, most of the computer vision / bio-inspired algorithms they cite do address some of these issues as well, and have in fact been tested on more than one dataset. Thus, many of these algorithms should be capable of dealing fairly well with the “difficult” task of the paper’s Fig. 2, on which the authors’ algorithm – unsurprisingly – fails. Caltech 101 is one of the most popular datasets currently in use, but it is by no means the sole standard of success on the object recognition problem. See [9] for a recent review of current datasets and the types of variability contained in each.

In conclusion, researchers in computer vision are well aware of the need for invariance to position, scale, and pose, among other challenges in visual recognition. We wish to reassure PLoS readers that research on these topics is alive and well.


[1] Bileschi S (2006) StreetScenes: Towards scene understanding in still images. [Ph.D. Thesis]. Cambridge (Massachusetts): MIT EECS.

[2] Caltech 101 dataset (accessed 2008-02-17) Available: http://www.vision.caltech....

[3] Fei-Fei L, Fergus R, and Perona P (2004) Learning generative visual models from few
training examples: an incremental Bayesian approach tested on 101 object categories.
IEEE CVPR 2004, Workshop on Generative-Model Based Vision.

[4] Fergus R, Perona P, Zisserman A (2003) Object Class Recognition by Unsupervised
Scale-Invariant Learning. Proc. CVPR 1006: 264-271.

[5] LeCun Y, Huang FJ, and Bottou L (2004) Learning methods for generic object recognition with invariance to pose and lighting. IEEE CVPR 2004: 97–104.

[6] Murase H and Nayar SK (1995) Visual learning and recognition of 3-D objects from appearance. International Journal of Computer Vision, Vol. 14, pp. 5-24.

[7] PASCAL Object Recognition Database Collection, Visual Object Classes Challenge (accessed 2007-12-26) Available: http://www.pascal-network....

[8] Pinto N, Cox DD, and DiCarlo JJ (2008) Why is Real-World Visual Object Recognition Hard? PLoS Computational Biology, 4(1):e27.

[9] Ponce J, Berg TL, Everingham MR, Forsyth DA, Hebert M, Lazebnik S, Marszalek M, Schmid C, Russell BC, Torralba A, Williams CKI, Zhang J, and Zisserman A (2006) Dataset Issues in Object Recognition. In Toward Category-Level Object Recognition, eds. Ponce J, Hebert M, Schmid C, and Zisserman A, LNCS 4170, Springer-Verlag, pp 29-48.

[10] Russell B, Torralba A, Murphy K, and Freeman WT (2005) LabelMe: a database and web-based tool for image annotation. Cambridge (Massachusetts): MIT Artificial Intelligence Lab Memo AIM-2005–025.

[11] Ullman S (1996) High-Level Vision: Object Recognition and Visual Cognition. MIT press.

[12] Von Ahn L, Liu R, and Blum M (2006) Peekaboom: a game for locating objects in images. ACM SIGCHI 2006: 55–64.

【Computer Vision】计算机视觉相关课程和书籍

Table of Contents BooksCoursesPapersSoftwareDatasetsTutorials and TalksResources for studentsBlog...
  • j_d_c
  • j_d_c
  • 2017年03月16日 09:22
  • 1457

【Matlab Computer Vision System ToolBox】学习笔记-1-点云配准流程 | 特征匹配

本系列博客将介绍Matlab中机器视觉工具箱的应用,更多内容见Matlab官方文档。 1. PointCloud Registration Workflow -点云配准流程 2. Bluran ...
  • kaspar1992
  • kaspar1992
  • 2017年02月02日 16:48
  • 3076

整理《Mastering OpenCV with Practical Computer Vision Projects》中第5章用SVM和神经网络进行车牌识别操作流程

ANPR(Automatic Number Plate Recognition) is divided in two main steps: plate detection and plate rec...
  • fengbingchun
  • fengbingchun
  • 2013年03月12日 15:15
  • 5245

计算机视觉Computer Vision-机器学习Machine Learning近年部分综述

计算机视觉和机器学习领域 近两年部分综述文章,欢迎推荐其他的文章,不定期更新。   【2015】   [1].    E.Sariyanidi, H. Gunes, A. Ca...
  • GarfieldEr007
  • GarfieldEr007
  • 2016年03月12日 16:09
  • 1290

七步带你认识计算机视觉(Computer Vision)

七步带你认识计算机视觉(Computer Vision)
  • zhuquan945
  • zhuquan945
  • 2017年03月05日 21:34
  • 1032

计算机视觉(ComputerVision, CV)相关领域的网站链接

http://blog.sina.com.cn/s/blog_6bfa03cf0101hqy2.html  转载于 以下链接是转载的关于计算机视觉(ComputerVision, CV)相关...
  • u014114990
  • u014114990
  • 2016年01月26日 14:19
  • 1068

CALTECH 101(加利福尼亚理工学院101类图像数据库)

CALTECH 101(加利福尼亚理工学院101类图像数据库)的简介及相关论文链接。
  • Solomon1558
  • Solomon1558
  • 2015年04月11日 17:28
  • 4660

Batch Normalization论文翻译——中文版

Batch Normalization论文翻译——中文版
  • Quincuntial
  • Quincuntial
  • 2017年09月28日 16:01
  • 1219

【Matlab Computer Vision System ToolBox】学习笔记-3 -点云配准 | 噪音去除 | 降采样

本系列博客将介绍Matlab中机器视觉工具箱的应用,部分案例,主要关于点云处理方面,更多内容见Matlab官方文档。如有翻译错误请批评指正!所有代码经自己运行测试通过。转载请注明链接 :http://...
  • kaspar1992
  • kaspar1992
  • 2017年02月05日 17:01
  • 2757

《Deep Learning》译文 第一章 前言(下) 神经网络的变迁与称谓的更迭

1.2.2不断增大的数据库           人们可能会有这样的疑问,既然第一个人工神经网络的实验早在20世纪50年代就被实施了,为什么只是在最近深度学习才被视为一个重要的技术呢?这是因为虽然深度学...
  • kai940325
  • kai940325
  • 2016年04月29日 16:15
  • 4415
您举报文章:Object Recognition, Computer Vision, and the Caltech 101: A Response to Pinto et al.