Object Recognition, Computer Vision, and the Caltech 101: A Response to Pinto et al.

Object Recognition, Computer Vision, and the Caltech 101: A Response to Pinto et al.


Yann LeCun, David G. Lowe, Jitendra Malik, Jim Mutch, Pietro Perona, and Tomaso Poggio


Readers of the recent paper “Why is Real-World Visual Object Recognition Hard?” [8] who are unfamiliar with the literature on computer vision are likely to come away with the impression that the problem of making visual recognition invariant with respect to position, scale, and pose has been overlooked. We would therefore like to clarify two main points.

(1) The paper criticizes the popular Caltech 101 benchmark dataset for not containing images of objects at a variety of positions, scales, and poses. It is true that Caltech 101 does not test these kinds of variability; however, this omission is intentional. Techniques for addressing these issues were the focus of much work in the 1980s [11]. For example, datasets like that of Murase and Nayar [6] focused on the problem of recognizing specific objects from a variety of 3d poses, but did not address the issue of object categories and the attendant intra-category variation in shape and texture. Pinto et al.’s synthetic dataset is in much the same spirit as Murase and Nayar’s. Caltech 101 was created to test a system [4,3] that was already position, scale, and pose invariant, with the goal of focusing on the more difficult problem of categorization. Its lack of position, scale, and pose variation is stated explicitly on the Caltech 101 website [2], where the dataset is available for download, and is often explicitly restated in later papers that use the dataset (including three of the five cited in Fig. 1). This is not to say that Caltech 101 is without problems. For example, as the authors state, correlation of object classes and backgrounds is a concern, and the relative success of their “toy” model does seem to suggest that the baseline for what is considered good performance on this dataset should be raised.

(2) The paper mentions the existence of other standard datasets (LabelMe [10], Peekaboom [12], StreetScenes [1], NORB [5], PASCAL [7]), many of which contain other forms of variability such as position, scale, and pose variation, occlusion, and multiple objects. But the authors do not mention that, unlike their “toy” model, most of the computer vision / bio-inspired algorithms they cite do address some of these issues as well, and have in fact been tested on more than one dataset. Thus, many of these algorithms should be capable of dealing fairly well with the “difficult” task of the paper’s Fig. 2, on which the authors’ algorithm – unsurprisingly – fails. Caltech 101 is one of the most popular datasets currently in use, but it is by no means the sole standard of success on the object recognition problem. See [9] for a recent review of current datasets and the types of variability contained in each.

In conclusion, researchers in computer vision are well aware of the need for invariance to position, scale, and pose, among other challenges in visual recognition. We wish to reassure PLoS readers that research on these topics is alive and well.


References

[1] Bileschi S (2006) StreetScenes: Towards scene understanding in still images. [Ph.D. Thesis]. Cambridge (Massachusetts): MIT EECS.

[2] Caltech 101 dataset (accessed 2008-02-17) Available: http://www.vision.caltech....

[3] Fei-Fei L, Fergus R, and Perona P (2004) Learning generative visual models from few
training examples: an incremental Bayesian approach tested on 101 object categories.
IEEE CVPR 2004, Workshop on Generative-Model Based Vision.

[4] Fergus R, Perona P, Zisserman A (2003) Object Class Recognition by Unsupervised
Scale-Invariant Learning. Proc. CVPR 1006: 264-271.

[5] LeCun Y, Huang FJ, and Bottou L (2004) Learning methods for generic object recognition with invariance to pose and lighting. IEEE CVPR 2004: 97–104.

[6] Murase H and Nayar SK (1995) Visual learning and recognition of 3-D objects from appearance. International Journal of Computer Vision, Vol. 14, pp. 5-24.

[7] PASCAL Object Recognition Database Collection, Visual Object Classes Challenge (accessed 2007-12-26) Available: http://www.pascal-network....

[8] Pinto N, Cox DD, and DiCarlo JJ (2008) Why is Real-World Visual Object Recognition Hard? PLoS Computational Biology, 4(1):e27.

[9] Ponce J, Berg TL, Everingham MR, Forsyth DA, Hebert M, Lazebnik S, Marszalek M, Schmid C, Russell BC, Torralba A, Williams CKI, Zhang J, and Zisserman A (2006) Dataset Issues in Object Recognition. In Toward Category-Level Object Recognition, eds. Ponce J, Hebert M, Schmid C, and Zisserman A, LNCS 4170, Springer-Verlag, pp 29-48.

[10] Russell B, Torralba A, Murphy K, and Freeman WT (2005) LabelMe: a database and web-based tool for image annotation. Cambridge (Massachusetts): MIT Artificial Intelligence Lab Memo AIM-2005–025.

[11] Ullman S (1996) High-Level Vision: Object Recognition and Visual Cognition. MIT press.

[12] Von Ahn L, Liu R, and Blum M (2006) Peekaboom: a game for locating objects in images. ACM SIGCHI 2006: 55–64.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
智慧校园的建设目标是通过数据整合、全面共享,实现校园内教学、科研、管理、服务流程的数字化、信息化、智能化和多媒体化,以提高资源利用率和管理效率,确保校园安全。 智慧校园的建设思路包括构建统一支撑平台、建立完善管理体系、大数据辅助决策和建设校园智慧环境。通过云架构的数据中心与智慧的学习、办公环境,实现日常教学活动、资源建设情况、学业水平情况的全面统计和分析,为决策提供辅助。此外,智慧校园还涵盖了多媒体教学、智慧录播、电子图书馆、VR教室等多种教学模式,以及校园网络、智慧班牌、校园广播等教务管理功能,旨在提升教学品质和管理水平。 智慧校园的详细方案设计进一步细化了教学、教务、安防和运维等多个方面的应用。例如,在智慧教学领域,通过多媒体教学、智慧录播、电子图书馆等技术,实现教学资源的共享和教学模式的创新。在智慧教务方面,校园网络、考场监控、智慧班牌等系统为校园管理提供了便捷和高效。智慧安防系统包括视频监控、一键报警、阳光厨房等,确保校园安全。智慧运维则通过综合管理平台、设备管理、能效管理和资产管理,实现校园设施的智能化管理。 智慧校园的优势和价值体现在个性化互动的智慧教学、协同高效的校园管理、无处不在的校园学习、全面感知的校园环境和轻松便捷的校园生活等方面。通过智慧校园的建设,可以促进教育资源的均衡化,提高教育质量和管理效率,同时保障校园安全和提升师生的学习体验。 总之,智慧校园解决方案通过整合现代信息技术,如云计算、大数据、物联网和人工智能,为教育行业带来了革命性的变革。它不仅提高了教育的质量和效率,还为师生创造了一个更加安全、便捷和富有智慧的学习与生活环境。
"Conference on Computer Vision and Pattern Recognition"(计算机视觉与模式识别会议)是一个重要的学术会议,是计算机视觉和模式识别领域的顶级会议之一。该会议由国际计算机科学家和工程师组成的学术界和工业界的专家们参与,并定期举办。这个会议提供了一个促进学术交流、展示研究成果和讨论最新领域进展的平台。 在这个会议上,参与者可以提交他们的研究论文、技术报告和实验结果,以展示他们在计算机视觉和模式识别方面的最新研究进展。评审委员会将选择高质量的论文和报告,并安排它们在会议期间进行展示和讨论。会议包括主题演讲、研讨会、技术展示以及学术交流会等活动。 这个会议为与会者提供了让他们与同行专家和业界人士建立联系、讨论和分享他们的研究成果的机会。在这些交流中,与会者可以从其他研究者的研究中获取新的思路和灵感,同时也可以获得评论和建议来改进自己的工作。此外,会议还可能面向工业界,提供计算机视觉和模式识别领域的最新商业应用和发展趋势。 参加"Conference on Computer Vision and Pattern Recognition"对于从事计算机视觉和模式识别研究的学者和工程师来说,是一个重要的机会。通过参与这个会议,他们可以展示他们的研究成果,拓宽他们的学术视野,扩大他们的合作网络,并与领域内其他研究者共同推动计算机视觉和模式识别领域的发展。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值