UCB Visual Object and Activity Recognition, Class CS 294-43

图像处理资源 专栏收录该内容
7 篇文章 0 订阅

UCB Visual Object and Activity Recognition, Class CS 294-43

Prof. Trevor Darrell, trevor@eecs.berkeley.edu, Spring 2011


This course will cover computer vision techniques for object and category recognition, as well as recognition of human activity from video streams.  Recognition of individual objects or activities (the coffee cup on your desk, a particular chair in your office, a video of you riding your bike) or generic categories (any cup, chair, or cycling event) is an essential capability for a variety of robotics and multimedia applications.  The advent of standardized datasets and evaluation regimes has spurred considerable innovation in this arena, with performance on benchmark evaluations increasing dramatically in recent years.  This course will review methods that have achieved success on such datasets, and will also consider the techniques needed for real-time interactive application on robots or mobile devices, e.g. domestic service robots or mobile phones that can retrieve information about objects in the environment based on visual observation.  This class will be based exclusively on readings from the recent literature, including those appearing at the CVPR, ICCV, and NIPS conferences.


The format of the course this year will primarily be discussion based, with each class beginning with a short overview of the topic by the instructor followed by detailed student-led presentations and structured critique of selected papers.  All students will be expected to actively discuss each paper each week.  Class size will be limited to those who have preregistered, or to 16 students, whichever is greater, to foster an environment conducive to discussion.


Each week will focus on a different subtopic of object and activity recognition, covering three to five different papers from the recent literature.  These papers will be presented jointly by two or three students, one acting as a primary presenter and the other student(s) as discussant.  Each student will be expected to act as presenter once and as discussant once during the term.  The presenting students will choose the papers from the list suggested for that subtopic, or they are welcome to suggest other papers. 


Students are expected to be involved in a related research project during the term, and be experimenting with a technique covered during the course.  (Graduate students who are not actively involved in a research project outside of the course can work on a class project specific for this course or joint with another course; undergraduates who are not actively involved in a related research project are not allowed in the course.)  Students will be expected to present their research progress during the term in a ten minute presentation in the last class.  Grades will be based entirely on in class presentations and participation.


This course will meet once a week, Friday 10-12noon, in the 7th floor conference room (Newton room) of Sutardja Dai Hall.


The first class will be jan 28th.  The introduction class which would have been scheduled jan 21st will happen virtually -- please contact the instructor if you are not already on the email list. 


Prerequisites: prior Computer Vision and Machine Learning courses, or permission of instructor. Advanced undergraduates allowed only with permission of instructor and if they are actively participating in a related research project.  Students should already be familiar with or be willing to learn on their own: basic image processing in MATLAB; Optic Flow; Edge Detection; Support Vector Machines;  Gaussian Mixture Models;  Hidden Markov Models, etc.; students must be able to read and understand at a basic level recent conference papers in the computer vision literature.

DRAFT Syllabus (class members please see google site for most up to date version):

January 28, 2011   Global Features    

Background readings:

1.         Oliva and A. Torralba, "Modeling the shape of the scene: A holistic representation of the spatial envelope," International Journal of Computer Vision, vol. 42, no. 3, pp. 145-175, May 2001. http://dx.doi.org/10.1023/A:1011139631724

2.         Efros, A. C. Berg, G. Mori, and J. Malik, "Recognizing action at a distance," ICCV 2003, pp. 726-733 vol.2. http://dx.doi.org/10.1109/ICCV.2003.1238420

3.         N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in CVPR '05: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005, pp. 886-893.


Contemporary readings:

4.         P. F. Felzenszwalb, R. B. Girshick, and D. McAllester, "Cascade Object Detection with Deformable Part Models", CVPR 2010.


5.         T. Deselaers and V. Ferrari, "Global and efficient self-similarity for object classification and detection", CVPR 2010.


February 4, 2011   Local Features

Background readings:

6.         D. G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, November 2004.


7.         T. Lindeberg, "Feature detection with automatic scale selection," International Journal of Computer Vision, vol. 30, no. 2, pp. 79-116, November 1998. http://dx.doi.org/10.1023/A:1008045108935

8.         J. Matas, O. Chum, U. Martin, and T. Pajdla, "Robust wide baseline stereo from maximally stable extremal regions," in Proceedings of British Machine Vision Conference, vol. 1, London, 2002, pp. 384-393. http://citeseer.ist.psu.edu/608213.html

9.         K. Mikolajczyk and C. Schmid, "Scale & affine invariant interest point detectors," Int. J. Comput. Vision, vol. 60, no. 1, pp. 63-86, October 2004.


10.     I. Laptev, "On space-time interest points," International Journal of Computer Vision, vol. 64, no. 2-3, pp. 107-123, September 2005. http://dx.doi.org/10.1007/s11263-005-1838-7

Contemporary readings:

11.     L. Bo, X. Ren, and D. Fox, "Kernel Descriptors for Visual Recognition", NIPS 2010, http://books.nips.cc/papers/files/nips23/NIPS2010_0821.pdf

12.     L. Bourdev, S. Maji, T. Brox, and J. Malik, "Detecting People Using Mutually Consistent Poselet Activations", ECCV 2010,


February 11, 2011          Bag-of-word and Correspondence Kernels         

Background readings:

13.     C. Dance, J. Willamowski, L. Fan, C. Bray, and G. Csurka, "Visual categorization with bags of keypoints," in ECCV International Workshop on Statistical Learning in Computer Vision, 2004. http://www.xrce.xerox.com/Publications/Attachments/2004%2D010/2004_010.pdf

14.     K. Grauman and T. Darrell, "The pyramid match kernel: discriminative classification with sets of image features," ICCV, vol. 2, 2005, pp. 1458-1465 Vol. 2. http://dx.doi.org/10.1109/ICCV.2005.239

15.     S. Lazebnik, C. Schmid, and J. Ponce, "Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories," CVPR, vol. 2, 2006, pp. 2169-2178. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1641019

Contemporary readings:

16.     S. Maji and A. C. Berg, "Max-margin additive classifiers for detection", ICCV 2009, http://dx.doi.org/10.1109/ICCV.2009.5459203

17.     A. Vedaldi and A. Zisserman, "Efficient Additive Kernels via Explicit Feature Maps", CVPR 2010, http://dx.doi.org/10.1109/CVPR.2010.5539949

18.     A. Kovashka and K. Grauman, "Learning a hierarchy of discriminative space-time neighborhood features for human action recognition", CVPR 2010, http://dx.doi.org/10.1109/CVPR.2010.5539881

February 18, 2011          Segmentation and Region Proposals

Background readings:

19.     J. Shotton, M. Johnson, and R. Cipolla, "Semantic texton forests for image categorization and segmentation," in Computer Vision and Pattern Recognition, 2008. CVPR 2008. http://dx.doi.org/10.1109/CVPR.2008.4587503

Contemporary readings:

20.     Y. Yang, S. Hallman, D. Ramanan, and C. Fowlkes, "Layered Object Detection for Multi-Class Segmentation", CVPR 2010, http://dx.doi.org/10.1109/CVPR.2010.5540070

21.     F. Li, J. Carreira and C. Sminchisescu, "Object Recognition as Ranking Holistic Figure-Ground Hypotheses", CVPR 2010, http://dx.doi.org/10.1109/CVPR.2010.5539839

22.     B. Alexe, T. Deselaers, V. Ferrari, "What is an object?", CVPR 2010, http://dx.doi.org/10.1109/CVPR.2010.5540226

23.     B. Packer, S. Gould, and D. Koller, "A Unified Contour-Pixel Model for Figure-Ground Segmentation", ECCV 2010, http://dx.doi.org/10.1007/978-3-642-15555-0_25

24.     I. Endres and D. Hoiem, "Category Independent Object Proposals", ECCV 2010, http://dx.doi.org/10.1007/978-3-642-15555-0_42

March 4, 2011        Descriptor Sparse Coding and Topic Models                 

Background reading:

25.     Olshausen B. and Field D. Sparse coding with an overcomplete basis set: A strategy employed by V1?. Vision Research (1997) vol. 37 (23) pp. 3311-3325 http://www.chaos.gwdg.de/~michael/CNS_course_2004/papers_max/OlshausenField1997.pdf

Contemporary readings:

26.     Raina et al. Self-taught learning: Transfer learning from unlabeled data. ICML (2007). http://dx.doi.org/10.1145/1273496.1273592

27.     Fritz M., Black M., Bradski G., Karayev S., Darrell T. An Additive Latent Feature Model for Transparent Object Recognition. NIPS (2009) http://books.nips.cc/papers/files/nips22/NIPS2009_0397.pdf

28.     Wang et al. Locality-constrained Linear Coding for Image Classification. CVPR (2010) http://dx.doi.org/10.1109/CVPR.2010.5540018

March 11, 2011      Hashing and Metric Learning       

Background readings:

29.     G. Shakhnarovich, P. Viola, and T. Darrell, "Fast pose estimation with parameter-sensitive hashing," ICCV 2003, http://dx.doi.org/10.1109/ICCV.2003.1238424

30.     A. Frome, Y. Singer, F. Sha, and J. Malik, "Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification", ICCV 2007, http://dx.doi.org/10.1109/ICCV.2007.4408839

Contemporary readings:

31.     P. Jain, B. Kulis, and K. Grauman, Fast Similarity Search for Learned Metrics, CVPR 2008/PAMI 2009, http://doi.ieeecomputersociety.org/10.1109/TPAMI.2009.151

32.     B. Kulis and T. Darrell, "Learning to Hash with Binary Reconstructive Embeddings", NIPS 2009, http://books.nips.cc/papers/files/nips22/NIPS2009_0971.pdf

March 18, 2011       Temporal Models          

Background readings:

33.     J. Niebles, H. Wang, and L. Fei-Fei, "Unsupervised learning of human action categories using spatial-temporal words," International Journal of Computer Vision. 79(3): 299-318. 2008 Available: http://dx.doi.org/10.1007/s11263-007-0122-4

Contemporary readings:

34.     K. Prabhakar, S. Oh, P. Wang, G. D. Abowd, J Rehg, "Temporal Causality for the Analysis of Visual Events", CVPR 2010, http://dx.doi.org/10.1109/CVPR.2010.5539871

35.     A. Yao, J. Gall, L. Van Gool, "A Hough Transform-Based Voting Framework for Action Recognition", CVPR 2010, http://dx.doi.org/10.1109/CVPR.2010.5539883

36.     J.C. Niebles, C. Chen, and L. Fei-Fei, "Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification", ECCV 2010, http://dx.doi.org/10.1007/978-3-642-15552-9_29

37.     D. Weinland1, M. Ozuysal and P. Fua, "Making Action Recognition Robust to Occlusions and Viewpoint Changes", ECCV 2010, http://dx.doi.org/10.1007/978-3-642-15558-1_46

38.     P. Matikainen, M. Hebert and R. Sukthankar, "Representing Pairwise Spatial and Temporal Relations for Action Recognition", ECCV 2010, http://dx.doi.org/10.1007/978-3-642-15549-9_37

39.     T. Lan, Y. Wang, W. Yang and G. Mori, "Beyond Actions: Discriminative Models for Contextual Group Activities", NIPS 2010, http://books.nips.cc/papers/files/nips23/NIPS2010_0115.pdf

April 1, 2011 Image and text models           

Background readings:

40.     K. Barnard and D. Forsyth, "Learning the Semantics of Words and Pictures," International Conference on Computer Vision, vol 2, pp. 408-415, 2001, http://doi.ieeecomputersociety.org/10.1109/ICCV.2001.937654

41.     D. Blei and M. Jordan, "Modeling Annotated Data", SIGIR '03 Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, http://dx.doi.org/10.1145/860435.860460

42.     T. Berg and D. Forsyth, "Animals on the Web", CVPR 2006, http://dx.doi.org/10.1109/CVPR.2006.57

Contemporary readings:

43.     Chong Wang, D. Blei, Fei-Fei Li, "Simultaneous image classification and annotation," CVPR 2009, http://doi.ieeecomputersociety.org/10.1109/CVPRW.2009.5206800

44.     K. Saenko and T. Darrell, “Filtering Abstract Senses From Image Search Results”, NIPS 2009, http://books.nips.cc/papers/files/nips22/NIPS2009_1143.pdf

45.     A. Farhadi, M. Hejrati , M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier and D. Forsyth, "Every Picture Tells a Story: Generating Sentences from Images", NIPS 2010, http://dx.doi.org/10.1007/978-3-642-15561-1_2

46.     B. Siddiquie and A. Gupta, "Beyond Active Noun Tagging: Modeling Contextual Interactions for Multi-Class Active Learning", CVPR 2010, http://dx.doi.org/10.1109/CVPR.2010.5540044

April 8, 2011 Crowd sourcing and Active Learning    

Background readings:

47.     L. von Ahn and L. Dabbish, "Labeling images with a computer game", SIGCHI 2004, http://dx.doi.org/10.1145/985692.985733

48.     A. Kapoor, K. Grauman, R. Urtasun, and T. Darrell, "Active Learning with Gaussian Processes for Object Categorization" ICCV 2007. http://doi.ieeecomputersociety.org/10.1109/ICCV.2007.4408844

Contemporary readings:

49.     J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. "ImageNet: A Large-Scale Hierarchical Image Database". In CVPR, 2009. http://doi.ieeecomputersociety.org/10.1109/CVPRW.2009.5206848

50.     S. Vijayanarasimhan, P. Jain, K. Grauman, "Far-sighted active learning on a budget for image and video recognition", CVPR 2010. http://dx.doi.org/10.1109/CVPR.2010.5540055

51.     P. Welinder, S. Branson, S. Belongie, P. Perona, "The Multidimensional Wisdom of Crowds", NIPS 2010. http://books.nips.cc/papers/files/nips23/NIPS2010_0577.pdf

52.     S. Branson, C. Wah, B. Babenko, F. Schroff, P. Welinder, P. Perona, S. Belongie, "Visual Recognition with Humans in the Loop", ECCV 2010. http://dx.doi.org/10.1007/978-3-642-15561-1_32   

April 15, 2011         Scene and Image Context      

Background readings:

53.     A. Torralba, K. P. Murphy, and W. T. Freeman, "Contextual models for object detection using boosted random fields," in Advances in Neural Information Processing Systems 17 (NIPS), 2005, pp. 1401-1408. http://dspace.mit.edu/handle/1721.1/6740

54.     D. Hoiem, A. A. Efros, and M. Hebert, "Putting objects in perspective," in Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 2, 2006, pp. 2137-2144. http://dx.doi.org/10.1109/CVPR.2006.232

55.     L.-J. Li and L. Fei-Fei, "What, where and who? classifying events by scene and object recognition," in Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, 2007, pp. 1-8. http://dx.doi.org/10.1109/ICCV.2007.4408872

Contemporary readings:

56.     S. Bao, M. Sun, S. Savarese, "Toward coherent object detection and scene layout understanding", CVPR 2010, http://dx.doi.org/10.1109/CVPR.2010.5540229

57.     B. Yao and L. Fei-Fei. "Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities.", CVPR 2010, http://dx.doi.org/10.1109/CVPR.2010.5540235

58.     A. Gupta, A. Efros and M. Hebert, "Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics". ECCV 2010, http://dx.doi.org/10.1007/978-3-642-15561-1_35 

April 22, 2011         Taxonomies and Sub-category Recognition                 

Background readings:

59.     A. Zweig and D. Weinshall, "Exploiting object hierarchy: Combining models from different category levels," in Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, 2007, pp. 1-8. Available:http://dx.doi.org/10.1109/ICCV.2007.4409064

60.     G. Griffin and P. Perona, "Learning and using taxonomies for fast visual categorization," in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, 2008, pp. 1-8. Available:http://dx.doi.org/10.1109/CVPR.2008.4587410

61.     J. Sivic, B. C. Russell, A. Zisserman, W. T. Freeman, and A. A. Efros, "Unsupervised discovery of visual object class hierarchies," in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, 2008, pp. 1-8. Available: http://dx.doi.org/10.1109/CVPR.2008.4587622

Contemporary readings:

62.     L.-J. Li, C. Wang, Y. Lim, D. Blei and L. Fei-Fei. "Building and Using a Semantivisual Image Hierarchy", CVPR 2010, http://dx.doi.org/10.1109/CVPR.2010.5540027

63.     M. Rohrbach, M. Stark, G. Szarvas, I. Gurevych, and B. Schiele, "What helps where – and why? Semantic relatedness for knowledge transfer", CVPR 2010, http://dx.doi.org/10.1109/CVPR.2010.5540121

April 29, 2011         Domain Adaptation

64.     K. Saenko, B. Kulis, M. Fritz, and T. Darrell, "Adapting Visual Category Models to New Domains", ECCV 2010, http://dx.doi.org/10.1109/10.1007/978-3-642-15561-1_16

65.     A. Bergamo and L. Torresani, "Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach", NIPS 2010, http://books.nips.cc/papers/files/nips23/NIPS2010_0093.pdf

66.     L. Cao, Z. Liu, T. Huang, "Cross-dataset action detection", CVPR 2010, http://dx.doi.org/10.1109/CVPR.2010.5539875  




  • 0
  • 0
  • 0
  • 一键三连
  • 扫一扫,分享海报

©️2021 CSDN 皮肤主题: 大白 设计师:CSDN官方博客 返回首页
钱包余额 0