Deep Learning in Computer Vision

最新推荐文章于 2024-06-27 16:12:22 发布

凌风探梅

最新推荐文章于 2024-06-27 16:12:22 发布

阅读量3.9k

点赞数

分类专栏： DeepLearning

DeepLearning 专栏收录该内容

181 篇文章 4 订阅

订阅专栏

In recent years, Deep Learning has become a dominant Machine Learning tool for a wide variety of domains. One of its biggest successes has been in Computer Vision where the performance in problems such object and action recognition has been improved dramatically. In this course, we will be reading up on various Computer Vision problems, the state-of-the-art techniques involving different neural architectures and brainstorming about promising new directions.

Please sign up here in the beginning of class.

This class is a graduate seminar course in computer vision. The class will cover a diverse set of topics in Computer Vision and various Neural Network architectures. It will be an interactive course where we will discuss interesting topics on demand and latest research buzz. The goal of the class is to learn about different domains of vision, understand, identify and analyze the main challenges, what works and what doesn't, as well as to identify interesting new directions for future research.

Prerequisites: Courses in computer vision and/or machine learning (e.g., CSC320, CSC420, CSC411) are highly recommended (otherwise you will need some additional reading), and basic programming skills are required for projects.

Time and Location

Winter 2016

Day: Tuesday
Time: 9am-11am
Room: ES B149 (Earth Science Building at 5 Bancroft Avenue)

Instructor

Sanja Fidler

Email: fidler@cs dot toronto dot edu
Homepage: http://www.cs.toronto.edu/~fidler
Office hours: by appointment (send email)

When emailing me, please put CSC2523 in the subject line.

Forum

This class uses piazza. On this webpage, we will post announcements and assignments. The students will also be able to postquestions and discussions in a forum style manner, either to their instructors or to their peers.

We will have an invited speaker for this course:

Raquel Urtasun
Assistant Professor, University of Toronto
Talk title: Deep Structured Models

as well as several invited lectures / tutorials:

Yuri Burda, Postdoctoral Fellow, University of Toronto: Lecture on Variational Autoencoders
Ryan Kiros, PhD student, University of Toronto: Lecture on Recurrent Neural Networks and Neural Language Models
Jimmy Ba, PhD student, University of Toronto: Lecture on Neural Programming
Yukun Zhu, Msc student, University of Toronto: Lecture on Convolutional Neural Networks
Elman Mansimov, Research Assistant, University of Toronto: Lecture on Image Generation with Neural Networks
Emilio Parisotto, Msc student, University of Toronto: Lecture on Deep Reinforcement Learning
Renjie Liao, PhD student, University of Toronto: Lecture on Highway and Residual Networks

Each student will need to write two paper reviews each week, present once or twice in class (depending on enrollment), participate in class discussions, and complete a project (done individually or in pairs).

Grading

The final grade will consist of the following
`Participation` (attendance, participation in discussions, reviews)	15%
`Presentation` (presentation of papers in class)	25%
`Project` (proposal, final report)	60%

Detailed Requirements (click to Expand / Collapse)

The first class will present a short overview of neural network architectures, however, the details will be covered when reading on particular topics. Readings will touch on a diverse set of topics in Computer Vision. The course will be interactive -- we will add interesting topics on demand and latest research buzz.

Tentative Syllabus (click to Expand / Collapse)

Schedule

Date	Topic	Reading / Material	Speaker	Slides
Jan 12	Admin & Introduction(s)		Sanja Fidler	admin
Convolutional Neural Networks
Jan 19	Convolutional Neural Nets(tutorial)	Resources: Stanford's cs231 class, VGG's Practical CNNTutorial Code: CNN Tutorial for TensorFlow, Tutorial for caffe, CNNTutorial for Theano	Yukun Zhu (invited)	[pdf] [code]
	Image Segmentation	Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs [PDF] [code] L-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L Yuille	Shenlong Wang	[pdf] [code]
Jan 26	Very Deep Networks	Highway Networks [PDF] [code] Rupesh Kumar Srivastava, Klaus Greff, Jurgen Schmidhuber Deep Residual Learning for Image Recognition [PDF] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun	Renjie Liao (invited)	[pdf]
	Object Detection	Rich feature hierarchies for accurate object detection and semantic segmentation [PDF] [code] Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [PDF] [code (Matlab)] [code (Python)] Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun	Kaustav Kundu	[pdf]
Feb 2	Stereo Siamese Networks	Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches [PDF] [code] Jure Žbontar, Yann LeCun Learning to Compare Image Patches via Convolutional Neural Networks [PDF] [code] Sergey Zagoruyko, Nikos Komodakis	Wenjie Luo	[pdf]
	Depth from Single Image	Designing Deep Networks for Surface Normal Estimation [PDF] Xiaolong Wang, David Fouhey, Abhinav Gupta	Mian Wei	[pptx] [pdf]
Feb 9	Image Generation	Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks [PDF] Alec Radford, Luke Metz, Soumith Chintala Generating Images from Captions with Attention [PDF] Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov	Elman Mansimov (invited)	[pdf]
	Domain Adaptation, Zero-shot Learning	Simultaneous Deep Transfer Across Domains and Tasks [PDF] Eric Tzeng, Judy Hoffman, Trevor Darrell Predicting Deep Zero-Shot Convolutional Neural Networks using Textual Descriptions [PDF] Jimmy Ba, Kevin Swersky, Sanja Fidler, Ruslan Salakhutdinov	Lluis Castrejon	[pdf]
Recurrent Neural Networks
Feb 23	RNNs and Neural Language Models	Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models [PDF] [code] Ryan Kiros, Ruslan Salakhutdinov, Richard Zemel Skip-Thought Vectors [PDF] [code] Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler	Jamie Kiros (invited)
Mar 1	Modeling Words	Efficient Estimation of Word Representations in Vector Space [PDF] [code] Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean	Eleni Triantafillou	[pdf]
	Describing Videos	Sequence to Sequence -- Video to Text [PDF] Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko	Erin Grant	[pdf]
	Image-based QA	Ask Your Neurons: A Neural-based Approach to Answering Questions about Images [PDF] Mateusz Malinowski, Marcus Rohrbach, Mario Fritz	Yunpeng Li	[pdf]
Mar 8	Variational Autoencoders	Auto-Encoding Variational Bayes [PDF] Diederik P Kingma, Max Welling Tutorial: Bayesian Reasoning and Deep Learning [PDF] Shakir Mohamed	Yura Burda (invited)	[pdf]
	Text-based QA	End-To-End Memory Networks [PDF] Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus	Marina Samuel	[pdf]
	Neural Reasoning	Recursive Neural Networks Can Learn Logical Semantics [PDF] Samuel R. Bowman, Christopher Potts, Christopher D. Manning	Rodrigo Toro Icarte
Mar 15	Neural Programming		Jimmy Ba (invited)
	Conversation Models	A Neural Conversational Model [PDF] Oriol Vinyals, Quoc Le	Caner Berkay Antmen
	Sentiment Analysis	Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank [PDF] Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng and Christopher Potts	Zhicong Lu
	Video Representations	Unsupervised Learning of Video Representations using LSTMs [PDF] Nitish Srivastava, Elman Mansimov, Ruslan Salakhutdinov	Kamyar Ghasemipour
	Visual Attention	Recurrent Models of Visual Attention [PDF] Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu	Matthew Shepherd
	Direction Following (Robotics)	Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences [PDF] Hongyuan Mei, Mohit Bansal, Matthew R. Walter	Alan Yusheng Wu

Tutorials, related courses:

Introduction to Neural Networks, CSC321 course at University of Toronto
Course on Convolutional Neural Networks, CS231n course at Stanford University
Course on Probabilistic Graphical Models, CSC412 course at University of Toronto, advanced machine learning course

Software:

Caffe: Deep learning for image classification
Tensorflow: Open Source Software Library for Machine Intelligence (good software for deep learning)
Theano: Deep learning library
mxnet: Deep Learning library
Torch: Scientific computing framework with wide support for machine learning algorithms
LIBSVM: A Library for Support Vector Machines (Matlab, Python)
scikit: Machine learning in Python

Popular datasets:

ImageNet: Large-scale object dataset
Microsoft Coco: Large-scale image recognition, segmentation, and captioning dataset
Mnist: handwritten digits
PASCAL VOC: Object recognition dataset
KITTI: Autonomous driving dataset
NYUv2: Indoor RGB-D dataset
LSUN: Large-scale Scene Understanding challenge
VQA: Visual question answering dataset
Madlibs: Visual Madlibs (question answering)
Flickr30K: Image captioning dataset
Flickr30K Entities: Flick30K with phrase-to-region correspondences
MovieDescription: a dataset for automatic description of movie clips
Action datasets: a list of action recognition datasets
MPI Sintel Dataset: optical flow dataset
BookCorpus: a corpus of 11,000 books

Online demos:

Lots of cool Toronto Deep Learning Demos: image classification and captioning demos
Lots of cool demos for ConvNets by Andrej Karpathy
Reinforcement Learning with Neural Nets (read paper for more info)
Places: scene classification with neural nets
CRF as RNN: Semantic Image Segmentation
drawNet: visualization of ConvNet activations
Visualization of ConvNets for digit classification
AI-painter: modify your photo in a certain style (eg, Van Gogh); uses neural nets as explained in this paper

Main conferences:

NIPS (Neural Information Processing Systems)
ICML (International Conference on Machine Learning)
ICLR (International Conference on Learning Representations)
AISTATS (International Conference on Artificial Intelligence and Statistics)
CVPR (IEEE Conference on Computer Vision and Pattern Recognition)
ICCV (International Conference on Computer Vision)
ECCV (European Conference on Computer Vision)
ACL (Association for Computational Linguistics)
EMNLP (Conference on Empirical Methods in Natural Language Processing)

凌风探梅

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
Deep Learning in Computer Vision

OverviewCourse InformationRequirementsSyllabusScheduleResourcesTopics in Computer Vision (CSC2523):Deep Learning in Computer VisionWinter 2016In recent years, Deep Le
复制链接

扫一扫