Lecture 1 Introduction to Deep Learning
1. Definition of Computer Vision and Deep Learning
Computer Vision: Building artificial systems that process, perceive, and reason about visual data.
Deep Learning: Hierarchical learning algorithms with many “layers”, (very) loosely inspired by the brain.
2. Brief History of Computer Vision
Early Exploration of the Visual Cortex, Hubel and Wiesel, 1959
Hubel and Wiesel first observed that neurons were sensitive to moving marginal stimuli in the cat visual cortex experiment in 1958, and defined simple and complex cells and discovered the visual functional column structure. This research is considered the beginning of computer vision for two reasons:
- This was the first to emphasize the edge of orientation, which was widely used in later computer vision architectures.
- They found the information processing of cat’s visual cortex is a hierarchical manner. Fisrtly, simple cells responsd to light orientation, and then through neuron transmission, more and more complex cells can respond to more and more complex abstract information.
Stage of Visual Representation, David Marr,1970s
Marr thought that computer vision is the use of effective symbols to describe images of the external world. Its core is to deduce the external world structure from the image structure. Vision begins with images, goes through a series of processing and transformation and finally reaches the recognition of the external reality world.
- Primal Sketch (2-D sketch): The primitive sketch is obtanied from the input image. It refers to the location where the image intensity changes dramatically and its geometric distribution and organizational structure.
- 2.5 D Sketch: It refers to the normal direction, approximate depth, and discontinuous contours of visible surfaces in an observer- centered coordinate system.
- 3-D model representation: It refers to the spatial organization form of shapes described by using hierarchical representations in terms of surface and volumetric primitives in the object-centered coordinate system.
Recognition via Edge Detection, John Canny, David Lowe, 1980s
Canny algorithm is a classical algorithm of edge detection, which was proposed by John F. Canny in 1986. In 1987, David Lowe proposed a more complex corresponding edge detection theory. The specific steps are as follows:
- Gaussian blur
- Calculate the gradient size and direction
- Non-maximization suppression
- Double threshold to separate strong edge and weak edge
- Connect weak edges
Recognition via Matching , David Lowe, 1999
David Lowe proposed a different approach to identification through matching (SIFT) in 1999. His idea was to identify some kind of feature vector through the key points in the image. The feature vector is an appearance real - valued vector encoded in a certain way. Therefore, invariance of different images can be encoded into the feature vector. Even if the basic image has slight changes (such as brightness change, rotation, shooting from different angles), the feature vector can still be used for image recognition through matching.
Large Scale Visual Recognition Challenge started from 2010 and AlexNet: Deep Learning Goes Mainstream ,Krizhevsky, 2012
In 2012, AlexNet greatly reduced the error rate of image recognition, making people realize that deep learning will become the mainstream of computer vision research.
3. Brief History of Deep Learning
Perceptron, Frank Rosenblatt, 1958
Rosenblatt proposed a one of the earliest algorithms that could learn from data, which could learn to recognize letters of the alphabet. Today we recognize it as linear classifier.
Backprop, Rumelhart, Hinton, 1985
Rumelhart proposed a backpropagation algorithms for computing gradients in neural network and successfully trained perceptrons with multiple layers.
Convolutional networks: LeNet, LeCun et al, 1998
It applied backprop algorithm to a Neocognitron-like architecture, which can learn to recognize handwritten digits and was deployed in a commercial system by NEC, processed handwritten check.
ConvNets are everywhere, 2012 to Present
Nowadays convolutional networks are widely used in computer vision:
- Image Classification
- Image Retrieval
- Object Detection
- Video Classification
- Pose Recognition
4. Summary
In 2012, Deep Learning opened the door of computer vision and gave a very bright performance. If we want to explore what happened in that year, while let’s take a 50-year perspective. In my opinion, it attribute to the algorithms, data, computing have evolved over the last 50 years. It is the improvement of algorithms, the massive increase of data and the development of GPUs that enable new applications represented by convolutional networks to magically change the world.