Some Concepts
- The hierarcy of concepts allows the computer to learn complicated concepts by building them out of simpler ones. --> deep learning
- A computer can reason about statements in these formal languages automatically using logical inference rules. --> knowledge base
- AI system need the ability to acquire their own knowledge, by extracting patterns from raw data. --> machine learning --> representation of the data ++(feature)++have an enormous effect on the performance of ML
- eg1.logistic regression
- eg2.naive Bayes
- representation learning(by ML) --> separate the factors of variation that explain the observed data --> solved by DL
- eg1.autoencoder: the combination of an encoder function (converts the input data into a different representation) and a decoder function (converts the new representation back into the orginal format)
- Deep learning
- eg1.feedforward deep network
- eg2.multilayer perceptron
- two perspective:
- learning the right representation for data
- depth allows the computer to learn a multi-step computer program
- measuring the depth of a model
- the number of sequential instructions
- not the depth of the computational graph but the depth of the graph(usually used in deep probabilistic)
summerise
Challenges
- how to get informal knowledge(knowledge about world) into a computer
- many of the factors of variantion influence every single piece of data we observe
Organize of the book
Historical Trends in Deep Learning
- The 1940s, Deep learning appare to be new.
- Known as cybernetics in the 1940s-1960s.
- Known as connectionism in the 1980s-1990s.
- Known as Deep learning in 2006.
- The neural perspective on DL:
- the brain provides a proof by eaxmple that intelligent behavior is possible, and a conceptually straightforward path to building intelligence is to reverse engineer the computational principles behind the brain and duplicate its functionality.
- it would be deeply interesting to understand the brain and the principles that underlie human intelligence.
- a more general principle of learning multiple levels of composition.
- the earliest predecessors is simple linear models motivated from a neuroscientific perspective
- hand-controlled weight for classifer
- In the 1950s, the perceptron became the first model that could learn the weights defining the categories given examples of inputs from each category.
- adaptive linear element (ADALINE) ++ proposed the same time ++
- the training algorithm for ADALINE is stochastic gradient descent (SGD)
- perceptron and ADALINE are linear models.Cannot learn XOR function
- Diminished role of neuroscience --> we cannot have enough information about the brain to use it as a guide.
- Neocognitron (1980) is the basis of mordern convolutional network (1998).most NN based on a model neuron called the rectified linear unit.
- Cognitron (1975)
- viewpoint
- Nair and Hinton (2010) and Glorot (2011a) --> neuroscience
- Jarrett (2009) --> engineering-oriented
- connectionism or parallel distributed processing (1986 and 1995)
- the central idea : a large number of simple computational units can achieve intelligent behavior when networked together.
- distributed representation (1986)
- successful use of back-propagation to train deep neural network with internal representations and the popularization of the back-propagation algorithm (1986a and 1987)
- some of fundamental mathematical difficulties in modeling long sequences are identified (1991 and 1994)
- the long short-term memory or LSTM network to solve above difficulties (1997)
- Kernel machines (1992,1995 and 1999) and Graphical models (1998) become popluar
- In 1998b and 2001, Canadian Institute for Advanced Research (CIFAR) keep NN research alive.
- In 2006, Deep Belief Network can be trained using a strategy called greedy layer-wise pretraining.
- greedy layer-wise pretraining is used to train many kinds of deep network (2007)
- deep learning forcus the depth (2007,2011,2014a and 2014)
Increasing Dataset Sizes
- 1950s, first experiment of ANN conducted; 1990s, used in commerical applications
Increasing Model Sizes
Increasing Accuracy, Complexity and Real-World Impact
- 1986a, earlist deep models for individual objects in tightly cropped, extremely small images.
- 2012, modern object recognition networks with high-resolution photographs and uncropped photos.–>error from 26.1% to 15.3% --> down to 3.6%
- 2010,2010b,2011 and 2012a, error rate of peech recongnition have a sudden drop with DL
- 2013, DL have successes for pedestrian detection and image segmentation
- 2012, DL have superhuman performance in traffic sign classification.
- 2014d,NN can output an entire sequence of characters transcribed from an image.
- 2013, need labeling of the individual elements of the sequence.
- 2014 and 2015, RNN–>machine translation
- 2015,extension of DL is reinforcement learning.
- more other application such as medicince(2014)…