# 1. linear regression

use MLE to understand the loss function


# 2. logistic regression— binary classification

use MLE to understand the loss function

# 4. Neural Network

activation function: sigmod(0,1), tanh[-1,1], rectified linear(0,+inf)

forward propagation

backpropagation algorithm

# Supervised CNN

feature extraction by convolution

SGD
compared with the batch GD. just use a single training example or a small amount of examples called “minibatch”, ususally 256.

notice the the term “minibatch”, “epoch”(iteration over the whole data set), “shuffle”

One final but important point regarding SGD is the order in which we present the data to the algorithm. If the data is given in some meaningful order, this can bias the gradient and lead to poor convergence. Generally a good method to avoid this is to randomly shuffle the data prior to each epoch of training.

Momentum
If the objective has the form of a long shallow ravine leading to the optimum and steep walls on the sides, standard SGD will tend to oscillate across the narrow ravine since the negative gradient will point down one of the steep sides rather than along the ravine towards the optimum. The objectives of deep architectures have this form near local optima and thus standard SGD can lead to very slow convergence particularly after the initial steep gains. Momentum is one method for pushing the objective more quickly along the shallow ravine.

Note

filter / kernel: for example, it is the 8*8 patch for convolution

after convolution, we get feature map.

CNN consists of three parts.
normal fully connected NN layers in CNN.
subsampling layer is the pooling layer in CNN.
convolutional layer in CNN

Comparison

## unsupervised learning

autoencoder

feature extraction for unsupervised learning when we don’t have the trained labels.

## Statistical Language Modeling (SLM)

ref: http://homepages.inf.ed.ac.uk/lzhang10/slm.html
definition
p(w1,...,wT)$p(w_1, ..., w_T)$ the probability of a word sequence
probabilistic chain rule
p(w1,...,wT)=p(w1)Ti=2p(wi|w1,...,wi1)=p(w1)Ti=2p(wi|hi)$p(w_1,..., w_T) = p(w_1) \prod_{i=2}^T p(w_i|w_1, ..., w_{i-1})= p(w_1) \prod_{i=2}^Tp(w_i|h_i)$ where hi$h_i$ denotes the history of the ith word wi$w_i$

## RNN

definition
The idea behind RNNs is to make use of sequential information. In a traditional neural network we assume that all inputs (and outputs) are independent of each other. But for many tasks that’s a very bad idea.

i.e. RNN considers the dependency of the training samples. In other words, each training sample has somewhat dependency or sequential relationship

note
the ref (http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-2-implementing-a-language-model-rnn-with-python-numpy-and-theano/) wildml has provided a specific implementation example.
several issues should be noted:

1.在generating text的code中

next_word_probs = model.forward_propagation(new_sentence)

while sampled_word == word_to_index[unknown_token]:
samples = np.random.multinomial(1, next_word_probs[-1])
sampled_word = np.argmax(samples)

## LSTM

a type of RNN can capture a long dependency

## loss function

least mean square
cross-entropy loss, i.e. log loss function or logistic loss

## Backpropagation Alg.

computational graph
forward-mode differentiation，一次只能计算output对其中一个input的偏导数，即如果在computational graph 中input很多，那么这种方法计算偏导数就会很慢。求从其中一条input到output所有的path的

AllPathNodeb

reverse-mode differentiation 更快，可以一次性计算 output对所有的node的偏导数，就是backpropagation. ZAllNode$\frac{\partial Z}{\partial All Node}$

#### 深度学习的编程模式比较

2016-09-13 10:12:08

#### 深度学习torch之三（神经网络的前向传播和反向传播以及损失函数的基本操作）

2018-01-28 14:40:45

#### Andrew Ng 深度学习课程Deeplearning.ai 编程作业——forward and backward propagation（1-4.1）

2017-10-31 17:55:50

#### torch 的 forward 和 backward

2018-01-01 11:31:17

#### CS231n笔记5--Weights Update 与 Dropout

2016-05-14 21:09:27

#### Caffe学习：Forward and Backward

2017-08-15 08:43:03

#### caffe学习笔记2：net forward与backward

2018-04-11 12:13:38

#### 深度学习 16. 反向传递算法最简单的理解与提高，BP算法，Backpropagation, 自己的心得。

2017-05-04 17:18:46

#### caffe中backward过程总结

2016-05-27 11:14:20

#### DeepLearning学习笔记（一）：Logistic 回归

2017-09-13 21:43:33