# 1. linear regression

use MLE to understand the loss function


# 2. logistic regression— binary classification

use MLE to understand the loss function

# 4. Neural Network

activation function: sigmod(0,1), tanh[-1,1], rectified linear(0,+inf)

forward propagation

backpropagation algorithm

# Supervised CNN

feature extraction by convolution

SGD
compared with the batch GD. just use a single training example or a small amount of examples called “minibatch”, ususally 256.

notice the the term “minibatch”, “epoch”(iteration over the whole data set), “shuffle”

One final but important point regarding SGD is the order in which we present the data to the algorithm. If the data is given in some meaningful order, this can bias the gradient and lead to poor convergence. Generally a good method to avoid this is to randomly shuffle the data prior to each epoch of training.

Momentum
If the objective has the form of a long shallow ravine leading to the optimum and steep walls on the sides, standard SGD will tend to oscillate across the narrow ravine since the negative gradient will point down one of the steep sides rather than along the ravine towards the optimum. The objectives of deep architectures have this form near local optima and thus standard SGD can lead to very slow convergence particularly after the initial steep gains. Momentum is one method for pushing the objective more quickly along the shallow ravine.

Note

filter / kernel: for example, it is the 8*8 patch for convolution

after convolution, we get feature map.

CNN consists of three parts.
normal fully connected NN layers in CNN.
subsampling layer is the pooling layer in CNN.
convolutional layer in CNN

Comparison

## unsupervised learning

autoencoder

feature extraction for unsupervised learning when we don’t have the trained labels.

## Statistical Language Modeling (SLM)

ref: http://homepages.inf.ed.ac.uk/lzhang10/slm.html
definition
p(w1,...,wT)$p(w_1, ..., w_T)$ the probability of a word sequence
probabilistic chain rule
p(w1,...,wT)=p(w1)Ti=2p(wi|w1,...,wi1)=p(w1)Ti=2p(wi|hi)$p(w_1,..., w_T) = p(w_1) \prod_{i=2}^T p(w_i|w_1, ..., w_{i-1})= p(w_1) \prod_{i=2}^Tp(w_i|h_i)$ where hi$h_i$ denotes the history of the ith word wi$w_i$

## RNN

definition
The idea behind RNNs is to make use of sequential information. In a traditional neural network we assume that all inputs (and outputs) are independent of each other. But for many tasks that’s a very bad idea.

i.e. RNN considers the dependency of the training samples. In other words, each training sample has somewhat dependency or sequential relationship

note
the ref (http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-2-implementing-a-language-model-rnn-with-python-numpy-and-theano/) wildml has provided a specific implementation example.
several issues should be noted:

1.在generating text的code中

next_word_probs = model.forward_propagation(new_sentence)

while sampled_word == word_to_index[unknown_token]:
samples = np.random.multinomial(1, next_word_probs[-1])
sampled_word = np.argmax(samples)

## LSTM

a type of RNN can capture a long dependency

## loss function

least mean square
cross-entropy loss, i.e. log loss function or logistic loss

## Backpropagation Alg.

computational graph
forward-mode differentiation，一次只能计算output对其中一个input的偏导数，即如果在computational graph 中input很多，那么这种方法计算偏导数就会很慢。求从其中一条input到output所有的path的

AllPathNodeb

reverse-mode differentiation 更快，可以一次性计算 output对所有的node的偏导数，就是backpropagation. ZAllNode$\frac{\partial Z}{\partial All Node}$

• 本文已收录于以下专栏：

## Deep Learning（深度学习）学习笔记整理系列之（七）

Deep Learning（深度学习）学习笔记整理系列 zouxy09@qq.com http://blog.csdn.net/zouxy09 作者：Zouxy version 1.0 201...
• zouxy09
• 2013年04月10日 10:48
• 523250

## Deep Learning（深度学习）学习笔记整理系列之（一）

Deep Learning（深度学习）学习笔记整理系列 zouxy09@qq.com http://blog.csdn.net/zouxy09 作者：Zouxy version 1.0  20...
• zouxy09
• 2013年04月08日 23:35
• 784876

## Deep Learning（深度学习）学习笔记整理

http://blog.csdn.net/zouxy09/article/details/8775360 一、概述        Artificial Intelligence，也就是人工智能，就像长...
• bluejoe2000
• 2016年03月14日 12:10
• 5419

## Deep Learning 优化小结

Hinton 06 年的 A fast learning algorithm for deep belief nets首先提出了 pre-training + fine-tuning的优化 deep ...
• LiFeitengup
• 2013年08月23日 11:50
• 6500

## 【面向代码】学习 Deep Learning（一）Neural Network

========================================================================================== 最近一直在看Dee...
• Dark_Scope
• 2013年07月23日 16:24
• 88336

## Neural Networks and Deep Learning 学习笔记(十二)

Problem 2Let’s verify that the data still looks good. Displaying a sample of the labels and images f...
• lmw21848
• 2016年08月03日 11:23
• 393

## Deep Learning 学习笔记 第一篇

• Selina013
• 2016年11月17日 14:48
• 296

## Deep Learning Resources

ImageNet AlexNet ImageNet Classification with Deep Convolutional Neural Networks nips-pag...
• qq_26898461
• 2016年03月14日 15:54
• 3886

## Deeplearning原文作者Hinton代码注解

Matlab示例代码为两部分，分别对应不同的论文： 1. Reducing the Dimensionality of data with neural networks 　　ministdeepa...
• zjxiaolu
• 2015年04月24日 17:38
• 1247

## Neural Networks and Deep Learning学习笔记ch1 - 神经网络

• yc461515457
• 2016年01月11日 17:00
• 5203

举报原因： 您举报文章：deep learning 学习笔记 色情 政治 抄袭 广告 招聘 骂人 其他 (最多只允许输入30个字)