deep learning 学习笔记

原创 2015年11月26日 16:27:04


1. linear regression

use MLE to understand the loss function

2. logistic regression— binary classification

use MLE to understand the loss function

3. softmax regression —multiple classification

4. Neural Network

activation function: sigmod(0,1), tanh[-1,1], rectified linear(0,+inf)

forward propagation
用input feature x 通过activation function 来计算最后输出的prediction结果的过程

backpropagation algorithm
整个NN也是算一个loss function 再用batch gradient descent 来计算W和b就行。只是在求W(l)ijb(l)j的partial derivative的时候,要用这个backpropagation algorithm来计算。总体思想就是求出先求出最后prediction与true value的difference,然后再向后计算出每一layer的对这个difference的contribution,利用这个contribution可以求出每个partial derivative. 这样求partial derivative会更快。

这里partial derivative是每个training sample的partial derivative的和,就是说没求一次partial derivative 就要scan所有的training sample

Supervised CNN

feature extraction by convolution
当input 的sample 的维度extremely large, we can firstly apply convolution降维, 然后利用pooling降维

compared with the batch GD. just use a single training example or a small amount of examples called “minibatch”, ususally 256.

notice the the term “minibatch”, “epoch”(iteration over the whole data set), “shuffle”

One final but important point regarding SGD is the order in which we present the data to the algorithm. If the data is given in some meaningful order, this can bias the gradient and lead to poor convergence. Generally a good method to avoid this is to randomly shuffle the data prior to each epoch of training.

If the objective has the form of a long shallow ravine leading to the optimum and steep walls on the sides, standard SGD will tend to oscillate across the narrow ravine since the negative gradient will point down one of the steep sides rather than along the ravine towards the optimum. The objectives of deep architectures have this form near local optima and thus standard SGD can lead to very slow convergence particularly after the initial steep gains. Momentum is one method for pushing the objective more quickly along the shallow ravine.

这里就是说如果objective function是一个很陡的谷底,那么每次update都很容易从谷的一边跑到另外一边,即在谷内震荡下行。


filter / kernel: for example, it is the 8*8 patch for convolution

after convolution, we get feature map.

CNN consists of three parts.
normal fully connected NN layers in CNN.
subsampling layer is the pooling layer in CNN.
convolutional layer in CNN


对比普通NN, CNN就是可以利用convolution and pooling 处理高维的输入数据

Sparse coding && PCA


unsupervised learning


feature extraction for unsupervised learning when we don’t have the trained labels.

Statistical Language Modeling (SLM)

p(w1,...,wT) the probability of a word sequence
probabilistic chain rule
p(w1,...,wT)=p(w1)Ti=2p(wi|w1,...,wi1)=p(w1)Ti=2p(wi|hi) where hi denotes the history of the ith word wi



The idea behind RNNs is to make use of sequential information. In a traditional neural network we assume that all inputs (and outputs) are independent of each other. But for many tasks that’s a very bad idea.

i.e. RNN considers the dependency of the training samples. In other words, each training sample has somewhat dependency or sequential relationship

the ref ( wildml has provided a specific implementation example.
several issues should be noted:

1.在generating text的code中

next_word_probs = model.forward_propagation(new_sentence)


while sampled_word == word_to_index[unknown_token]:
            samples = np.random.multinomial(1, next_word_probs[-1])
            sampled_word = np.argmax(samples)

所以这里next_word_probs[-1] 其实是表示的o[-1],对应input x的最后一个word,然后samples就是一个one-hot-vector,再用np.argmax取得index。


a type of RNN can capture a long dependency

只是为了克服RNN cannot capture the long dependency的问题。


loss function

least mean square
cross-entropy loss, i.e. log loss function or logistic loss

Backpropagation Alg.


注意δli是 total error 对 zli的偏导数

computational graph
forward-mode differentiation,一次只能计算output对其中一个input的偏导数,即如果在computational graph 中input很多,那么这种方法计算偏导数就会很慢。求从其中一条input到output所有的path的


reverse-mode differentiation 更快,可以一次性计算 output对所有的node的偏导数,就是backpropagation. ZAllNode

Deep Learning(深度学习)学习笔记整理系列之(七)

Deep Learning(深度学习)学习笔记整理系列 作者:Zouxy version 1.0 201...
  • zouxy09
  • zouxy09
  • 2013年04月10日 10:48
  • 523250

Deep Learning(深度学习)学习笔记整理系列之(一)

Deep Learning(深度学习)学习笔记整理系列 作者:Zouxy version 1.0  20...
  • zouxy09
  • zouxy09
  • 2013年04月08日 23:35
  • 784876

Deep Learning(深度学习)学习笔记整理 一、概述        Artificial Intelligence,也就是人工智能,就像长...
  • bluejoe2000
  • bluejoe2000
  • 2016年03月14日 12:10
  • 5419

Deep Learning 优化小结

Hinton 06 年的 A fast learning algorithm for deep belief nets首先提出了 pre-training + fine-tuning的优化 deep ...
  • LiFeitengup
  • LiFeitengup
  • 2013年08月23日 11:50
  • 6500

【面向代码】学习 Deep Learning(一)Neural Network

========================================================================================== 最近一直在看Dee...
  • Dark_Scope
  • Dark_Scope
  • 2013年07月23日 16:24
  • 88336

Neural Networks and Deep Learning 学习笔记(十二)

Problem 2Let’s verify that the data still looks good. Displaying a sample of the labels and images f...
  • lmw21848
  • lmw21848
  • 2016年08月03日 11:23
  • 393

Deep Learning 学习笔记 第一篇

一、声明:1)该Deep Learning的学习系列是整理自网上很大牛和机器学习专家所无私奉献的资料的。具体引用的资料请看参考文献。具体的版本声明也参考原文献。2)本文转自别人的博客,若有侵权,请联系...
  • Selina013
  • Selina013
  • 2016年11月17日 14:48
  • 296

Deep Learning Resources

ImageNet AlexNet ImageNet Classification with Deep Convolutional Neural Networks nips-pag...
  • qq_26898461
  • qq_26898461
  • 2016年03月14日 15:54
  • 3886


Matlab示例代码为两部分,分别对应不同的论文: 1. Reducing the Dimensionality of data with neural networks   ministdeepa...
  • zjxiaolu
  • zjxiaolu
  • 2015年04月24日 17:38
  • 1247

Neural Networks and Deep Learning学习笔记ch1 - 神经网络

最近开始看一些深度学习的资料,想学习一下深度学习的基础知识。找到了一个比较好的tutorial,Neural Networks and Deep Learning,认真看完了之后觉得收获还是很多的。从...
  • yc461515457
  • yc461515457
  • 2016年01月11日 17:00
  • 5203
您举报文章:deep learning 学习笔记