最近看过的部分Deep Learning论文

最近看过的部分Deep Learning论文

A Fast Learning Algorithm for Deep Belief Nets (2006)
- 首次提出layerwise greedy pretraining的方法,开创deep learning方向。layerwise pretraining的Restricted Boltzmann Machine (RBM)堆叠起来构成Deep Belief Network (DBN),其中训练最高层的RBM时加入了label。之后对整个DBN进行fine-tuning。在MNIST数据集上测试没有严重过拟合,得到了比Neural Network (NN)更低的test error。

Reducing the Dimensionality of Data with Neural Networks (2006)
- 提出deep autoencoder,作为数据降维方法发在Science上。Autoencoder是一类通过最小化函数集对训练集数据的重构误差,自适应地编解码训练数据的算法。Deep autoencoder模型用Contrastive Divergence (CD)算法逐层训练重构输入数据的RBM,堆叠在一起fine-tuning最小化重构误差。作为非线性降维方法在图像和文本降维实验中明显优于传统方法。

Learning Deep Architectures for AI (2009)
- Bengio关于deep learning的tutorial,从研究背景到RBM和CD再到数种deep learning算法都有详细介绍。还有丰富的reference。于是也有个缺点就是太长了。

A Practical Guide to Training Restricted Boltzmann Machines (2010)
- 如果想要自己实现deep learning算法,这篇是不得不看的。我曾经试过自己写但是效果很不好,后来看到它才知道算法实现中还有很多重要的细节。对照网上的代码看也能更好地理解代码。

Greedy Layer-Wise Training of Deep Networks (2007)
- 对DBN的一些扩展,比如应用于实值输入等。根据实验提出了对deep learning的performance的一种解释。

Why Does Unsupervised Pre-training Help Deep Learning? (2010)
- 总结了对deep learning的pretraining作用的两种解释:regularization和help optimization。设计实验验证两种因素的作用。

Autoencoders, Unsupervised Learning, and Deep Architectures (2011)
- 从理论角度对不同的Autoencoders作了统一分析的尝试。

On the Quantitative Analysis of Deep Belief Networks (2008)
- 用annealed importance sampling (AIS)给出一种估计RBM的partition function的方法,从而能够估算p(x)以及比较不同的DBN。

Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient (2008)
- 提出用persistent contrastive divergence (PCD)算法逼近maximum likelihood estimation的目标,从而可以得到更好的generative model。传统CD算法并不是以最大化p(x)为目标的,另有paper证明CD算法不对应任何优化目标函数。


以上主要是Geoffery Hinton和Joshua Bengio他们枫叶国这一支用RBM组成deep architecture的研究。另一支是以Yann LeCun为代表的(deep) convolutional networks,用convolutional network组成deep architecture的研究,也包括Andrew Ng的组。我目前对这一支了解还不太多。不过似乎对于computer vision的问题,convolutional networks 更加biological plausible,同时也有更强的模型先验,所以效果一般更好一些。


Complexity theory of circuits strongly suggests that deep architectures can be much more efcient sometimes exponentially than shallow architectures in terms of computational elements required to represent some functions Deep multi layer neural networks have many levels of non linearities allowing them to compactly represent highly non linear and highly varying functions However until recently it was not clear how to train such deep networks since gradient based optimization starting from random initialization appears to often get stuck in poor solutions Hinton et al recently introduced a greedy layer wise unsupervised learning algorithm for Deep Belief Networks DBN a generative model with many layers of hidden causal variables In the context of the above optimization problem we study this algorithm empirically and explore variants to better understand its success and extend it to cases where the inputs are continuous or where the structure of the input distribution is not revealing enough about the variable to be predicted in a supervised task Our experiments also conrm the hypothesis that the greedy layer wise unsupervised training strategy mostly helps the optimization by initializing weights in a region near a good local minimum giving rise to internal distributed representations that are high level abstractions of the input bringing better generalization ">Complexity theory of circuits strongly suggests that deep architectures can be much more efcient sometimes exponentially than shallow architectures in terms of computational elements required to represent some functions Deep multi layer neural networks have many levels of non linearities allowin [更多]




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


