Deep Learning 23:dropout理解_之读论文“Improving neural networks by preventing co-adaptation of feature det...

理论知识Deep learning:四十一(Dropout简单理解)深度学习(二十二)Dropout浅层理解与实现、“Improving neural networks by preventing co-adaptation of feature detectors

 感觉没什么好说的了,该说的在引用的这两篇博客里已经说得很清楚了,直接做试验吧

 

注意

1.在模型的测试阶段,使用”mean network(均值网络)”来得到隐含层的输出,其实就是在网络前向传播到输出层前时隐含层节点的输出值都要减半(如果dropout的比例为p=50%),其理由如下:

At test time, we use the “mean network” that contains all of the hidden units but with their outgoing weights halved to compensate for the fact that twice as many of them are active.

即:因为神经元被激活的概率被增加了两倍(因为被激活神经元数量少了一半),所以为了补偿这一点,它的权值就要减少一半。

当然,这一点补偿可以是在训练的时候,对x进行放大(除以1-p),也可以是在测试的时候,对权重进行缩小(乘以概率p)。 

 

2.Deep learning:四十一(Dropout简单理解)中有一点很容易让人误解:

Deep learning:四十一(Dropout简单理解)实验中nn.dropoutFraction和深度学习(二十二)Dropout浅层理解与实现实验中的level是指该神经元被dropout(即:丢弃)的概率,而论文“Dropout: A simple way to prevent neural networks from overfitting”中的概率p是指神经元被present(即:被选中不被dropout)的概率。即:p=1 - dropoutFraction = retain_prob = 1 - level。不搞清楚这一点,在看代码的时候很容易误解Deep learning:四十一(Dropout简单理解)实验中的代码跟论文“Dropout: A simple way to prevent neural networks from overfitting”中说的不一样,但其实是一样的。

所以,在论文“Dropout: A simple way to prevent neural networks from overfitting”中有指出:

经过上面屏蔽掉某些神经元,使其激活值为0以后,我们还需要对向量x1……x1000进行rescale,也就是乘以1/(1-p)。如果你在训练的时候,经过置0后,没有对x1……x1000进行rescale,那么你在测试的时候,就需要对权重进行rescale:

 

3.论文中明明说的是,为什么在代码中是:

        %dropout
        if(nn.dropoutFraction > 0)
            if(nn.testing)
                nn.a{i} = nn.a{i}.*(1 - nn.dropoutFraction);
            else
                nn.dropOutMask{i} = (rand(size(nn.a{i}))>nn.dropoutFraction);
                nn.a{i} = nn.a{i}.*nn.dropOutMask{i};
            end
        end

即:是在激活值上乘以p?

答:因为有

由上面公式可知,在w在乘以p等同于在z上乘以p。

 

4.Deep learning:四十一(Dropout简单理解)实验中下面代码中的d表示什么:

反向传播函数nnbp.m中的代码:

        if(nn.dropoutFraction>0)
            d{i} = d{i} .* [ones(size(d{i},1),1) nn.dropOutMask{i}];
        end

答:其中d表示的是残差或误差delta

 

5.dropout的优点与缺点

答:

优点:在训练数据较少时,可用于防止过拟合

缺点:会使训练时间加长,但不影响测试时间

 

 

 

 一些matlab函数

1.matlab中用rng替换rand('seed',sd)、randn('seed',sd)和rand('state',sd)的通俗解释

 

实验

我做的是实验是重复了Deep learning:四十一(Dropout简单理解)中的实验,结果是一样的,具体可看该篇博文

 

 

 

参考文献:

Dropout: A simple way to prevent neural networks from overfitting [paper][bibtex][code

Imagenet classification with deep convolutional neural networks

Improving Neural Networks with Dropout

转载于:https://www.cnblogs.com/dmzhuo/p/5644790.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
不管你想做什么,你都要好好的从论文看,而不是单纯的调论文写代码!通过这些学习,你才能真正的对深度学习的发展,模型的优化,进经典的trick有深入的理解! 做算法,做科研必不可少!时间有限的人可以只看1.3 2.1 2.2 !(强烈推荐!) ## 1.3 ImageNet EvolutionDeep Learning broke out from here) **[4]** Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "**Imagenet classification with deep convolutional neural networks**." Advances in neural information processing systems. 2012. [[pdf]](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) **(AlexNet, Deep Learning Breakthrough)** :star::star::star::star::star: **[5]** Simonyan, Karen, and Andrew Zisserman. "**Very deep convolutional networks for large-scale image recognition**." arXiv preprint arXiv:1409.1556 (2014). [[pdf]](https://arxiv.org/pdf/1409.1556.pdf) **(VGGNet,Neural Networks become very deep!)** :star::star::star: **[6]** Szegedy, Christian, et al. "**Going deeper with convolutions**." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. [[pdf]](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf) **(GoogLeNet)** :star::star::star: **[7]** He, Kaiming, et al. "**Deep residual learning for image recognition**." arXiv preprint arXiv:1512.03385 (2015). [[pdf]](https://arxiv.org/pdf/1512.03385.pdf) **(ResNet,Very very deep networks, CVPR best paper)** :star::star::star::star::star: #2 Deep Learning Method ## 2.1 Model **[14]** Hinton, Geoffrey E., et al. "**Improving neural networks by preventing co-adaptation of feature detectors**." arXiv preprint arXiv:1207.0580 (2012). [[pdf]](https://arxiv.org/pdf/1207.0580.pdf) **(Dropout)** :star::star::star: **[15]** Srivastava, Nitish, et al. "**Dropout: a simple way to prevent neural networks from overfitting**." Journal of Machine Learning Research 15.1 (2014): 1929-1958. [[pdf]](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf) :star::star::star: **[16]** Ioffe, Sergey, and Christian Szegedy. "**Batch normalization: Accelerating deep network training by reducing internal covariate shift**." arXiv preprint arXiv:1502.03167 (2015). [[pdf]](http://arxiv.org/pdf/1502.03167) **(An outstanding Work in 2015)** :star::star::star::star: **[17]** Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. "**Layer normalization**." arXiv preprint arXiv:1607.06450 (2016). [[pdf]](https://arxiv.org/pdf/1607.06450.pdf?utm_source=sciontist.com&utm_medium=refer&utm_campaign=promote) **(Update of Batch Normalization)** :star::star::star::star: **[18]** Courbariaux, Matthieu, et al. "**Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to+ 1 or−1**." [[pdf]](https://pdfs.semanticscholar.org/f832/b16cb367802609d91d400085eb87d630212a.pdf) **(New Model,Fast)** :star::star::star: **[19]** Jaderberg, Max, et al. "**Decoupled neural interfaces using synthetic gradients**." arXiv preprint arXiv:1608.05343 (2016). [[pdf]](https://arxiv.org/pdf/1608.05343) **(Innovation of Training Method,Amazing Work)** :star::star::star::star::star: **[20]** Chen, Tianqi, Ian Goodfellow, and Jonathon Shlens. "Net2net: Accelerating learning via knowledge transfer." arXiv preprint arXiv:1511.05641 (2015). [[pdf]](https://arxiv.org/abs/1511.05641) **(Modify previously trained network to reduce training epochs)** :star::star::star: **[21]** Wei, Tao, et al. "Network Morphism." arXiv preprint arXiv:1603.01670 (2016). [[pdf]](https://arxiv.org/abs/1603.01670) **(Modify previously trained network to reduce training epochs)** :star::star::star: ## 2.2 Optimization **[22]** Sutskever, Ilya, et al. "**On the importance of initialization and momentum in deep learning**." ICML (3) 28 (2013): 1139-1147. [[pdf]](http://www.jmlr.org/proceedings/papers/v28/sutskever13.pdf) **(Momentum optimizer)** :star::star: **[23]** Kingma, Diederik, and Jimmy Ba. "**Adam: A method for stochastic optimization**." arXiv preprint arXiv:1412.6980 (2014). [[pdf]](http://arxiv.org/pdf/1412.6980) **(Maybe used most often currently)** :star::star::star: **[24]** Andrychowicz, Marcin, et al. "**Learning to learn by gradient descent by gradient descent**." arXiv preprint arXiv:1606.04474 (2016). [[pdf]](https://arxiv.org/pdf/1606.04474) **(Neural Optimizer,Amazing Work)** :star::star::star::star::star: **[25]** Han, Song, Huizi Mao, and William J. Dally. "**Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding**." CoRR, abs/1510.00149 2 (2015). [[pdf]](https://pdfs.semanticscholar.org/5b6c/9dda1d88095fa4aac1507348e498a1f2e863.pdf) **(ICLR best paper, new direction to make NN running fast,DeePhi Tech Startup)** :star::star::star::star::star: **[26]** Iandola, Forrest N., et al. "**SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 1MB model size**." arXiv preprint arXiv:1602.07360 (2016). [[pdf]](http://arxiv.org/pdf/1602.07360) **(Also a new direction to optimize NN,DeePhi Tech Startup)** :star::star::star::star:
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值