Deep Learning 学习笔记(二)

1.深度学习步骤

1)define function set;goodness of the function set;pick the best function
——get the neural network
2)check training set performers:good results
——if not:step1):不是过拟合
——if ture:step3)
注:深度学习相比于k近邻、决策树而言,不容易出现过拟合(k近邻、决策树的training set最后一定是很好的正确率100%之类,这是一定会过拟合的),深度学习很容易training的时候得不到好的结果,所以可能会反复步骤1和2
3)if testing set :not good
——step 1):过拟合(training good ,testing not good)
注意:只有 testing set 正确率低,不能判断是过拟合,要看training set
4) testing set :good
——over

2.步骤2-4涉及优化模型,但是不同问题有不同的优化方式

问题类型:

1)优化training set(提升拟合度)
优化方式:选择新的激活函数(sigmoid、relu、maxout等);adaptive learning rate
2)优化testing set(解决过拟合)
优化方式:dropout;early stopping;regularization
在这里插入图片描述

:dropout(用于解决过拟合:train ok,test not good)
在前向传播的时候,让某个神经元的激活值以一定的概率p停止工作,这样可以使模型的泛化性更强,因为他不会太依赖局部的特征(让一些神经元失活,来防止过拟合的一种方式)

3.用sigmoid做激活函数时,层数越多不代表结果越好

在这里插入图片描述
但这不是层数多的时候过拟合了(蓝色线时training set结果,不是testingset 结果)
只是not well trained

原因:

1)梯度消失
(2)梯度爆炸)

4.梯度消失 vanishing gradient problem

在这里插入图片描述
根据backpropagation算法
在学习效率不变的情况下,前面的层梯度小,后面的层梯度大;前面还在随机缓慢下降,后面已经收敛(converge)了,结果会卡在局部最小的位置(local minimum)
因此,是前面layer的梯度小导致了这个问题
在这里插入图片描述
👆改变input的值,他的影响是逐渐衰减的,因为sigmoid会压缩输出
所以示实际问题中,通常采用ReLU和其衍生的激活函数作为中间层函数
(layer wise training 逐层训练)

5.ReLU(rectified linear unit:线性整流单元/修正线性单元)

如何解决前面的layer梯度小的问题:

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Complexity theory of circuits strongly suggests that deep architectures can be much more efcient sometimes exponentially than shallow architectures in terms of computational elements required to represent some functions Deep multi layer neural networks have many levels of non linearities allowing them to compactly represent highly non linear and highly varying functions However until recently it was not clear how to train such deep networks since gradient based optimization starting from random initialization appears to often get stuck in poor solutions Hinton et al recently introduced a greedy layer wise unsupervised learning algorithm for Deep Belief Networks DBN a generative model with many layers of hidden causal variables In the context of the above optimization problem we study this algorithm empirically and explore variants to better understand its success and extend it to cases where the inputs are continuous or where the structure of the input distribution is not revealing enough about the variable to be predicted in a supervised task Our experiments also conrm the hypothesis that the greedy layer wise unsupervised training strategy mostly helps the optimization by initializing weights in a region near a good local minimum giving rise to internal distributed representations that are high level abstractions of the input bringing better generalization ">Complexity theory of circuits strongly suggests that deep architectures can be much more efcient sometimes exponentially than shallow architectures in terms of computational elements required to represent some functions Deep multi layer neural networks have many levels of non linearities allowin [更多]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值