Lecture 6: Training Neural Networks, Part I

CS231n

Lecture 6: Training Neural Networks, Part I

Review

回顾之前的内容,我们学习了神经网络的反向传播训练方法和CNN的结构,于是对于CNN我们可以用反向传播方法进行训练,具体方式是
1. 采样mini-batch
2. 前向传播获得loss
3. 根据loss进行反向传播梯度
4. 根据梯度更新参数

Training Neural Networks

Activation Functions

sigmoid, leaky ReLU, tanh, maxout, ReLU, ELU, …
传统神经网络中用的是sigmoid,但是这有很大问题
1. 神经元一饱和梯度就消失,tanh同理
2. 输出不是以0为中心的
3. exp()计算代价较大
ReLU则有很大优势
1. 不饱和
2. 计算代价很低
3. 收敛快
4. 从神经生物学来看更合理
白璧微瑕:不是以0为中心的, x<0 x < 0 时不激活
其余还有leaky ReLU等,因为较少应用所以略过

Data Preprocessing

中心化、正规化: y=xμσ y = x − μ σ

Weight Initialization

Gauss initialization不行
Xavier initialization: wN(μ,σ2),μ=0fanin×fanout,σ=fanin w ∼ N ( μ , σ 2 ) , μ = 0 f a n i n × f a n o u t , σ = f a n i n ,在ReLU时不好用
He et.al.: wN(μ,σ2),μ=0fanin×fanout,σ=fanin2 w ∼ N ( μ , σ 2 ) , μ = 0 f a n i n × f a n o u t , σ = f a n i n 2

Batch Normalization

直接用Gauss initialization

x^=xE(x)Var(x) x ^ = x − E ( x ) V a r ( x )

一般放在CONV/FC层和ReLU层之间,即[CONV+BN+ReLU+pool]或[FC+BN+ReLU+pool]
Problem: do we necessarily want a unit gaussian input to a tanh layer?
A: N(0,1) N ( 0 , 1 ) 的大部分能量集中在 [3,3] [ − 3 , 3 ] 之间,而 tanh(3)=0.995,tanh(2)=0.964 tanh ⁡ ( 3 ) = 0.995 , tanh ⁡ ( 2 ) = 0.964 ,梯度已经开始消失,所以还是将 σ σ 进一步缩小比较好
实际使用时迭代更新
x^=xμσ2+ϵy=γx^+β x ^ = x − μ σ 2 + ϵ y = γ x ^ + β

好处

  • Improves gradient flow through the network
  • Allows higher learning rates
  • Reduces the strong dependenceon initialization
  • Acts as a form of regularization in a funny way, and slightly reduces the need for dropout, maybe
Babysitting the Learning Process
  1. Preprocess the data
  2. Choose the architecture
  3. Double check that the loss is reasonable
  4. Try training…Make sure that you can overfit very small portion of the training data; Start with small regularization and ind learning rate that makes the loss go down;
Hyperparameter Optimization

coarse fine
If the cost is ever > 3 * original cost, break out early
it’s best to optimize in log space
Q: But this best cross-validation result is worrying. Why?
A:
big gap between training accuracy and testing accuracy overfitting

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值