Lecture 7: Training Neural Networks, Part 2

CS231n

Lecture 7: Training Neural Networks, Part 2

Optimization

SGD

w -= lr * grad

  • Loss function has high condition number: ratio of largest to smallest singular value of the Hessian matrix is large Very slow progress along shallow dimension, jitter along steep direction
  • 易陷入局部最优或鞍点出不来
  • 对minibatch的噪声敏感
SGD + Momentum
v = rho * v + grad
x -= lr * v
  • Build up “velocity” as a running mean of gradients
  • Rho gives “friction”; typically rho=0.9 or 0.99
    能够跳出鞍点和局部最优点,平滑mini-batch的噪声
Nesterov Momentum
AdaGrad
grad /= (norm(grad) + epsilon)
x -= lr * grad

Added element-wise scaling of the gradient based on the historical sum of squares in each dimension
Q: What happens with AdaGrad?
A: grad normalization
Q2: What happens to the step size over long time?
A: keeps the same

RMSProp
Adam

Q: What happens at first timestep?
A: 起初 m1=m2=0 m 1 = m 2 = 0 ,于是dx=lr * (1- beta1) * dx/((1 - beta2) * norm(grad) + epsilon)退化为AdaGrad
Q: Which one of these learning rates is best to use?
A: weight decay. 一般只对SGD+Momentum使用而不对Adam使用

beta1 = 0.9, beta2 = 0.999, and learning_rate = 1e-3 or 5e-4 is a great starting point for many models!
In practice, Adam is a good default choice in most cases

Model Ensembles

  1. Train multiple independent models
  2. At test time average their results
    +2%

Instead of training independent models, use multiple snapshots of a single model during training!
Polyak averaging: Instead of using actual parameter vector, keep a moving average of the parameter vector and use that at test time
这些方法好像很少见到有人用过
improve single-model performance: Regularization, Dropout

Regularization

  • Data Augmentation: Horizontal Flips, Random crops and scales, Color Jitter, …
  • DropConnect
  • Fractional Max Pooling
  • Stochastic Depth

Dropout

  • Forces the network to have a redundant representation;
  • Prevents co-adaptation of features
  • training a large ensemble of models (that share parameters)

At test time, multiply by dropout probability

Transfer Learning

it’s the norm, not an exception

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值