Optimization algorithms -----week2

  1. Batch vs. mini-batch gradientt descent
    (1)可以分成5000个子集。
    for t = 1, …., 5000
    Forward prop on x{t} x { t } .
    Z[1]=w[1]x{t}+b[t] Z [ 1 ] = w [ 1 ] x { t } + b [ t ]
    A[1]=g[1](Z[1]) A [ 1 ] = g [ 1 ] ( Z [ 1 ] )
    …..
    A[l]=g[l](z[l]) A [ l ] = g [ l ] ( z [ l ] )
    (2) Compute cost :
    J=11000li=1L(y^(i),y(i))+λ21000lw[l]2F J = 1 1000 ∑ i = 1 l L ( y ^ ( i ) , y ( i ) ) + λ 2 ⋅ 1000 ∑ l ‖ w [ l ] ‖ F 2
    (3) Backprop to compute gradients J{t}(x{t},y{t}) J { t } ( x { t } , y { t } )
    w[t]=w[l]αdw[l] w [ t ] = w [ l ] − α d w [ l ]
    b[l]=b[l]αdb[l] b [ l ] = b [ l ] − α d b [ l ]
  2. Choosing mini-batch size
    (1)if mini-batch size = m: the size of training set——->Batch gradient descent.(如果训练集数据大将会运行很长时间)
    (2) if mini-batch size = 1: Stochastic gradient descent—->Every example is its own mini-batch.(噪声大,而且最后总是在最小值附近摆动)
    (3)Choose In-between(minibatch size not too big/small)
  3. Some guidelines about choosing your mini-batch size:
    (1)If small training(m <= 2000 ) set: Use batch gradient descent.
    (2)Typical mini-batch size:64, 128, 256, 512(据说 2n 2 n 代码运行得快)
    (3)Make sure some mini-batch fit in CPU/GPU memory. x[t],y[t] x [ t ] , y [ t ]
  4. Exponentially weighted moving averages(指数加权滑动平均)
    Vt=βVt1+(1β)θt V t = β V t − 1 + ( 1 − β ) θ t
    β=0.911βdaystempetaure β = 0.9 ≈ 1 1 − β d a y s ′ t e m p e t a u r e
    β=0.9811β=50daystemperature β = 0.98 ≈ 1 1 − β = 50 d a y s ′ t e m p e r a t u r e
  5. Bias correction(偏差修正) in exponentially weighted average.
    Vt=βVt1+(1β)θt V t = β V t − 1 + ( 1 − β ) θ t
    Vt1βt V t 1 − β t
  6. Gradient descent with momentum(动量梯度下降法):
    (1) Compute dw,db d w , d b on current mini-batch.
    Vdw=βVdw+(1β)dw V d w = β V d w + ( 1 − β ) d w
    Vθ=βVθ+(1β)θt V θ = β V θ + ( 1 − β ) θ t
    Vdb=βVdb+(1β)db V d b = β V d b + ( 1 − β ) d b
    (2) Update w,b w , b :
    w=wαVdw w = w − α V d w
    b=bαVdb b = b − α V d b
    使得梯度下降法在垂直方向的震荡幅度变小,水平方向的移动更快速,以更快速度进行梯度下降。
    (3)Implementation details:
    Vdw=0,Vdb=0 V d w = 0 , V d b = 0
    On iteration t:
    Compute dW,db d W , d b on the current mini-batch.
    vdW=βvdW+(1β)dW v d W = β v d W + ( 1 − β ) d W
    vdb=βvdb+(1β)db v d b = β v d b + ( 1 − β ) d b
    W=WαvdW,b=bαvdb W = W − α v d W , b = b − α v d b
    Hyperparameters: α α , β β , β=0.9 β = 0.9
    (4) RMSprop(root mean square prop)(均方根传递)
    On iteration t:
    Compute dw,db d w , d b on current mini-batch
    SdW=βSdW+(1β)(dW)2 S d W = β S d W + ( 1 − β ) ( d W ) 2 : (dW)2,elementwise ( d W ) 2 , e l e m e n t − w i s e
    Sdb=βSdb+(1β)(db)2 S d b = β S d b + ( 1 − β ) ( d b ) 2
    Update:
    W=WαdWSdW+ε W = W − α d W S d W + ε
    b=bαdbSdb+ε b = b − α d b S d b + ε
    (5) Adam optimization algorithms
    VdW=0,SdW=0,Vdb=0,Sdb=0 V d W = 0 , S d W = 0 , V d b = 0 , S d b = 0
    On iteration t:
    Compute dW,db d W , d b using current mini-batch.(mini-batch gradient)
    “momentum”:
    VdW=β1VdW+(1β1)dW,Vdb=β1Vdb+(1β1)db V d W = β 1 V d W + ( 1 − β 1 ) d W , V d b = β 1 V d b + ( 1 − β 1 ) d b
    “RMSprop”:
    Sdw=β2Sdw+(1β2)(dW)2,Sdb=β2Sdb+(1β2)db S d w = β 2 S d w + ( 1 − β 2 ) ( d W ) 2 , S d b = β 2 S d b + ( 1 − β 2 ) d b
    Bias corrected:
    vcorrecteddw=VdW1βt1, v d w c o r r e c t e d = V d W 1 − β 1 t ,
    Vcorrecteddb=Vdb1βt1 V d b c o r r e c t e d = V d b 1 − β 1 t
    ScorrecteddW=Sdw1βt2, S d W c o r r e c t e d = S d w 1 − β 2 t ,
    Scorrecteddb=Sdb1βt2 S d b c o r r e c t e d = S d b 1 − β 2 t
    W=WαVcorrecteddwScorrecteddw++ε W = W − α V d w c o r r e c t e d S d w c o r r e c t e d + + ε
    b=bαVcorrecteddbScorrecteddb+ b = b − α V d b c o r r e c t e d S d b c o r r e c t e d +
    Hyperparameters choice:
    α:needstobetune α : n e e d s t o b e t u n e
    β1:0.9(dW) β 1 : 0.9 ( d W )
    β2:0.999(dW2)elementwise β 2 : 0.999 ( d W 2 ) e l e m e n t − w i s e
    ε:108 ε : 10 − 8
  7. Learning rate decay
    epoch: 迭代次数
    α=11+decay_rateepoch_numbersα0 α = 1 1 + d e c a y _ r a t e ⋅ e p o c h _ n u m b e r s ⋅ α 0
    8.Local optima in neural network
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值