改善深层神经网络:超参数调试、正则化以及优化
吴恩达 Andrew Ng
Mini-batch 梯度下降法
把巨大的数据集分成一个一个的小部分
5000000 examples, 1000 × 5000, X{ 1}...X{ 5000} X { 1 } . . . X { 5000 } , Y{ 1}...Y{ 5000} Y { 1 } . . . Y { 5000 }
epoch means a single pass through the training set
Batch gradient descent’s cost decrease on every iteration
Mini-batch gradient descent may not decrease on every iteration. It trends downwards, but it’s going to be a little bit noisier.
mini-batch size = m: Batch gradient descent
mini-batch size = 1: Stochastic gradient descent (随机梯度下降法)
不会收敛,最终在最小值处波动一般 mini-batch 大小为 64、128、256、512
Exponentially weighted averages (指数加权平均)
Vt=β