1,to be done
1,slides 21页中为什么
w
的梯度可以为全负值
(我认为只能为全正值,因为
2,使用Xavier的原因
3,当模型太大时,cross validation的必要性
2,Notes
1,神经网络训练步骤:
preprocess data :将data normalize到 zero mean.有两种方法:第一种是减去使用每一个维度的数据的平均值,即subtract the mean image;第二种是减去每一种通道(r,g,b)的平均值,即subtract the per channel mean。
weight Initialization :对于 tanh 采用Xavier initialization(
np.random.randn(fan_in,fan_out)/np.sqrt(fan_in)
)对于relu采用改进版的Xavier initializationnp.random.randn(fan_in,fan_out)/np.sqrt(fan_in/2)
- batch normalization.Usually inserted after Fully Connected or Convolutional layers,
and before nonlinearity. - Hyperparameter 的优化,从coarse到fine .
First stage: only a few epochs to get rough idea of what params work
Second stage: longer running time, finer search (repeat as necessary)