Algorithm Design
决定:
layers
hidden units
learning rates
activation functions
训练:
train sets, 70%
dev sets, 20%
test sets, 10%
Bias and variance
high bias = under fitting 高偏差 欠拟合
hihg variance = over fitting 高方差 过度拟合
train set error:
dev set error:
dev > train: overfitting dev 泛化性不够好 high varience
dev 15 < train 16: under fitting
dev 30 > train 15: both
Basic recipe
- High bais?
To training data
New net work, more layers, units, bigger network to fit training data
只要人类可识别,一个足够大的网络应该是可以有好的表现的 - High varience?
More data, reguleration, other achietecture.
Regularization
- L2 regularization: weight decay
- Frobenins norm
Numerical approximation of gradients
梯度检验:
通过一个大三角形的面积,而不是一个小三角形的面积来估算梯度,会更加的准确
Dropout layer, in test set, do not apply it and do not keep the 1/keep_pro factor in the calcualtions