1、误差的来源
Where does the error come from ?
并不是模型越复杂,误差越小
error due to ‘bias’ and error due to ‘variance’
理论上有一个最佳的函数
f
^
\hat f
f^,但我们没办法知道。利用训练数据,我们可以找到
f
∗
f^*
f∗,这个
f
∗
f^*
f∗只是
f
^
\hat f
f^的估计值。
Bias and Variance of Estimator:
V
a
r
[
m
]
=
s
i
g
m
a
2
N
Var[m]=\frac{sigma^2}{N}
Var[m]=Nsigma2,说明方差取决于样本量
简单的模型Small Variance and Large Bias,复杂的模型Large Variance and Small Bias.因为简单的模型受到样本值的影响较小,可以理解为对异常值不敏感,而复杂的模型对异常值更敏感。
偏差是衡量所以预测或估计结果
f
∗
f^*
f∗的均值,与真实结果
f
^
\hat f
f^之间有多接近。
过拟合是error中variance影响较大,欠拟合是error中bias影响较大
Diagnosis:
Underfitting:If your model cannot even fit the training examples,then you have large bias
Overfitting:If you can fit the training data,but large error on testing data,then you probably have large variance
For underfitting,redesign your model:(1)Add more features as input;(2)A more complex model
For overfitting:(1)More data,;(2)Regularization
Model Selection
在偏差和方差之间寻找一个平衡
select a model that balances two kinds of error to minimize total error
此时,public set可能不能很好的说明模型在private中的效果
Corss Validation
此时public set可以说明模型的效果,但是不建议因为效果比在validation set上差,而去改变模型,因为如果以选择public set效果好的模型为导向,还是不能提前知晓模型在private set上的真实效果
N-fold Cross Validation
2、梯度下降
机器学习中需要寻找最优的function,因此需要求解最优化问题:
w
∗
=
arg min
n
L
(
w
)
w^*=\displaystyle \argmin_nL(w)
w∗=nargminL(w)
其中:L:loss function;w:parameters
Tip1:Tuning your learning rates
learning rate有时会存在一些问题,如蓝色路径太慢,花费的时间太多,绿色和黄色的路径每次步伐太大,导致永远不能到达最优点。并且当参数过多的情况下,没办法可视化整个优化的路径,但可以可视化参数变化对loss的影响趋势
Adaptive Learning Rates:
-
Popular & Simple Idea:Reduce the learning rate by some factor every few epochs.learning rate 随着参数的update越来越小,原因在于:在起始点时,通常离最优点较远,步伐需要大一些,才能节省时间。但经过几次update之后,离最优点比较近了,此时需要减小learning rate,使得它能够收敛在最优点。
-
Learning rate cannot be one-size-fits-all. Giving different parameters different learning rates. E.g. Adagrad
每个参数的learning rate都不同
Adagrad的矛盾点:
对于矛盾点的直观解释:
参考:https://www.bilibili.com/video/BV1Ht411g7Ef?p=5