http://scott.fortmann-roe.com/docs/BiasVariance.html
bias-variance tradeoff.
- High bias means that the model is too simple to capture the relationship of the features and target. A high bias model usually has less model complexity and has both bad training and validation/testing score.
- High variance means models with same complexity trained on similar data points would have significant difference.It is too complex to be consistent. A high variance model usually has 1) very low training error and high testing error, and 2) large gap between training and validation curve.
- 高偏差意味着模型太简单,估计不准确,不能捕捉特征和目标之间的关系。高偏差模型通常复杂度低,训练和测试得分(这里指R^2可决系数)低。
- 高方差意味着同样复杂度的模型在 相似数据点上训练但有显著性的差异。模型太复杂很难一致。高方差模型通常1)训练错误低,测试错误高,2)训练和测试曲线的得分相距很远。如下图所示:
- 在用决策树模型时,
- max-depth=1时,Training Score 和 Validation Score都很低,是高偏差,即模型简单,在训练集上效果不好,在预测集上也不好
- max-depth=10时,Training Score 和 Validation Score之间的gap很大,且Training Score很高,说明在训练集上效果很好,但在测试集上效果不好。是高方差。
- 竖轴的Score是指可决系数R^2
- 可决系数在[0,1]zhijian ,表示目标变量的预测值和实际值之间的相关程度平方的百分比,表示该模型中目标变量中有百分之多少能用特征来解释。
- R^2为负值时,所做预测还不如直接计算目标变量的平均值。
- 具体公式参考另一篇博客http://blog.csdn.net/duxinyuhi/article/details/52233993