wee6——模型评估

最新推荐文章于 2024-09-14 19:18:15 发布

三つ叶

最新推荐文章于 2024-09-14 19:18:15 发布

阅读量113

点赞数

分类专栏： Coursera机器学习文章标签：机器学习

本文链接：https://blog.csdn.net/zzhhjjjj/article/details/120821429

版权

Coursera机器学习专栏收录该内容

13 篇文章 0 订阅

订阅专栏

训练集、测试集、验证集

One way to break down our dataset into the three sets is:

Training set: 60%
Cross validation set: 20%
Test set: 20%

三种数据划分可以分别做一下用途：
We can now calculate three separate error values for the three different sets using the following method:

Optimize the parameters in Θ using the training set for each polynomial degree.
Find the polynomial degree d with the least error using the cross validation set.
Estimate the generalization error using the test set with $J_{test}(\Theta^{(d)})$
(d = theta from polynomial with lower error);

如果只是按照训练集、测试集进行划分，那么测试集就承担了完成2、3两步的任务，但是第二步就是用测试集去选择的d，如果第三步再用测试集去评估模型，很显然是“不公平”的。因此，我们引入了验证集去分担任务2.

Bias vs. Variance

在这里插入图片描述

Regularization and Bias/Variance

正则项系数的大小带来的影响

在这里插入图片描述

正则项系数与J的关系

在这里插入图片描述

high bias和 high variance下增加训练集数量对J的影响

在这里插入图片描述

调整一个学习算法——Debugging a learning algorithm

Our decision process can be broken down as follows:

Getting more training examples: Fixes high variance
Trying smaller sets of features: Fixes high variance
Adding features: Fixes high bias
Adding polynomial features: Fixes high bias
Decreasing λ: Fixes high bias
Increasing λ: Fixes high variance.

Diagnosing Neural Networks

A neural network with fewer parameters is prone to underfitting. It is also computationally cheaper.
A large neural network with more parameters is prone to overfitting. It is also computationally expensive. In this case you can use regularization (increase λ) to address the overfitting.

Precision/Recall

假设一个肿瘤患病问题，患肿瘤的概率为0.5%，概率很小，对于这样一个一边概率远大于另一边的我们称为倾斜分类skewed class.
如果我们仍然采用accuracy来衡量这样的问题，那么对于一个始终预测y=0的模型，它预测上面的肿瘤问题的错误率也仅仅是0.5%.
Accuracy = (true positives + true negatives) / (total examples)
为此，我们引入Precision和Recall
在这里插入图片描述
此时，如果我们用Precision和Recall去评判刚刚y=0的模型，那么结果都是0