Machine Learning 05 - Model Evaluation and Analysis

最新推荐文章于 2022-04-21 22:14:20 发布

能智工人

最新推荐文章于 2022-04-21 22:14:20 发布

阅读量435

点赞数 1

分类专栏：机器学习文章标签：人工智能机器学习 Standford

本文链接：https://blog.csdn.net/ddragon1/article/details/79324722

版权

机器学习专栏收录该内容

7 篇文章 0 订阅

订阅专栏

正在学习Stanford吴恩达的机器学习课程，常做笔记，以便复习巩固。
鄙人才疏学浅，如有错漏与想法，还请多包涵，指点迷津。

5.1 Evaluation

5.1.1 Hypothesis evaluation

a. Evaluation method
Given a dataset of training examples. we can split up it into two parts.Typically, we can divide the dataset :

split the dataset

Remark :

Both of the two parts must have the same data distribution.
The sample partitioning method should be randomly choosed, thus we should try the evaluation process many times and take an average.
Different cases have different size of training set and test set.

b. Performance measurement
For linear regression :

J t e s t (Θ) = 1 2 m t e s t \sum i = 1 m t e s t (h Θ (x (i) t e s t) - y (i) t e s t) 2

$J_{test}(\Theta )=\frac{1}{2m_{test}}\sum_{i=1}^{m_{test}}\left ( h_{\Theta} ( x_{test}^{(i)}) - y_{test}^{(i)} \right )^{2}$

For classification :

e r r (h Θ (x), y) = {10 if h Θ (x) \geq 0.5 and y = 0 or h Θ (x) \leq 0.5 and y = 1 otherwise

$err\left ( h_{\Theta }(x),y \right )=\left\{\begin{matrix} 1\\ 0 \end{matrix}\right. \begin{matrix} \quad \text{if } h_{\Theta }(x) \geq 0.5 \text{ and } y=0 \text{ or } h_{\Theta }(x) \leq 0.5 \text{ and } y=1\\ \quad \text{otherwise} \end{matrix}$

Test Error = 1 m t e x t \sum i = 1 m t e s t e r r (h Θ (x (i) t e s t), y (i) t e s t)

$\text{Test Error} = \frac{1}{m_{text}}\sum_{i=1}^{m_{test}}err\left ( h_{\Theta }(x_{test}^{(i)}),y_{test}^{(i)} \right )$

c. Example 1 : model selection

Let us talk about the selection of polynomial degree.

We can test each degree of polynomial and look at the error. It is suggested that we should divide the dataset into three parts, usually is :

dataset division

In general, we can evaluate our hypothesis using the following method :

1.Optimize the $\Theta$ using the training set for each polynomial degree.
2.Find the polynomial degree with the least error using the cross validation set.
3.Esitimate the generalization error using the test set with $J_{test}\left ( \Theta^{(d)} \right )$ .

5.1.2 Bias and variance

a. Bias and variance

Given a model, we have two concepts:

Bias : underfit, both train error and the cross validation error is high, also $J_{CV}(\Theta)\approx J_{train}(\Theta)$ .

Variance : overfit, the cross validation error is high but the train error is very low, also $J_{train}(\Theta)\ll J_{CV}(\Theta)$ .

bias and variance

b. Relations of regularization

Consider $\lambda$ in regularization :

when $\lambda$ is very large, our fit becomes more rigid.

when $\lambda$ is very samll, we tend to over overfit the data.

relations of regularization

c. Example 2 : regularization selection

We can find the best lambda using the methods below :

1.Learn some $\Theta$ using a list of lambdas.
2.Compute the cross validation error using the $\Theta$ , and choose the best combo that produces the lowest error on the cross validation set.
3.Using the best combo of $\Theta$ and $\lambda$ to compute the test error.

5.1.3 Learning curves

a. Relations of dataset

We can easily know that :

when the dataset is very small, we can get $0$ train errors while the cross validation error is large. As the dataset get larger, the train errors increase, and the cross validation decrease.

In the high bias problem :

High bias

In the high variance problem :

High variance

b. Example 3 : whether to add data

When experiencing high bias :

Low training set size causes low and high JCV(Θ) .
- High training set size casuses both high $J_{train}(\Theta)$ and high $J_{CV}(\Theta)$ .
- Getting more training data will not help much.
  
  When experiencing high variance :
  - Low training set size casuses low $J_{train}(\Theta)$ and high $J_{CV}(\Theta)$ .
  - High training set size result in increasing $J_{train}(\Theta)$ and decreasing $J_{CV}(\Theta)$ , but still $J_{train}(\Theta) < J_{CV}(\Theta)$ significantly.
  Getting more training data is likely to help.
  
  5.1.4 Summary
  
  Our decision process can be broken down as follows:
  - Getting more training examples: Fixes high variance
  - Trying smaller sets of features: Fixes high variance
  - Adding features: Fixes high bias
  - Adding polynomial features: Fixes high bias
  - Decreasing λ: Fixes high bias
  - Increasing λ: Fixes high variance.
  Addition ：
  
  A neural network with fewer parameters is prone to underfitting. It is also computationally cheaper.
  
  A neural network with more parameters is prone to overfitting. It is also computationally expensive.
  
  The knowledge above (such as 5.1.1) is also useful to neural network.
  
  5.2 System Design
  
  5.2.1 Error analysis
  
  We have konwn how to evaluate a model. Next we will talk about the whole process when facing a machine learning problem.
  
  Error analysis : after model training and evaluation, in order to choose which method is best or get some ideas, we will manually spot the errors in cross validation set, calculating the algorithm’s performance.
  
  5.2.2 Examples & Experience
  
  (1) Skewed Data
  
  Skewed data : the error metric of the algorithm can be very low, in which case a simple algorithm (like $y=0$ ) is better.
  
  Solution : use better metric
  
  Precision :
  
  P=TPTP+FP
  
  Recall :
  
  R=TPTP+FN
  
  Trade-off between T and R
  In order to compare different algorithm with P-R metric, we can use
  
  F-score
  
  F=2PRP+R
  
  (2) Large Dataset
  
  Training on a lot of data is likely to give good performance when two of the following conditions hold true :
  - Our learning algorithm is able to represent fairly complex functions.
  - if we have some way to be confident that $x$ contains sufficient information to predict $y$ accurately
  5.2.3 Summary
  
  The recommanded approach to solve machine learning problems is :
  - Start with a simple algorithm, implement it quickly and test validation error. (model training)
  - Plot learning curves to evluate the model and decide if more data, more features etc. are likely to help. (model evaluation)
  - Manually examine the errors on examples in the cross validation set and try to spot a trend where most of errors were made. (error analysis)
  Tricks : Numerical Metric
  
  Get error result as a single, numerical value can help assess algorithm’s performance.
  
  More content abour numerical metric will be talk next time.