Stanford ML - Lecture 6 - Advice for applying machine learning

最新推荐文章于 2020-10-14 09:29:26 发布

Quebradawill

最新推荐文章于 2020-10-14 09:29:26 发布

阅读量1.9k

点赞数

分类专栏： Machine Learning ML-Stanford-Andrew Ng

本文链接：https://blog.csdn.net/qiudw/article/details/8684445

版权

Machine Learning 同时被 2 个专栏收录

19 篇文章 0 订阅

订阅专栏

ML-Stanford-Andrew Ng

12 篇文章 0 订阅

订阅专栏

1. Deciding what to try next

Debugging a learning algorithm
- Suppose you have implemented regularized linear regression to predict housing prices, when you test your hypothesis on a new set of houses, you find that it makes unacceptably large errors in its predictions. What should you try next?
  1. Get more training examples
  2. Try smaller sets of features
  3. Try getting additional features
  4. Try adding polynomial features
  5. Try decreasing $\lambda$
  6. Try increasing $\lambda$

2. Evaluating a hypothesis

separate data sets into training set (70%) andtest set (30%)
Training/Testing procedure for logistic regression
- learn parameter $\theta$ from training data
- compute test set error

$J_{test}(\theta) = - \frac{1}{m_{test}}\sum_{i=1}^{m_{test}} y_{test}^{(i)} \log h_{\theta} (x_{text}^{(i)}) + (1 - y_{test}^{(i)}) \log (1 - h_{\theta} (x_{text}^{(i)}))$

- misclassification error (0/1 misclassification error)

3. Model selection and training/validation/test sets

overfitting example
- the training error is likely to be lower than the actual generalization error
model selection
- select the model that has the lowest test error
training set - 60%
cross validation set (cv) - 20%
test set - 20%
training error

$J_{train}(\theta) = - \frac{1}{2m}\sum_{i=1}^{m} (h_{\theta} (x^{(i)}) - y^{(i)})^2$

cross validation error

$J_{cv}(\theta) = - \frac{1}{2m_{cv}}\sum_{i=1}^{m_{cv}} (h_{\theta} (x_{cv}^{(i)}) - y_{cv}^{(i)})^2$

test error

$J_{test}(\theta) = - \frac{1}{2m_{test}}\sum_{i=1}^{m_{test}} (h_{\theta} (x_{test}^{(i)}) - y_{test}^{(i)})^2$

4. Diagnosing bias vs. variance

bias (underfit)
- $J_{train}(\theta) \ \textrm{will be high}$
- $J_{cv}(\theta) \approx J_{train}(\theta)$
variance (overfit)
- $J_{train}(\theta) \ \textrm{will be low}$
- $J_{cv}(\theta) \gg J_{train}(\theta)$

5. Regularization and bias/variance

choosing the regularization parameter $\lambda$
- $\textrm{try} \ \lambda = 0\rightarrow \min_{\theta} J(\theta) \rightarrow \theta^{(1)} \rightarrow J_{cv} (\theta^{(1)})$
- $\textrm{try} \ \lambda = 0.01 \rightarrow \min_{\theta} J(\theta) \rightarrow \theta^{(2)} \rightarrow J_{cv} (\theta^{(2)})$
- $\textrm{try} \ \lambda = 0.02 \rightarrow \min_{\theta} J(\theta) \rightarrow \theta^{(3)} \rightarrow J_{cv} (\theta^{(3)})$
- ........
- $\textrm{try} \ \lambda = 10 \rightarrow \min_{\theta} J(\theta) \rightarrow \theta^{(12)} \rightarrow J_{cv} (\theta^{(12)})$

6. Learning curves

If a learning algorithm is suffering from high bias, getting more training data will not (by itself) help much.
If a learning algorithm is suffering from high variance, getting more training data is likely to help.

7. Deciding what to try next (revisited)

"small" neural network (fewer parameters, more prone to underfitting)
- computationally cheapter
"large" neural network (more parameters, more prone to overfitting)
- computationally more expensive
- use regularization to address overfitting

the definition:

Variance: measures the extent to which the solutions for individual data sets vary around their average, hence this measures the extent to which the function f(x) is sensitive to theparticular choice of data set.

Bias: represents the extent to which the average prediction over all data sets differs from the desired regression function.

variance：估计本身的方差。

bias：估计的期望和样本数据样本希望得到的回归函数之间的差别。

From : http://blog.csdn.net/abcjennifer/article/details/7797502