Coursera ML笔记6
标签(空格分隔): 机器学习
Evaluating a Hypothesis
Once we have done some trouble shooting for errors in our predictions by:
- Getting more training examples
- Trying smaller sets of features
- Trying additional features
- Trying polynomial features( x12x22x1x2 )
Increasing or decreasing λ
- Learn Θ and minimize Jtrain(Θ) using the training set
- Compute the test set error Jtest(Θ)
The test set error
To evaluate a hypothesis, given a dataset of training examples, we can split up the data into two sets: a training set and a test set. Typically, the training set consists of 70 % of your data and the test set is the remaining 30 %.
The new procedure using these two sets is then:
1. For linear regression:
Jtest(Θ)=12mtest∑mtesti=1(hΘ(x(i)test)−y(i)test)2
2. For classification ~ Misclassification error (aka 0/1 misclassification error):
err(hΘ(x),y)=10if hΘ(x)≥0.5 and y=0 or hΘ(x)<0.5 and y=1otherwise
Test Error=1mtest∑mtesti=1err(hΘ(x(i)test),y(i)test)
Model Selection and Train/Validation/Test Sets
One way to break down our dataset into the three sets is:
- Training set: 60%
- Cross validation set: 20%
- Test set: 20%
We can now calculate three separate error values for the three different sets using the following method:
- Optimize the parameters in Θ using the training set for each polynomial degree.
- Find the polynomial degree d with the least error using the cross validation set.
- Estimate the generalization error using the test set with Jtest(Θ(d)), (d = theta from polynomial with lower error);
Diagnosing Bias vs. Variance
High bias (underfitting): both
Jtrain(Θ)
and
JCV(Θ)
will be high. Also,
Jtrain(Θ)≈JCV(Θ)
.
High variance (overfitting):
Jtrain(Θ)
will be low and
JCV(Θ)
will be much greater than
Jtrain(Θ)
.
Regularization and Bias/Variance
Learning Curves
If a learning algorithm is suffering from high bias, getting more training data will not (by itself) help much.
If a learning algorithm is suffering from high variance, getting more training data is likely to help.
Deciding What to Do Next Revisited
- Getting more training examples: Fixes high variance
- Trying smaller sets of features: Fixes high variance
- Adding features: Fixes high bias
- Adding polynomial features: Fixes high bias
- Decreasing λ: Fixes high bias
- Increasing λ: Fixes high variance.
Prioritizing What to Work On
- Collect lots of data (for example “honeypot” project but doesn’t always work)
- Develop sophisticated features (for example: using email header data in spam emails)
- Develop algorithms to process your input in different ways (recognizing misspellings in spam).
It is difficult to tell which of the options will be most helpful.
Error Analysis
The recommended approach to solving machine learning problems is to:
- Start with a simple algorithm, implement it quickly, and test it early on your cross validation data.
- Plot learning curves to decide if more data, more features, etc. are likely to help.
- Manually examine the errors on examples in the cross validation set and try to spot a trend where most of the errors were made.
Data For Machine Learning
It’s not who has the best algorithem that wins. It’s who has the most data.
Training on a lot of data is likely to give good performance when two of the following conditions hold true.
- feature x∈Rn+1 has sufficient information to predict y accurately.
- use a learning algorithem with many parameters