1. Training Error
Define a loss funtion like below:
and the train error is defined as theaverage loss on houses in training set:
and RMSE is simply the square root of the average loss:
The traning error decreases with the increase of model complexity.
The training error is overly optimistic.Because the weights war trained
to fit the training data, therefore,it is not a good measure of predictive performance.
2. Generalization error
Suppose that we can enum all the possible pair of square footage and the house price in a
distribution and the generalization error is averaged value over all pairs weighted by how likely
they are in the distribution.
With the increase of the model complexity, the error firstlky goes down, then goes up.
And we can not compute the generalization error.
3. Three errors
Noise:
it is inherently in the data.
Bias:
Over all possible N training set, and the bias is the difference between the average fit and the true
relationship,
For low complexity model, it has a high bias and it is not flexible enought to represent the true relationship
for high complexity model, the average fit is closer to the true relationship
variance:
for high complexity model, the difference between different fits is larger.
tradeoff:
MSE=bias^bias + variance(we cannot compute bias and variance, because it is define using the true function)
and the goal is to find the minimum point in the MSE curve
4. Amount of data
If the model complexity is fixed, the true error decease with the increase of data points, and it will flaten out to
bias + noise, bacause our model may not be flexible enought to capture the true relationship between x and y.
And the training error increase with the increase of data points and will flaten out to nearly the same point as the true error.
5. Validation set
In order to tune the model complexity, the validation set is needed. If we only use the test set, then the model complexity
was selected to minimize the test error, it is over optimistic. So we need train set,validation set and test set.
Validation set is used to choose the model complexity.
Test set is used to approximate the generalization error.T