*The observations are identically distributed means that we are just making random observations without any bias.
* When we have different set of observations, we will get different value of point estimates of coefficients.
For each case, the red star is the true value of beta_0 and beta_1. Every obsevation we make will end up with one green dot.
If we already know the values of betas for the true model, we can then quantify what is the bias and variance of the model we have with respect to true value. This helps us to know what kind of algorithm will give us a high bias or high variance which we don't want.
For example, if we're going to fit the observation with a 10th order polynomial, our model will change drastically. We could imagine with different set of observations, our coefficients will vary so fast with high order polynomial. So we would see the algorithm with high order polynomial fitting is really sensitive to the data we have, then it will give us a high variance.
And for a contrary case, if we just fit our model with a horizontal line. No matter what observation we fit in, it will always predicts a constant, so we will have no variance but an obvious bias, the value of bias will just depends on the value of the constant this line represents.