05.08.2021
Part 2
Regression
(2) Multiple Linear Regression
There are multiple independent variables x0, x1, x2...can influence dependent variable y
The data set does not necessarily have some perfect linear correlations.
Assumptions of a Linear Regression:
1. Linearity; 2. Homoscedasticity; 3. Multivariate normality;
4. Independence of errors; 5. Lack of multicollinearity
Dummy variables
When there's categorical variable, we can create new columns for each category, place 1 on the row where this category is and 0 on the other rows. Those are called Dummy Variables, but we will only keep one of them, even though we won't lose information. Here we keep NY, the coefficient b4 will work like a light switch, altering from NY to CA by setting it as 1 and 0.
Dummy Variables Trap - you can't have the constant b0 and all dummy variables b4 and b5 at the same time. b5 = b4 - 1. So for example, if you have 9 dummy variables, then you only include 8. Never include all of the dummy variables.
Understanding the P-value and Statistical Level (SL)