Linear Regression
1. Problem Understanding:
What type of decision do we need to make?
The answer of this quesion has to be an yes/no answer.
What type of information we need to make the decision?
All the information we need to support the decision we need. Go specific.
What kind of analysis we need to make?
Go back to Predictive Analytics for Business 2 to see the category of analysis method.
2.
3. The calculation of Correlation and R-square indicates whether the data fits the line.
4. Using google sheet or excel by using: slope(X, Y) and intercept (X, Y) to calculate the best fit learn regression line for the data set.
5. Using CORREL(X,Y) to calculate the correlation between the X and Y. Using RSQ to calculate square-R value of the line.
6. Use Data Analysis function in the excel to generate the model for the dataset for mutiple variable regression.
7. The R-squared value will increase while there are more variables. So here we are using adjusted R-squared value to replace the original one.
8. Transforming categorical variables: Dummy variables:
Expenditures = β 0 + β1 Avg_Income + β2 Pct_Under_18 + β3 midwest + β4 southeast + β5 west
The value of region can only be 0 or one.
9. Understanding the equation:
Expenditures = -530 + 0.073 Avg_Income + 1406.36 Pct_Under_18 + 6.53 region
0.073 means for Average income, 1 dollar more will cause 1 more dollar for expenditures.
1406.36 means for every one more percent students whos is under 18. there are 14 dollar more in the expenditures.
10. You always create one less dummy variable than the number of categories to make sure that one category is represented by zero values for the dummy variables.