Linear Regression with One Variable
Cost function
Gradient descent
Linear Regression with multiple variables
Gradient descent for multiple variables
Feature Scaling
Xi = ( Xi - Avg) / (Max - Min)
Learning rate
If a is too small: slow convergence.
If a is too large: J(θ) may not decrease on every iteration; may not converge.
Normal equation
X *θ= Y
XTX*θ= XTY
(XTX)-1(XTX)*θ= (XTX)-1XTY
θ= (XTX)-1XTY
Normal equation and non-invertibility
•Redundant features (linearly dependent).
E.g. X1 = size in feet2
X2 = size in m2
•Too many features (e.g. m<=n ).
Delete some features, or use regularization.