Week 2 : Linear Regression with Multiple Variables and Normal Equation
Linear Regression with Multiple Variables
The Hypothesis Function
Notation:
For convenience of notation, define x0=1
n = number of features
x(i)j = value of feature j in
we can get the vectors
θ
and
X
as:
and:
So we get the
In matlab:
h = X*theta;
but,just talk about hθ(x) ,
X=⎡⎣⎢⎢⎢⎢x0x1⋮xn⎤⎦⎥⎥⎥⎥
so, hθ(x)=θTX
Cost Function
Attention:
J(θ)
is a scalar, just a number.
In matlab, we can use like:
J = 1/(2*m) * sum((X*theta - y).^2);
%.^ means dot product
%sum means sum all elements in matrix
Gradient Descent for Linear Regression
When specifically applied to the case of linear regression, a new form of the gradient descent equation can be derived.
In matlab, we can use like:
theta = theta - (alpha/m * X' * (X*theta - y));
%(X*theta - y) is [m by 1],and X' is [n+1 by m],so X' * (X*theta - y)) is [n+1 by 1]
%because of the matrix multiplication, we need not sum them.
Feature Normalize Feature Scaling and Mean normalization
Idea: Make sure features are on a similar scale.
- Feature Scaling
Get every feature into approximately a −1<xi<1 range. - Mean normalization
Replace xi with xi−μ to make features have approximately zero mean (Do not apply to x0=1 ).
Realize in matlab:
mu = mean(X);
sigma = std(X);
X_norm = (X-repmat(mu,m,1)) ./ repmat(sigma,m,1);
%get more from 'help mean/std'
%repmat likes copy matrix
Learning Rate
Summary:
- If α is too small: slow convergence.
- If α is too large: J(θ) may not decrease on every iteration; may not converge.
- To choose
α
, try
⋯0.010.030.10.313⋯
Features and polynomial regression
For example:
You can let:
or:
Normal Equation
Normal equation: Method to solve for θ analytically.
Now,
J(θ0,θ1,⋯,θm)=12m∑mi=1(hθ(x(i))−y(i))2
We set
∂∂θjJ(θ)=0
(for every
j
)
Solve for
Then, we get this:
Realize in matlab:
theta = pinv(X'*X)*X'*y;
Compare Gradient descent with Normal equation
m
training examples,
Gradient Descent | Normal Equation |
---|---|
Need to choose α | No need to choose |
Needs many iterations | Don’t need to iterate |
Works well even when n is large | Need to compute |
Normal equation and non-invertibility
What if
- We use pinv() function replace inv() , so, it’s don’t matter.
How to do if XTX is non-invertible?
- Redundant features (linearly dependent).
E.g. x1 = size in feet 2 , x2 = size in m 2 - Too many features (e.g.
m≤n
).
Delete some features, or use regularization.