[ML of Andrew Ng]Week 2 : Linear Regression with Multiple Variables and Normal Equation

Week 2 : Linear Regression with Multiple Variables and Normal Equation


Linear Regression with Multiple Variables

The Hypothesis Function

hθ(x)=θ0x0+θ1x1+θ2x2++θnxn

Notation:
For convenience of notation, define x0=1
n = number of features
x(i) = input (features) of ith training example
x(i)j = value of feature j in ith training example

we can get the vectors θ and X as:

θ=θ0θ1θn
[(n+1)×1]

and:
X=111x(1)1x(2)1x(m)x(1)nx(2)nx(m)n
[m×(n+1)]

So we get the
H=Xθ
[m×(n+1)]×[(n+1)×1]=[m×1]
like as:
H=hθ(x(1))hθ(x(2))hθ(x(m))
[m×1]

In matlab:

h = X*theta;

but,just talk about hθ(x) ,

X=x0x1xn

so, hθ(x)=θTX

Cost Function

J(θ)=12mi=1m(hθ(x(i))y(i))2

Attention: J(θ)  is a scalar, just a number.
In matlab, we can use like:

J = 1/(2*m) * sum((X*theta - y).^2);
%.^ means dot product
%sum means sum all elements in matrix

Gradient Descent for Linear Regression

θ=θαθjJ(θ)

When specifically applied to the case of linear regression, a new form of the gradient descent equation can be derived.

θ=θα1mi=1m(hθ(x(i))y(i))x(i)j

In matlab, we can use like:

theta = theta - (alpha/m * X' * (X*theta - y));
%(X*theta - y) is [m by 1],and X' is [n+1 by m],so X' * (X*theta - y)) is [n+1 by 1]
%because of the matrix multiplication, we  need not sum them.

Feature Normalize Feature Scaling and Mean normalization

Idea: Make sure features are on a similar scale.

  • Feature Scaling
    Get every feature into approximately a 1<xi<1 range.
  • Mean normalization
    Replace xi with xiμ to make features have approximately zero mean (Do not apply to x0=1 ).

Realize in matlab:

mu = mean(X);
sigma = std(X);
X_norm = (X-repmat(mu,m,1)) ./ repmat(sigma,m,1);
%get more from 'help mean/std'
%repmat likes copy matrix

Learning Rate

Summary:

  • If α is too small: slow convergence.
  • If α is too large: J(θ) may not decrease on every iteration; may not converge.
  • To choose α , try
    0.010.030.10.313

Features and polynomial regression

For example:

hθ(x)=θ0+θ1x1+θ2x2+θ3x3

You can let:
x1=sizex2=(size)2x3=(size)3

or:
x1=sizex2=sizex3=0


Normal Equation

Normal equation: Method to solve for θ analytically.

Now, J(θ0,θ1,,θm)=12mmi=1(hθ(x(i))y(i))2
We set θjJ(θ)=0 (for every j )
Solve for θ0,θ1,,θm

Then, we get this:

θ=(XTX)1XTy)

Realize in matlab:

theta = pinv(X'*X)*X'*y;

Compare Gradient descent with Normal equation

m training examples, n features.

Gradient DescentNormal Equation
Need to choose α No need to choose
Needs many iterationsDon’t need to iterate
Works well even when n is large Need to compute XTX, Slow if n is very large

Normal equation and non-invertibility

What if XTX is non-invertible? (singular/ degenerate)

  • We use pinv() function replace inv() , so, it’s don’t matter.

How to do if XTX is non-invertible?

  • Redundant features (linearly dependent).
    E.g. x1 = size in feet 2 , x2 = size in m 2
  • Too many features (e.g. mn ).
    Delete some features, or use regularization.
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值