[ML of Andrew Ng]Week 2 : Linear Regression with Multiple Variables and Normal Equation

最新推荐文章于 2021-06-06 10:12:38 发布

大庆csdn

最新推荐文章于 2021-06-06 10:12:38 发布

阅读量373

点赞数 1

分类专栏： meachine learning

本文链接：https://blog.csdn.net/mrliudq/article/details/50829179

版权

meachine learning 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

Week 2 : Linear Regression with Multiple Variables and Normal Equation

Week 2 Linear Regression with Multiple Variables and Normal Equation
- Linear Regression with Multiple Variables
- Normal Equation
  - Compare Gradient descent with Normal equation
  - Normal equation and non-invertibility

Linear Regression with Multiple Variables

The Hypothesis Function

h θ (x) = θ 0 x 0 + θ 1 x 1 + θ 2 x 2 + \dots + θ n x n

$h_\theta (x) = \theta_0 x_0 + \theta_1 x_1 +\theta_2 x_2 +\cdots + \theta_n x_n$
Notation:
For convenience of notation, define

x0=1 $x_0 = 1$

n $n$ = number of features

x(i) $x^{(i)}$ = input (features) of

ith $i^{th}$ training example

x(i)j $x_j^{(i)}$ = value of feature

j $j$ in

ith $i^{th}$ training example

we can get the vectors $\boldsymbol{\theta}$ and $\boldsymbol{X}$ as:

θ = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ θ 0 θ 1 ⋮ θ n ⎤ ⎦ ⎥ ⎥ ⎥ ⎥

$\boldsymbol{\theta} = \begin{bmatrix} \theta_0 \\ \theta_1 \\ \vdots \\ \theta_n \end{bmatrix}$

[(n + 1) \times 1]

$\qquad [(n+1) \times 1]$
and:

X = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ 11 ⋮ 1 x (1) 1 x (2) 1 ⋮ x (m) \dots \dots \dots x (1) n x (2) n ⋮ x (m) n ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

$\boldsymbol{X} = \begin{bmatrix} 1 & x_1^{(1)} & \cdots & x_n^{(1)} \\ 1 & x_1^{(2)} & \cdots & x_n^{(2)} \\ \vdots & \vdots & & \vdots \\ 1 & x^{(m)} & \cdots & x_n^{(m)} \end{bmatrix}$

[m \times (n + 1)]

$\qquad [m \times (n+1)]$
So we get the

H = X θ

$\boldsymbol{H} = \boldsymbol{X}\boldsymbol{\theta}$

[m \times (n + 1)] \times [(n + 1) \times 1] = [m \times 1]

$[m \times (n+1)] \times [(n+1) \times 1] = [m \times 1]$ like as:

H = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ h θ (x (1)) h θ (x (2)) ⋮ h θ (x (m)) ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥

$\boldsymbol{H} = \begin{bmatrix} h_{\theta}(x^{(1)}) \\ h_{\theta}(x^{(2)}) \\ \vdots \\h_{\theta}(x^{(m)}) \end{bmatrix}$

[m \times 1]

$\qquad [m \times 1]$

In matlab:

h = X*theta;

but,just talk about $h_{\theta}(x)$ ,

$X = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ x 0 x 1 ⋮ x n ⎤ ⎦ ⎥ ⎥ ⎥ ⎥$ $\boldsymbol{X} = \begin{bmatrix}x_0 \\ x_1 \\ \vdots \\ x_n \end{bmatrix}$
so, $h_{\theta}(x) = \boldsymbol{\theta}^T \boldsymbol{X}$

Cost Function

J (θ) = 1 2 m \sum i = 1 m (h θ (x (i)) - y (i)) 2

$J(\boldsymbol{\theta}) = \frac{1}{2m} \sum_{i=1}^m (h_ \theta (x^{(i)}) - y^{(i)})^2$

Attention: $J(\boldsymbol{\theta})$ 　is a scalar, just a number.
In matlab, we can use like:

J = 1/(2*m) * sum((X*theta - y).^2);
%.^ means dot product
%sum means sum all elements in matrix

Gradient Descent for Linear Regression

θ = θ - α \partial \partial θ j J (θ)

$\boldsymbol{\theta} = \boldsymbol{\theta} - \alpha \frac{\partial}{\partial \theta_j} J(\boldsymbol{\theta})$

When specifically applied to the case of linear regression, a new form of the gradient descent equation can be derived.

θ = θ - α 1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j

$\boldsymbol{\theta}= \boldsymbol{\theta} - \alpha \frac{1}{m} \sum_{i=1}^m (h_ \theta (x^{(i)}) - y^{(i)}) x_j^{(i)}$
In matlab, we can use like:

theta = theta - (alpha/m * X' * (X*theta - y));
%(X*theta - y) is [m by 1],and X' is [n+1 by m],so X' * (X*theta - y)) is [n+1 by 1]
%because of the matrix multiplication, we  need not sum them.

Feature Normalize Feature Scaling and Mean normalization

Idea: Make sure features are on a similar scale.

Feature Scaling
Get every feature into approximately a $-1 < x_i < 1$ range.
Mean normalization
Replace $x_i$ with $x_i-\mu$ to make features have approximately zero mean (Do not apply to $x_0 = 1$ ).

Realize in matlab:

mu = mean(X);
sigma = std(X);
X_norm = (X-repmat(mu,m,1)) ./ repmat(sigma,m,1);
%get more from 'help mean/std'
%repmat likes copy matrix

Learning Rate

Summary:

If $\alpha$ is too small: slow convergence.
If $\alpha$ is too large: $\boldsymbol{J(\theta)}$ may not decrease on every iteration; may not converge.
To choose $\alpha$ , try $\dots 0.01 0.03 0.1 0.3 13 \dots$ $\cdots 0.01 \quad 0.03 \quad 0.1 \quad 0.3 \quad 1 \quad 3 \cdots$

Features and polynomial regression

For example:

h θ (x) = θ 0 + θ 1 x 1 + θ 2 x 2 + θ 3 x 3

$h_\theta (x) = \theta_0 + \theta_1 x_1 +\theta_2 x_2 + \theta_3 x_3$
You can let:

x 1 = s i z e x 2 = (s i z e) 2 x 3 = (s i z e) 3

$x_1 = size \\ x_2 = (size)^2 \\ x_3 = (size)^3$
or:

x 1 = s i z e x 2 = s i z e - - - \sqrt x 3 = 0

$x_1 = size \\ x_2 = \sqrt{size} \\ x_3 = 0$

Normal Equation

Normal equation: Method to solve for $\boldsymbol{\theta}$ analytically.

Now, $J(\theta_0,\theta_1,\cdots,\theta_m) = \frac{1}{2m} \sum_{i=1}^m (h_ \theta (x^{(i)}) - y^{(i)})^2$
We set $\frac{\partial}{\partial \theta_j} J(\boldsymbol{\theta}) = 0$ (for every $j$ )
Solve for $\theta_0,\theta_1,\cdots,\theta_m$

Then, we get this:

θ = (X T X) - 1 X T y)

$\boldsymbol{\theta} = (\boldsymbol{X^T X})^{-1} \boldsymbol{X^T}y)$

Realize in matlab:

theta = pinv(X'*X)*X'*y;

Compare Gradient descent with Normal equation

$m$ training examples, $n$ features.

Gradient Descent	Normal Equation
Need to choose $\alpha$	No need to choose
Needs many iterations	Don’t need to iterate
Works well even when $n$ is large	Need to compute $\boldsymbol{X^T X}$ , Slow if $n$ is very large

Normal equation and non-invertibility

What if $\boldsymbol{X^T X}$ is non-invertible? (singular/ degenerate)

We use $pinv()$ function replace $inv()$ , so, it’s don’t matter.

How to do if $\boldsymbol{X^T X}$ is non-invertible?

Redundant features (linearly dependent).
E.g. $x_1$ = size in feet $^2$ , $x_2$ = size in m $^2$
Too many features (e.g. $m \le n$ ).
Delete some features, or use regularization.

大庆csdn

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[ML of Andrew Ng]Week 2 : Linear Regression with Multiple Variables and Normal Equation

Week 2 : Linear Regression with Multiple Variables and Normal EquationWeek 2 Linear Regression with Multiple Variables and Normal EquationLinear Regression with Multiple VariablesThe Hypothesis Func
复制链接

扫一扫