机器学习系列之coursera week 2 Linear Regression with Multiple Variables

目录

1. Multiple Features

1.1 Multiple features

1.2 Gradient descent for multiple varibales

1.3 Gradient descent in practice I:Feature scaling

1.4 Gradient descent in practice II: Learning rate

1.5 Summary

1.6 Features and polynomial regression

2. Computing Parameters Analytically

2.1 Normal Equation

2.2 Normal Equation Nonivertibility


1. Multiple Features

1.1 Multiple features

Size

Number of bedrooms

Number of floors

Age of home

Price

2104

5

1

45

460

1416

3

2

40

232

1534

3

2

30

315

852

2

1

36

178

X1

X2

X3

X4

y

Notation:

n = number of features

x^(i) = ith training example

x^(i)_(j) = value of feature j in ith training example

E.g:

x^(2) = [1416; 3; 2; 40]

Hypothesis:

vectorization:

令x0 = 1

这就叫做多元线性回归

 

1.2 Gradient descent for multiple varibales

Hypothesis: 

parameters: 

cost function:

Gradient descent:

simultaneously updata for every j=0,1...n

 

1.3 Gradient descent in practice I:Feature scaling

Feature scaling 特征缩放:

Idea: Make sure features are on a similar scalar

这样梯度下降就能收敛更快

E.g. x1 = size (0~2000)

        x2 = number of bedrooms (1~5)

引自coursera machine learning week 2 Gradient descent in practice I: Feature scaling

feature scaling:

More generally:

frature scaling:

get every feature into approximately a

x0 = 1可以

0 <= x1 <= 3 非常接近 可以

-100 <= x2 <= 100 须scaling

-0.0001 <= x3 <= 0.0001 须scaling

一般地,一个特征在

-3 to 3

- 1/3 to 1/3 都是可以的

另一种缩放叫归一化

 

1.4 Gradient descent in practice II: Learning rate

Gradient descent:

- "Debuggung": How to make sure gradient descent is working correctly

- How to choose learning rate

 

making sure gradient descent is working correctly

plot J(θ) - No. of iterations

引自coursera machine learning week 2 Gradient descent in practice II: Learning rate

J(θ) should decrease after every iteration

还能判断是否收敛

Example automatic converage test:

declare convergence if J(θ) decreases by less than 10^(-3) in one iteration

最好看图判断,因为阈值很难选取

如果收敛图是下面这样:

引自coursera machine learning week 2 Gradient descent in practice II: Learning rate

- For sufficiently small α, J(θ) should decrease on every iteration ------ hold true for linear regression

- But if α is too small, gradient descent can be slow converge

 

1.5 Summary

- If  α is too small: slow convergence

- If  α is too large: J(θ) may not decrease on every iteration; may not converge. (slow converge also possible)

to choose  α, try:

引自coursera machine learning week 2 Gradient descent in practice II: Learning rate

 

1.6 Features and polynomial regression

2. Computing Parameters Analytically

2.1 Normal Equation

A method to solve for θ analytically.(求解析解)

E.g.

J(θ) = aθ^2 + bθ +c, θ belongs to R

J(min) = J(-2a/b)

引自coursera machine learning week 2: Normal Equation

 

Normal Equation:

note: 不需要feature scaling

m training examples, n features

 

Gradient descent

Normal Equation

Need to choose α

No need to choose α

Need many iteration

Don’t need to iteration

works well even when n is large O(n^2)

Need to compute inv(a) O(n^3)

 

slow if n is very large

n > 10000, 开始用梯度下降

 

2.2 Normal Equation Nonivertibility

如果不可逆???

当1. redundant feature(linearly dependent)

E.g.

x1 = size in feet^2

x2 = size in m^2

而1m = 3.28feet

2. too many features(e.g. m<=n)

时会出现不可逆,实际上当且仅当X的行向量线性无关时,才可逆。

解决方法是删除redundant features, 多余features,or use regularization.

到这里, 第二周的内容已经复习完,其中1.6多项式回归笔者以后会通过专题补充。如果想更多的了解正规方程,请看传送门:MIT线性代数

https://www.bilibili.com/video/av6951511/?p=16

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值