机器学习系列之coursera week 2 Linear Regression with Multiple Variables

最新推荐文章于 2020-08-03 17:06:59 发布

爱战术的码农新人

最新推荐文章于 2020-08-03 17:06:59 发布

阅读量204

点赞数

本文链接：https://blog.csdn.net/zyh3826/article/details/81361365

版权

1. Multiple Features

1.1 Multiple features

Size	Number of bedrooms	Number of floors	Age of home	Price
2104	5	1	45	460
1416	3	2	40	232
1534	3	2	30	315
852	2	1	36	178
X1	X2	X3	X4	y

Notation:

n = number of features

x^(i) = ith training example

x^(i)_(j) = value of feature j in ith training example

E.g:

x^(2) = [1416; 3; 2; 40]

Hypothesis:

vectorization:

令x0 = 1

这就叫做多元线性回归

1.2 Gradient descent for multiple varibales

Hypothesis:

parameters:

cost function:

Gradient descent:

simultaneously updata for every j=0,1...n

即

1.3 Gradient descent in practice I：Feature scaling

Feature scaling 特征缩放：

Idea: Make sure features are on a similar scalar

这样梯度下降就能收敛更快

E.g. x1 = size (0~2000)

x2 = number of bedrooms (1~5)

引自coursera machine learning week 2 Gradient descent in practice I: Feature scaling

feature scaling:

More generally:

frature scaling:

get every feature into approximately a

x0 = 1可以

0 <= x1 <= 3 非常接近可以

-100 <= x2 <= 100 须scaling

-0.0001 <= x3 <= 0.0001 须scaling

一般地，一个特征在

-3 to 3

- 1/3 to 1/3 都是可以的

另一种缩放叫归一化

1.4 Gradient descent in practice II: Learning rate

Gradient descent:

- "Debuggung": How to make sure gradient descent is working correctly

- How to choose learning rate

making sure gradient descent is working correctly

plot J(θ) - No. of iterations

引自coursera machine learning week 2 Gradient descent in practice II: Learning rate

J(θ) should decrease after every iteration

还能判断是否收敛

Example automatic converage test:

declare convergence if J(θ) decreases by less than 10^(-3) in one iteration

最好看图判断，因为阈值很难选取

如果收敛图是下面这样：

引自coursera machine learning week 2 Gradient descent in practice II: Learning rate

- For sufficiently small α, J(θ) should decrease on every iteration ------ hold true for linear regression

- But if α is too small, gradient descent can be slow converge

1.5 Summary

- If α is too small: slow convergence

- If α is too large: J(θ) may not decrease on every iteration; may not converge. (slow converge also possible)

to choose α, try:

引自coursera machine learning week 2 Gradient descent in practice II: Learning rate

1.6 Features and polynomial regression

2. Computing Parameters Analytically

2.1 Normal Equation

A method to solve for θ analytically.(求解析解)

E.g.

J(θ) = aθ^2 + bθ +c, θ belongs to R

J(min) = J(-2a/b)

引自coursera machine learning week 2: Normal Equation

Normal Equation:

note: 不需要feature scaling

m training examples, n features

Gradient descent	Normal Equation
Need to choose α	No need to choose α
Need many iteration	Don’t need to iteration
works well even when n is large O(n^2)	Need to compute inv(a) O(n^3)
	slow if n is very large

n > 10000, 开始用梯度下降

2.2 Normal Equation Nonivertibility

如果不可逆？？？

当1. redundant feature(linearly dependent)

E.g.

x1 = size in feet^2

x2 = size in m^2

而1m = 3.28feet

2. too many features(e.g. m<=n)

时会出现不可逆，实际上当且仅当X的行向量线性无关时，才可逆。

解决方法是删除redundant features，多余features，or use regularization.

到这里，第二周的内容已经复习完，其中1.6多项式回归笔者以后会通过专题补充。如果想更多的了解正规方程，请看传送门：MIT线性代数

https://www.bilibili.com/video/av6951511/?p=16

爱战术的码农新人

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习系列之coursera week 2 Linear Regression with Multiple Variables

目录1. Multiple Features1.1 Multiple features1.2 Gradient descent for multiple varibales1.3 Gradient descent in practice I：Feature scaling1.4 Gradient descent in practice II: Learning rate1....
复制链接

扫一扫