吴恩达机器学习 Chapter5 多元线性回归（Linear Regression with multiple variables）

最新推荐文章于 2022-04-25 22:01:16 发布

菲啊菲啊菲

最新推荐文章于 2022-04-25 22:01:16 发布

阅读量231

点赞数

分类专栏：线性回归梯度下降机器学习笔记文章标签：线性回归机器学习梯度下降笔记

本文链接：https://blog.csdn.net/weixin_38768647/article/details/88383297

版权

笔记同时被 3 个专栏收录

9 篇文章 0 订阅

订阅专栏

机器学习

7 篇文章 0 订阅

订阅专栏

梯度下降

3 篇文章 0 订阅

订阅专栏

吴恩达机器学习 Chapter5 多元线性回归（Linear Regression with multiple variables）

Multiple variables => Multiple features
Gradient Descent Practice
Another solution: Normal equation

Multiple variables => Multiple features

hypothesis

在这里插入图片描述

Vectorize

$h_\theta(x) = \theta^Tx$
$x\in R^{n+1}, \theta \in R^{n+1}$

Cost function

Cost function remains the same. The parameters are vectorized.
在这里插入图片描述

New algorithm

在这里插入图片描述
NOTE: Still, the parameters should be updated simultaneously.

Gradient Descent Practice

Feature Scaling

Idea

Get every feature into approximately a $-1\leq x \leq 1$ range. To make sure that the function will converge quickly.

Mean normalization

$x_i => \frac{x - \mu_i}{s_i}$
$\mu_i$ : the mean of the feature $i$
$s_i$ : the range of the feature $i$ ( $m a x - m i n$ ) or standard deviation

NOTE: Do not apply to $x_0=1$ !

Learning Rate

Debugging

To make sure gradient descent is working correctly.
Ploting $J(\theta)$ :

To debugging: $J(\theta)$ should decrease after every iteration.
If gradient descent not work like this:

Try to use smaller $\alpha$
To judge whether convergengce: Declare convergence when $J(\theta)$ decrease by less than $10^{-3}$ in one iteration.

Summary: Not too big nor too small

If too small: Slow to converge
If too big: $J(\theta)$ may not decrease every iteration; may not converge.
To try $\alpha$ : try …0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, …

Another solution: Normal equation

A method to solve for $\theta$ analytically.
Not work for complex algorithm.

Intuition

To get the minimum, set $\frac{\partial J(\theta)}{\partial \theta_j} = 0$ for every $j$
$\theta = (X^TX)^{-1}X^Ty$
Feature scaling is unnecessary.

Advantage and Disadvantage (Compared to GD)

m training examples, n features

Gradient Descent	Normal Equation
Need to choose $\alpha$	No need to choose $\alpha$
Needs many iterations	One time. No iteration.
Works well even when n is large	Need to compute $X^TX)^{-1}$ $O(n^3)$ slow if n is large
If $n > 10, 000$	If $n < 10, 000$

Normal equation and non-invertibility

When $X^TX$ is non-invertible?

Redundant features(linearly dependent)
Too many features( $\leq n$ )
=> Delete some features; or use regularization.

菲啊菲啊菲

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
吴恩达机器学习 Chapter5 多元线性回归（Linear Regression with multiple variables）

吴恩达机器学习 Chapter5 多元线性回归（Linear Regression with multiple variables）Multiple variables =&gt; Multiple featureshypothesisVectorizeCost functionNew algorithmGradient Descent PracticeFeature ScalingIdeaMea...
复制链接

扫一扫