吴恩达机器学习 Chapter5 多元线性回归(Linear Regression with multiple variables)

Multiple variables => Multiple features

hypothesis

在这里插入图片描述

Vectorize

h θ ( x ) = θ T x h_\theta(x) = \theta^Tx hθ(x)=θTx
x ∈ R n + 1 , θ ∈ R n + 1 x\in R^{n+1}, \theta \in R^{n+1} xRn+1,θRn+1

Cost function

Cost function remains the same. The parameters are vectorized.
在这里插入图片描述

New algorithm

在这里插入图片描述
NOTE: Still, the parameters should be updated simultaneously.

Gradient Descent Practice

Feature Scaling

Idea

Get every feature into approximately a − 1 ≤ x ≤ 1 -1\leq x \leq 1 1x1 range. To make sure that the function will converge quickly.

Mean normalization

x i = > x − μ i s i x_i => \frac{x - \mu_i}{s_i} xi=>sixμi
μ i \mu_i μi: the mean of the feature i i i
s i s_i si : the range of the feature i i i ( m a x − m i n max - min maxmin) or standard deviation

NOTE: Do not apply to x 0 = 1 x_0=1 x0=1!

Learning Rate

Debugging

To make sure gradient descent is working correctly.
Ploting J ( θ ) J(\theta) J(θ):

  • To debugging: J ( θ ) J(\theta) J(θ) should decrease after every iteration.
    If gradient descent not work like this:
    在这里插入图片描述
    Try to use smaller α \alpha α
  • To judge whether convergengce: Declare convergence when J ( θ ) J(\theta) J(θ) decrease by less than 1 0 − 3 10^{-3} 103 in one iteration.

Summary: Not too big nor too small

  • If too small: Slow to converge
  • If too big: J ( θ ) J(\theta) J(θ) may not decrease every iteration; may not converge.
  • To try α \alpha α: try …0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, …

Another solution: Normal equation

A method to solve for θ \theta θ analytically.
Not work for complex algorithm.

Intuition

To get the minimum, set ∂ J ( θ ) ∂ θ j = 0 \frac{\partial J(\theta)}{\partial \theta_j} = 0 θjJ(θ)=0 for every j j j
θ = ( X T X ) − 1 X T y \theta = (X^TX)^{-1}X^Ty θ=(XTX)1XTy
Feature scaling is unnecessary.

Advantage and Disadvantage (Compared to GD)

m training examples, n features

Gradient DescentNormal Equation
Need to choose α \alpha αNo need to choose α \alpha α
Needs many iterationsOne time. No iteration.
Works well even when n is largeNeed to compute ( X T X ) − 1 (X^TX)^{-1} (XTX)1 O ( n 3 ) O(n^3) O(n3) slow if n is large
If n &gt; 10 , 000 n &gt; 10,000 n>10,000If n &lt; 10 , 000 n&lt;10,000 n<10,000

Normal equation and non-invertibility

When X T X X^TX XTX is non-invertible?

  • Redundant features(linearly dependent)
  • Too many features( m ≤ n m \leq n mn)
    => Delete some features; or use regularization.
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值