多变量的线性回归——Linear Regresssion with Multiple Variables
多变量线性回归——Multivariant Linear Regression
多特征——Multiple Feature
Notation
n n = number of features.
= input of ith i t h training example.
x(i)j x j ( i ) = value of feature j in ith i t h training example.
Hypothesis
Previously:hθ(x)=θ0+θ1x P r e v i o u s l y : h θ ( x ) = θ 0 + θ 1 x
Now:hθ(x)=θ0+θ1x1+θ2x2+...+θnxn N o w : h θ ( x ) = θ 0 + θ 1 x 1 + θ 2 x 2 + . . . + θ n x n为了符号的收敛,定义 x0=1 x 0 = 1 即 (x(i)0=1) ( x 0 ( i ) = 1 )
x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢x0x1x2...xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥∈Rn+1, θ=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢θ0θ1θ2...θn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥ x = [ x 0 x 1 x 2 . . . x n ] ∈ R n + 1 , θ = [ θ 0 θ 1 θ 2 . . . θ n ]
hθ(x)=[θ0θ1θ2...θn]⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢x0x1x2...xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥ h θ ( x ) = [ θ 0 θ 1 θ 2 . . . θ n ] [ x 0 x 1 x 2 . . . x n ]
=θTx = θ T x
so the hypothsis can be writen:
hθ(x)=θ0x0+θ1x1+...+θnxn h θ ( x ) = θ 0 x 0 + θ 1 x 1 + . . . + θ n x n
=θTx = θ T x
Multivariate Linear Regression
多元变量的梯度下降——Gradient Descent for Multiple Variables
寻找参数使得cost Function收敛:
repeat until convergence:{
θ0:=θ0−α1m∑mi=1(hθ(x(i))−y(i))⋅x(i)0 θ 0 := θ 0 − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) ⋅ x 0 ( i )
θ1:=θ0−α1m∑mi=1(hθ(x(i))−y(i))⋅x(i)1 θ 1 := θ 0 − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) ⋅ x 1 ( i )
θ2:=θ0−α1m∑mi=1(hθ(x(i))−y(i))⋅x(i)2 θ 2 := θ 0 − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) ⋅ x 2 ( i )…
}
简单来说:
repeat until convergence:{
θj:=θ0−α1m∑mi=1(hθ(x(i))−y(i))⋅x(i)j forj:=0...n θ j := θ 0 − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) ⋅ x j ( i ) f o r j := 0... n
}
梯度下降实用技巧(特征缩放)——Gradient Descent in Practice (Feature Scaling)
一般情况下,特征值相差不大的情况下,梯度下降会找到最近的路径得到最优值
特征缩放或者均值归一化(Mean Normalization):
xi:=xi−μisi x i := x i − μ i s i
其中 μi μ i 是第i个特征的平均值, si s i 是值域(最大值-最小值)
例如:
如果
xi
x
i
表示房价,房价为100-2000,平均数为1000,则将房价输入重新赋值为:
xi:=price−10001900
x
i
:=
p
r
i
c
e
−
1000
1900
特征下降实用技巧(学习率)——Gradient Descent in Practice(Learning rate)
目的:
Gradient Descent:
θj:=θj−α∂∂θjJ(θ) θ j := θ j − α ∂ ∂ θ j J ( θ )
“Debugging”: How to mark sure gradient descent is working correctly.
How to choose learning rate α α
一般情况下,如果一次迭代的代价函数 J(θ) J ( θ ) 小于 10−3 10 − 3 则为收敛
α
α
的情况:
总结:
- 如果
α
α
太小:很慢的收敛
- 如果
α
α
太大:每个迭代并不减少,并且不收敛