Machine Learning Wu Enda3

chapter 27 Multiple features

start to talk about a new version of linear regression,more powerful one that works with multiple variables or with multiple features.

Notation:

n = number of features

x(i) x ( i ) = input (features) of i th training example.

x(i)j x j ( i ) = value of feature j in i th training example.

Hypothesis:

hθ(x)=θ0+θ1x1+θ2x2+...+θnxn h θ ( x ) = θ 0 + θ 1 x 1 + θ 2 x 2 + . . . + θ n x n

for convenience of notation ,define x0 x 0 = 1.

x=x0x1x2xnRn+1         x = [ x 0 x 1 x 2 ⋮ x n ] ∈ R n + 1                

θ=θ0θ1θ2θnRn+1 θ = [ θ 0 θ 1 θ 2 ⋮ θ n ] ∈ R n + 1

so

hθ(x)=θTx h θ ( x ) = θ T x

Multivariate linear regression.

chapter 28 Gradient descent for multiple variables

how to fit the parameters of that hypothesis.how to use gradient descent for linear regression with multiple features

Hypothesis: hθ(x)=θTx=θ0x0+θ1x1+...+θnxn) h θ ( x ) = θ T x = θ 0 x 0 + θ 1 x 1 + . . . + θ n x n )

Parameters: θ θ here is a n+1-dimensional vector.

Cost function:

J(θ)=12mmi=1(hθ(x(i))y(i)) J ( θ ) = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) )

Gradient descent:

​ Repeat{

θj:=θjaθjJ(θ) θ j := θ j − a ∂ ∂ θ j J ( θ )

​ } (simultaneously update for every j=0,...,n j = 0 , . . . , n )

New algorithm(n>=1):

Repeat{

θj:=θja1mmi=1(hθ(x(i))y(i)) θ j := θ j − a 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) )

}

θ0:=θ0a1mmi=1(hθ(x(i))y(i))x(i)0 θ 0 := θ 0 − a 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x 0 ( i )

θ1:=θ1a1mmi=1(hθ(x(i))y(i))x(i)1 θ 1 := θ 1 − a 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x 1 ( i )

chapter 29 Gradient descent in practice 1:feature scaling

practical tricks for making gradient descent work well

Feature Scaling:

Idea: Make sure features are on similar scale.

Get every feature into approximately a 1<=xi<=1 − 1 <= x i <= 1 range.

Mean normalization

Replace xi x i with xiui x i − u i to make features have approximately zero mean (Do not apply to x0=1 x 0 = 1 )

x1x1u1s1 x 1 ← x 1 − u 1 s 1 u1 u 1 is the average value of x1 in the training sets.

s1 s 1 is the range of values of that feature or standard deviation

chapter 30 Gradient descent in practice 2:learning rate

around the learning rate a a

  • “Debugging “:How to make sure gradient descent is working correctly.
  • How to choose learning rate a

    Declare convergence if J(θ) J ( θ ) decreases by less than 103 10 − 3 in one iteration.

    but choose what this threshold is pretty difficult.So,in order to check your gradient descent has converged, actually tend to look at plots.

    • For sufficiently small a a ,J(θ)should decrease on every iteration.
    • But if a a is too small,gradient descent can be slow to converge.
    • if a is too large ; J(θ) J ( θ ) may not decrease on every iteration ;may not converge.

    chapter 31 Features and polynomial regression

    the choice of features that you have and how you can get different learning algorithm

    It is important to apply feature scaling if you’re using gradient descent to get them into comparable ranges of values.

    broad choices in the features you use.

    chapter 32 Normal equation

    which for some linear regression problems ,will give us a much better way to solve for the optimal value of the parameters θ θ

    Normal equation : Method to solve for θ θ analytically.

    正规方程推到过程:https://zhuanlan.zhihu.com/p/22474562

    矩阵求导:https://blog.csdn.net/nomadlx53/article/details/50849941

    Gradient Descent and Normal Equation advantages and disadvantages :

    The normal equation method actually do not work for those more sophisticated learning algorithms.

    chapter 33 Normal equation and non-invertibility (optional)

    what if XTX X T X is non-invertible?

    • Redundant features (linearly dependent)

    e g : x1 x 1 = size in feet 2 2 x2 x 2 = size in m 2 2

    • too many features (e.g m<=n)

    Delete some features , or use regularization.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值