吴恩达·Machine Learning || chap10 Advice for applying machine learning简记

本文探讨了在应用机器学习时遇到的问题,如预测错误和过拟合。提出了通过获取更多训练样本、调整特征集和正则化参数来改善模型性能的策略。此外,介绍了学习曲线和交叉验证作为评估模型偏差和方差的工具,以及如何根据学习曲线选择合适的正则化参数。最后,强调了训练数据量对高偏差和高方差问题的影响。
摘要由CSDN通过智能技术生成

10 Advice for applying machine learning

10-1 Deciding what to try next

Debugging a learning algorithm

Suppose you have implemented regularized linear regression to predict housing prices.

J ( θ ) = 1 2 m [ ∑ i = 1 m ( h 0 ( x ( i ) ) − y ( i ) ) 2 + λ ∑ j = 1 m θ j 2 ] J ( \theta ) = \frac { 1 } { 2 m } [ \sum _ { i = 1 } ^ { m } ( h _ { 0 } ( x ^ { ( i ) } ) - y ^ { ( i ) } ) ^ { 2 } + \lambda\sum _ { j = 1 } ^ { m } \theta _ { j } ^ { 2 } ] J(θ)=2m1[i=1m(h0(x(i))y(i))2+λj=1mθj2]

However, when you test your hypothesis on a new set of houses, you find that it makes unacceptably large errors in its predictions. What should you try next?

  • Get more training examples

  • Try smaller sets if features

  • Try getting additional features

  • Try adding polynomial features

  • Try decreasing λ \lambda λ

  • Try increasing λ \lambda λ

Machine learning diagnostic:

Diagnostic: A test that you can run to gain insight what is/isn’ t working with a learning algorithm, and gain guidance as to how best to improve its performance.

Diagnostics can take time to implement, but doing so can be a very good use of your time.

10-2 Evaluating a hypothesis

Evaluating your hypothesis

Fails to generalize to new examples not in training set

Training/testing procedure for linear regression

  • Learn parameter θ \theta θ from training data(minimizing training error ( J ( θ ) J(\theta) J(θ)))
  • Compute test set error : J t e s t ( θ ) J_{test}(\theta) Jtest(θ)

classification problem:

  • Learn parameter θ \theta θ from training data
  • Compute test set error:

J t e s t ( θ ) = − 1 m t e s t ∑ i = 1 m t e s t y t e s t ( i ) log ⁡ h θ ( x t e s t ( i ) ) + ( 1 − y t e s t ) ( i ) l o g h θ ( x t e s t ( i ) ) J _ {test} ( \theta ) = - \frac { 1 } { m _ { test} } \sum _ { i = 1 } ^ { m _ { t e s t } } y^{(i)}_{test}\log h_{\theta}(x_{test}^{(i)}) + ( 1 - y _ { t e s t }) ^ { ( i ) }logh_{\theta}(x_{test}^{(i)}) Jtest(θ)=mtest1i=1mtestytest(i)loghθ(xtest(i))+(1ytest)(i)loghθ(xtest(i))

  • Miscalssification error (0/1 miscalssification error):

在这里插入图片描述

10-3 Model selection and training/validation/test sets

Overfitting example

Once parameters θ 0 , θ 1 , ⋯   , θ n \theta_0,\theta_1,\cdots,\theta_n θ0,θ1,,θn were fit to some set of data (training set), the error of the parameters as measured on that data(the training error J ( θ ) J(\theta) J(θ) is likely to be lower than the actual generalization error.

Model selection

parameter d=degree of polynomial

How well does the model generalize? Report test set error J t e s t ( θ ( d ) ) J_{test}(\theta^{(d)}) Jtest(θ(d))

Problem: J t e s t ( θ ( d ) ) J_{test}(\theta^{(d)}) Jtest(θ(d)) is likely to be an optimistic estimate of generalization error l.e. our extra parameter(d = degree of polynomial)is fit to test set.

Evaluating your hypothesis

Training set 60%

cross validation 20%

test set 20%

Training/validation/test error

Training error:

J t r a i n ( θ ) = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 J _ { t r a i n } ( \theta ) = \frac { 1 } { 2 m } \sum _ { i = 1 } ^ { m } ( h _ { \theta } ( x ^ { ( i ) } ) - y ^ { ( i ) } ) ^ { 2 } Jtrain(θ)=2m1i=1m(hθ(x(i))y(i))2

Cross validation error:

J c v ( θ ) = 1 2 m c v ∑ i = 1 m v ( h θ ( x c v ( i ) ) − y c v ( i ) ) 2 J _ { c v } ( \theta ) = \frac { 1 } { 2 m _ { c v } } \sum _ { i = 1 } ^ { m _ { v } } ( h _ { \theta } ( x _ { c v } ^ { ( i )} ) - y _ { c v } ^ { ( i ) } ) ^ { 2 } Jcv(θ)=2mcv1i=1mv(hθ(xcv(i))ycv(i))2

Test error:

J t e s t ( θ ) = 1 2 m t e s t ∑ i = 1 m t e s t ( h θ ( x t e s t ( i ) ) − y t e s t ( i ) ) 2 J _ { t e s t } ( \theta ) = \frac { 1 } { 2 m_{ t e s t} } \sum _ { i = 1 } ^ { m _ { t e s t } } ( h _ { \theta } ( x _ { t e s t }^{(i)} ) - y _ { t e s t } ^ { ( i ) } ) ^ { 2 } Jtest(θ)=2mtest1i=1mtest(hθ(xtest(i))ytest(i))2

Estimate generalization error for test set J t e s t ( θ ( 4 ) ) J_{test}(\theta^{(4)}) Jtest(θ(4))

10-4 Diagnosing bias vs. variance

Bias/variance 偏差/方差

Training error:

Crossing validation error:

Diagnosing bias vs variance
Suppose your learning algorithm is performing less well than you were hoping. ( J c v ( θ )    o r    J t e s t ( θ )    i s    h i g h ) (J_{cv}(\theta)\;or\;J_{test}(\theta) \; is \;high) (Jcv(θ)orJtest(θ)ishigh) Is it a bias problem or a variance problem?

Bias (underfit): J t r a i n ( θ ) J_{train}(\theta) Jtrain(θ) will be high; J c v ( θ ) ≈ J t r a i n ( θ ) J_{cv}(\theta)\approx J_{train}(\theta) Jcv(θ)Jtrain(θ)

Variance(Overfit): J t r a i n ( θ ) J_{train}(\theta) Jtrain(θ) will be low; J c v ( θ ) > > J t r a i n ( θ ) J_{cv}(\theta)>> J_{train}(\theta) Jcv(θ)>>Jtrain(θ)

10-5 Regularization and bias/variance

Linear regression with regularization

Model:

h θ ( x ) = θ 0 + θ 1 x + θ 2 x 2 + θ 3 x 3 + θ 4 x 4 h _ { \theta } ( x ) = \theta _ { 0 } + \theta _ { 1 } x + \theta _ { 2 } x ^ { 2 } + \theta _ { 3 } x ^ { 3 } + \theta _ { 4 } x ^ { 4 } hθ(x)=θ0+θ1x+θ2x2+θ3x3+θ4x4

J ( θ ) = 1 2 m [ ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 + λ ∑ j = 1 n θ j 2 ] J ( \theta ) = \frac { 1 } { 2 m } [ \sum _ { i = 1 } ^ { m } ( h _ { \theta } ( x ^ { ( i ) } ) - y ^ { ( i ) } ) ^ { 2 } + \lambda \sum _ { j = 1 } ^ { n } \theta _ { j } ^ { 2 } ] J(θ)=2m1[i=1m(hθ(x(i))y(i))2+λj=1nθj2]

High bias——Large λ \lambda λ

Just right——intermediate λ \lambda λ

High variance——small λ \lambda λ

Choosing the regularization parameter λ \lambda λ

​ the smallest J c v ( θ ( i ) ) J_{cv}(\theta^{(i)}) Jcv(θ(i)) error

Bias/variance as a function of the regularization parameter λ \lambda λ

10-6 Learning curves

Learning curves

J t r a i n ( θ ) J_{train}(\theta) Jtrain(θ)

J c v ( θ ) J_{cv}(\theta) Jcv(θ)

High bias: more and more close

If a learning algorithm is suffering from high bias, getting more training data will not (by itself) help much

High variance: large gap

If a learning algorithm is suffering from high variance, getting more training data is likely to help

10-7 Deciding what to try next (revisited)

Debugging a learning algorithm:

  • Get more training examples ⟶ \longrightarrow fixed high variance
  • Try smaller sets if features ⟶ \longrightarrow fixed high variance
  • Try getting additional features ⟶ \longrightarrow fixed high bias
  • Try adding polynomial features ⟶ \longrightarrow ⟶ \longrightarrow fixed high bias
  • Try decreasing λ \lambda λ ⟶ \longrightarrow fixed high bias
  • Try increasing λ \lambda λ ⟶ \longrightarrow fixed high variance

Neural networks and overfitting
在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值