Solving the Problem of Overfitting

Solving the Problem of Overfitting

The problem of overfitting

  • Underfitting, high bias

  • overfitting, high variance:

    If we have too many features, the learned hypothesis may fit the training set very well, but fail to generalize to new examples.

Debugging and Diagnosing

  • address overfitting

    -plotting hypothesis(always unpractical with too many features)

    • Options:

      1. Reduce number of features

        -Manually select which features to keep.

        -Model selection algorithm(模型选择算法)

      2. Regularization

        -Keep all the features, but reduce the magnitude/values of parameters θ j \theta_j θj.

        -Works well when we have a lot of features, each of which contributes a bit to predicting y y y.

Cost function

Regularization

Small values for parameters θ 0 , θ 1 , . . . , θ n \theta_0,\theta_1,...,\theta_n θ0,θ1,...,θn

​ -“Simpler” hypothesis

​ -Less prone to overfitting

If have overfitting, we can reduce the weight that some of the items in function carry by increasing their cost.

e.g.

Want to make the following function more quadratic:

θ 0 + θ 1 x + θ 2 x 2 + θ 3 x 3 + θ 4 x 4 \theta_0+\theta_1x+\theta_2x^2+\theta_3x^3+\theta_4x^4 θ0+θ1x+θ2x2+θ3x3+θ4x4

And we want to eliminate the influence of θ 3 x 3 \theta_3x^3 θ3x3 and θ 4 x 4 \theta_4x^4 θ4x4

We can modify cost function:

m i n θ 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 + 1000 ⋅ θ 3 2 + 1000 ⋅ θ 4 2 min_\theta\frac{1}{2m}\sum^{m}_{i=1}{(h_\theta(x^{(i)})-y^{(i)})^2}+1000\cdot\theta^2_3+1000\cdot\theta^2_4 minθ2m1i=1m(hθ(x(i))y(i))2+1000θ32+1000θ42

Added two extra terms at the end to inflate the cost of θ 3 \theta_3 θ3 and θ 4 \theta_4 θ4 without actually getting rid of them.

In order for the cost function get close to zero($\theta_3,\theta_4\approx$0), we will have reduce the value of θ 3 \theta_3 θ3 and θ 4 \theta_4 θ4 to near zero and in return greatly reduce the values of θ 3 x 3 \theta_3x^3 θ3x3 and θ 4 x 4 \theta_4x^4 θ4x4.


We could regularize all of our theta parameters in :

m i n θ 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 + λ ∑ j = 1 n θ j 2 min_\theta\frac{1}{2m}\sum^{m}_{i=1}{(h_\theta(x^{(i)})-y^{(i)})^2}+\lambda\sum^n_{j=1}{\theta_j^2} minθ2m1i=1m(hθ(x(i))y(i))2+λj=1nθj2


λ \lambda λ , the regularization parameter, determines how much the costs of parameters are inflated.

If λ \lambda λ is chosen to be too large, it may smooth out the function too much and cause underfitting.

Regularized Linear Regression

Gradient Descent

θ j : = θ j ( 1 − α λ m ) − α 1 m ∑ i = 1 m ( h θ ( x ( i ) − y ( i ) ) x j ( i ) θ_j:=θ_j(1−α\frac{λ}{m})−α\frac{1}{m}∑_{i=1}^m(h_θ(x^{(i)}−y(i))x_j^{(i)} θj:=θj(1αmλ)αm1i=1m(hθ(x(i)y(i))xj(i)

( 1 − α λ m ) (1−α\frac{λ}{m}) (1αmλ) is always be less than 1.

Normal Equation

If λ ≥ 0 \lambda\geq0 λ0

θ = ( X T X + λ ⋅ L ) − 1 X T y \theta=(X^TX+\lambda\cdot L)^{-1}X^Ty θ=(XTX+λL)1XTy

在这里插入图片描述

Dimension (n+1)×(n+1)

(m:#examples; n:#features)

If m < n, then X T X X^TX XTX is non-invertible (singular). However, when we add the term λ⋅L, then X T X + λ ⋅ L X^TX+ λ⋅L XTX+λL becomes invertible.

Regularized Logistic Regression

Cost Function

Cost function for logistic regression:

J ( θ ) = − 1 m ∑ i = 1 m [ y ( i ) l o g ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) l o g ( 1 − h θ ( x ( i ) ) ) ] J{(θ)}=−\frac{1}{m}∑_{i=1}^m[y^{(i)}log(h_θ(x^{(i)}))+(1−y^{(i)}) log(1−h_θ(x^{(i)}))] J(θ)=m1i=1m[y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))]

Regularize this equation by adding a term to the end: J ( θ ) = − 1 m ∑ i = 1 m [ y ( i ) l o g ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) l o g ( 1 − h θ ( x ( i ) ) ) ] + λ 2 m ∑ j = 1 n θ j 2 J{(θ)}=−\frac{1}{m}∑_{i=1}^m[y^{(i)}log(h_θ(x^{(i)}))+(1−y^{(i)}) log(1−h_θ(x^{(i)}))]+\frac{\lambda}{2m}\sum^n_{j=1}{\theta^2_j} J(θ)=m1i=1m[y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))]+2mλj=1nθj2

∑ j = 1 n θ j 2 \sum^n_{j=1}{\theta^2_j} j=1nθj2 means to explicitly exclude the bias term.

da}{2m}\sumn_{j=1}{\theta2_j}$

∑ j = 1 n θ j 2 \sum^n_{j=1}{\theta^2_j} j=1nθj2 means to explicitly exclude the bias term.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值