Solving the Problem of Overfitting

Solving the Problem of Overfitting

The problem of overfitting

  • Underfitting, high bias

  • overfitting, high variance:

    If we have too many features, the learned hypothesis may fit the training set very well, but fail to generalize to new examples.

Debugging and Diagnosing

  • address overfitting

    -plotting hypothesis(always unpractical with too many features)

    • Options:

      1. Reduce number of features

        -Manually select which features to keep.

        -Model selection algorithm(模型选择算法)

      2. Regularization

        -Keep all the features, but reduce the magnitude/values of parameters θ j \theta_j θj.

        -Works well when we have a lot of features, each of which contributes a bit to predicting y y y.

Cost function


Small values for parameters θ 0 , θ 1 , . . . , θ n \theta_0,\theta_1,...,\theta_n θ0,θ1,...,θn

​ -“Simpler” hypothesis

​ -Less prone to overfitting

If have overfitting, we can reduce the weight that some of the items in function carry by increasing their cost.


Want to make the following function more quadratic:

θ 0 + θ 1 x + θ 2 x 2 + θ 3 x 3 + θ 4 x 4 \theta_0+\theta_1x+\theta_2x^2+\theta_3x^3+\theta_4x^4 θ0+θ1x+θ2x2+θ3x3+θ4x4

And we want to eliminate the influence of θ 3 x 3 \theta_3x^3 θ3x3 and θ 4 x 4 \theta_4x^4 θ4x4

We can modify cost function:

m i n θ 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 + 1000 ⋅ θ 3 2 + 1000 ⋅ θ 4 2 min_\theta\frac{1}{2m}\sum^{m}_{i=1}{(h_\theta(x^{(i)})-y^{(i)})^2}+1000\cdot\theta^2_3+1000\cdot\theta^2_4 minθ2m1i=1m(hθ(x(i))y(i))2+1000θ32+1000θ42

Added two extra terms at the end to inflate the cost of θ 3 \theta_3 θ3 and θ 4 \theta_4 θ4 without actually getting rid of them.

In order for the cost function get close to zero($\theta_3,\theta_4\approx$0), we will have reduce the value of θ 3 \theta_3 θ3 and θ 4 \theta_4 θ4 to near zero and in return greatly reduce the values of θ 3 x 3 \theta_3x^3 θ3x3 and θ 4 x 4 \theta_4x^4 θ4x4.

We could regularize all of our theta parameters in :

m i n θ 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 + λ ∑ j = 1 n θ j 2 min_\theta\frac{1}{2m}\sum^{m}_{i=1}{(h_\theta(x^{(i)})-y^{(i)})^2}+\lambda\sum^n_{j=1}{\theta_j^2} minθ2m1i=1m(hθ(x(i))y(i))2+λj=1nθj2

λ \lambda λ , the regularization parameter, determines how much the costs of parameters are inflated.

If λ \lambda λ is chosen to be too large, it may smooth out the function too much and cause underfitting.

Regularized Linear Regression

Gradient Descent

θ j : = θ j ( 1 − α λ m ) − α 1 m ∑ i = 1 m ( h θ ( x ( i ) − y ( i ) ) x j ( i ) θ_j:=θ_j(1−α\frac{λ}{m})−α\frac{1}{m}∑_{i=1}^m(h_θ(x^{(i)}−y(i))x_j^{(i)} θj:=θj(1αmλ)αm1i=1m(hθ(x(i)y(i))xj(i)

( 1 − α λ m ) (1−α\frac{λ}{m}) (1αmλ) is always be less than 1.

Normal Equation

If λ ≥ 0 \lambda\geq0 λ0

θ = ( X T X + λ ⋅ L ) − 1 X T y \theta=(X^TX+\lambda\cdot L)^{-1}X^Ty θ=(XTX+λL)1XTy


Dimension (n+1)×(n+1)

(m:#examples; n:#features)

If m < n, then X T X X^TX XTX is non-invertible (singular). However, when we add the term λ⋅L, then X T X + λ ⋅ L X^TX+ λ⋅L XTX+λL becomes invertible.

Regularized Logistic Regression

Cost Function

Cost function for logistic regression:

J ( θ ) = − 1 m ∑ i = 1 m [ y ( i ) l o g ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) l o g ( 1 − h θ ( x ( i ) ) ) ] J{(θ)}=−\frac{1}{m}∑_{i=1}^m[y^{(i)}log(h_θ(x^{(i)}))+(1−y^{(i)}) log(1−h_θ(x^{(i)}))] J(θ)=m1i=1m[y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))]

Regularize this equation by adding a term to the end: J ( θ ) = − 1 m ∑ i = 1 m [ y ( i ) l o g ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) l o g ( 1 − h θ ( x ( i ) ) ) ] + λ 2 m ∑ j = 1 n θ j 2 J{(θ)}=−\frac{1}{m}∑_{i=1}^m[y^{(i)}log(h_θ(x^{(i)}))+(1−y^{(i)}) log(1−h_θ(x^{(i)}))]+\frac{\lambda}{2m}\sum^n_{j=1}{\theta^2_j} J(θ)=m1i=1m[y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))]+2mλj=1nθj2

∑ j = 1 n θ j 2 \sum^n_{j=1}{\theta^2_j} j=1nθj2 means to explicitly exclude the bias term.


∑ j = 1 n θ j 2 \sum^n_{j=1}{\theta^2_j} j=1nθj2 means to explicitly exclude the bias term.

  • 0
  • 0
    觉得还不错? 一键收藏
  • 0


  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


