Solving the Problem of Overfitting

最新推荐文章于 2024-07-08 22:12:26 发布

NZOGGY_

最新推荐文章于 2024-07-08 22:12:26 发布

阅读量1k

点赞数

文章标签：机器学习算法

本文链接：https://blog.csdn.net/NZOGGY_/article/details/122790734

版权

Solving the Problem of Overfitting

The problem of overfitting

Underfitting, high bias
overfitting, high variance:

If we have too many features, the learned hypothesis may fit the training set very well, but fail to generalize to new examples.

Debugging and Diagnosing

address overfitting

-plotting hypothesis（always unpractical with too many features)
- Options:
  1. Reduce number of features
    
    -Manually select which features to keep.
    
    -Model selection algorithm(模型选择算法)
  2. Regularization
    
    -Keep all the features, but reduce the magnitude/values of parameters $\theta_j$ .
    
    -Works well when we have a lot of features, each of which contributes a bit to predicting $y$ .

Cost function

Regularization

Small values for parameters $\theta_0,\theta_1,...,\theta_n$

-“Simpler” hypothesis

-Less prone to overfitting

If have overfitting, we can reduce the weight that some of the items in function carry by increasing their cost.

e.g.

Want to make the following function more quadratic:

$\theta_0+\theta_1x+\theta_2x^2+\theta_3x^3+\theta_4x^4$

And we want to eliminate the influence of $\theta_3x^3$ and $\theta_4x^4$

We can modify cost function:

$min_\theta\frac{1}{2m}\sum^{m}_{i=1}{(h_\theta(x^{(i)})-y^{(i)})^2}+1000\cdot\theta^2_3+1000\cdot\theta^2_4$

Added two extra terms at the end to inflate the cost of $\theta_3$ and $\theta_4$ without actually getting rid of them.

In order for the cost function get close to zero($\theta_3,\theta_4\approx$0), we will have reduce the value of $\theta_3$ and $\theta_4$ to near zero and in return greatly reduce the values of $\theta_3x^3$ and $\theta_4x^4$ .

We could regularize all of our theta parameters in :

$min_\theta\frac{1}{2m}\sum^{m}_{i=1}{(h_\theta(x^{(i)})-y^{(i)})^2}+\lambda\sum^n_{j=1}{\theta_j^2}$

$\lambda$ , the regularization parameter, determines how much the costs of parameters are inflated.

If $\lambda$ is chosen to be too large, it may smooth out the function too much and cause underfitting.

Regularized Linear Regression

Gradient Descent

$θ_j:=θ_j(1−α\frac{λ}{m})−α\frac{1}{m}∑_{i=1}^m(h_θ(x^{(i)}−y(i))x_j^{(i)}$

$(1−α\frac{λ}{m})$ is always be less than 1.

Normal Equation

If $\lambda\geq0$

$\theta=(X^TX+\lambda\cdot L)^{-1}X^Ty$

在这里插入图片描述

Dimension (n+1)×(n+1)

(m:#examples; n:#features)

If m < n, then $X^TX$ is non-invertible (singular). However, when we add the term λ⋅L, then $X^TX+ λ⋅L$ becomes invertible.

Regularized Logistic Regression

Cost Function

Cost function for logistic regression:

$J{(θ)}=−\frac{1}{m}∑_{i=1}^m[y^{(i)}log(h_θ(x^{(i)}))+(1−y^{(i)}) log(1−h_θ(x^{(i)}))]$

Regularize this equation by adding a term to the end: $J{(θ)}=−\frac{1}{m}∑_{i=1}^m[y^{(i)}log(h_θ(x^{(i)}))+(1−y^{(i)}) log(1−h_θ(x^{(i)}))]+\frac{\lambda}{2m}\sum^n_{j=1}{\theta^2_j}$

$\sum^n_{j=1}{\theta^2_j}$ means to explicitly exclude the bias term.

da}{2m}\sum^{n_{j=1}{\theta}2_j}$

$\sum^n_{j=1}{\theta^2_j}$ means to explicitly exclude the bias term.

NZOGGY_

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Solving the Problem of Overfitting

Solving the Problem of OverfittingThe problem of overfittingUnderfitting, high biasoverfitting, high variance:If we have too many features, the learned hypothesis may fit the training set very well, but fail to generalize to new examples.Debuggi
复制链接

扫一扫