【ML13】overfitting and underfitting 过拟合与欠拟合

过拟合与欠拟合概念

首先看一个实例:

来源:周志华《机器学习》 图2.1
在这里插入图片描述
过拟合和欠拟合可以狭义的理解为:过拟合是考虑的太多,太面面俱到而发生错误;欠拟合是考虑的太少,漏掉很多店从而产生错误。

其次结合线性回归以及逻辑回归:

比如在 线性回归 Linear Regression 以及 逻辑回归 Logistic Regression 中,过拟合与欠拟合图:

在这里插入图片描述


过拟合解决办法

解决办法一:在训练集中加入更多数据

图片来源:吴恩达《ML》第三节课程
在训练集中加入更多的数据,可以优化训练模型!


解决办法二:优化数据集 feature selection

在这里插入图片描述
优化数据集理解举例:

1、特征很多,删除不必要特征

在对房子价格预估中,你有很多属性,其中包括:房子的面积,房子中卫生间个数,房子中楼层数,房子的花园大小,前任主人的岁数,房子前任主人有多少条狗,房子中前任主人有多少个孩子…

这些里有一些是必要的,有一些是无关紧要的。

属性是否必要
房子的面积必要
房子中卫生间的个数必要
房子中楼层数必要
房子花园的大小必要
前任主人的岁数不必要
房子前任主人有多少条狗不必要
房子前任主人有多少个孩子不必要

事实证明,无关紧要的数据会使得数据集因为太多特征而过拟合。
删除无关紧要的数据,仅保留对你的判断有价值的必要数据,会使得解决过拟合问题。

2、特征太少,增加必要特征

在对房子价格预估中,你只有一个属性,房子的面积,也是不够的,增加必要属性,从而优化模拟模型。


解决方法三:正则化 Regularization

正则化可以理解为:减少幂特别高的自变量的系数,比如下图中,将 x 4 x^4 x4 的系数从 174 174 174 减小为 0.0001 0.0001 0.0001 ,而不是将 x 4 x^4 x4 前的系数设置为 0 0 0

在这里插入图片描述
图片来源:吴恩达《ML》第三周课程,仅用于学习

以下内容可选择查看

明白以上三种方法的原理,那么我们程序中到底该如何做呢?首先在系数非常少的情形下的确可以通过筛选进行处理。但是如果系数非常多的情形下,我们的方案是通过在损失函数中增添 正则化 regularization term 的部分,从而对所有的系数进行衰减,即:

J ( w , b ) = 1 2 m ∑ i = 1 m ( f w , b ( x ( i ) ) − y ( i ) ) 2 J_{(w,b)} = \frac 1 {2m} \sum\limits_{i = 1}^{m} (f_{w,b}(x^{(i)})- y^{(i)})^2 J(w,b)=2m1i=1m(fw,b(x(i))y(i))2

J ( w , b ) = 1 2 m ∑ i = 1 m ( f w , b ( x ( i ) ) − y ( i ) ) 2 + λ 2 m ∑ j = 1 n w j 2 J_{(w,b)} = \frac 1 {2m} \sum\limits_{i = 1}^{m} (f_{w,b}(x^{(i)})- y^{(i)})^2 + \frac λ {2m} \sum\limits_{j = 1}^{n} w_j^2 J(w,b)=2m1i=1m(fw,b(x(i))y(i))2+2mλj=1nwj2

J ( w , b ) = 1 2 m ∑ i = 1 m ( f w , b ( x ( i ) ) − y ( i ) ) 2 + λ 2 m ∑ j = 1 n w j 2 + λ 2 m ∑ j = 1 n b 2 J_{(w,b)} = \frac 1 {2m} \sum\limits_{i = 1}^{m} (f_{w,b}(x^{(i)})- y^{(i)})^2 + \frac λ {2m} \sum\limits_{j = 1}^{n} w_j^2 + \frac λ {2m} \sum\limits_{j = 1}^{n} b^2 J(w,b)=2m1i=1m(fw,b(x(i))y(i))2+2mλj=1nwj2+2mλj=1nb2

示意图
在这里插入图片描述

同时需要注意的是,正则化部分的 λ λ λ 值,不可太大也不可太小。太小假设趋近于0,则跟没有一样;太大假设趋近于无穷,则会导致在做 m a x ( J ( w , b ) ) max(J_{(w,b)}) max(J(w,b)) 时,忽略预测值与实际值的差,从而大量衰减系数,使得最终趋近于一条平行于 x x x轴 的直线,即 y = b y=b y=b


正则化线性回归

Regularized Linear Regression

首先,recape 一下线性回归的损失函数以及相关处理步骤

Recape of Cost Function of Linear Regression

J ( w , b ) = 1 2 m ∑ i = 1 m ( f w , b ( x ( i ) ) − y ( i ) ) 2 J_{(w,b)} = \frac 1 {2m} \sum\limits_{i = 1}^{m} (f_{w,b}(x^{(i)})- y^{(i)})^2 J(w,b)=2m1i=1m(fw,b(x(i))y(i))2

w j = w j − α d d w j J ( w , b ) w_j = w_j-α\frac d {dw_j}J_{(w,b)} wj=wjαdwjdJ(w,b)
b j = b j − α d d b j J ( w , b ) b_j = b_j-α\frac d {db_j}J_{(w,b)} bj=bjαdbjdJ(w,b)

d d w j J ( w , b ) = 1 m ∑ i = 1 m ( f w , b ( x ( i ) ) − y ( i ) ) x j ( i ) \frac d {dw_j}J_{(w,b)}=\frac 1 m \sum\limits_{i = 1}^{m} (f_{w,b}(x^{(i)})- y^{(i)})x_j^{(i)} dwjdJ(w,b)=m1i=1m(fw,b(x(i))y(i))xj(i)
d d b j J ( w , b ) = 1 m ∑ i = 1 m ( f w , b ( x ( i ) ) − y ( i ) ) \frac d {db_j}J_{(w,b)}=\frac 1 m \sum\limits_{i = 1}^{m} (f_{w,b}(x^{(i)})- y^{(i)}) dbjdJ(w,b)=m1i=1m(fw,b(x(i))y(i))


Add the regularized term

J ( w , b ) = 1 2 m ∑ i = 1 m ( f w , b ( x ( i ) ) − y ( i ) ) 2 + λ 2 m ∑ j = 1 n w j 2 J_{(w,b)} = \frac 1 {2m} \sum\limits_{i = 1}^{m} (f_{w,b}(x^{(i)})- y^{(i)})^2+\fracλ {2m}\sum\limits_{j = 1}^{n}w_j^2 J(w,b)=2m1i=1m(fw,b(x(i))y(i))2+2mλj=1nwj2

w j = w j − α d d w j J ( w , b ) w_j = w_j-α\frac d {dw_j}J_{(w,b)} wj=wjαdwjdJ(w,b)
b j = b j − α d d b j J ( w , b ) b_j = b_j-α\frac d {db_j}J_{(w,b)} bj=bjαdbjdJ(w,b)

d d w j J ( w , b ) = 1 m ∑ i = 1 m ( f w , b ( x ( i ) ) − y ( i ) ) x j ( i ) + λ m w j \frac d {dw_j}J_{(w,b)}=\frac 1 m \sum\limits_{i = 1}^{m} (f_{w,b}(x^{(i)})- y^{(i)})x_j^{(i)}+\frac λ mw_j dwjdJ(w,b)=m1i=1m(fw,b(x(i))y(i))xj(i)+mλwj
d d b j J ( w , b ) = 1 m ∑ i = 1 m ( f w , b ( x ( i ) ) − y ( i ) ) \frac d {db_j}J_{(w,b)}=\frac 1 m \sum\limits_{i = 1}^{m} (f_{w,b}(x^{(i)})- y^{(i)}) dbjdJ(w,b)=m1i=1m(fw,b(x(i))y(i))


So what is regularized doing for Linear Regression

对 系数 w w w损失 做进一步化简:

w j = w j − α d d w j J ( w , b ) = w j − α ( 1 m ∑ i = 1 m ( f w , b ( x ( i ) ) − y ( i ) ) x j ( i ) + λ m w j ) w_j = w_j-α\frac d {dw_j}J_{(w,b)} = w_j-α(\frac 1 m \sum\limits_{i = 1}^{m} (f_{w,b}(x^{(i)})- y^{(i)})x_j^{(i)}+\frac λ mw_j) wj=wjαdwjdJ(w,b)=wjα(m1i=1m(fw,b(x(i))y(i))xj(i)+mλwj)
即:
w j = ( w j − α λ m w j ) − α 1 m ∑ i = 1 m ( f w , b ( x ( i ) ) − y ( i ) ) x j ( i ) w_j =(w_j-α\frac λ mw_j)-α\frac 1 m \sum\limits_{i = 1}^{m} (f_{w,b}(x^{(i)})- y^{(i)})x_j^{(i)} wj=(wjαmλwj)αm1i=1m(fw,b(x(i))y(i))xj(i)
w j = ( 1 − α λ m ) w j − α 1 m ∑ i = 1 m ( f w , b ( x ( i ) ) − y ( i ) ) x j ( i ) w_j =(1-α\frac λ m)w_j-α\frac 1 m \sum\limits_{i = 1}^{m} (f_{w,b}(x^{(i)})- y^{(i)})x_j^{(i)} wj=(1αmλ)wjαm1i=1m(fw,b(x(i))y(i))xj(i)
其中, α α α learning rate,取值范围为: [ 0 , 1 ] [0,1] [0,1],一般为 0.01 0.01 0.01
λ λ λ 为 regularized 的系数,一般取值为 1 或 10 1或10 110
m m m 为训练集元素个数,为一个常数项。假设为50.

那么:
w j = ( 1 − 0.01 1 50 ) w j − α 1 m ∑ i = 1 m ( f w , b ( x ( i ) ) − y ( i ) ) x j ( i ) w_j =(1-0.01\frac 1 {50})w_j-α\frac 1 m \sum\limits_{i = 1}^{m} (f_{w,b}(x^{(i)})- y^{(i)})x_j^{(i)} wj=(10.01501)wjαm1i=1m(fw,b(x(i))y(i))xj(i)
w j = 0.9998 w j − α 1 m ∑ i = 1 m ( f w , b ( x ( i ) ) − y ( i ) ) x j ( i ) w_j =0.9998w_j-α\frac 1 m \sum\limits_{i = 1}^{m} (f_{w,b}(x^{(i)})- y^{(i)})x_j^{(i)} wj=0.9998wjαm1i=1m(fw,b(x(i))y(i))xj(i)

对比不通过正则化的损失函数,发现区别在于对首项 w j w_j wj 的值做部分衰减。而在每次减法时,都会对 w j w_j wj 的值做部分衰减。


正则化逻辑回归

Regularized Logistic Regression

首先还是 recape 一下 逻辑回归函数Logistic Regression 的损失函数。

Recape of Cost Function of Logistic Regression

J ( w , b ) = − 1 m ∑ i = 1 m [ y ( i ) l o g ( f w , b ( x ( i ) ) ) + ( 1 − y ( i ) ) l o g ( 1 − f w , b ( x ( i ) ) ) ] J_{(w,b)} = -\frac 1 m \sum\limits_{i = 1}^{m}[y^{(i)}log(f_{w,b}(x^{(i)}))+(1-y^{(i)})log(1-f_{w,b}(x^{(i)}))] J(w,b)=m1i=1m[y(i)log(fw,b(x(i)))+(1y(i))log(1fw,b(x(i)))]

w j = w j − α d d w j J ( w , b ) w_j = w_j-α\frac d {dw_j}J_{(w,b)} wj=wjαdwjdJ(w,b)
b j = b j − α d d b j J ( w , b ) b_j = b_j-α\frac d {db_j}J_{(w,b)} bj=bjαdbjdJ(w,b)

d d w j J ( w , b ) = 1 m ∑ i = 1 m ( f w , b ( x ( i ) ) − y ( i ) ) x j ( i ) \frac d {dw_j}J_{(w,b)}=\frac 1 m \sum\limits_{i = 1}^{m} (f_{w,b}(x^{(i)})- y^{(i)})x_j^{(i)} dwjdJ(w,b)=m1i=1m(fw,b(x(i))y(i))xj(i)
d d b j J ( w , b ) = 1 m ∑ i = 1 m ( f w , b ( x ( i ) ) − y ( i ) ) \frac d {db_j}J_{(w,b)}=\frac 1 m \sum\limits_{i = 1}^{m} (f_{w,b}(x^{(i)})- y^{(i)}) dbjdJ(w,b)=m1i=1m(fw,b(x(i))y(i))


Add the regularized term

J ( w , b ) = − 1 m ∑ i = 1 m [ y ( i ) l o g ( f w , b ( x ( i ) ) ) + ( 1 − y ( i ) ) l o g ( 1 − f w , b ( x ( i ) ) ) ] + λ 2 m ∑ j = 1 n w j 2 J_{(w,b)} = -\frac 1 m \sum\limits_{i = 1}^{m}[y^{(i)}log(f_{w,b}(x^{(i)}))+(1-y^{(i)})log(1-f_{w,b}(x^{(i)}))]+\frac λ {2m} \sum\limits_{j = 1}^{n}w_j^2 J(w,b)=m1i=1m[y(i)log(fw,b(x(i)))+(1y(i))log(1fw,b(x(i)))]+2mλj=1nwj2

w j = w j − α d d w j J ( w , b ) w_j = w_j-α\frac d {dw_j}J_{(w,b)} wj=wjαdwjdJ(w,b)
b j = b j − α d d b j J ( w , b ) b_j = b_j-α\frac d {db_j}J_{(w,b)} bj=bjαdbjdJ(w,b)

d d w j J ( w , b ) = 1 m ∑ i = 1 m [ ( f w , b ( x ( i ) ) − y ( i ) ) x j ( i ) ] + λ m w j \frac d {dw_j}J_{(w,b)}=\frac 1 m \sum\limits_{i = 1}^{m} [(f_{w,b}(x^{(i)})- y^{(i)})x_j^{(i)}]+\frac λ m w_j dwjdJ(w,b)=m1i=1m[(fw,b(x(i))y(i))xj(i)]+mλwj
d d b j J ( w , b ) = 1 m ∑ i = 1 m ( f w , b ( x ( i ) ) − y ( i ) ) \frac d {db_j}J_{(w,b)}=\frac 1 m \sum\limits_{i = 1}^{m} (f_{w,b}(x^{(i)})- y^{(i)}) dbjdJ(w,b)=m1i=1m(fw,b(x(i))y(i))


python in 正则化线性回归

Cost function for regularized linear regression

code:

def compute_cost_linear_reg(X, y, w, b, lambda_=1):

    m = X.shape[0]
    n = len(w)
    cost = 0.
    for i in range(m):
        f_wb_i = np.dot(X[i], w) + b
        cost = cost + (f_wb_i - y[i]) ** 2
    cost = cost / (2 * m)  # scalar

    reg_cost = 0
    for j in range(n):
        reg_cost += (w[j] ** 2)
    reg_cost = (lambda_ / (2 * m)) * reg_cost

    total_cost = cost + reg_cost
    return total_cost

explaination:
在这里插入图片描述


Gradient function for regularized linear regression

code:

def compute_gradient_linear_reg(X, y, w, b, lambda_):

    m, n = X.shape  # (number of examples, number of features)
    dj_dw = np.zeros((n,))
    dj_db = 0.

    for i in range(m):
        err = (np.dot(X[i], w) + b) - y[i]
        for j in range(n):
            dj_dw[j] = dj_dw[j] + err * X[i, j]
        dj_db = dj_db + err
    dj_dw = dj_dw / m
    dj_db = dj_db / m

    for j in range(n):
        dj_dw[j] = dj_dw[j] + (lambda_ / m) * w[j]

    return dj_db, dj_dw

explaination:
在这里插入图片描述


python in 正则化逻辑回归

Cost function for regularized Logistic regression

code:

def compute_cost_logistic_reg(X, y, w, b, lambda_=1):

    m, n = X.shape
    cost = 0.
    for i in range(m):
        z_i = np.dot(X[i], w) + b 
        f_wb_i = sigmoid(z_i) 
        cost += -y[i] * np.log(f_wb_i) - (1 - y[i]) * np.log(1 - f_wb_i)  

    cost = cost / m 

    reg_cost = 0
    for j in range(n):
        reg_cost += (w[j] ** 2)  
    reg_cost = (lambda_ / (2 * m)) * reg_cost  

    total_cost = cost + reg_cost  
    return total_cost  

explaination:
在这里插入图片描述


Gradient function for regularized Logistic regression

code:

def compute_gradient_logistic_reg(X, y, w, b, lambda_): 

    m,n = X.shape
    dj_dw = np.zeros((n,)) 
    dj_db = 0.0  

    for i in range(m):
        f_wb_i = sigmoid(np.dot(X[i],w) + b)    
        err_i  = f_wb_i  - y[i]                     
        for j in range(n):
            dj_dw[j] = dj_dw[j] + err_i * X[i,j]      
        dj_db = dj_db + err_i
    dj_dw = dj_dw/m                                 
    dj_db = dj_db/m                               

    for j in range(n):
        dj_dw[j] = dj_dw[j] + (lambda_/m) * w[j]

    return dj_db, dj_dw  

explaination:
在这里插入图片描述

end — >

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

脚踏实地的大梦想家

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值