【Machine Learning】12.过拟合及正则化

1.过拟合

%matplotlib widget
import matplotlib.pyplot as plt
from ipywidgets import Output
from plt_overfit import overfit_example, output
plt.style.use('./deeplearning.mplstyle')
plt.close("all")
display(output)
ofit = overfit_example(False)

1.1 回归问题过拟合

拟合不足
在这里插入图片描述

刚好
在这里插入图片描述

过拟合
在这里插入图片描述

1.2 分类问题过拟合

拟合不足
在这里插入图片描述
拟合刚好

在这里插入图片描述
过拟合
在这里插入图片描述

2.正则化

解决过拟合问题的一个重要方法就是正则化,正则化是一种对所有特征进行“惩罚”的方式,可以让拟合变得平滑,如果 λ \lambda λ的值过小,惩罚效果微乎其微,如果值过大,会直接变成一条斜率为0的直线,下面所述代码中,选择 λ \lambda λ = 1

import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from plt_overfit import overfit_example, output
from lab_utils_common import sigmoid
np.set_printoptions(precision=8)
  • 线性回归和逻辑回归的成本函数有显著差异,但将正则化添加到方程中是相同的。
  • 线性回归和逻辑回归的梯度函数非常相似。它们仅在 f w b f_{wb} fwb的实现上有所不同。

在这里插入图片描述

2.1 正则化后的线性回归代价函数

添加正则化后的代价函数:
J ( w , b ) = 1 2 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) ) 2 + λ 2 m ∑ j = 0 n − 1 w j 2 (1) J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 + \frac{\lambda}{2m} \sum_{j=0}^{n-1} w_j^2 \tag{1} J(w,b)=2m1i=0m1(fw,b(x(i))y(i))2+2mλj=0n1wj2(1)
where:
f w , b ( x ( i ) ) = w ⋅ x ( i ) + b (2) f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b \tag{2} fw,b(x(i))=wx(i)+b(2)

没添加正则化前的代价函数:

J ( w , b ) = 1 2 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) ) 2 J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 J(w,b)=2m1i=0m1(fw,b(x(i))y(i))2

不同点在于正则化项 regularization term λ 2 m ∑ j = 0 n − 1 w j 2 \frac{\lambda}{2m} \sum_{j=0}^{n-1} w_j^2 2mλj=0n1wj2

注意只有参数w正则化,b并没有

代码实现

def compute_cost_linear_reg(X, y, w, b, lambda_ = 1):
    """
    Computes the cost over all examples
    Args:
      X (ndarray (m,n): Data, m examples with n features
      y (ndarray (m,)): target values
      w (ndarray (n,)): model parameters  
      b (scalar)      : model parameter
      lambda_ (scalar): Controls amount of regularization
    Returns:
      total_cost (scalar):  cost 
    """

    m  = X.shape[0]
    n  = len(w)
    cost = 0.
    for i in range(m):
        f_wb_i = np.dot(X[i], w) + b                                   #(n,)(n,)=scalar, see np.dot
        cost = cost + (f_wb_i - y[i])**2                               #scalar             
    cost = cost / (2 * m)                                              #scalar  
 
    reg_cost = 0
    for j in range(n):
        reg_cost += (w[j]**2)                                          #scalar
    reg_cost = (lambda_/(2*m)) * reg_cost                              #scalar
    
    total_cost = cost + reg_cost                                       #scalar
    return total_cost                                                  #scalar

调用

np.random.seed(1)
X_tmp = np.random.rand(5,6)
y_tmp = np.array([0,1,0,1,0])
w_tmp = np.random.rand(X_tmp.shape[1]).reshape(-1,)-0.5
b_tmp = 0.5
lambda_tmp = 0.7
cost_tmp = compute_cost_linear_reg(X_tmp, y_tmp, w_tmp, b_tmp, lambda_tmp)

print("Regularized cost:", cost_tmp)


Regularized cost: 0.07917239320214277

2.2 正则化后的逻辑回归代价函数

正则化后的逻辑回归
J ( w , b ) = 1 m ∑ i = 0 m − 1 [ − y ( i ) log ⁡ ( f w , b ( x ( i ) ) ) − ( 1 − y ( i ) ) log ⁡ ( 1 − f w , b ( x ( i ) ) ) ] + λ 2 m ∑ j = 0 n − 1 w j 2 (3) J(\mathbf{w},b) = \frac{1}{m} \sum_{i=0}^{m-1} \left[ -y^{(i)} \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) - \left( 1 - y^{(i)}\right) \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) \right] + \frac{\lambda}{2m} \sum_{j=0}^{n-1} w_j^2 \tag{3} J(w,b)=m1i=0m1[y(i)log(fw,b(x(i)))(1y(i))log(1fw,b(x(i)))]+2mλj=0n1wj2(3)
where:
f w , b ( x ( i ) ) = s i g m o i d ( w ⋅ x ( i ) + b ) (4) f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = sigmoid(\mathbf{w} \cdot \mathbf{x}^{(i)} + b) \tag{4} fw,b(x(i))=sigmoid(wx(i)+b)(4)

比较一下正则化之前的:

J ( w , b ) = 1 m ∑ i = 0 m − 1 [ ( − y ( i ) log ⁡ ( f w , b ( x ( i ) ) ) − ( 1 − y ( i ) ) log ⁡ ( 1 − f w , b ( x ( i ) ) ) ] J(\mathbf{w},b) = \frac{1}{m}\sum_{i=0}^{m-1} \left[ (-y^{(i)} \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) - \left( 1 - y^{(i)}\right) \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right)\right] J(w,b)=m1i=0m1[(y(i)log(fw,b(x(i)))(1y(i))log(1fw,b(x(i)))]

As was the case in linear regression above, the difference is the regularization term, which is
λ 2 m ∑ j = 0 n − 1 w j 2 \frac{\lambda}{2m} \sum_{j=0}^{n-1} w_j^2 2mλj=0n1wj2

代码实现:

def compute_cost_logistic_reg(X, y, w, b, lambda_ = 1):
    """
    Computes the cost over all examples
    Args:
    Args:
      X (ndarray (m,n): Data, m examples with n features
      y (ndarray (m,)): target values
      w (ndarray (n,)): model parameters  
      b (scalar)      : model parameter
      lambda_ (scalar): Controls amount of regularization
    Returns:
      total_cost (scalar):  cost 
    """

    m,n  = X.shape
    cost = 0.
    for i in range(m):
        z_i = np.dot(X[i], w) + b                                      #(n,)(n,)=scalar, see np.dot
        f_wb_i = sigmoid(z_i)                                          #scalar
        cost +=  -y[i]*np.log(f_wb_i) - (1-y[i])*np.log(1-f_wb_i)      #scalar
             
    cost = cost/m                                                      #scalar

	# 这之后的部分和线性回归一模一样
    reg_cost = 0
    for j in range(n):
        reg_cost += (w[j]**2)                                          #scalar
    reg_cost = (lambda_/(2*m)) * reg_cost                              #scalar
    
    total_cost = cost + reg_cost                                       #scalar
    return total_cost                                                  #scalar

实际调用:

np.random.seed(1)
X_tmp = np.random.rand(5,6)
y_tmp = np.array([0,1,0,1,0])
w_tmp = np.random.rand(X_tmp.shape[1]).reshape(-1,)-0.5
b_tmp = 0.5
lambda_tmp = 0.7
cost_tmp = compute_cost_logistic_reg(X_tmp, y_tmp, w_tmp, b_tmp, lambda_tmp)

print("Regularized cost:", cost_tmp)

2.3 应用正则化的梯度下降

这是原来的梯度下降
repeat until convergence:    {        w j = w j − α ∂ J ( w , b ) ∂ w j    for j := 0..n-1            b = b − α ∂ J ( w , b ) ∂ b } \begin{align*} &\text{repeat until convergence:} \; \lbrace \\ & \; \; \;w_j = w_j - \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \tag{1} \; & \text{for j := 0..n-1} \\ & \; \; \; \; \;b = b - \alpha \frac{\partial J(\mathbf{w},b)}{\partial b} \\ &\rbrace \end{align*} repeat until convergence:{wj=wjαwjJ(w,b)b=bαbJ(w,b)}for j := 0..n-1(1)

线性回归的梯度下降和逻辑回归的梯度下降表达式一样,区别只是 f w b f_{\mathbf{w}b} fwb.

这是正则化之后的梯度下降表达式
∂ J ( w , b ) ∂ w j = 1 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) ) x j ( i ) + λ m w j ∂ J ( w , b ) ∂ b = 1 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) ) \begin{align*} \frac{\partial J(\mathbf{w},b)}{\partial w_j} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} + \frac{\lambda}{m} w_j \tag{2} \\ \frac{\partial J(\mathbf{w},b)}{\partial b} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \tag{3} \end{align*} wjJ(w,b)bJ(w,b)=m1i=0m1(fw,b(x(i))y(i))xj(i)+mλwj=m1i=0m1(fw,b(x(i))y(i))(2)(3)

  • m is the number of training examples in the data set 数据集数量

  • f w , b ( x ( i ) ) f_{\mathbf{w},b}(x^{(i)}) fw,b(x(i)) is the model’s prediction, while y ( i ) y^{(i)} y(i) is the target

  • For a linear regression model
    f w , b ( x ) = w ⋅ x + b f_{\mathbf{w},b}(x) = \mathbf{w} \cdot \mathbf{x} + b fw,b(x)=wx+b

  • For a logistic regression model
    z = w ⋅ x + b z = \mathbf{w} \cdot \mathbf{x} + b z=wx+b
    f w , b ( x ) = g ( z ) f_{\mathbf{w},b}(x) = g(z) fw,b(x)=g(z)
    where g ( z ) g(z) g(z) is the sigmoid function:
    g ( z ) = 1 1 + e − z g(z) = \frac{1}{1+e^{-z}} g(z)=1+ez1

正则化项是 λ m w j \frac{\lambda}{m} w_j mλwj

线性回归梯度下降代码实现

def compute_gradient_linear_reg(X, y, w, b, lambda_): 
    """
    Computes the gradient for linear regression 
    Args:
      X (ndarray (m,n): Data, m examples with n features
      y (ndarray (m,)): target values
      w (ndarray (n,)): model parameters  
      b (scalar)      : model parameter
      lambda_ (scalar): Controls amount of regularization
      
    Returns:
      dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. 
      dj_db (scalar):       The gradient of the cost w.r.t. the parameter b. 
    """
    m,n = X.shape           #(number of examples, number of features)
    dj_dw = np.zeros((n,))
    dj_db = 0.

    for i in range(m):                             
        err = (np.dot(X[i], w) + b) - y[i]                 
        for j in range(n):                         
            dj_dw[j] = dj_dw[j] + err * X[i, j]               
        dj_db = dj_db + err                        
    dj_dw = dj_dw / m                                
    dj_db = dj_db / m   
    
    for j in range(n):
        dj_dw[j] = dj_dw[j] + (lambda_/m) * w[j]

    return dj_db, dj_dw

调用:

np.random.seed(1)
X_tmp = np.random.rand(5,3)
y_tmp = np.array([0,1,0,1,0])
w_tmp = np.random.rand(X_tmp.shape[1])
b_tmp = 0.5
lambda_tmp = 0.7
dj_db_tmp, dj_dw_tmp =  compute_gradient_linear_reg(X_tmp, y_tmp, w_tmp, b_tmp, lambda_tmp)

print(f"dj_db: {dj_db_tmp}", )
print(f"Regularized dj_dw:\n {dj_dw_tmp.tolist()}", )

逻辑回归梯度下降代码实现

def compute_gradient_logistic_reg(X, y, w, b, lambda_): 
    """
    Computes the gradient for linear regression 
 
    Args:
      X (ndarray (m,n): Data, m examples with n features
      y (ndarray (m,)): target values
      w (ndarray (n,)): model parameters  
      b (scalar)      : model parameter
      lambda_ (scalar): Controls amount of regularization
    Returns
      dj_dw (ndarray Shape (n,)): The gradient of the cost w.r.t. the parameters w. 
      dj_db (scalar)            : The gradient of the cost w.r.t. the parameter b. 
    """
    m,n = X.shape
    dj_dw = np.zeros((n,))                            #(n,)
    dj_db = 0.0                                       #scalar

    for i in range(m):
        f_wb_i = sigmoid(np.dot(X[i],w) + b)          #(n,)(n,)=scalar
        err_i  = f_wb_i  - y[i]                       #scalar
        for j in range(n):
            dj_dw[j] = dj_dw[j] + err_i * X[i,j]      #scalar
        dj_db = dj_db + err_i
    dj_dw = dj_dw/m                                   #(n,)
    dj_db = dj_db/m                                   #scalar

    for j in range(n):
        dj_dw[j] = dj_dw[j] + (lambda_/m) * w[j]

    return dj_db, dj_dw  

调用:

np.random.seed(1)
X_tmp = np.random.rand(5,3)
y_tmp = np.array([0,1,0,1,0])
w_tmp = np.random.rand(X_tmp.shape[1])
b_tmp = 0.5
lambda_tmp = 0.7
dj_db_tmp, dj_dw_tmp =  compute_gradient_logistic_reg(X_tmp, y_tmp, w_tmp, b_tmp, lambda_tmp)

print(f"dj_db: {dj_db_tmp}", )
print(f"Regularized dj_dw:\n {dj_dw_tmp.tolist()}", )

dj_db: 0.341798994972791
Regularized dj_dw:
 [0.17380012933994293, 0.32007507881566943, 0.10776313396851499]

3.应用正则化后的拟合效果示例

4.课后题

  1. 解决过拟合问题的方法

增加训练数据,正则化,选择更加相关的特征(特征工程)
在这里插入图片描述

  1. bias/variance
    在这里插入图片描述

  2. 考查正则化项中 λ \lambda λ的作用

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
机器学习中的过拟合和欠拟合是两个重要问题,需要在模型开发和训练中加以解决。过拟合指的是模型在训练数据上表现良好,但是在测试数据上表现不佳的情况。欠拟合则是指模型无法将训练数据的特征准确地捕捉到。为了解决这些问题,研究者们提出了许多方法和技术。 一种解决过拟合和欠拟合问题的方法是正则化。通过对模型的参数加入惩罚项,可以限制模型的复杂度,从而防止过拟合。常见的正则化方法有L1正则化和L2正则化等。 另一种方法是使用集成学习技术。例如,通过训练多个模型并将它们的结果组合起来,可以减少过拟合和欠拟合的风险。常见的集成学习方法包括随机森林和梯度提升树等。 此外,深度学习中也有一些特有的方法用于解决过拟合和欠拟合问题。例如,Dropout技术可以随机地将一些神经元的输出置为零,从而减少神经网络的复杂度,防止过拟合。 参考文献: 1. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. 2. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer. 3. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer. 出处: 1. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. 2. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer. 3. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值