【机器学习笔记】正则化

1.简介

正则化用于解决过拟合问题是一个非常好的方法!!通过限制某个项对应的权重来削减其对模型的影响,同时又保留该项不至于删掉特征值导致模型不够完整

2. 正则化代价函数

2.1 线性回归

其代价函数经过正则化后如下所示
J ( w ⃗ , b ) = 1 2 m ∑ i = 0 m − 1 ( f w ⃗ , b ( x ( i ) ) − y ( i ) ) 2 + λ 2 m ∑ j = 0 n − 1 w j 2 J(\vec{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\vec{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 + \frac{\lambda}{2m} \sum_{j=0}^{n-1} w_j^2 J(w ,b)=2m1i=0m1(fw ,b(x(i))y(i))2+2mλj=0n1wj2 其中 λ \lambda λ 是一个很小的数,通过它来约束 w j w_j wj ,当然读者也可以对 b b b 也进行正则化,但是一般都只正则化 w j w_j wj

这里的λ当取值较大时,所限制的w更趋向于正态分布;而当取值较小时,所限制的w更趋向平均分布

以上正则化代价函数的最终代码如下

def compute_cost_linear_reg(X, y, w, b, lambda_ = 1):
    """
    Computes the cost over all examples
    Args:
      X (ndarray (m,n): Data, m examples with n features
      y (ndarray (m,)): target values
      w (ndarray (n,)): model parameters  
      b (scalar)      : model parameter
      lambda_ (scalar): Controls amount of regularization
    Returns:
      total_cost (scalar):  cost 
    """

    m  = X.shape[0]
    n  = len(w)
    cost = 0.
    #左半部分
    for i in range(m):
        f_wb_i = np.dot(X[i], w) + b                                   #(n,)(n,)=scalar, see np.dot
        cost = cost + (f_wb_i - y[i])**2                               #scalar             
    cost = cost / (2 * m)                                              #scalar  
 	#右半部分
    reg_cost = 0
    for j in range(n):
        reg_cost += (w[j]**2)                                          #scalar
    reg_cost = (lambda_/(2*m)) * reg_cost                              #scalar
    
    total_cost = cost + reg_cost                                       #scalar
    return total_cost                                                  #scalar

2.2 逻辑回归

其代价函数经过正则化后如下所示
J ( w ⃗ , b ) = 1 m ∑ i = 0 m − 1 [ − y ( i ) log ⁡ ( f w ⃗ , b ( x ( i ) ) ) − ( 1 − y ( i ) ) log ⁡ ( 1 − f w ⃗ , b ( x ( i ) ) ) ] + λ 2 m ∑ j = 0 n − 1 w j 2 J(\vec{w},b) = \frac{1}{m} \sum_{i=0}^{m-1} \left[ -y^{(i)} \log\left(f_{\vec{w},b}\left( \mathbf{x}^{(i)} \right) \right) - \left( 1 - y^{(i)}\right) \log \left( 1 - f_{\vec{w},b}\left( \mathbf{x}^{(i)} \right) \right) \right] \\ + \frac{\lambda}{2m} \sum_{j=0}^{n-1} w_j^2 J(w ,b)=m1i=0m1[y(i)log(fw ,b(x(i)))(1y(i))log(1fw ,b(x(i)))]+2mλj=0n1wj2 其中 λ \lambda λ 的解释同上,注意逻辑回归用的loss function与线性回归不同噢
以上正则化代价函数的最终代码如下

def compute_cost_logistic_reg(X, y, w, b, lambda_ = 1):
    """
    Computes the cost over all examples
    Args:
    Args:
      X (ndarray (m,n): Data, m examples with n features
      y (ndarray (m,)): target values
      w (ndarray (n,)): model parameters  
      b (scalar)      : model parameter
      lambda_ (scalar): Controls amount of regularization
    Returns:
      total_cost (scalar):  cost 
    """

    m,n  = X.shape
    cost = 0.
    #上左式
    for i in range(m):
        z_i = np.dot(X[i], w) + b                                      #(n,)(n,)=scalar, see np.dot
        f_wb_i = sigmoid(z_i)                                          #scalar
        cost +=  -y[i]*np.log(f_wb_i) - (1-y[i])*np.log(1-f_wb_i)      #scalar
             
    cost = cost/m                                                      #scalar
 	#上右式
    reg_cost = 0
    for j in range(n):
        reg_cost += (w[j]**2)                                          #scalar
    reg_cost = (lambda_/(2*m)) * reg_cost                              #scalar
    
    total_cost = cost + reg_cost                                       #scalar
    return total_cost                                                  #scalar

3. 梯度下降

有了代价函数之后我们要通过梯度下降来求解最优的参数组合,那其公式跟之前没用正则化的时候有什么不同呢🧐 大体上跟之前的样式不变还是,如下
repeat until convergence:    {        w j = w j − α ∂ J ( w ⃗ , b ) ∂ w j    for j  ∈  [0,n-1]             b = b − α ∂ J ( w ⃗ , b ) ∂ b } \begin{align*} &\text{repeat until convergence:} \; \lbrace \\ & \; \; \;w_j = w_j - \alpha \frac{\partial J(\vec{w},b)}{\partial w_j} \; & \text{for j $\in$ [0,n-1] } \\ & \; \; \; \; \;b = b - \alpha \frac{\partial J(\vec{w},b)}{\partial b} \\ &\rbrace \end{align*} repeat until convergence:{wj=wjαwjJ(w ,b)b=bαbJ(w ,b)}for j  [0,n-1] 而求偏导部分就有所不同,由于加入了正则化多了一项再求导过程中出现 2 w j 2w_j 2wj,然后与前面的常数项相乘化简后结果如下
∂ J ( w ⃗ , b ) ∂ w j = 1 m ∑ i = 0 m − 1 ( f w ⃗ , b ( x ( i ) ) − y ( i ) ) x j ( i ) + λ m w j ∂ J ( w ⃗ , b ) ∂ b = 1 m ∑ i = 0 m − 1 ( f w ⃗ , b ( x ( i ) ) − y ( i ) ) \begin{align*} \frac{\partial J(\vec{w},b)}{\partial w_j} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\vec{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} + \frac{\lambda}{m} w_j \\ \frac{\partial J(\vec{w},b)}{\partial b} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\vec{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \end{align*} wjJ(w ,b)bJ(w ,b)=m1i=0m1(fw ,b(x(i))y(i))xj(i)+mλwj=m1i=0m1(fw ,b(x(i))y(i))
值得注意的是对于线性回归和逻辑回归,它们的求导公式是一样的,区别就在于函数 f ( w ⃗ , b ) ( x ( i ) ) f_{(\vec w,b)}(x^{(i)}) f(w ,b)(x(i))不同

  • 线性回归是 f ( w ⃗ , b ) ( x ( i ) ) = w ⃗ x ⃗ + b f_{(\vec w,b)}(x^{(i)}) = \vec w \vec x + b f(w ,b)(x(i))=w x +b 其对应的梯度下降计算代码如下
def compute_gradient_linear_reg(X, y, w, b, lambda_): 
    """
    Computes the gradient for linear regression 
    Args:
      X (ndarray (m,n): Data, m examples with n features
      y (ndarray (m,)): target values
      w (ndarray (n,)): model parameters  
      b (scalar)      : model parameter
      lambda_ (scalar): Controls amount of regularization
      
    Returns:
      dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. 
      dj_db (scalar):       The gradient of the cost w.r.t. the parameter b. 
    """
    m,n = X.shape           #(number of examples, number of features)
    dj_dw = np.zeros((n,))
    dj_db = 0.

    for i in range(m):    
    	#函数f                         
        err = (np.dot(X[i], w) + b) - y[i]                 
        for j in range(n):                         
            dj_dw[j] = dj_dw[j] + err * X[i, j]               
        dj_db = dj_db + err                        
    dj_dw = dj_dw / m                                
    dj_db = dj_db / m   
    
    for j in range(n):
        dj_dw[j] = dj_dw[j] + (lambda_/m) * w[j]

    return dj_db, dj_dw
  • 逻辑回归是 f ( w ⃗ , b ) ( x ( i ) ) = g ( z ) = g ( w ⃗ x ⃗ + b ) = 1 1 + e w ⃗ x ⃗ + b f_{(\vec w,b)}(x^{(i)}) = g(z) = g(\vec w \vec x + b) = \frac{1}{1+e^{\vec w \vec x + b}} f(w ,b)(x(i))=g(z)=g(w x +b)=1+ew x +b1 其对应的梯度下降计算代码如下
def compute_gradient_logistic_reg(X, y, w, b, lambda_): 
    """
    Computes the gradient for linear regression 
 
    Args:
      X (ndarray (m,n): Data, m examples with n features
      y (ndarray (m,)): target values
      w (ndarray (n,)): model parameters  
      b (scalar)      : model parameter
      lambda_ (scalar): Controls amount of regularization
    Returns
      dj_dw (ndarray Shape (n,)): The gradient of the cost w.r.t. the parameters w. 
      dj_db (scalar)            : The gradient of the cost w.r.t. the parameter b. 
    """
    m,n = X.shape
    dj_dw = np.zeros((n,))                            #(n,)
    dj_db = 0.0                                       #scalar

    for i in range(m):
    	#函数f 
        f_wb_i = sigmoid(np.dot(X[i],w) + b)          #(n,)(n,)=scalar
        err_i  = f_wb_i  - y[i]                       #scalar
        for j in range(n):
            dj_dw[j] = dj_dw[j] + err_i * X[i,j]      #scalar
        dj_db = dj_db + err_i
    dj_dw = dj_dw/m                                   #(n,)
    dj_db = dj_db/m                                   #scalar

    for j in range(n):
        dj_dw[j] = dj_dw[j] + (lambda_/m) * w[j]

    return dj_db, dj_dw  
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值