Python machine learning Ridge Regression

Ridge回归通过对系数的大小施加惩罚来解决普通最小二乘的一些问题 岭系数最小化惩罚残差平方和,

\ underset {w} {min \,} {​{||  X w  -  y || _2} ^ 2 + \ alpha {|| w || _2} ^ 2}

这里\ alpha \ geq 0是控制收缩量的复杂性参数:收缩值\α越大,收缩量越大,因此系数对于共线性变得更加稳健。

from sklearn import linear_model
>>> reg = linear_model.Ridge (alpha = .5)
>>> reg.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1]) 
Ridge(alpha=0.5, copy_X=True, fit_intercept=True, max_iter=None,
      normalize=False, random_state=None, solver='auto', tol=0.001)
>>> reg.coef_
array([ 0.34545455,  0.34545455])

RidgeCV通过alpha参数的内置交叉验证实现岭回归。该对象的工作方式与GridSearchCV相同,不同之处在于它默认使用通用交叉验证(GCV),这是一种有效的一次性交叉验证形式:

from sklearn import linear_model
>>> reg = linear_model.RidgeCV(alphas=[0.1, 1.0, 10.0])
>>> reg.fit([[0, 0], [0, 0], [1, 1]], [0, .1, 1])       
RidgeCV(alphas=[0.1, 1.0, 10.0], cv=None, fit_intercept=True, scoring=None,
    normalize=False)
>>> reg.alpha_                                      
0.1
核心代码:

def ridge_regression(X, y, alpha, sample_weight=None, solver='auto',

                     max_iter=None, tol=1e-3, verbose=0, random_state=None,
                     return_n_iter=False, return_intercept=False):
    """Solve the ridge equation by the method of normal equations.
    Parameters
    ----------
    X : {array-like, sparse matrix, LinearOperator},
        shape = [n_samples, n_features]
        Training data

    y : array-like, shape = [n_samples] or [n_samples, n_targets]
        Target values

    alpha : {float, array-like},
        shape = [n_targets] if array-like
        Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of
        the estimates. Larger values specify stronger regularization. Alpha corresponds to ``C^-1`` in other linear models such as
        LogisticRegression or LinearSVC. If an array is passed, penalties are assumed to be specific to the targets.
Hence they must correspond               in number.

    max_iter : int, optional
        Maximum number of iterations for conjugate gradient solver.
        For 'sparse_cg' and 'lsqr' solvers, the default value is determined
        by scipy.sparse.linalg. For 'sag' solver, the default value is 1000.

    sample_weight : float or numpy array of shape [n_samples]
        Individual weights for each sample. If sample_weight is not None and
        solver='auto', the solver will be set to 'cholesky'.

    solver : {'auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga'}
        Solver to use in the computational routines:
        - 'auto' chooses the solver automatically based on the type of data.

        - 'svd' uses a Singular Value Decomposition of X to compute the Ridge
          coefficients. More stable for singular matrices than
          'cholesky'.

        - 'cholesky' uses the standard scipy.linalg.solve function to
          obtain a closed-form solution via a Cholesky decomposition of
          dot(X.T, X)

        - 'sparse_cg' uses the conjugate gradient solver as found in
          scipy.sparse.linalg.cg. As an iterative algorithm, this solver is
          more appropriate than 'cholesky' for large-scale data
          (possibility to set `tol` and `max_iter`).

        - 'lsqr' uses the dedicated regularized least-squares routine
          scipy.sparse.linalg.lsqr. It is the fastest but may not be available
          in old scipy versions. It also uses an iterative procedure.

        - 'sag' uses a Stochastic Average Gradient descent, and 'saga' uses
          its improved, unbiased version named SAGA. Both methods also use an
          iterative procedure, and are often faster than other solvers when
          both n_samples and n_features are large. Note that 'sag' and
          'saga' fast convergence is only guaranteed on features with
          approximately the same scale. You can preprocess the data with a
          scaler from sklearn.preprocessing.

        All last five solvers support both dense and sparse data. However, only
        'sag' and 'saga' supports sparse input when`fit_intercept` is True.

    tol : float
        Precision of the solution.

    verbose : int
        Verbosity level. Setting verbose > 0 will display additional
        information depending on the solver used.

    random_state : int, RandomState instance or None, optional, default None
        The seed of the pseudo random number generator to use when shuffling
        the data.  If int, random_state is the seed used by the random number
        generator; If RandomState instance, random_state is the random number
        generator; If None, the random number generator is the RandomState
        instance used by `np.random`. Used when ``solver`` == 'sag'.

    return_n_iter : boolean, default False
        If True, the method also returns `n_iter`, the actual number of
        iteration performed by the solver.

    return_intercept : boolean, default False
        If True and if X is sparse, the method also returns the intercept,
        and the solver is automatically changed to 'sag'. This is only a
        temporary fix for fitting the intercept with sparse data. For dense
        data, use sklearn.linear_model._preprocess_data before your regression.

    Returns
    -------
    coef : array, shape = [n_features] or [n_targets, n_features]
        Weight vector(s).

    n_iter : int, optional
        The actual number of iteration performed by the solver.
        Only returned if `return_n_iter` is True.

    intercept : float or array, shape = [n_targets]
        The intercept of the model. Only returned if `return_intercept`
        is True and if X is a scipy sparse array.

    This function won't compute the intercept.
    """
    if return_intercept and sparse.issparse(X) and solver != 'sag':
        if solver != 'auto':
            warnings.warn("In Ridge, only 'sag' solver can currently fit the "
                          "intercept when X is sparse. Solver has been "
                          "automatically changed into 'sag'.")
        solver = 'sag'


    # SAG needs X and y columns to be C-contiguous and np.float64
    if solver in ['sag', 'saga']:
        X = check_array(X, accept_sparse=['csr'],
                        dtype=np.float64, order='C')
        y = check_array(y, dtype=np.float64, ensure_2d=False, order='F')
    else:
        X = check_array(X, accept_sparse=['csr', 'csc', 'coo'],
                        dtype=np.float64)
        y = check_array(y, dtype='numeric', ensure_2d=False)
    check_consistent_length(X, y)


    n_samples, n_features = X.shape


    if y.ndim > 2:
        raise ValueError("Target y has the wrong shape %s" % str(y.shape))


    ravel = False
    if y.ndim == 1:
        y = y.reshape(-1, 1)
        ravel = True


    n_samples_, n_targets = y.shape


    if n_samples != n_samples_:
        raise ValueError("Number of samples in X and y does not correspond:"
                         " %d != %d" % (n_samples, n_samples_))


    has_sw = sample_weight is not None


    if solver == 'auto':
        # cholesky if it's a dense array and cg in any other case
        if not sparse.issparse(X) or has_sw:
            solver = 'cholesky'
        else:
            solver = 'sparse_cg'


    elif solver == 'lsqr' and not hasattr(sp_linalg, 'lsqr'):
        warnings.warn("""lsqr not available on this machine, falling back
                      to sparse_cg.""")
        solver = 'sparse_cg'


    if has_sw:
        if np.atleast_1d(sample_weight).ndim > 1:
            raise ValueError("Sample weights must be 1D array or scalar")

        if solver not in ['sag', 'saga']:
            # SAG supports sample_weight directly. For other solvers,
            # we implement sample_weight via a simple rescaling.
            X, y = _rescale_data(X, y, sample_weight)


    # There should be either 1 or n_targets penalties
    alpha = np.asarray(alpha).ravel()
    if alpha.size not in [1, n_targets]:
        raise ValueError("Number of targets and number of penalties "
                         "do not correspond: %d != %d"
                         % (alpha.size, n_targets))


    if alpha.size == 1 and n_targets > 1:
        alpha = np.repeat(alpha, n_targets)


    if solver not in ('sparse_cg', 'cholesky', 'svd', 'lsqr', 'sag', 'saga'):
        raise ValueError('Solver %s not understood' % solver)


    n_iter = None
    if solver == 'sparse_cg':
        coef = _solve_sparse_cg(X, y, alpha, max_iter, tol, verbose)


    elif solver == 'lsqr':
        coef, n_iter = _solve_lsqr(X, y, alpha, max_iter, tol)


    elif solver == 'cholesky':
        if n_features > n_samples:
            K = safe_sparse_dot(X, X.T, dense_output=True)
            try:
                dual_coef = _solve_cholesky_kernel(K, y, alpha)


                coef = safe_sparse_dot(X.T, dual_coef, dense_output=True).T
            except linalg.LinAlgError:
                # use SVD solver if matrix is singular
                solver = 'svd'


        else:
            try:
                coef = _solve_cholesky(X, y, alpha)
            except linalg.LinAlgError:
                # use SVD solver if matrix is singular
                solver = 'svd'


    elif solver in ['sag', 'saga']:
        # precompute max_squared_sum for all targets
        max_squared_sum = row_norms(X, squared=True).max()


        coef = np.empty((y.shape[1], n_features))
        n_iter = np.empty(y.shape[1], dtype=np.int32)
        intercept = np.zeros((y.shape[1], ))
        for i, (alpha_i, target) in enumerate(zip(alpha, y.T)):
            init = {'coef': np.zeros((n_features + int(return_intercept), 1))}
            coef_, n_iter_, _ = sag_solver(
                X, target.ravel(), sample_weight, 'squared', alpha_i, 0,
                max_iter, tol, verbose, random_state, False, max_squared_sum,
                init,
                is_saga=solver == 'saga')
            if return_intercept:
                coef[i] = coef_[:-1]
                intercept[i] = coef_[-1]
            else:
                coef[i] = coef_
            n_iter[i] = n_iter_


        if intercept.shape[0] == 1:
            intercept = intercept[0]
        coef = np.asarray(coef)


    if solver == 'svd':
        if sparse.issparse(X):
            raise TypeError('SVD solver does not support sparse'
                            ' inputs currently')
        coef = _solve_svd(X, y, alpha)


    if ravel:
        # When y was passed as a 1d-array, we flatten the coefficients.
        coef = coef.ravel()


    if return_n_iter and return_intercept:
        return coef, n_iter, intercept
    elif return_intercept:
        return coef, intercept
    elif return_n_iter:
        return coef, n_iter
    else:
        return coef

Lasso是一个估计稀疏系数的线性模型。在某些情况下它是有用的,因为它倾向于选择具有较少参数值的解决方案,从而有效地减少给定解决方案所依赖的变量的数量。出于这个原因,套索及其变体是压缩感测领域的基础。在某些情况下,它可以恢复非零权值的精确集合(请参阅 压缩感知:使用L1之前(Lasso)的断层摄影重建)。

在数学上,它由一个\ ell_1以前作为正规化器训练过的线性模型组成最小化的目标函数是:

\下面{w} {min \,} {\ frac {1} {2n_ {samples}} || X w  -  y || _2 ^ 2 + \ alpha || w || _1}

因此,套索估计解决了最小二乘惩罚的最小化\ alpha || w || _1,其中 \α是常数,并且|| w ^ || _1\ ell_1参数向量范数。

类中的实现Lasso使用坐标下降作为算法来拟合系数。最小角度回归 的另一个实现:

from sklearn import linear_model
>>> reg = linear_model.Lasso(alpha = 0.1)
>>> reg.fit([[0, 0], [1, 1]], [0, 1])
Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)
>>> reg.predict([[1, 1]])
array([ 0.8])
Lasso回归产生稀疏模型,所以可以用它来执行特征选择。

通过交叉验证设置Lasso 参数的对象LassoCVLassoLarsCV。 LassoLarsCV基于最小角度回归算法。

对于具有许多共线回归器的高维数据集, LassoCV通常是优选的。然而,它LassoLarsCV具有探索更多相关α参数值的优点,并且如果样本数量与特征数量相比非常小,则通常比LassoCV或者,估计器LassoLarsIC提出使用赤池信息准则(AIC)和贝叶斯信息准则(BIC)。这是一个计算更便宜的选择,找到最佳的alpha值,因为正则化路径计算只有一次,而不是k + 1倍,当使用k倍交叉验证。然而,这样的标准需要对解的自由度进行适当的估计,对于大样本(渐近结果),假定模型是正确的,即数据实际上是由这个模型生成的。当问题严重时,他们也倾向于打破(比样本更多的特征)。

MultiTaskLasso是一个线性模型,联合估计多个回归问题的稀疏系数:y是一个二维数组,形状约束条件是选择的特征对于所有的回归问题是相同的,也称为任务。(n_samples, n_tasks)

在数学上,它由一个线性模型和一个混合的 \ ell_1 \ ell_2前调整器组成。最小化的目标函数是:

\下面{w} {min \,} {\ frac {1} {2n_ {samples}} || XW  -  Y || _ {Fro} ^ 2 + \ alpha || W || _ {21}}

在哪里FRO表示Frobenius规范:

|| A || _ {Fro} = \ sqrt {\ sum_ {ij} a_ {ij} ^ 2}

\ ell_1 \ ell_2写道:

|| A || _ {2 1} = \ sum_i \ sqrt {\ sum_j a_ {ij} ^ 2}

类中的实现MultiTaskLasso使用坐标下降作为算法来拟合系数。

ElasticNet是一个线性回归模型,以L1和L2作为正则化参数进行训练。这种组合允许学习稀疏模型,其中很少的权重是非零的Lasso,同时仍然保持正则化属性Ridge我们使用l1_ratio参数来控制L1和L2的凸组合

当有多个相互关联的特征时,弹性网很有用。套索可能随机挑选其中之一,而弹性网可能会选择这两个。

在Lasso和Ridge之间进行权衡的一个实际优势是它允许Elastic-Net在旋转的情况下继承Ridge的一些稳定性。

在这种情况下,最小化的目标函数是:

\下面{w} {min \,} {\ frac {1} {2n_ {samples}} || X w  -  y || _2 ^ 2 + \ alpha \ rho || w || _1 + \ frac {\ alpha (1-ρ)} {2} || w || _2 ^ 2}

ElasticNetCV 可用于通过交叉验证来设置参数  alpha \α )和 l1_ratio \ RHO )。

MultiTaskElasticNet是一个弹性网模型,可以共同估计多个回归问题的稀疏系数:Y是一个二维数组,是一个形状约束条件是选择的特征对于所有的回归问题是相同的,也称为任务。(n_samples, n_tasks)

在数学上,它由一个线性模型和一个\ ell_1 \ ell_2先前和\ ell_2之前混合的 调节器组成。最小化的目标函数是:

\备注{W} {min \,} {\ frac {1} {2n_ {samples}} || XW  -  Y || _ {Fro} ^ 2 + \ alpha \ rho || W || _ {2 1} + \ frac {\ alpha(1-ρ)} {2} || W || _ {Fro} ^ 2}

类中的实现MultiTaskElasticNet使用坐标下降作为算法来拟合系数。

该类MultiTaskElasticNetCV可用于通过交叉验证来设置参数 alpha\α)和l1_ratio\ RHO)。

\下面{w} {min \,} {\ frac {1} {2n_ {samples}} || X w  -  y || _2 ^ 2 + \ alpha \ rho || w || _1 + \ frac {\ alpha (1-ρ)} {2} || w || _2 ^ 2}




贝叶斯回归技术可用于在估计过程中包括正则化参数:正则化参数不是在硬判定中设定,而是根据所掌握的数据进行调整。

这可以通过 在模型的超参数中引入无信息的先验来完成Ridge回归中\ ell_ {2}使用正则化等价于在精度高于参数的情况下在高斯先验下找到最大后验估计而不是手动设置 lambda,可以将其视为一个随机变量,从数据中进行估计。w ^\拉姆达^ { -  1}

为了获得完全概率模型,ÿ假设输出是高斯分布的X w

p(y | X,w,\ alpha)= \ mathcal {N}(y | X w,\ alpha)

阿尔法再次被视为一个随机变量,从数据中估计。

贝叶斯回归的优点是:

  • 它适应手头的数据。
  • 它可以用来在估计过程中包含正则化参数。

贝叶斯回归的缺点包括:

  • 模型的推断可能是耗时的。
















\下面{w} {min \,} {\ frac {1} {2n_ {samples}} || X w  -  y || _2 ^ 2 + \ alpha \ rho || w || _1 + \ frac {\ alpha (1-ρ)} {2} || w || _2 ^ 2}


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值