sklearn浅析（四）——Generalized Linear Models之三

最新推荐文章于 2024-07-15 14:24:14 发布

NirHeavenX

最新推荐文章于 2024-07-15 14:24:14 发布

阅读量3.1k

点赞数 1

分类专栏：学习笔记

本文链接：https://blog.csdn.net/qsczse943062710/article/details/75733355

版权

学习笔记专栏收录该内容

19 篇文章 9 订阅

订阅专栏

BayesianRidge

贝叶斯岭回归，求解的是基于gamma先验（假设噪音是服从高斯分布的，那么gamma分布就是其共轭先验）的岭回归，可以理解为正则化项是gamma先验（其实贝叶斯方法中的先验概率在一定程度即是正则化项的作用）：

w = (X T X + λ α I) - 1 X T y

$w = (X^TX+\frac{\lambda}{\alpha}I)^{-1}X^Ty$
其中

λ $\lambda$ 的初始值1，

α $\alpha$ 的初始值为var(y)，并在迭代中使用gamma分布的形状参数和尺度参数进行更新。
关于贝叶斯回归，可以参考 mlapp的第七章。

BayesianRidge的使用

from sklearn.linear_model import BayesianRidge
lr = BayesianRidge()

BayesianRidge类的定义：

class BayesianRidge(LinearModel, RegressorMixin):
   def __init__(self, n_iter=300, tol=1.e-3, alpha_1=1.e-6, alpha_2=1.e-6,
                 lambda_1=1.e-6, lambda_2=1.e-6, compute_score=False,
                 fit_intercept=True, normalize=False, copy_X=True,
                 verbose=False):
                 ...

n_iter：迭代次数
tol：终止迭代的阈值，即相邻两次迭代的参数差值小于tol则终止迭代
alpha_1，alpha_2：gamma分布中参数 $\alpha$ 的形状参数和尺度参数
lambda_1，lambda_2：gamma分布中参数 $\lambda$ 的形状参数和尺度参数
compute_score：是否计算每一轮迭代的模型评估得分

BayesianRidge类的fit()方法：

    def fit(self, X, y):
        类型检查和预处理
        ...
        alpha_ = 1. / np.var(y)
        lambda_ = 1.
        XT_y = np.dot(X.T, y)
        U, S, Vh = linalg.svd(X, full_matrices=False)
        eigen_vals_ = S ** 2
        for iter_ in range(self.n_iter):
            if n_samples > n_features:
                coef_ = np.dot(Vh.T,
                Vh / (eigen_vals_ + lambda_ /alpha_)[:, None])
                coef_ = np.dot(coef_, XT_y)
                ...
            else:
                coef_ = np.dot(X.T, np.dot(
                U / (eigen_vals_ + lambda_ / alpha_)[None, :], U.T))
                coef_ = np.dot(coef_, y)
                ...
            #更新alpha_,lambda_
            rmse_ = np.sum((y - np.dot(X, coef_)) ** 2)
            gamma_ = (np.sum((alpha_ * eigen_vals_) /
                      (lambda_ + alpha_ * eigen_vals_)))
            lambda_ = ((gamma_ + 2 * lambda_1) /
                       (np.sum(coef_ ** 2) + 2 * lambda_2))
            alpha_ = ((n_samples - gamma_ + 2 * alpha_1) /
                      (rmse_ + 2 * alpha_2))

这里的求解方式类似于岭回归中将method设置为svd，采用的是一种迭代算发，每一轮迭代后，使用传入的alpha_1，alpha_2和lambda_1，lambda_2来更新公式中的 $\alpha$ 和 $\lambda$

fit()函数返回值：

self：BayesianRidge的实例对象

属性：

coef_：模型参数
alpha_：噪声的精度估计
lambda_：权重的精度估计
- scores_：目标函数值，可选

LogisticRegression

LogisticRegression，名为回归，实则是一个分类器，它的损失函数是交叉熵函数，关于该模型的介绍，详见：LogisticRegression

LogisticRegression的使用

from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()

LogisticRegression类的定义：

class BayesianRidge(BaseEstimator, LinearClassifierMixin,
                         _LearntSelectorMixin, SparseCoefMixin):
  def __init__(self, penalty='l2', dual=False, tol=1e-4, C=1.0,
                 fit_intercept=True, intercept_scaling=1, class_weight=None,
                 random_state=None, solver='liblinear', max_iter=100,
                 multi_class='ovr', verbose=0, warm_start=False, n_jobs=1):
                 ...

从类的继承关系我们也可以看出，它是一个分类器，继承自LinearClassifierMixin。SparseCoefMixin是求解稀疏参数的基类，当我们选用l1正则时，会用到这个类，它有一个sparsify()方法用来将参数矩阵稀疏化。

penalty：str，正则化策略，可选值为‘l1’和‘l2’。
dual： bool，求解原始问题还是对偶问题（只有penalty=‘l2’且solver=’liblinear’才能使用对偶形式），当样本大于特征时，建议设置为False
C：float，正则化项系数的倒数，用来控制正则化项的权重
intercept_scaling：float，当solver=’liblinear’且fit_intercept=True时生效，会额外生成一个特征，该特征值为1，很多地方会采用这种方法将偏置项b与其他的特征统一起来，即每个样本变为[1,x1,…,xn]，权重变为[b,w1,…,wn]，b即该参数值，该特征也会参与正则化项的计算，因此用b来控制它的权重
solver：str，求解算法
取值可以为：
- newton-cg：牛顿共轭梯度法，仅适用于l2正则
- lbfgs：拟牛顿法，仅适用于l2正则
- liblinear：适用于小数据集
- sag：随机平均梯度下降，仅适用于l2正则，适用于大数据集
multi_class：str，多分类求解策略
取值可以为：
- ovr：one-vs-rest
- multinomial：直接采用多分类的方式，不适用于liblinear

LogisticRegression类的fit()方法：

     def fit(self, X, y, sample_weight=None):
        类型检查和预处理
        ...
        分类决策

因为这里的fit()函数的选择策略要考虑多个因素，这里不一一列出，需要注意的是：

当solver=liblinear时调用的_fit_liblinear()方法，该方法也是线性支持向量机分类器(LinearSVC)的实现函数。
当采用sag时，会使用并行的方法进行处理，此时最好给n_jobs赋值

fit()函数返回值：

self：LogisticRegression的实例对象

属性：

coef_：模型参数
intercept_：b值
iter_：迭代次数

Stochastic Gradient Descent

Stochastic Gradient Descent（SGD），随机梯度下降法，它并不是一种模型，而是一种迭代的优化问题求解算法，通过定义不同的损失函数来最小化该损失从而进行拟合，SGD详见：梯度下降法。因为随机梯度下降是一个广泛使用的优化算法，sklearn中专门提供了一个SGDClassifier和SGDRegressor来使用SGD进行线性问题的求解。

SGDClassifier

SGDClassifier的使用

from sklearn.linear_model import SGDClassifier
lr = SGDClassifier()

SGDClassifier类的定义：

class SGDClassifier(BaseSGDClassifier, _LearntSelectorMixin):
   def __init__(self, loss="hinge", penalty='l2', alpha=0.0001, l1_ratio=0.15,
                 fit_intercept=True, n_iter=5, shuffle=True, verbose=0,
                 epsilon=DEFAULT_EPSILON, n_jobs=1, random_state=None,
                 learning_rate="optimal", eta0=0.0, power_t=0.5,
                 class_weight=None, warm_start=False, average=False):
        super(SGDClassifier, self).__init__(**args)#**args即上述所有参数

SGD分类器没有重写初始化方法和fit()方法，直接调用的BaseSGDClassifier()的相应方法。

loss：str，损失函数类型
取值可以为：
- hinge：合页损失函数，返回的是一个线性svm
- squared_hinge：平方合页损失
- perceptron：感知机损失函数，类似hinge
- log：logistic回归损失，即交叉熵损失函数
- modified_huber：类似svm的软间隔最大化，对异常点有一定的tolerance
- squared_loss：均方误差
- huber：
- epsilon_insensitive：
- squared_epsilon_insensitive：
  后三个主要用于SGDRegressor
alpha： float，正则化项系数，当learning_rate=”optimal”时也用来计算学习率
l1_ratio：float，同弹性网络中的l1_ratio，是l1正则和l2正则比例的控制参数
epsilon：float，
learning_rate：str，学习率参数
取值可以为：
- constant：学习率= $eta0$
- optimal：通过alpha参数计算学习率
- invscaling：学习率= $\frac{eta0}{t^{power\_t}}$
eta0：学习率的初始化值，当learning_rate=”optimal”可以采用默认值0，因为该方式不会用到eta0
power_t：learning_rate=invscaling时，求解学习率的参数。
shuffle：是否打乱训练集

BaseSGDClassifier的fit()函数返回值：

self：BaseSGDClassifier的实例对象

属性：

coef_：模型参数
intercept_：b值

SGDRegressor

SGDRegressor的使用

from sklearn.linear_model import SGDRegressor
lr = SGDRegressor()

SGDRegressor类的定义：

class SGDRegressor(BaseSGDRegressor, _LearntSelectorMixin):
  def __init__(self, loss="squared_loss", penalty="l2", alpha=0.0001,
                 l1_ratio=0.15, fit_intercept=True, n_iter=5, shuffle=True,
                 verbose=0, epsilon=DEFAULT_EPSILON, random_state=None,
                 learning_rate="invscaling", eta0=0.01, power_t=0.25,
                 warm_start=False, average=False):
        super(SGDRegressor, self).__init__(**args)#**args即上述所有参数

SGDRegressor继承自BaseSGDRegressor，SGDClassifier继承自BaseSGDClassifier，
BaseSGDRegressor和BaseSGDClassifier都是继承自BaseSGD类。
参数含义同SGDClassifier，但是某些参数的默认值不同。

loss：str，损失函数类型
取值可以为：
- hinge：合页损失函数，返回的是一个线性svm
- squared_hinge：平方合页损失
- perceptron：感知机损失函数，类似hinge
- log：logistic回归损失，即交叉熵损失函数
- modified_huber：类似svm的软间隔最大化，对异常点有一定的tolerance
- squared_loss：均方误差
- huber：
- epsilon_insensitive：
- squared_epsilon_insensitive：
  后三个主要用于回归，在SGDRegressor中介绍
alpha： float，正则化项系数，当learning_rate=”optimal”时也用来计算学习率
l1_ratio：float，同弹性网络中的l1_ratio，是l1正则和l2正则比例的控制参数
epsilon：float，
learning_rate：str，学习率参数
取值可以为：
- constant：学习率= $eta0$
- optimal：通过alpha参数计算学习率
- invscaling：学习率= $\frac{eta0}{t^{power\_t}}$
eta0：学习率的初始化值，当learning_rate=”optimal”可以采用默认值0，因为该方式不会用到eta0
power_t：learning_rate=invscaling时，求解学习率的参数。
shuffle：是否打乱训练集

BaseSGDRegressor的fit()函数返回值：

self：BaseSGDRegressor的实例对象

属性：

coef_：模型参数
intercept_：b值
average_coef_：基于特征的模型参数的均值
average_intercept_：b值的均值

Perceptron

Perceptron的使用

from sklearn.linear_model import Perceptron
lr = Perceptron()

Perceptron类的定义：

class Perceptron(BaseSGDClassifier, _LearntSelectorMixin):
 def __init__(self, penalty=None, alpha=0.0001, fit_intercept=True,
                 n_iter=5, shuffle=True, verbose=0, eta0=1.0, n_jobs=1,
                 random_state=0, class_weight=None, warm_start=False):
        super(SGDRegressor, self).__init__(**args)#**args即上述所有参数

Perceptron也是继承自BaseSGDClassifier，默认不采用正则化项，感知机是线性svm的原型，它不要间隔最大化，仅仅是已误分类的样本个数作为损失。
它除了初始化方法以外，其他的所有方法都是沿用的BaseSGDClassifier，没有重写任何方法。返回值参考SGDClassifier。

其他的线性模型

除了上述模型外，sklearn还提供了一些其他的模型：

TheilSenRegressor
HuberRegressor
RANSACRegressor

线性模型的交叉验证版本

sklearn还提供了自动进行模型交叉验证的版本，在类名后加上CV即调用相应的带交叉验证的模型，这些都继承自原模型，包括但不限于：

RidgeCV（_BaseRidgeCV, RegressorMixin）
LarsCV（Lars）
LassoCV（LinearModelCV, RegressorMixin）
ElasticNetCV（LinearModelCV, RegressorMixin）
LassoLarsCV（LarsCV）
MultiTaskElasticNetCV（LinearModelCV, RegressorMixin）
MultiTaskLassoCV（LinearModelCV, RegressorMixin）
OrthogonalMatchingPursuitCV(LinearModel, RegressorMixin)
LogisticRegressionCV(LogisticRegression, BaseEstimator,LinearClassifierMixin, _LearntSelectorMixin)

这些模型的初始化参数中相比基类模型多出一个cv参数，取值为：