1.sklearn中与逻辑回归有关的三个类
sklearn中,lr相关的代码在linear_model模块中,查看linear_model的__init__文件,内容如下
__all__ = ['ARDRegression',
'BayesianRidge',
'ElasticNet',
'ElasticNetCV',
'Hinge',
'Huber',
'HuberRegressor',
'Lars',
'LarsCV',
'Lasso',
'LassoCV',
'LassoLars',
'LassoLarsCV',
'LassoLarsIC',
'LinearRegression',
'Log',
'LogisticRegression',
'LogisticRegressionCV',
'ModifiedHuber',
'MultiTaskElasticNet',
'MultiTaskElasticNetCV',
'MultiTaskLasso',
'MultiTaskLassoCV',
'OrthogonalMatchingPursuit',
'OrthogonalMatchingPursuitCV',
'PassiveAggressiveClassifier',
'PassiveAggressiveRegressor',
'Perceptron',
'Ridge',
'RidgeCV',
'RidgeClassifier',
'RidgeClassifierCV',
'SGDClassifier',
'SGDRegressor',
'SquaredLoss',
'TheilSenRegressor',
'enet_path',
'lars_path',
'lars_path_gram',
'lasso_path',
'logistic_regression_path',
'orthogonal_mp',
'orthogonal_mp_gram',
'ridge_regression',
'RANSACRegressor']
可以看出来,与lr有关的部分一共有三个类:
LogisticRegression
LogisticRegressionCV
logistic_regression_path
2.logistic_regression_path
@deprecated('logistic_regression_path was deprecated in version 0.21 and '
'will be removed in version 0.23.0')
def logistic_regression_path(X, y, pos_class=None, Cs=10, fit_intercept=True,
max_iter=100, tol=1e-4, verbose=0,
solver='lbfgs', coef=None,
class_weight=None, dual=False, penalty='l2',
intercept_scaling=1., multi_class='auto',
random_state=None, check_input=True,
max_squared_sum=None, sample_weight=None,
l1_ratio=None):
"""Compute a Logistic Regression model for a list of regularization
parameters.
This is an implementation that uses the result of the previous model
to speed up computations along the set of solutions, making it faster
than sequentially calling LogisticRegression for the different parameters.
Note that there will be no speedup with liblinear solver, since it does
not handle warm-starting.
.. deprecated:: 0.21
``logistic_regression_path`` was deprecated in version 0.21 and will
be removed in 0.23.
Read more in the :ref:`User Guide <logistic_regression>`.
首先可以看到,logistic_regression_path将会在0.23.0版本移出。
由注释说明不难看出,logistic_regression_path主要是基于之前的训练结果用于训练加速。同时特别强调,如果用的是liblinear求解器,logistic_regression_path不能加快速度,因为liblinear不能处理warm-starting场景。
3.LogisticRegression
class LogisticRegression(BaseEstimator, LinearClassifierMixin,
SparseCoefMixin):
"""
Logistic Regression (aka logit, MaxEnt) classifier.
In the multiclass case, the training algorithm uses the one-vs-rest (OvR)
scheme if the 'multi_class' option is set to 'ovr', and uses the
cross-entropy loss if the 'multi_class' option is set to 'multinomial'.
(Currently the 'multinomial' option is supported only by the 'lbfgs',
'sag', 'saga' and 'newton-cg' solvers.)
This class implements regularized logistic regression using the
'liblinear' library, 'newton-cg', 'sag', 'saga' and 'lbfgs' solvers. **Note
that regularization is applied by default**. It can handle both dense
and sparse input. Use C-ordered arrays or CSR matrices containing 64-bit
floats for optimal performance; any other input format will be converted
(and copied).
The 'newton-cg', 'sag', and 'lbfgs' solvers support only L2 regularization
with primal formulation, or no regularization. The 'liblinear' solver
supports both L1 and L2 regularization, with a dual formulation only for
the L2 penalty. The Elastic-Net regularization is only supported by the
'saga' solver.
Read more in the :ref:`User Guide <logistic_regression>`.
...
def __init__(self, penalty='l2', dual=False, tol=1e-4, C=1.0,
fit_intercept=True, intercept_scaling=1, class_weight=None,
random_state=None, solver='lbfgs', max_iter=100,
multi_class='auto', verbose=0, warm_start=False, n_jobs=None,
l1_ratio=None):
self.penalty = penalty
self.dual = dual
self.tol = tol
self.C = C
self.fit_intercept = fit_intercept
self.intercept_scaling = intercept_scaling
self.class_weight = class_weight
self.random_state = random_state
self.solver = solver
self.max_iter = max_iter
self.multi_class = multi_class
self.verbose = verbose
self.warm_start = warm_start
self.n_jobs = n_jobs
self.l1_ratio = l1_ratio
看看注释里头提到的几个关键点:
1.首先对于多分类,如果multi_class设置的是ovr,那么采用的是one-vs-rest的方式。如果设置的是multinomial,会使用cross-entropy loss。
2.具体的优化求解方法包括’liblinear’ library, ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’。‘newton-cg’, ‘sag’, and ‘lbfgs’支持L2正则,liblinear支持L1与L2,Elastic-Net正则只能使用’saga’。
关心一下初始化方法中的相关参数:
1.dual:选择目标函数是原始形式还是对偶形式,默认false。
2.tol:停止迭代的标准,默认1e-4。
3.C:正则化系数,默认1.0
4.solver:最优化求解方法,默认lbfgs
5.max_iter: 最大迭代次数,默认100。
4.LogisticRegressionCV
class LogisticRegressionCV(LogisticRegression, BaseEstimator,
LinearClassifierMixin):
"""Logistic Regression CV (aka logit, MaxEnt) classifier.
See glossary entry for :term:`cross-validation estimator`.
This class implements logistic regression using liblinear, newton-cg, sag
of lbfgs optimizer. The newton-cg, sag and lbfgs solvers support only L2
regularization with primal formulation. The liblinear solver supports both
L1 and L2 regularization, with a dual formulation only for the L2 penalty.
Elastic-Net penalty is only supported by the saga solver.
For the grid of `Cs` values and `l1_ratios` values, the best hyperparameter
is selected by the cross-validator
:class:`~sklearn.model_selection.StratifiedKFold`, but it can be changed
using the :term:`cv` parameter. The 'newton-cg', 'sag', 'saga' and 'lbfgs'
solvers can warm-start the coefficients (see :term:`Glossary<warm_start>`).
LogisticRegressionCV的用法与LogisticRegression基本相当,不一样在于LogisticRegressionCV通过cross validation,来选择正则化参数C,同时通过l1_ratios来选择l1正则与l2正则的组合。