# 通过正则化减小提高模型你能力 来减小方差from sklearn.linear_model import LogisticRegressionCV
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
iris = datasets.load_iris()
features = iris.data
target = iris.target
# 标准化
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)# 正则化,通过惩罚项来减小方差 , LogisticRegressionCV 来调节C L1惩罚项和L2惩罚项 惩罚复杂模型
logistic_regression = LogisticRegressionCV(
penalty='l2', Cs=10, random_state=0, n_jobs=-1)
model = logistic_regression.fit(features_standardized, target)
Discussion
Regularization is a method of penalizing complex models to reduce their variance. Specifically, a penalty term is added to the loss function we are trying to minimize typically the L1 and L2 penalties
In the L1 penalty:
α∑j=1p|β̂ j|
α∑j=1p|β^j|
where β̂ jβ^j is the parameters of the jth of p features being learned and αα is a hyperparameter denoting the regularization strength.
With the L2 penalty:
α∑j=1pβ̂ 2j
α∑j=1pβ^j2
higher values of αα increase the penalty for larger parameter values(i.e. more complex models). scikit-learn follows the common method of using C instead of αα where C is the inverse of the regularization strength: C=1αC=1α . To reduce variance while using logistic regression, we can treat C as a hyperparameter to be tuned to find thevalue of C that creates the best model. In scikit-learn we can use the LogisticRegressionCV classto efficiently tune C.