# 我们在乳腺癌数据集上详细分析 LogisticRegressionfrom sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
cancer.data, cancer.target, stratify=cancer.target, random_state=42)
logreg = LogisticRegression().fit(X_train, y_train)print("Training set score: {:.3f}".format(logreg.score(X_train, y_train)))print("Test set score: {:.3f}".format(logreg.score(X_test, y_test)))
# C=1 的默认值给出了相当好的性能,在训练集和测试集上都达到 95% 的精度。但由于训练# 集和测试集的性能非常接近,所以模型很可能是欠拟合的。我们尝试增大 C 来拟合一个更# 灵活的模型:
Training set score:0.946
Test set score:0.958
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:939: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter)or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html.
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html