sklearn实战：乳腺癌检测（逻辑回归算法）

最新推荐文章于 2024-07-26 15:23:07 发布

Douhh_sisy

最新推荐文章于 2024-07-26 15:23:07 发布

阅读量8.3k

点赞数 2

分类专栏：机器学习 scikit-learn 文章标签： sklearn

本文链接：https://blog.csdn.net/Douhh_sisy/article/details/80610178

版权

本文介绍了使用sklearn进行乳腺癌检测的实战案例，基于569个样本和30个特征，其中357个为正样本。通过将数据集划分为训练集和测试集，进行了模型优化，包括添加二阶多项式特征和应用L1正则化。在添加二阶特征后，特征数增加到495，经过L1正则化后保留了94个特征。实验结果显示，二阶L1正则化的模型在训练集和交叉验证集上表现最佳，但存在过拟合，建议增加更多训练数据。

摘要由CSDN通过智能技术生成

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

# 载入数据
from sklearn.datasets import load_breast_cancer

cancer = load_breast_cancer()
X = cancer.data
y = cancer.target
print('data shape: {0};target shape: {1} no. positive: {2}; no. negative: {3}'.format(
    X.shape, y.shape,y[y==1].shape[0], y[y==0].shape[0])) #shape[0]就是读取矩阵第一维度的长度
print(cancer.data[0])  #打印一组样本数据

data shape: (569, 30);target shape: (569,) no. positive: 357; no. negative: 212
[1.799e+01 1.038e+01 1.228e+02 1.001e+03 1.184e-01 2.776e-01 3.001e-01
 1.471e-01 2.419e-01 7.871e-02 1.095e+00 9.053e-01 8.589e+00 1.534e+02
 6.399e-03 4.904e-02 5.373e-02 1.587e-02 3.003e-02 6.193e-03 2.538e+01
 1.733e+01 1.846e+02 2.019e+03 1.622e-01 6.656e-01 7.119e-01 2.654e-01
 4.601e-01 1.189e-01]

569个样本，30个特征，357个正样本（阳性）

print(len(cancer.feature_names))
cancer.feature_names

30





array(['mean radius', 'mean texture', 'mean perimeter', 'mean area',
       'mean smoothness', 'mean compactness', 'mean concavity',
       'mean concave points', 'mean symmetry', 'mean fractal dimension',
       'radius error', 'texture error', 'perimeter error', 'area error',
       'smoothness error', 'compactness error', 'concavity erro