python中线性回归的标签可以是分类的吗_机器学习之路：python线性回归分类器进行良恶性肿瘤分类预测...

最新推荐文章于 2022-07-15 17:19:20 发布

weixin_39756696

最新推荐文章于 2022-07-15 17:19:20 发布

阅读量135

点赞数

文章标签： python中线性回归的标签可以是分类的吗

1 importnumpy as np2 importpandas as pd3 from sklearn.cross_validation importtrain_test_split4 from sklearn.preprocessing importStandardScaler5 from sklearn.linear_model importLogisticRegression, SGDClassifier6 from sklearn.metrics importclassification_report7

8 ‘‘‘

9 线性分类器10 最基本和常用的机器学习模型11 受限于数据特征与分类目标的线性假设12 逻辑斯蒂回归计算时间长，模型性能略高13 随机参数估计计算时间短，模型性能略低14 ‘‘‘

16 ‘‘‘

17 1 数据预处理18 ‘‘‘

19 #创建特征列表

20 column_names = [‘Sample code number‘, ‘Clump Thickness‘, ‘Uniformity of Cell Size‘,21 ‘Uniformity of Cell Shape‘, ‘Marginal Adhesion‘, ‘Single Epithelial Cell size‘,22 ‘Bare Nuclei‘, ‘Bland Chromatin‘, ‘Normal Nucleoli‘, ‘Mitoses‘, ‘Class‘]23 #使用pandas.read_csv取数据集

24 data = pd.read_csv(‘./data/breast/breast-cancer-wisconsin.data‘, names=column_names)25 #将?替换为标准缺失值表示

26 data = data.replace(to_replace=‘?‘, value=np.nan)27 #丢失带有缺失值的数据只要有一个维度有缺失就丢弃

28 data = data.dropna(how=‘any‘)29 #输出data数据的数量和维度

30 #print(data.shape)

33 ‘‘‘

34 2 准备良恶性肿瘤训练、测试数据部分35 ‘‘‘

36 #随机采样25%数据用于测试 75%数据用于训练

37 x_train, x_test, y_train, y_test = train_test_split(data[column_names[1:10]],38 data[column_names[10]],39 test_size=0.25,40 random_state=33)41 #查验训练样本和测试样本的数量和类别分布

42 #print(y_train.value_counts())

43 #print(y_test.value_counts())

44 ‘‘‘

45 训练样本共512条其中344条良性肿瘤 168条恶性肿瘤46 2 34447 4 16848 Name: Class, dtype: int6449 测试数据共171条其中100条良性肿瘤 71条恶性肿瘤50 2 10051 4 7152 Name: Class, dtype: int6453 ‘‘‘

56 ‘‘‘

57 3 机器学习模型进行预测部分58 ‘‘‘

59 #数据标准化，保证每个维度特征的方差为1 均值为0 预测结果不会被某些维度过大的特征值主导

60 ss =StandardScaler()61 x_train = ss.fit_transform(x_train) #对x_train进行标准化

62 x_test = ss.transform(x_test) #用与x_train相同的规则对x_test进行标准化，不重新建立规则

64 #分别使用逻辑斯蒂回归和随机参数估计两种方法进行学习预测

66 lr = LogisticRegression() #初始化逻辑斯蒂回归模型

67 sgdc = SGDClassifier() #初始化随机参数估计模型

69 #使用逻辑斯蒂回归在训练集合上训练

70 lr.fit(x_train, y_train)71 #训练好后对测试集合进行预测预测结果保存在 lr_y_predict中

72 lr_y_predict =lr.predict(x_test)73

74 #使用随机参数估计在训练集合上训练

75 sgdc.fit(x_train, y_train)76 #训练好后对测试集合进行预测结果保存在 sgdc_y_predict中

77 sgdc_y_predict =sgdc.predict(x_test)78

79 ‘‘‘

80 4 性能分析部分81 ‘‘‘

82 #逻辑斯蒂回归模型自带评分函数score获得模型在测试集合上的准确率

83 print("逻辑斯蒂回归准确率：", lr.score(x_test, y_test))84 #逻辑斯蒂回归的其他指标

85 print("逻辑斯蒂回归的其他指标：\n", classification_report(y_test, lr_y_predict, target_names=["Benign", "Malignant"]))86

87 #随机参数估计的性能分析

88 print("随机参数估计准确率：", sgdc.score(x_test, y_test))89 #随机参数估计的其他指标

90 print("随机参数估计的其他指标:\n", classification_report(y_test, sgdc_y_predict, target_names=["Benign", "Malignant"]))91

92 ‘‘‘

93 recall 召回率94 precision 精确率95 fl-score96 support97

98 逻辑斯蒂回归准确率： 0.970760233918128699 逻辑斯蒂回归的其他指标：100 precision recall f1-score support101

102 Benign 0.96 0.99 0.98 100103 Malignant 0.99 0.94 0.96 71104

105 avg / total 0.97 0.97 0.97 171106

107 随机参数估计准确率： 0.9649122807017544108 随机参数估计的其他指标:109 precision recall f1-score support110

111 Benign 0.97 0.97 0.97 100112 Malignant 0.96 0.96 0.96 71113

114 avg / total 0.96 0.96 0.96 171115 ‘‘‘

weixin_39756696

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python中线性回归的标签可以是分类的吗_机器学习之路：python线性回归分类器进行良恶性肿瘤分类预测...

1 importnumpy as np2 importpandas as pd3 from sklearn.cross_validation importtrain_test_split4 from sklearn.preprocessing importStandardScaler5 from sklearn.linear_model importLogisticRegression, SGDC...
复制链接

扫一扫