逻辑回归:用线性回归式子作为逻辑回归的输入,用来解决二分类问题
想把线性回归 用来做二分类问题,要用sigmoid函数
小于0.5的概率归为0,大于0.5的概率归位1
二、用逻辑回归做癌症二分类问题
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import SGDRegressor, LogisticRegression
from sklearn.metrics import classification_report #召回率API
def logistic():
"""
逻辑回归对癌症病情的二分类问题
:return: None
"""
#构造类标签名字
column = ['Sample code number', 'Clump Thickness',
'Uniformity of Cell Size',
'Uniformity of Cell Shape',
'Marginal Adhesion',
'Single Epithelial Cell Size ',
'Bare Nuclei',
'Bland Chromatin',
'Normal Nucleoli',
'Mitoses',
'Class:']
data = pd.read_csv("./breast-cancer-wisconsin.data ", names=column)
print(data)
#处理缺失值
data = data.replace(to_replace='?', value=np.nan)
data = data.dropna() #把nan类型的值删除
#进行数据的分割
x_train, x_test, y_train, y_test = train_test_split(data[column[1:10]], data[column[10]], test_size=0.25) #切片时 不包括10
print(x_train )
#进行标准化处理
std = StandardScaler()
x_train = std.fit_transform(x_train)
x_test = std.fit_transform(x_test)
#进行逻辑回归预测
LR = LogisticRegression(C=1.0, )
LR.fit(x_train, y_train )
print(LR.coef_)
y_predict = LR.predict(x_test)
print("准确率", LR.score(x_test, y_test) )
print("召回率", classification_report(y_test, y_predict, labels=[2, 4], target_names=['良性', '恶性'] ) )
return None
if "__name__" =="__name__":
logistic()
运行结果
Sample code number Clump Thickness ... Mitoses Class:
0 1000025 5 ... 1 2
1 1002945 5 ... 1 2
2 1015425 3 ... 1 2
3 1016277 6 ... 1 2
4 1017023 4 ... 1 2
.. ... ... ... ... ...
694 776715 3 ... 1 2
695 841769 2 ... 1 2
696 888820 5 ... 2 4
697 897471 4 ... 1 4
698 897471 4 ... 1 4
[699 rows x 11 columns]
Clump Thickness Uniformity of Cell Size ... Normal Nucleoli Mitoses
146 3 4 ... 1 1
426 5 3 ... 1 1
620 3 1 ... 1 1
214 10 10 ... 6 1
336 6 5 ... 4 1
.. ... ... ... ... ...
91 3 1 ... 1 1
121 4 2 ... 1 1
421 10 10 ... 2 1
311 1 1 ... 1 1
344 7 6 ... 5 3
[512 rows x 9 columns]
[[ 1.49940035 -0.03587566 1.04958732 0.50454901 0.58707274 1.36663014
0.78557885 0.41617397 0.65672664]]
准确率 0.9532163742690059
召回率 precision recall f1-score support
良性 0.97 0.96 0.97 116
恶性 0.91 0.95 0.93 55
accuracy 0.95 171
macro avg 0.94 0.95 0.95 171
weighted avg 0.95 0.95 0.95 171
主要关注召回率,看恶性是0.95 , 意味着假如100个人,有5个人患癌症没有被预测出来