现在有一组数据,请用knn方法来进行分析肿瘤的良性恶性(良性肿瘤用“0”,恶性肿瘤用“1”表示)部分数据样例如图所示:
radius | texture | perimeter | area | smoothness | compactness | symmetry | fractal_dimension | class |
23 | 12 | 151 | 954 | 0.143 | 0.278 | 0.242 | 0.079 | 1 |
9 | 13 | 133 | 1326 | 0.143 | 0.079 | 0.181 | 0.057 | 0 |
21 | 27 | 130 | 1203 | 0.125 | 0.16 | 0.207 | 0.06 | 1 |
14 | 16 | 78 | 386 | 0.07 | 0.284 | 0.26 | 0.097 | 1 |
9 | 19 | 135 | 1297 | 0.141 | 0.133 | 0.181 | 0.059 | 1 |
25 | 25 | 83 | 477 | 0.128 | 0.17 | 0.209 | 0.076 | 0 |
16 | 26 | 120 | 1040 | 0.095 | 0.109 | 0.179 | 0.057 | 1 |
15 | 18 | 90 | 578 | 0.119 | 0.165 | 0.22 | 0.075 | 1 |
19 | 24 | 88 | 520 | 0.127 | 0.193 | 0.235 | 0.074 | 1 |
25 | 11 | 84 | 476 | 0.119 | 0.24 | 0.203 | 0.082 | 1 |
24 | 21 | 103 | 798 | 0.082 | 0.067 | 0.153 | 0.057 | 1 |
17 | 15 | 104 | 781 | 0.097 | 0.129 | 0.184 | 0.061 | 1 |
14 | 15 | 132 | 1123 | 0.097 | 0.246 | 0.24 | 0.078 | 0 |
12 | 22 | 104 | 783 | 0.084 | 0.1 | 0.185 | 0.053 | 1 |
12 | 13 | 94 | 578 | 0.113 | 0.229 | 0.207 | 0.077 | 1 |
22 | 19 | 97 | 659 | 0.114 | 0.16 | 0.23 | 0.071 | 1 |
10 | 16 | 95 | 685 | 0.099 | 0.072 | 0.159 | 0.059 | 1 |
15 | 14 | 108 | 799 | 0.117 | 0.202 | 0.216 | 0.074 | 1 |
20 | 14 | 130 | 1260 | 0.098 | 0.103 | 0.158 | 0.054 | 1 |
17 | 11 | 87 | 566 | 0.098 | 0.081 | 0.189 | 0.058 | 0 |
16 | 14 | 86 | 520 | 0.108 | 0.127 | 0.197 | 0.068 | 0 |
17 | 24 | 60 | 274 | 0.102 | 0.065 | 0.182 | 0.069 | 0 |
20 | 27 | 103 | 704 | 0.107 | 0.214 | 0.252 | 0.07 | 1 |
import numpy as np import pandas as pd data1=pd.read_csv('./data/cancer1.csv') data1.head() X=data1.drop('class',axis=1) y=data1['class'] from sklearn.model_selection import train_test_split X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=6) from sklearn.preprocessing import StandardScaler ss=StandardScaler() X_train=ss.fit_transform(X_train) X_test=ss.transform(X_test) from sklearn.neighbors import KNeighborsClassifier model=KNeighborsClassifier() model.fit(X_train,y_train) from sklearn.metrics import classification_report print("训练集的模型评估指标:") model_score = model.score(X_train, y_train) print() print('The accuracy of train data', model_score) print('--------------------------------------------------------------------------') y_train_predict = model.predict(X_train) model_report1 = classification_report(y_train, y_train_predict) print(model_report1) print('$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$') print("测试集的模型评估指标:") model_score = model.score(X_test, y_test) print() print('The accuracy of test data is', model_score) print('--------------------------------------------------------------------------') y_predict = model.predict(X_test) model_report = classification_report(y_test, y_predict) print(model_report) print('--------------------------------------------------------------------------')
训练集的模型评估指标:
The accuracy of train data 0.8266666666666667
--------------------------------------------------------------------------
precision recall f1-score support
0 0.80 0.71 0.75 28
1 0.84 0.89 0.87 47
accuracy 0.83 75
macro avg 0.82 0.80 0.81 75
weighted avg 0.83 0.83 0.82 75
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
测试集的模型评估指标:
The accuracy of test data is 0.84
--------------------------------------------------------------------------
precision recall f1-score support
0 1.00 0.60 0.75 10
1 0.79 1.00 0.88 15
accuracy 0.84 25
macro avg 0.89 0.80 0.82 25
weighted avg 0.87 0.84 0.83 25