KNN判断良性恶性肿瘤

最新推荐文章于 2024-08-10 22:15:42 发布

pbxxk

最新推荐文章于 2024-08-10 22:15:42 发布

阅读量876

点赞数

文章标签：机器学习人工智能

本文链接：https://blog.csdn.net/pbxxk/article/details/131936349

版权

该文展示了运用K近邻(KNN)算法对肿瘤数据进行分类的过程。通过对数据预处理、训练集测试集划分、标准化以及模型训练和评估，得出模型在训练集和测试集上的精度，分别约为83%和84%，并提供了分类报告详述了精确度、召回率和F1分数。

摘要由CSDN通过智能技术生成

现在有一组数据，请用knn方法来进行分析肿瘤的良性恶性（良性肿瘤用“0”，恶性肿瘤用“1”表示）部分数据样例如图所示：

radius	texture	perimeter	area	smoothness	compactness	symmetry	fractal_dimension	class
23	12	151	954	0.143	0.278	0.242	0.079	1
9	13	133	1326	0.143	0.079	0.181	0.057	0
21	27	130	1203	0.125	0.16	0.207	0.06	1
14	16	78	386	0.07	0.284	0.26	0.097	1
9	19	135	1297	0.141	0.133	0.181	0.059	1
25	25	83	477	0.128	0.17	0.209	0.076	0
16	26	120	1040	0.095	0.109	0.179	0.057	1
15	18	90	578	0.119	0.165	0.22	0.075	1
19	24	88	520	0.127	0.193	0.235	0.074	1
25	11	84	476	0.119	0.24	0.203	0.082	1
24	21	103	798	0.082	0.067	0.153	0.057	1
17	15	104	781	0.097	0.129	0.184	0.061	1
14	15	132	1123	0.097	0.246	0.24	0.078	0
12	22	104	783	0.084	0.1	0.185	0.053	1
12	13	94	578	0.113	0.229	0.207	0.077	1
22	19	97	659	0.114	0.16	0.23	0.071	1
10	16	95	685	0.099	0.072	0.159	0.059	1
15	14	108	799	0.117	0.202	0.216	0.074	1
20	14	130	1260	0.098	0.103	0.158	0.054	1
17	11	87	566	0.098	0.081	0.189	0.058	0
16	14	86	520	0.108	0.127	0.197	0.068	0
17	24	60	274	0.102	0.065	0.182	0.069	0
20	27	103	704	0.107	0.214	0.252	0.07	1

import numpy as np
import pandas as pd
data1=pd.read_csv('./data/cancer1.csv')
data1.head()
X=data1.drop('class',axis=1)
y=data1['class']
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=6)
from sklearn.preprocessing import StandardScaler
ss=StandardScaler()
X_train=ss.fit_transform(X_train)
X_test=ss.transform(X_test)
from sklearn.neighbors import KNeighborsClassifier
model=KNeighborsClassifier()
model.fit(X_train,y_train)
from sklearn.metrics import classification_report

print("训练集的模型评估指标：")
model_score = model.score(X_train, y_train)
print()
print('The accuracy of train data', model_score)
print('--------------------------------------------------------------------------')
y_train_predict = model.predict(X_train)
model_report1 = classification_report(y_train, y_train_predict)
print(model_report1)
print('$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$')

print("测试集的模型评估指标：")
model_score = model.score(X_test, y_test)
print()
print('The accuracy of test data is', model_score)
print('--------------------------------------------------------------------------')
y_predict = model.predict(X_test)
model_report = classification_report(y_test, y_predict)
print(model_report)
print('--------------------------------------------------------------------------')

训练集的模型评估指标：

The accuracy of train data 0.8266666666666667
--------------------------------------------------------------------------
precision recall f1-score support

0 0.80 0.71 0.75 28
1 0.84 0.89 0.87 47

accuracy 0.83 75
macro avg 0.82 0.80 0.81 75
weighted avg 0.83 0.83 0.82 75

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
测试集的模型评估指标：

The accuracy of test data is 0.84
--------------------------------------------------------------------------
precision recall f1-score support

0 1.00 0.60 0.75 10
1 0.79 1.00 0.88 15

accuracy 0.84 25
macro avg 0.89 0.80 0.82 25
weighted avg 0.87 0.84 0.83 25